Difference between revisions of "Statistical analysis"

Latest revision as of 06:11, 17 July 2006

If you'd like to contribute to the experimental data, see search odds condensed or search odds
If you'd like to view the summarized results, see search odds results
If you'd like to learn about or discuss results, see search odds discussion
If you'd like to learn about or discuss the underlying assumptions about game mechanics, see how searching works
If you'd like to learn about or discuss statistical techniques being used to analyze data, see statistical analysis

Here are some introductory discussions to the statistical analysis used to determine find rates.

Approximate find rate

Suppose that we have performed n searches resulting in s successes and f failures (where s + f = n). Our best guess for the find percentage p is given by:

p = s / n

As an example, suppose 371 searches on the beach yield 25 pieces of driftwood, so n = 371, s = 25 and p = 25 / 371 ≈ 0.067. Given these numbers, the chance of finding driftwood on the beach is around 6.7%, or 1 find in 15 AP.

General 95% confidence interval

How accurate is our experimentally estimated find rate? Usually, the underlying true find rate will be speculated to be within a range, such as 6.7% ±5.1% (or 1.6%-11.8%), with an associated confidence (95% certainty). There are several ways to compute the maximum error E.

A simple 95% confidence interval around our determined probability can be estimated by approximating with a normal distribution, using the central limit theorem. For n trials, the margin of error E around our experimental mean is ±0.98 / sqrt(n) with 95% confidence.

E = 0.98 / sqrt(n)

Loosely speaking, we can be "95% certain" that the true probability is somewhere within ±E of our experimental probability. For example, 371 searches yield a maximum error of E = 0.98 / sqrt(371) ≈ 0.051 = 5.1%, so we are 95% sure that the actual find rate is within 5.1% of 6.7%. (For more information, see Margin of error and Checking if a coin is fair#Estimator of true probability on Wikipedia.)

Narrower 95% confidence interval

E = 1.9599 × sqrt(p × (1 − p) / n)

The previous formula uses the worst possible error bar, by replacing sqrt(p × (1 − p)) with 0.5, where p, as before, is the probability of success. (This assumes that p = 0.50, which gives the greatest margin of error.) This is great when testing a fair coin, but in our case the find rates are much lower, so generic error bars are too large. As long as the number of successes in our experimental data is large enough (most sources require at least 5 or 10 successes), then we can use our experimental p in the above formula for E.

Using the above example where n = 371 and p = 0.067, E ≈ 2.6%. This suggests a driftwood find range of 4.1% to 9.3% (with 95% confidence).

Exact binomial tests

When the number of successes s = n × p in our data is smaller than 5 (or 10 if you're feeling conservative), then our data is too skewed to use a normal distribution approximation. (Of course, there also need to be at least 5 (or 10) failures in our data, but in all of our examples successes are much rarer than failures.) In this case, it makes the most sense to compare to an exact binomial distribution for n trials of success probability p. In fact, it always makes the most sense to analyze our results using an exact binomial test, since that is exactly what is happening.

Our experimental data directly supports the hypothesis that the true find rate p = 6.7%. But how well does our data support the hypothesis that true p = 9.8%? Well, assuming a true find of 9.8%, we usually will have at least 36 successes. The chance of 25 or less successes out of 371 searches is the area under the lower tail of the binomial distribution. We can compute this probability in Excel using the BINOMDIST formula:

A_- = BINOMDIST(25,371,0.098,TRUE) = 2.45%

When the true value of p is 9.8% (or more), our experimental data is extremely aberrant and unlikely, occurring in less than 1 in 40 collections. '

Similarly, for true p = 4.4%, we usually have at most 17 successes. The chance of 25 or more successes is the area under the upper tail:

A₊ = 1 - BINOMDIST(24,371,0.044,TRUE) = 2.45%

With a true p of 4.4% (or less), our experimental data is extremely aberrant and unlikely – occurring in less than 1 in 40 collections.

This method yields a driftwood find range of 4.4% to 9.8%.

Statistical analysis can be somewhat contentious, but these values are in good agreement with the range determined previously.

How many failures to be sure?

After a dozen or a hundred unsuccessful searches, the question arises: Is there absolutely no chance of a successful search in my current terrain?

For example, Water (blue), Swamps (brown), Caves (gray), Tunnels (gray), and Mountain Paths (gray) currently appear to be derelict. Are we reasonably certain of the emptiness of these locations, or should we perform additional (probably fruitless) searches?

The Excel formula BINOMDIST(0,N,p,TRUE) gives the exact probability of failing to find something in n searches, given a find rate of p. When this probability falls below 2.5%, we can be fairly certain that a find rate of p will return at least one success from n trials. (If this probability falls below 1%, we are 99% certain of at least one success in n trials.)

Hence 100 trials is good enough to strongly reject a 5% find rate (99% certainty), because BINOMDIST(0,100,0.05,TRUE) = 0.0059 < 0.01. Similarly, 200 trials will reasonably reject a 2% find rate (95% certainty) and strongly reject a 2.5% find rate (99%).

Since we only have 25 searches logged on a gray Mountain Path, it is reasonably possible to find an unknown item there. With 60 searches in Water, it is unlikely that a non-worthless item will be found, but we still do not have enough data to confidently reject even a 5% find rate. (For example, driftwood could potentially be found in water at a 5% find rate, even after 60 unsuccessful searches.)

I believe that this analysis is duplicated on the Urban Dead wiki, so at some point I'll compare results…

Unsuccessful searches	Unlikely find rate	Extremely unlikely find rate
n	p (95%)	p (99%)
35	10%	12.5%
75	5%	6%
91	4%	5%
122	3%	4%
183	2%	2.5%
367	1%	1.25%
460	0.8%	1%

Thus, after 91 unsuccessful searches, we are 95% sure that the odds of finding anything are less than 4% and 99% sure that they're less than 5%.

Technical notes

The above examples used the following data:

Beaches (May 16, 2006)
Total searches	Nothing	Piece of driftwood	Gold coin	Bottle of beer	Crab	Bite from crab
371	322	25	16	2	4	2

@@ Line 1: / Line 1: @@
+{| cellpadding=5 align=center style="width:100%;"
+| valign=top style="width:40%;" |
+__TOC__
+| valign=top style="width:60%;" |
+{| style="width:100%;margin-bottom:10px;padding:6px;border:1px solid #AAAAAA;background:#eee2e2;-moz-border-radius:1em;"
+| '''Guide to searching'''
+* If you'd like to contribute to the experimental data, see [[search odds condensed]] or [[search odds]]
+* If you'd like to view the summarized results, see [[search odds results]]
+* If you'd like to learn about or discuss results, see [[search odds discussion]]
+* If you'd like to learn about or discuss the underlying assumptions about game mechanics, see [[how searching works]]
+* If you'd like to learn about or discuss statistical techniques being used to analyze data, see [[statistical analysis]]
+|}
+|}
 Here are some introductory discussions to the statistical analysis used to determine find rates.
 ==Approximate find rate==
-Suppose that we have performed N searches resulting in S successes (and F = N-S failures).
+Suppose that we have performed ''n'' searches resulting in ''s'' successes and ''f'' failures (where ''s'' + ''f'' = ''n''). Our best guess for the find percentage ''p'' is given by:
+:''p'' = ''s'' / ''n''
-Our best guess for the find percentage p is given by:
- p = S/N
-Example: 371 searches on the beach yield 25 pieces of driftwood. N=371, S=25, p = '''6.7%'''. The chance of finding driftwood on the beach is around 6.7%. (May 16, 2006)
+As an example, suppose 371 searches on the beach yield 25 pieces of driftwood, so ''n'' = 371, ''s'' = 25 and ''p'' = 25 / 371 &asymp; '''0.067'''. Given these numbers, the chance of finding driftwood on the beach is around 6.7%, or 1 find in 15 [[AP]].
 ==General 95% confidence interval==
-How accurate is our experimentally estimated find rate? Usually, the underlying true find rate will be speculated to be within a range, such as 6.7% +/-5.1% (or 1.6%-11.8%), with an associated confidence (95% certainty). There are several ways to compute the maximum error E.
+How accurate is our experimentally estimated find rate? Usually, the underlying true find rate will be speculated to be within a range, such as 6.7% &plusmn;5.1% (or 1.6%-11.8%), with an associated confidence (95% certainty). There are several ways to compute the maximum error ''E''.
-A simple 95% confidence interval around our determined probability can be estimated by approximating with a normal distribution, using the central limit theorem. For N trials, the margin of error around our experimental mean is +/- 0.98/sqrt(N) with 95% confidence. Loosely speaking, we can be "95% certain" that the true probability is somewhere within +/- 0.98/sqrt(N) of our experimental probability.
- E = 0.98/sqrt(N)
-(See http://en.wikipedia.org/wiki/Margin_of_error and http://en.wikipedia.org/wiki/Checking_if_a_coin_is_fair true probability section.)
-Example: 371 searches yield a maximum error of E = 0.051 = '''+/-5.1%'''.
+A simple 95% confidence interval around our determined probability can be estimated by approximating with a normal distribution, using the central limit theorem. For ''n'' trials, the margin of error ''E'' around our experimental mean is &plusmn;0.98 / sqrt(''n'') with 95% confidence.
+:''E'' = 0.98 / sqrt(''n'')
+Loosely speaking, we can be "95% certain" that the true probability is somewhere within &plusmn;''E'' of our experimental probability. For example, 371 searches yield a maximum error of ''E'' = 0.98 / sqrt(371) &asymp; 0.051 = '''5.1%''', so we are 95% sure that the actual find rate is within 5.1% of 6.7%. (For more information, see [http://en.wikipedia.org/wiki/Margin_of_error Margin of error] and [http://en.wikipedia.org/wiki/Checking_if_a_coin_is_fair#Estimator_of_true_probability Checking if a coin is fair#Estimator of true probability] on Wikipedia.)
 ==Narrower 95% confidence interval==
- E = 1.9599*sqrt(p*(1-p)/N)
-The previous formula uses the worst possible error bar, by replacing sqrt( p*(1-p)), where p is the probability of success, by 1/2. (This assumes that the p = 0.50, to obtain the maximum error bar.) This is great when testing a fair coin, but in our case the find rates are much lower, so generic error bars are too large. As long as the number of successes in our experimental data is large enough (most sources require at least 5 or 10 successes), then we can use our experimental p in the above formula for E.
+:''E'' = 1.9599 &times; sqrt(''p'' &times; (1 &minus; ''p'') / ''n'')
-Example: 371 searches and 25 finds is p = 6.7%, yielding a maximum error of E = 2.6%. This suggests a driftwood find range of '''4.1%-9.3%''' (with 95% confidence).
+The previous formula uses the worst possible error bar, by replacing sqrt(''p'' &times; (1 &minus; ''p'')) with 0.5, where ''p'', as before, is the probability of success. (This assumes that ''p'' = 0.50, which gives the greatest margin of error.) This is great when testing a fair coin, but in our case the find rates are much lower, so generic error bars are too large. As long as the number of successes in our experimental data is large enough (most sources require at least 5 or 10 successes), then we can use our experimental ''p'' in the above formula for ''E''.
+Using the above example where ''n'' = 371 and ''p'' = 0.067, ''E'' &asymp; '''2.6%'''. This suggests a driftwood find range of 4.1% to 9.3% (with 95% confidence).
 ==Exact binomial tests==
-When the number of successes S = n*p in our data is smaller than 5 (or 10 if you're feeling conservative), then our data is too skewed to use a normal distribution approximation. (Of course, there also need to be at least 5 (or 10) failures in our data, but in all of our examples successes are much rarer than failures.) In this case, it makes the most sense to compare to an exact binomial distribution for N trials of success probability p. In fact, it always makes the most sense to analyze our results using an exact binomial test, since that is exactly what is happening.
+When the number of successes ''s'' = ''n'' &times; ''p'' in our data is smaller than 5 (or 10 if you're feeling conservative), then our data is too skewed to use a normal distribution approximation. (Of course, there also need to be at least 5 (or 10) failures in our data, but in all of our examples successes are much rarer than failures.) In this case, it makes the most sense to compare to an exact binomial distribution for ''n'' trials of success probability ''p''. In fact, it always makes the most sense to analyze our results using an exact binomial test, since that is exactly what is happening.
-Our experimental data directly supports the hypothesis that the true find rate p = 6.7%. But how well does our data support the hypothesis that true p = 9.8%? Well, assuming a true find of 9.8%, we usually will have at least 36 successes. The chance of 25 or less successes out of 371 searches is the area under the lower tail of the binomial distribution. We can compute this probability in Excel using the BINOMDIST formula:
+Our experimental data directly supports the hypothesis that the true find rate ''p'' = 6.7%. But how well does our data support the hypothesis that true ''p'' = 9.8%? Well, assuming a true find of 9.8%, we usually will have at least 36 successes. The chance of 25 or less successes out of 371 searches is the area under the lower tail of the binomial distribution. We can compute this probability in Excel using the BINOMDIST formula:
- A<sub>-</sub> = BINOMDIST(25,371,0.098,TRUE) = 2.45%
+:''A''<sub>-</sub> = BINOMDIST(25,371,0.098,TRUE) = 2.45%
-When the true value of p is 9.8% (or more), our experimental data is extremely aberrant and unlikely – occurring in less than 1 in 40 collections.
+When the true value of ''p'' is 9.8% (or more), our experimental data is extremely aberrant and unlikely, occurring in less than 1 in 40 collections. '
-Similarly, for true p = 4.4%, we usually have at most 17 successes. The chance of 25 or more successes is the area under the upper tail:
+Similarly, for true ''p'' = 4.4%, we usually have at most 17 successes. The chance of 25 or more successes is the area under the upper tail:
- A<sub>+</sub> = 1 - BINOMDIST(24,371,0.044,TRUE) = 2.45%
+:''A''<sub>+</sub> = 1 - BINOMDIST(24,371,0.044,TRUE) = 2.45%
-With a true p of 4.4% (or less), our experimental data is extremely aberrant and unlikely – occurring in less than 1 in 40 collections.
+With a true ''p'' of 4.4% (or less), our experimental data is extremely aberrant and unlikely – occurring in less than 1 in 40 collections.
-This method yields a driftwood find range of '''4.4%-9.8%'''.
+This method yields a driftwood find range of 4.4% to 9.8%.
 Statistical analysis can be somewhat contentious, but these values are in good agreement with the range determined previously.
@@ Line 50: / Line 56: @@
 ==How many failures to be sure?==
-The Excel formula BINOMDIST(0,N,p,TRUE) gives the exact probability of failing to find something in N searches, given a find rate of p. When this probability falls below 2.5%, we can be fairly certain that a find rate of p will return at least one success from N trials. (If this probability falls below 1%, we are 99% certain of at least one success in N trials.)
+After a dozen or a hundred unsuccessful searches, the question arises: Is there absolutely no chance of a successful search in my current terrain?
+For example, Water (blue), Swamps (brown), Caves (gray), Tunnels (gray), and Mountain Paths (gray) currently appear to be derelict. Are we reasonably certain of the emptiness of these locations, or should we perform additional (probably fruitless) searches?
+The Excel formula BINOMDIST(0,N,p,TRUE) gives the exact probability of failing to find something in ''n'' searches, given a find rate of ''p''. When this probability falls below 2.5%, we can be fairly certain that a find rate of ''p'' will return at least one success from ''n'' trials. (If this probability falls below 1%, we are 99% certain of at least one success in ''n'' trials.)
 Hence 100 trials is good enough to strongly reject a 5% find rate (99% certainty), because BINOMDIST(0,100,0.05,TRUE) = 0.0059 < 0.01. Similarly, 200 trials will reasonably reject a 2% find rate (95% certainty) and strongly reject a 2.5% find rate (99%).
+Since we only have 25 searches logged on a gray Mountain Path, it is reasonably possible to find an unknown item there. With 60 searches in Water, it is unlikely that a non-worthless item will be found, but we still do not have enough data to confidently reject even a 5% find rate. (For example, driftwood could potentially be found in water at a 5% find rate, even after 60 unsuccessful searches.)
 I believe that this analysis is duplicated on the Urban Dead wiki, so at some point I'll compare results…
-{| border="1" cellpadding="2" style="text-align:center"
+{| border="1" cellpadding="5" cellspacing="1" style="background-color: #f9f9f9; border: solid 1px #ccc; border-collapse: collapse" |
-| Unsuccessful Searches || Unlikely Find Rate || Almost Certainly Not
+|-align="left" bgcolor="#ccc" valign="top"
-|-
+! Unsuccessful searches !! Unlikely find rate !! Extremely unlikely find rate
-| N || p (95%) || p (99%)
+|-bgcolor="#ddd"
+! ''n'' !! ''p'' (95%) !! ''p'' (99%)
 |-
 | 35 || 10% || 12.5%
@@ Line 75: / Line 88: @@
 | 460 || 0.8% || 1%
 |}
+Thus, after 91 unsuccessful searches, we are 95% sure that the odds of finding anything are less than 4% and 99% sure that they're less than 5%.
 ==Technical notes==
-The following data used in above examples:
+The above examples used the following data:
-{| border="1" cellpadding="2" style="text-align:center"
+{| border="1" cellpadding="5" cellspacing="1" style="background-color: #f9f9f9; border: solid 1px #ccc; border-collapse: collapse" |
-| colspan="7"| Beaches (May 20, 2006)
+|-bgcolor="#ccc" valign="top"
-|-
+!colspan="7"| Beaches (May 16, 2006)
-| Total Searches || Nothing || Piece of Driftwood || Gold Coin || Bottle of Beer || Crab || Bite from Crab
+|-align="left" bgcolor="#ddd" valign="top"
+! Total searches !! Nothing !! Piece of driftwood !! Gold coin !! Bottle of beer !! Crab !! Bite from crab
 |-
 | 371 || 322 || 25 || 16 || 2 || 4 || 2
 |}
+[[Category:Statistics]]

Difference between revisions of "Statistical analysis"

Latest revision as of 06:11, 17 July 2006

Contents

Approximate find rate

General 95% confidence interval

Narrower 95% confidence interval

Exact binomial tests

How many failures to be sure?

Technical notes

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools