A rare disease post
This post looks at issues associated with sampling when an
event or occurrence is rare.
Question: A team of statisticians is collecting data
on a wide range of medical conditions.
The likelihood a person has a disease is 1/50,000 or 0.00002.
The team questions 50,000 people. What is the likelihood that no person in the
sample has the disease in question?
What is the likelihood that the point estimate of the incidence of this
disease in this population obtained from this sample is greater than or equal
to 0.00004.
What are the answers to these questions if the team
questions 100,000 people?
Methodology:
The number of people with the diseases is binomially
distributed with the probability of having the disease equal to 0.00002 and the
probability of not having the disease equal to 0.00008.
The easiest way to calculate binomial distribution
probabilities is to use the BINOM.DIST function in Excel.
The BINOM.DIST function has four arguments  the number of successes (in this case the
number of people with the disease), the number of trials, the probability of a
success on each trial and a logical variable set to 0 for the probability mass
function (the probability X=k) or set to 1 for the cumulative density function
(X<=k).
Answers: The chart below contains information on probability
X=k for a sample of 50,000.
Number of people with a
disease

Trials

Probability of having the
disease

BINOM
PROBABILITY

0

50000

0.00002

0.367876

1

50000

0.00002

0.367883

2

50000

0.00002

0.183942

3

50000

0.00002

0.061313

4

50000

0.00002

0.015328

5

50000

0.00002

0.003065

6

50000

0.00002

0.000511

7

50000

0.00002

0.000073

8

50000

0.00002

0.000009

9

50000

0.00002

0.000001

10

50000

0.00002

0.000000

11

50000

0.00002

0.000000

12

50000

0.00002

0.000000

The probability of having no one in a sample of 50,000
people that has this disease is 0.368.
When the number of successes is greater or equal to 2 the
estimated incidence of the disease is greater than 0.00004. The likelihood we obtain an estimate of the
disease incidence greater than 0.00004 is 0.264.
The chart below contains information on P(X=K) for a sample
of 100,000 people.
Number of people with a
disease

Trials

Probability of having the
disease

BINOM
PROBABILITY

0

100000

0.00002

0.135333

1

100000

0.00002

0.270671

2

100000

0.00002

0.270673

3

100000

0.00002

0.180449

4

100000

0.00002

0.090224

5

100000

0.00002

0.036089

6

100000

0.00002

0.012029

7

100000

0.00002

0.003437

8

100000

0.00002

0.000859

9

100000

0.00002

0.000191

10

100000

0.00002

0.000038

11

100000

0.00002

0.000007

12

100000

0.00002

0.000001

The probability of having no one in a sample of 100000
people who has this disease is 0.135.
When number of successes (people with the disease) is
greater than or equal to 4 the estimated incidence of the disease is greater
than or equal to 0.00004. This
likelihood is 0.143.
Concluding Thoughts: General public surveys like the MEPS are not
useful when studying issues like costs associated with a relatively rare
disease or the characteristics of people with a particularly rare disease.
^{People seemed to like my
previous post on highcost patients and health plan type.}
^{http://dailymathproblem.blogspot.com/2014/04/highcostpatientsandhealthplantype.html}
No comments:
Post a Comment