I know what you are thinking…what are all of these crazy images and video with this extremely exciting biostatistics and epidemiology material? Isn’t it interesting enough without adding all of this extra stuff?? Well, I have been hanging out at PICMONIC recently, and I got to thinking…what if I added my own funny pics to make the material more memorable!
Here is my experiment…quiz tomorrow! Gulp!
Categorical variables (Qualitative)
Nominal : blood type (A, B, AB, O).
Ordinal : GCS (3 15).
Numerical (Metric, Quantitative)
Discrete (Counts) : frequency of taking medicine.
Continuous (Measures) : weight, height.
Characteristics of NOMINAL variables:
Data do not have any unit of measurement.
The ordering of the categories is completely arbitrary.
Characteristics of ORDINAL variables:
The data do not have any units of measurement .
The ordering of the categories is not arbitrary.
the difference between any pair of adjacent scores is not necessarily the same as
the difference between any other pair of adjacent scores.
Characteristics of CONTINUOUS METRIC variables:
Can be properly measured and have units of measurement.
Produce data that are real numbers (located on the number line)
Can apply mathematical operations to them.
Interval property : Birth wt : 4000 – 4001 gram interval = 4001 – 4002 gram
interval
Ratio property: Blood cholesterol; 8.4 μg/ml is exactly 2x a bloos cholesterol of 4.2
μg/ml
Characteristics of DISCRETE METRIC variables:
Can be properly counted and have units of measurements
Produce data which are real numbers located on the number line.
Relative frequency
Cross-tabulation
To examine the association between two variables, within a single group of
individuals
Example : A cross-tabulation of the variables ‘Mother smoked during
pregnancy? (Y/N)’ and ‘Apgar score
Pic chart disadvantages (for NOMINAL and ORDINAL DATA)
A disadvantage of a pie chart is that can only represent one variable.
can lose clarity if it is used to represent more than four or five categories.
– Each segment (slice) of a pie chart should be proportional to the frequency of the
category it represents.
Simple bar chart
Frequency on vertical axis, category on the horizontal axis
appropriate if only one variable is to be shown
Example: a simple bar chart of hair colour for the group of children receiving
Malathion in the nit lotion study.
Equal width, and spaces between bars (emphasize the categorical nature of the data).
The clustered bar chart
To present more than one group.
This arrangement is helpful if you want to compare the relative sizes of the
groups within each category
Example : Breast cancer by age and race
The stacked bar chart
Bars are stacked on top of each other.
Appropriate if you want to compare the total number of subjects in each group
(total number of boys and girls for example), but not so good if you want to
compare category sizes between groups,
Charting discrete metric data
We can use bar charts to graph discrete metric data in the same way as with
ordinal data
example : numbers of measles cases (discrete metric in 37 schools in Kentucky
in a school year (Prevots et al. 1997).
Charting continuous metric data
The histogram
Impractical to plot continuous metric variable without first grouping the values.
Frequency histogram : frequency plotted on the vertical axis and group size on the
horizontal axis.
No gaps between adjacent bars the continuous nature of the underlying
variable
Limitation of the histogram : represent only one variable at the time
Advantage of charting cumulative data with a STEP-CHART
– can show cumulative frequency
– uses ordinal/discrete data
The cumulative frequency chart (Ogive)
– continuous data / cumulative
– e.g., birth weight and % frequency
Choosing the right chart
Data type Pie Chart Bar chart Histogram (if grouped) Step chart Ogive
Nominal yes yes no no no
Ordinal no yes no yes (cum.) no
Metric disc. no yes yes yes (cum.) yes (cum.)
Metric cont. no no yes no yes (cum.)
– Values fairly evenly spread throughout their possible range a uniform
distribution.
– Most of the values concentrated towards the bottom of the range, with
progressively fewer values towards the top of the range This is a right or
positively skewed distribution.
Mean > median > mode
– towards the top of the range, with progressively fewer values towards the bottom of
the range a left or negatively skewed distribution.
mean
– most of the values clump together around one particular value, with
progressively fewer values both below and above this value a symmetric or
moundshaped distribution.
– Do most of the values clump around two or more particular values This is a
bimodal or multimodal distribution.
Nominal scales do not have numerical values
Nominal data
can be measured using: proportions/percentages/ratios/ rates
Proportions and percentages
• A proportion : the number, a, of observations with a given characteristic divided
by the total number of observations in a given group.
•Always defined as a part divided by the whole
e.g./ BUN/CREAT ratio
• Useful for ordinal and numerical data as well as nominal data
•The proportion of physicians trained in domestic violence who screened patients is
•E.g., The proportion without training who subsequently screened patients is 155/266
Ratios
Ratio : the number of observations in a group with a given characteristic
divided by the number of observations without the given characteristic
Ex. Among the physicians who trained, the ratio of those who screened patients to
those who did not is 175/27
LDL/HDL ratio
A ratio is a relation between two amounts.
Example: the ratio of men in my house to women is 1 to 3.
Proportion is what part something is of a larger number.
Example: the proportion of people in my house that are men are 1 in 4.
Rates
are similar to proportions except that a multiplier (eg, 1000, 10,000, or
100,000) is used, and they are computed over a specified period of time.
For example, if a study lasted exactly 1 year a and the proportion of patients with
a given condition was 0.002, the rate per 10,000 patients would be (0.002) ×
(10,000), or 20 per 10,000 patients per year.
RATES ARE SIMILAR TO PROPORTIONS EXCEPT THAT A MULTIPLIER IS USED!!!
Prevalence and the incidence rate
Point prevalence vs. Period prevalence
The prevalence figure will include existing cases, as well as those first getting
the disease
The incidence or inception rate of a disease : the number of new cases
occurring per 1000, or per 10 000, of the population, during some period,
usually 12 months.
Collecting the data – type of sample
The simple random sample
Systematic sample
fixed fraction of the sampling frame is selected, say every 10th or every 50th member,
until a sample of the required size is obtained.
Stratified sample
The sampling frame is broken down into strata relevant to the study
Then each separate stratum is sampled using a systematic sampling approach
Finally these strata samples are combined.
Contact or consecutive samples
Taking a sample from individuals in current or recent contact with the clinical
setting, such as consecutive attendees at a clinic
Study a group of subjects in situ ( in a ward, or in school)
Types of studies
Observational versus experimental studies.
observational study : researchers actively observe the subjects involved (asking
questions/taking measurements/ looking at clinical records), but they don’t
control, change or effect in any way, their selection, treatment or care.
An experimental study : involves active intervention with the subjects.
– Prospective versus retrospective studies.
– Longitudinal versus cross-sectional studies.
Observational studies
Types of observational study:
Case-series.
Cross-section studies.
Cohort studies.
Case-control studies.
Case series
A health carer may see a series of patients (cases) with similar but
unusual symptoms or outcomes, find something interesting and write
it up as a study
Ex : Pentamidine and Pneumocystis carinii pneumonia in gays,
Kaposi sarcoma in gays, Cruetzfield Jacobs’ Disease (CJD) and
the mad cow disease
Cross-section studies
Taking a ‘snapshot’ of some situation at some particular point in time, but
notably data on one or more variables from each subject in the study is
collected only once
Example : A total of 2 542 subjects aged 20–70 years from a rural area of
Anqing, China, participated in a cross-sectional survey, and 1 610
provided blood samples. Mean BMI (kg/m2) was 20.7 for men and 20.9 for
women. . .
Characteristics:
Take only one measurement from each subject at one moment in,
or during one period of, time. Data from one or more than one variable may be collected.
Can be used to investigate a link between two or more variables, but not the
direction of any causal relationship.
Are not particularly helpful if the condition being investigated is rare
That aim to uncover attitudes, opinions or behaviours, are often referred to as
surveys.
Cohort studies
Goal : to identify risk factors causing a particular outcome, for example death, or lung
cancer, or stroke, or low-birthweight babies.
Also known as : follow-up, prospective, or longitudinal study
The principle :
Random selection of sample
The group is followed forward over a period of time and monitored on their exposure to
suspected risk factors or different clinical interventions.
At the end of the study, a comparison is made between groups with and without the outcome of
interest in terms of their exposure over the course of the study to a suspected risk factor.
A reasoned conclusion is drawn about the relationship between the outcome of interest and the
suspected risk factor or intervention.
Example: The connection between mortality and cigarette smoking (Doll and Hill)
Case-control studies
Goal : Can the outcome of interest be related to the candidate risk factor
Also known as a retrospective study or longitudinal
The principle
Two groups of subjects are selected on the basis of whether they have or do not have some
condition of interest (for example, sudden infant death, or stroke, or depression, etc.).
One group, the cases, will have the condition of interest.
The other group, the controls, will not have the condition, but will be as
similar to the cases as possible in all other ways.
Individuals in both groups are then questioned about past exposure to possible risk
factors.
A reasoned conclusion is then drawn about the relationship between the condition in
question and exposure to the suspected risk factor.
It was the outcome from such a case-control study by Doll and Hill that led them to
conduct the later cohort study.
Confounding variable
Arises when an association between an exposure and an outcome is being
investigated, but the exposure and outcome are both strongly associated with a
third variable.
Example : mothers who smoke more have fewer Down syndrome babies than
mothers who smoke less (or don’t smoke at all). Confounder: Maternal Age.
Characteristics of confounder:
a variable must be associated with both the risk factor (smoking) and the outcome of
interest (Down syndrome).
Age and sex sex are common confounders.
Matched case-control
Is the way to make case and controls more similar
We always apply matching to case-control studies
Case control studies are divided (based on matching method) into
Matched designs
Each control must be individually matched (or paired) with a case
Unmatched designs (Frequency matching)
Cases and controls are independently selected, or are only broadly
matched (the same broad mix of ages, same proportions of males and
females) frequency matching
Comparing cohort and case-control designs
Advantages of case-control studies
The availability of potential cases is much greater and sample size can be smaller,
cases will often be contact samples(selected from attending particular clinics)
Much cheaper and easier to conduct.
Give results more quickly.
Limitations of case-control studies
Problems with the selection of suitable controls.
Problems with the selection of cases.
The problem of recall bias
Case-control studies often provide results which
seem to conflict with findings of other
apparently similar case-control studies.
For reliable conclusions, cohort studies are generally preferred
(but are not always a practical alternative)
Clinical Trials
Clinical trials are experiments to compare two or more clinical treatments.
Ideal clinical trial: randomized, double-blinded
Traditionally 4 phases of research
I=Establish safety, dose-finding, PK studies
II=Establish biological activity or potential efficacy
III=Randomized comparison of treatment
IV=Long-term surveillance in broader population
Example : A new drug, Arabarb, has been developed for treating hypertension.
You want to investigate its efficacy compared to the existing drug of choice.
Decide on an outcome measure – diastolic blood pressure.
Select a sample of individuals with hypertension. Divide into two groups
Ensure that the two groups are as similar as possible with regards to variables as sex
age, emotional state of mind, lifestyles, genetic differences.
Give one group the new drug (treatment group).
Give the other group the existing drug (standard of care) or placebo.
Randomization
Randomised Controlled Trial (RCT) : randomization is successful, and the original
sample is large enough, then the two groups should be more or less identical,
differing only by chance.
Methods of randomization
Coin tossing (ex. Heads Treatment Group T, Tails Control Group)
Table of random numbers (ex. Odd # T, Even # C)
Example: Allocating 12 subjects (6 subjects to T and 6 subjects to C)
Thenumbers: 2 3 1 5 7 0 5 5 4 5 1 4
The allocations: T C C C C T C C T C C T
Block randomisation
Block1 : CCTT
Block2 : CTCT
Block3 : CTTC
Block4 : TCTC
Block5 : TCCT
Block6 : TTCC
Blinding
Blinding the patient to eliminate response or placebo bias
Blinding the investigator to eliminate treatment bias and researcher
expectancy.
Entrusting a disinterested third party to obtain the random numbers and decide
on the allocation rules (code of trial)
Assessment bias can be overcome by blinding the investigator.
double-blind randomized controlled trial – the gold standard
The cross-over randomised controlled trial
Wash-out time to eliminate remaining effect of treatment
Bias
any force that causes the answer obtained in a study to deviate from the “true
answer” or the answer that would have been obtained if there was no bias.
Types of bias
Selection bias (sampling bias):
the sample selected is not representative of the population.
Examples:
Predicting rates of heart disease by gathering subjects from a local health club
Using only hospital records to estimate population prevalence (Berkson’s bias)
People included in a study are different than those who are not (nonrespondent
bias)
Measurement bias:
information is gathered in a manner that distorts the information.
Examples:
Measuring patients’ satisfaction with their respective physicians by
using leading questions, e.g.,”You don’t like your doctor, do you?”
Subjects’behavior is altered because they are being studied (Hawthorne
effect).
Only a factor when there is no control group in a prospective study
Experimenter expectancy (Pygmalion effect):
experimenter’s expectations inadvertently communicated to subjects,who then
produce the desired effects.
Can be avoided by double-blind design, where neither the subject nor the
investigators who have contact with them know which group receives the
intervention under study and which group is the control
Lead-time bias:
gives a false estimate of survival rates.
Example:Patients seem to live longer with the disease after it is uncovered by a
screening test. Actually, there is no increased survival, but because the disease is
discovered sooner, patients who are diagnosed seem to live longer.
Recall bias:
subjects fail to accurately recall events in the past.
Example:”How many times last year did you kiss your mother?” Likely problem in
retrospective studies
Late-look bias:
individuals with severe disease are less likely to be uncovered in a survey because
they die first.
Example:a recent survey found that persons with AIDS reported only mild
symptoms.
Confounding bias:
factor being examined is related to other factors of less interest. Unanticipated
factors obscure a relationship or make it seem like there is one when there is not.
More than one explanation can be found for the presented results.
Example: comparing the relationship between exercise and heart disease in two
populations when one population is younger and the other is older. Are differences in
heart disease due to exercise or to age?
INFERENTIAL STATS…having fun yet?
The process of generalizing the sample findings, first to the study population and
ultimately to the target population.
Some basic ideas about probability:
The probability of a particular outcome from an event will lie between zero and one.
The probability of an event that is certain to happen is equal to one. For example, the
probability that everybody dies eventually.
The probability of an event that is impossible is zero. For example, throwing a seven
with a normal dice.
If an event has as much chance of happening as of not happening (like tossing a coin
and getting a head), then it has a probability of 1/2 or 0.5.
If the probability of an event happening is p, then the probability of the event not
happening is 1 – p.
Probability and the Normal distribution
If data is Normally distributed then about 95% of the values will lie no further
than two standard deviations from the mean .
Risk
and RELATIVE risK
The same as a probability but more preferred in medical setting
Definition : the risk of any particular outcome from an event is equal to the number
of favorable outcomes divided by the total number of outcomes
Varies between 0 and 1
Example : Cohort study of weight at one year and its effect on the presence of coronary heart
disease (CHD) in adult life, expressed in the form of a contingency table
What is the risk that those adults who as infants weighed 18 lbs or less at one year will have
CHD? —- 4/15 = 0.1382
Odds
The odds for an event is equal to the number of outcomes favourable to the
event divided by the number of outcomes not favourable to the event.
While probability (or risk) of a particular outcome is the number of outcomes
favourable to the event divided by the total number of outcomes.
Notice that:
The value of the odds for an outcome can vary from zero to infinity.
When the odds for an outcome are < 1 the odds are unfavourable to the outcome;
(less likely to happen than it is to happen).
When the odds = 1 the outcome is as likely to happen as not.
When the odds are > 1 the odds are favourable to the outcome (more likely to happen
than not)
Among those patients who’d had a stroke, 55 had exercised and 70 had not, so the odds that
those with a stroke had exercised is 55/70 = 0.7857.
Why you can’t calculate risk in a
case-control study
– In a case-control study you don’t select on the basis of whether people have
been exposed to the risk or not, but on the basis of whether they have some
condition or not
– You can select whatever number of cases and controls you want (i.e the column
totals, which you would otherwise need for your risk calculation, are
meaningless).
The link between probability and odds
The link means that it is possible to derive one from another:
Risk or probability = odds/(1 + odds)
Odds = probability/(1 – probability)
Risk Ratio
– Risk ratio : risk for one group (usually the group exposed to the risk factor)
divided by the risk for the second, non-exposed, group
– The risk of disease among those exposed to the risk factor = a/(a + c).
– The risk of disease among those not exposed = b/(b + d).
The odds ratio
The odds that those with a disease will have been exposed to the risk factor
divided by the odds that those who don’t have the disease will have been
exposed.
Number needed to treat (NNT)
NNT is the number of patients who would need to be treated with the active
procedure, rather than a placebo (or alternative procedure), in order to reduce
by one the number of patients experiencing the condition
The absolute risk of CHD among those weighing 18 lbs or less =4/15 = 0.26
The absolute risk of CHD for those weighing more than 18 lbs was 38/275 =0.138
ARR (Absolute risk reduction) = AR1 –AR2
NNT = 1/ARR = 7.78
The standard error of the mean
Ex: sample of 30 infants produced a sample mean birth weight of 3644.4 g
Characteristics of sample means
These means were Normal in distribution.
The sample means were centered around the true population mean.
other words, the mean of all possible sample means is the same as the population
mean. ( on average, the sample mean estimates the population mean exactly).
s.e.( ¯x)=s/ n.
S.e. : Standard error
X : sample mean
S : standard deviation
n : sample size
For example,
sample of size n = 100 from a population, and measured systolic blood
pressure, and obtained a sample mean of 135 mmHg and a sample standard deviation of
3 mmHg, then what is the estimated standard error
s.e.( ¯X ) = 3/C100 = 3/10 = 0.33mmHg
– the standard error is a measure of the preciseness of the sample mean as an estimator of
the population mean.
– If you are comparing the precision of two different sample means as estimates of a
population mean, the sample mean with the smallest standard error is likely to be the
more precise.
Confidence interval for a population
proportion
{[p − 1.96 × s.e.(p)] to [p + 1.96 × s.e.(p)]}
{[p − 2.58 × s.e.(p)] to [p + 2.58 × s.e.(p)]}
Incidence versus Prevalence
Attack Rates and Case-Fatality Rates
The tradeoff between sensitivity and specificity
Receiver Operator Curves
Incidence rate
The rate at which new events occur in population.
Numerator : # of new events that occur in a defined period
Denumerator is the population at risk of experiencing this new event
during the same period
10^6 is the standard multiplier for incidence of diseases
10^3 is the standard multiplier for incidence of infant motality
10^2 is the standard multiplier for incidence of marriages
Ex: what is the indcidence of rubella in the united states?
A. 197/1000
B. 286/10000
C. 256/100 000
D. 10/100 000
E. 987/100 000
Incidence of diseases
must be presented as per 100 000
Will be under 50 / 100 000
Prevalence
All the persons who experience an event in a population.
Numerator : all individuals who have an attribute or disease at
particular period in time
Denominator : the population at risk of having the attribute of
disease at this point in time or midway through the period
An important measure for chronic conditions
– Point prevalence : prevalence during a specific point in time
– Period prevalence : prevalence during a specific period or
span of time
– Prevalence = incidence x duration of time
Default multiplier for all diseases is / 100,000!!
MORTALITY RATE:
– Mortality rate is INCIDENCE (acute transition)
only. Because not all dead people, from life to death only
MORBIDITY – prevalence/incidence
1. “The goal of medicine is not to wipe out diseases, it is to
INCREASE prevalence of disease (chronic d.)”
2.
The goal of primary prevention is to REDUCE INCIDENCE
The goal of SECONDARY prevention is to REDUCE SEVERITY of disorder
The goal of Tertiary prevention is to IMPROVE THE OUTCOME of EXISTING disorder
(no prevention, no reduction of severity)
Q. What is the simplest measure of variability?
The Range!!!
Q.Clinical trials compare TWO or more Tx!!!
How are Incidence or Prevalence affected?
1. New effective treatment is initiated
INCIDENCE – no change, Prevalence – Dec
2. New effective vaccine gains widespread use
INCIDENCE – Dec, Prevalence – Dec
3. Number of persons dying from the condition increases
INCIDENCE – no change, Prevalence – Dec
4. Contacts between infected persons and non infected persons are reduced for airborne infection
INCIDENCE – Dec, Prevalence – Dec
5. Recovery from the disease is more rapid than it was one year ago
INCIDENCE – no change, Prevalence – Dec
6. Long term survival rates for the disease are increasing
INCIDENCE – no change, Prevalence – Inc
– Hey! Is the MEAN appropriate for Discrete data?
Noooooooooo!
– what is the MEDIAN most appropriate for?
ORDINAL data…
– MODE and MEDIAN have poor sampling stability…
so there…
P-Value
The traditional set of p-value is 0.05
If p < 0.05 reject the null hypothesis (reached statistical significance)
If P > 0.05 do not reject the null hypothesis (has not reached statistical
significance).
TWO parts of a confidence interval
1. Standard error of the mean 2. Standard Z-score
Standard error of the mean
An estimation of QUALITY of the SAMPLE for the estimate. If the mean is 10 and the standard error of the mean is 2, then the true score is likely to fall somewhere between 8 and 12 or 10 +/- 2.
Standard Z-Score
The degree of confidence provided by the interval provided (Zorro was CONFIDENT that he could make that Z- score sign!!)
Power
the capacity to detect a difference if there is one
Increasing sample size (n) leads to
Increase in power
why use p-value?
provides criterion for making decisions about the null hypothesis
limits to the p-value: the p-value does NOT tell us –
The chance that an individual will benefit, the percentage of pts who will benefit, the degree of benefit expected for a given pt
The most common method to increase the power is by
increasing the sample size
Two kinds of prospective studies
Longitudinal, Experimental
Experimental studies (e.g., clinical trials, lab experiment)
Studies the possible “cause and effect” relationship between two variable
repeated cross sectional study
CHANGES OVER TIME: Data from 2+ points in time from different samples
cohort
A population group unified by a specific common characteristic, such as age, and subsequently treated as a statistical unit.
4 types of observational studies
Case-series, Cross-sectional, Cohort, Case-control
case-series study
studying several patients, similar but unusual symptoms, ex – Kaposi sarcoma in homosexual males
cross sectional study characteristics
collecting data only once from each subject, NOT for DIRECTION of any CAUSAL relationship, NOT GOOD for rare things
COHORT studies characteristics, AKA Follow-up, prospective, longitudinal
IDENTIFY RISK factors, FOLLOW over time-SMOKING (DOLL & HILL study on smoking doctors),
CASE-control studies characteristics
RETROSPECTIVE, longitudinal – TWO groups…one with, one without condition – but similar characteristics. Then BOTH groups are QUESTIONED about past exposure to possible RISK FACTORS.
Doll and hill
Both a cohort and case-control study on smoking and lung cancer – first a case-control study, then a cohort study
cohort vs case-control study
Plus: many potential cases, small sample size ok, cheap, quick results, NEG: controls, case selection, Recall bias of subjects, FINDINGS may conflict with other studies, COHORT more reliable, but not always a practical alternative
Cross sectional study limitations
Provide info on PREVALENCE, not INCIDENCE..no distribution data
Observational study PROS..
suitable for common diseases, prolonged study time, larger number of subjects, less selection bias, subjects usually volunteer, incidence is determined
Clinical trials
Experiments to compare two or more clinical treatments
IDEAL is RANDOMIZED, DOUBLE-BLIND
Four phases of clinical trials
1. Establish safety, dose finding, PK studies 2. Establish biological activity or potential efficacy 3. Randomized comparison of treatment 4. Long-term surveillance in broader population
Sensitivity: If a person has a disease, how often will the test be positive (true positive rate)? Put another way, if the test is highly sensitive and the test result is negative you can be nearly certain that they don’t have disease. A Sensitive test helps rule out disease (when the result is negative). Sensitivity rule out or “Snout” |
Sensitivity= true positives/(true positive + false negative) |
Specificity: If a person does not have the disease how often will the test be negative (true negative rate)? In other terms, if the test result for a highly specific test is positive you can be nearly certain that they actually have the disease. A very specific test rules in disease with a high degree of confidence Specificity rule in or “Spin”. |
Specificity=true negatives/(true negative + false positives) |
Predictive value for a positive result (PV+):PV+ asks ” If the test result is positive what is the probability that the patient actually has the disease?” |
PV+= true positive/(true positive + false positive)Predictive value for a negative result (PV-):PV- aks ” If f the test result is negative what is the probability that the patient does not have disease?” PV-= true negatives/(true negatives +false negatives) |
Types of error
Type I error ( error )
Rejecting the null hypothesis when it is really true
Assuming a statistically significant effect on the basis of the sample when
there is none in the population
The chance of Type I error is given by the p-value
Ex: p=0.05,then the chance of a Type I error is 5 in 100, or 1/20
Type II error (ß error)
failing to reject the null hypothesis when it is really false
Declaring no significant effect on the basis of the sample when there really is
one in the population (does not work when it really does)
The chance of a Type II error cannot be directly estimated from the pvalue
(yup..this is just about the
wimpiest Superman I have ever seen!)
Satistical power
The capacity to detect a difference if there is one.
Directly related to type II error ( 1 – ß = Power )
The most common method to increase the power is by
increasing the sample size
Estimating Differences Between 2
Population Parameters
Estimating the difference between the means of two
independent populations – using a method based on the
two-sample t test
prerequisites that need to be met:
Data for both groups must be metric (estimating the mean! ).
The distribution of the relevant variable in each population must be reasonably
Normal
check this assumption from the sample data using a histogram.
The population standard deviations of the two variables concerned should be
approximately the same less important as sample sizes get larger.
Just memorize this!!
Estimating the difference between two matched population
means – using a method based on the matched-pairs t test
data within each of the two groups whose means you are comparing is widely
spread compared to the difference in the spreads between the groups
Therefore; data is matched to reduces much of the within-group variation, and,
makes it easier to detect any differences between groups
Disadvantage
– Difficulty to find a sufficiently large number of matches
Procedure
– For each group : mean is computed separately
Confidence interval for the difference in two means is calculated
Parametric versus non-parametric
methods
A parametric test is a type of test data. A parametric test shows a bell-shape distribution that is normal. It is usually forced through samples. There are two types of test data. The first of them, as I have mentioned is a parametric test and the second is called a non-parametric test. This type of does does the opposite.
A parametric procedure
can be applied to data which is metric, (most commonly the
Normal distribution).
A non-parametric procedure
No requirements for normal distributional requirements.
Data: metric but not normal or ordinal use a non-parametric
approach.
Non Parametric Procedures
the Mann–Whitney rank-sums method
– Estimating the difference between two independent population medians
– The mean may not be the most representative measure of location if the data
is skewed
– If the data is ordinal
Wilcoxon signed-ranks method
Estimating the difference between two matched population medians