# Understanding Effect Size from Differences in Mean#

## Synopsis#

• Working Definition : Working from a sample of a population and trying to estimate something about the world

• We are trying to generalize from what we measure about the world from a sample.

• Statistical Inference

• Estimate Effect Size

• Quantify precision of the result[ Confidence Interval / standard error]

• Doing Hypothesis testing: [What is chance that the difference is zero? Or in another words, it happened by chance?]

## Imports#

import numpy as np
import scipy as sp
import scipy.stats as stats
import matplotlib as plt
import matplotlib.pyplot as plt
from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets
from
# seed the random number generator so we all get the same results
np.random.seed(17)

# some nice colors from http://colorbrewer2.org/
COLOR1 = '#7fc97f'
COLOR2 = '#beaed4'
COLOR3 = '#fdc086'
COLOR4 = '#ffff99'
COLOR5 = '#386cb0'

%matplotlib inline


## Part 1#

### Population distribution#

mu1, sig1 = 178, 7.7
male_height = stats.norm(mu1, sig1); male_height

<scipy.stats._distn_infrastructure.rv_frozen at 0x7f18e0daac70>

mu2, sig2 = 163, 7.3
female_height = stats.norm(mu2, sig2); female_height

<scipy.stats._distn_infrastructure.rv_frozen at 0x7f1819eb5730>


### Function to evaluate normal(Gaussian) PDF#

def eval_pdf(rv, n=4):
mean, std = rv.mean(), rv.std()
xs = np.linspace(mean - n*std, mean + n*std, 100)
ys = rv.pdf(xs)
return xs, ys

xs_male, ys_male = eval_pdf(male_height)
xs_female, ys_female = eval_pdf(female_height)

fig= plt.figure()
ax = fig.gca()
ax.plot(xs_male, ys_male, label='male', linewidth=4, color=COLOR2)
ax.plot(xs_female, ys_female, label='female', linewidth=4, color=COLOR3)
plt.xlabel('height(cm)')
plt.legend(loc='upper left')
plt.show() ### Random Sampling from distribution/population#

male_sample = male_height.rvs(1000); male_sample.shape

(1000,)

female_sample = female_height.rvs(1000); female_sample.shape

(1000,)


Mean of sample will be close to population but not exactly same

mean1, std1 = male_sample.mean(), male_sample.std()
mean1, std1

(178.16511665818112, 7.84199617128995)

mean2, std2 = female_sample.mean(), female_sample.std()
mean2, std2

(163.48610226651135, 7.382384919896662)

difference_in_mean = mean1-mean2
difference_in_mean

14.679014391669767

relative_difference_by_sample = (2*difference_in_mean/(mean1+mean2))*100
relative_difference_by_sample

8.59298230392402

relative_difference_by_distribution = (2*difference_in_mean/(mu1+mu2))*100
relative_difference_by_distribution

8.609392605084908

relative_difference_by_male = difference_in_mean/mu1*100
relative_difference_by_male

8.24663729869088

relative_difference_by_female = difference_in_mean/mu2*100
relative_difference_by_female

9.005530301637894


Which one is correct?

• Here things are symmetric so you can express in either way

• There may be situation where you have a treatment group and control group. There you might want to do

$$Diff_{treatment}/ mean_{control}$$

## Part 2#

### Understanding the overlap, threshold, misclassification rate#

Alternative way to quantify difference is to see how much they overlap

#### Simple threshold#

simple_thresh = (mean1 + mean2)/2; simple_thresh

170.82560946234622


#### Threshold(more appropriate)#

thresh = (mean1*std2+mean2*std1)/(std1+std2); thresh

170.6040359174722


#### Overlap#

How many male below threshold ?

male_below_thresh = sum(male_sample<thresh); male_below_thresh

164


How many female above threshold ?

female_above_thresh = sum(female_sample>thresh); female_above_thresh

174


The “overlap” is the area under the curves that ends up on the wrong side of the threshold

male_overlap = male_below_thresh / len(male_sample)
female_overlap = female_above_thresh / len(female_sample)
male_overlap, female_overlap

# overlap = (male_overlap + female_overlap)

(0.164, 0.174)


#### Misclassification Rate#

misclassification_rate = (male_overlap+female_overlap)/2; misclassification_rate

0.16899999999999998


#### Probability of superiority#

Another way to quantify the difference between distributions is what’s called “probability of superiority”, which is a problematic term.

• In this context it’s the probability that a randomly-chosen man is taller than a randomly-chosen woman.

sum(male_sample > female_sample), (male_sample> female_sample).mean()

(918, 0.918)

##### Distribution of Superiority#
male_sample[:5]

array([178.44105417, 170.49028233, 170.94885314, 188.07981499,
171.51692076])

np.random.shuffle(male_sample)
male_sample[:5]

array([180.02059072, 185.18602683, 177.45366243, 176.95014285,
186.13226315])

s = []
for i in range(100):
np.random.shuffle(male_sample)
np.random.shuffle(female_sample)
s.append((male_sample> female_sample).mean())

prob_sup_sample = np.array(s)

plt.hist(prob_sup_sample)
plt.show() @interact
def plot_prob_sup(n_trials=(100, 1000), n_bins=(20, 40)):
s = []
for i in range(n_trials):
np.random.shuffle(male_sample)
np.random.shuffle(female_sample)
s.append((male_sample> female_sample).mean())
prob_sup_sample = np.array(s)
mean = np.mean(prob_sup_sample)
std = np.std(prob_sup_sample)
plt.title(f"$\mu$ {mean:0.01f} \n $\sigma$ {std:0.03f} \n n {n_trials}")

plt.hist(prob_sup_sample, bins=n_bins, color=COLOR1)
plt.axvline(x=mean, color=COLOR3)
plt.axvline(x=mean-2*std, color=COLOR2)
plt.axvline(x=mean+2*std, color=COLOR2)
plt.xlim(.890, .950)
plt.xlabel("N[male_height > female_height]")
plt.show()


Note

Overlap(or misclassification rate) & “probability of superiority” have 2 good properties:

• As probabilities, they don’t depend on UOM- Comparable between studies

• Expressed in Operational terms ( sense of what practical effect the difference makes)

#### Cohens’ Effect Size#

Difference in mean , standardized by dividing by std. dev

$$d = \frac{\bar{x_1} - \bar{x_2}}{s}$$

Pooled standard deviation

$$s = \sqrt{\frac{n_1{s_1}^2+n_2{s_2}^2}{n_1+n_2}}$$

##### Implementation cohen’s effectsize#
def cohen_effectsize(grp1, grp2):
diff = grp1.mean() - grp2.mean()

n1, n2 = len(grp1), len(grp2)
var1 = grp1.var()
var2 = grp2.var()

pooled_var = (n1*var1+n2*var2)/(n1+n2)

d = diff/np.sqrt(pooled_var)
return d

cohen_effectsize(male_sample, female_sample)

1.9274780043619493

##### Implementation overlap_superiority#
def overlap_superiority(control, treatment, n=1000):

control_sample = control.rvs(n)
treatment_sample = treatment.rvs(n)
#     mean1, std1 = control_sample.mean(), control_sample.std()
#     mean2, std2 = treatment_sample.mean(), control_sample.std()
# #     treatment_sample.mean()
#     thresh = (mean1*std2+mean2*std1)/(std1+std2)
thresh = (control.mean()+treatment.mean())/2

control_above = sum(control_sample > thresh)
treatment_below = sum(treatment_sample < thresh)
overlap = (control_above + treatment_below)/n
superiority =  (treatment_sample> control_sample).mean()
return overlap , superiority


overlap_superiority(male_height, female_height)

(1.672, 0.083)

def plot_pdfs(cohen_d=2):
control = stats.norm(0,1)
treatment = stats.norm(cohen_d,1)
xs, ys = eval_pdf(control)
plt.fill_between(xs, ys, label='control',
color=COLOR3, alpha=0.7)
xs, ys = eval_pdf(treatment)
plt.fill_between(xs, ys, label='treatment',
color=COLOR2, alpha=0.7)
o, s = overlap_superiority(control, treatment)
plt.text(0, 0.05, 'overlap ' + str(o))
plt.text(0, 0.15, 'superiority ' + str(s))
plt.show()


plot_pdfs(2) slider = widgets.FloatSlider(min=0, max=4.1, value=2)
interactive(plot_pdfs, cohen_d=slider)


Note

• Cohen effect size non dimensional

• Can be used to calculate overlap and probability of superiority