Probability
Contents
Probability#
What does conditional probability mean?
%load_ext autoreload
%autoreload 2
%matplotlib inline
The autoreload extension is already loaded. To reload it, use:
%reload_ext autoreload
Imports#
from fastai import *
from aiking.data.external import *
import pandas as pd
Create Dataset#
push_ds("https://raw.githubusercontent.com/AllenDowney/BiteSizeBayes/master/gss_bayes.csv", dsname='thinkbayes')
list_ds()
(#4) ['oxford-iiit-pet','mktr','california-housing-prices','thinkbayes']
ds = get_ds('thinkbayes'); ds.ls()
(#1) [Path('/gdrive/PPV/S_Personal_Study/aiking/data/thinkbayes/gss_bayes.csv')]
Note
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is more probable?
Linda is a bank teller.
Linda is a bank teller and is active in the feminist movement
df = pd.read_csv(ds/'gss_bayes.csv'); df.head()
caseid | year | age | sex | polviews | partyid | indus10 | |
---|---|---|---|---|---|---|---|
0 | 1 | 1974 | 21.0 | 1 | 4.0 | 2.0 | 4970.0 |
1 | 2 | 1974 | 41.0 | 1 | 5.0 | 0.0 | 9160.0 |
2 | 5 | 1974 | 58.0 | 2 | 6.0 | 1.0 | 2670.0 |
3 | 6 | 1974 | 30.0 | 1 | 5.0 | 4.0 | 6870.0 |
4 | 7 | 1974 | 48.0 | 1 | 5.0 | 4.0 | 7860.0 |
Note
caseid: Respondent id (which is the index of the table).
year: Year when the respondent was surveyed.
age: Respondent’s age when surveyed.
sex: Male or female.
polviews: Political views on a range from liberal to conservative.
partyid: Political party affiliation, Democrat, Independent, or Republican.
indus10: Code for the industry the respondent works in.
Banking and related code : 6870
Probability Function#
df.nunique()
caseid 4376
year 29
age 72
sex 2
polviews 7
partyid 8
indus10 270
dtype: int64
banker = (df['indus10'] == 6870); banker
0 False
1 False
2 False
3 True
4 False
...
49285 False
49286 False
49287 False
49288 False
49289 False
Name: indus10, Length: 49290, dtype: bool
banker.sum()
728
np.round(banker.mean()*100, 1) # Percentage of bankers in population -> Probability of a random person choosen in dataset to be a banker
1.5
def prob(A):
"""Computes probability of a proposition"""
return A.mean()
prob(banker)
0.014769730168391155
Note
1 Male 2 Female
female = (df['sex'] == 2)
prob(female)
0.5378575776019476
Note
Extremely liberal
Liberal
Slightly liberal
Moderate
Slightly conservative
Conservative
Extremely conservative
liberal = (df['polviews'] <=3)
prob(liberal)
0.27374721038750255
Strong democrat
Not strong democrat
Independent, near democrat
Independent
Independent, near republican
Not strong republican
Strong republican
Other party
democrate = (df['partyid'] <= 1); prob(democrate)
0.3662609048488537
Conjunction#
prob (A and B) - Both A & B are true
prob(banker & democrate)
0.004686548995739501
Conditional Probability#
Probability given a condition
E.g. -> Of all respondents who are liberal what fraction are Democrats
prob(democrate[liberal])
0.5206403320240125
len(democrate[liberal])
13493
len(democrate)
49290
len(female)
49290
len(female[banker])
728
prob(female[banker])
0.7706043956043956
def conditional(proposition, given):
""" Probability of A conditioned on given"""
return prob(proposition[given])
conditional(liberal, given=female)
0.27581004111500884
conditional(female, given=liberal)
0.5419106203216483
prob(liberal & female)
0.14834652059241227
prob(female & liberal)
0.14834652059241227
conditional(female, given=banker), conditional(banker, given=female)
(0.7706043956043956, 0.02116102749801969)
dummy = pd.DataFrame({
'A':[1,0,0],
'B':[1,1,0],
'C':[1,1,1]
}) >0; dummy
A | B | C | |
---|---|---|---|
0 | True | True | True |
1 | False | True | True |
2 | False | False | True |
conditional(dummy.A, dummy.B)
0.5
conditional(dummy.B, dummy.A)
1.0
conditional(dummy.B, dummy.C)
0.6666666666666666
conditional(dummy.C, dummy.B)
1.0
Condition and Conjugation#
Probability respondent is a female given she is as liberal democrat
conditional(female, given=liberal & democrate)
0.576085409252669
Probability that they are a liberal female, given that they are a banker
conditional(liberal & female, given=banker)
0.17307692307692307
Laws of probability#
In the next few sections, we’ll derive three relationships between conjunction and conditional probability:
Theorem 1: Using a conjunction to compute a conditional probability.
Theorem 2: Using a conditional probability to compute a conjunction.
Theorem 3: Using conditional(A, B) to compute conditional(B, A).
Theorem 3 is also known as Bayes’s Theorem.
I’ll write these theorems using mathematical notation for probability:
P(A) is the probability of proposition A.
P(A and B) is the probability of the conjunction of A and B, that is, the probability that both are true.
P(A|B) is the conditional probability of A given that B is true. The vertical line between A and B is pronounced “given”.
Tip
What fraction of bankers are female?
female[banker].mean()
0.7706043956043956
conditional(female, given=banker)
0.7706043956043956
Another way to compute
Fraction of respondents who are female bankers
Fraction of respondents who are bankers
prob(female & banker)/prob(banker)
0.7706043956043956