Probability#

What does conditional probability mean?

%load_ext autoreload
%autoreload 2
%matplotlib inline
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

Imports#

from fastai import *
from aiking.data.external import *
import pandas as pd

Create Dataset#

push_ds("https://raw.githubusercontent.com/AllenDowney/BiteSizeBayes/master/gss_bayes.csv", dsname='thinkbayes')
100.13% [1548288/1546290 00:00<00:00]
list_ds()
(#4) ['oxford-iiit-pet','mktr','california-housing-prices','thinkbayes']
ds = get_ds('thinkbayes'); ds.ls()
(#1) [Path('/gdrive/PPV/S_Personal_Study/aiking/data/thinkbayes/gss_bayes.csv')]

Note

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is more probable?

Linda is a bank teller.

Linda is a bank teller and is active in the feminist movement

df = pd.read_csv(ds/'gss_bayes.csv'); df.head()
caseid year age sex polviews partyid indus10
0 1 1974 21.0 1 4.0 2.0 4970.0
1 2 1974 41.0 1 5.0 0.0 9160.0
2 5 1974 58.0 2 6.0 1.0 2670.0
3 6 1974 30.0 1 5.0 4.0 6870.0
4 7 1974 48.0 1 5.0 4.0 7860.0

Note

caseid: Respondent id (which is the index of the table).

year: Year when the respondent was surveyed.

age: Respondent’s age when surveyed.

sex: Male or female.

polviews: Political views on a range from liberal to conservative.

partyid: Political party affiliation, Democrat, Independent, or Republican.

indus10: Code for the industry the respondent works in.

Banking and related code : 6870

Probability Function#

df.nunique()
caseid      4376
year          29
age           72
sex            2
polviews       7
partyid        8
indus10      270
dtype: int64
banker = (df['indus10'] == 6870); banker
0        False
1        False
2        False
3         True
4        False
         ...  
49285    False
49286    False
49287    False
49288    False
49289    False
Name: indus10, Length: 49290, dtype: bool
banker.sum()
728
np.round(banker.mean()*100, 1) # Percentage of bankers in population -> Probability of a random person choosen in dataset to be a banker
1.5
def prob(A):
    """Computes probability of a proposition"""
    return A.mean()
prob(banker)
0.014769730168391155

Note

1 Male 2 Female

female = (df['sex'] == 2)
prob(female)
0.5378575776019476

Note

Extremely liberal
Liberal
Slightly liberal
Moderate
Slightly conservative
Conservative
Extremely conservative
liberal = (df['polviews'] <=3)
prob(liberal)
0.27374721038750255
  1. Strong democrat

  2. Not strong democrat

  3. Independent, near democrat

  4. Independent

  5. Independent, near republican

  6. Not strong republican

  7. Strong republican

  8. Other party

democrate = (df['partyid'] <= 1); prob(democrate)
0.3662609048488537

Conjunction#

prob (A and B) - Both A & B are true

prob(banker & democrate)
0.004686548995739501

Conditional Probability#

Probability given a condition

E.g. -> Of all respondents who are liberal what fraction are Democrats

prob(democrate[liberal])
0.5206403320240125
len(democrate[liberal])
13493
len(democrate)
49290
len(female)
49290
len(female[banker])
728
prob(female[banker])
0.7706043956043956
def conditional(proposition, given):
    """ Probability of A conditioned on given"""
    return prob(proposition[given])
conditional(liberal, given=female)
0.27581004111500884
conditional(female, given=liberal)
0.5419106203216483
prob(liberal & female)
0.14834652059241227
prob(female & liberal)
0.14834652059241227
conditional(female, given=banker), conditional(banker, given=female)
(0.7706043956043956, 0.02116102749801969)
dummy = pd.DataFrame({
    'A':[1,0,0],
    'B':[1,1,0],
    'C':[1,1,1]
}) >0; dummy
A B C
0 True True True
1 False True True
2 False False True
conditional(dummy.A, dummy.B)
0.5
conditional(dummy.B, dummy.A)
1.0
conditional(dummy.B, dummy.C)
0.6666666666666666
conditional(dummy.C, dummy.B)
1.0

Condition and Conjugation#

Probability respondent is a female given she is as liberal democrat

conditional(female, given=liberal & democrate)
0.576085409252669

Probability that they are a liberal female, given that they are a banker

conditional(liberal & female, given=banker)
0.17307692307692307

Laws of probability#

In the next few sections, we’ll derive three relationships between conjunction and conditional probability:

Theorem 1: Using a conjunction to compute a conditional probability.

Theorem 2: Using a conditional probability to compute a conjunction.

Theorem 3: Using conditional(A, B) to compute conditional(B, A).

Theorem 3 is also known as Bayes’s Theorem.

I’ll write these theorems using mathematical notation for probability:

P(A) is the probability of proposition A.

P(A and B) is the probability of the conjunction of A and B, that is, the probability that both are true.

P(A|B) is the conditional probability of A given that B is true. The vertical line between A and B is pronounced “given”.

Tip

What fraction of bankers are female?

female[banker].mean()
0.7706043956043956
conditional(female, given=banker)
0.7706043956043956

Another way to compute

  • Fraction of respondents who are female bankers

  • Fraction of respondents who are bankers

prob(female & banker)/prob(banker)
0.7706043956043956
\[ P(A|B) = \frac{P(A and B)}{P(B)} \]