# Probability

## Contents

# Probability#

What does conditional probability mean?

```
%load_ext autoreload
%autoreload 2
%matplotlib inline
```

```
The autoreload extension is already loaded. To reload it, use:
%reload_ext autoreload
```

## Imports#

```
from fastai import *
from aiking.data.external import *
import pandas as pd
```

## Create Dataset#

```
push_ds("https://raw.githubusercontent.com/AllenDowney/BiteSizeBayes/master/gss_bayes.csv", dsname='thinkbayes')
```

```
list_ds()
```

```
(#4) ['oxford-iiit-pet','mktr','california-housing-prices','thinkbayes']
```

```
ds = get_ds('thinkbayes'); ds.ls()
```

```
(#1) [Path('/gdrive/PPV/S_Personal_Study/aiking/data/thinkbayes/gss_bayes.csv')]
```

Note

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is more probable?

Linda is a bank teller.

Linda is a bank teller and is active in the feminist movement

```
df = pd.read_csv(ds/'gss_bayes.csv'); df.head()
```

caseid | year | age | sex | polviews | partyid | indus10 | |
---|---|---|---|---|---|---|---|

0 | 1 | 1974 | 21.0 | 1 | 4.0 | 2.0 | 4970.0 |

1 | 2 | 1974 | 41.0 | 1 | 5.0 | 0.0 | 9160.0 |

2 | 5 | 1974 | 58.0 | 2 | 6.0 | 1.0 | 2670.0 |

3 | 6 | 1974 | 30.0 | 1 | 5.0 | 4.0 | 6870.0 |

4 | 7 | 1974 | 48.0 | 1 | 5.0 | 4.0 | 7860.0 |

Note

caseid: Respondent id (which is the index of the table).

year: Year when the respondent was surveyed.

age: Respondent’s age when surveyed.

sex: Male or female.

polviews: Political views on a range from liberal to conservative.

partyid: Political party affiliation, Democrat, Independent, or Republican.

indus10: Code for the industry the respondent works in.

Banking and related code : 6870

## Probability Function#

```
df.nunique()
```

```
caseid 4376
year 29
age 72
sex 2
polviews 7
partyid 8
indus10 270
dtype: int64
```

```
banker = (df['indus10'] == 6870); banker
```

```
0 False
1 False
2 False
3 True
4 False
...
49285 False
49286 False
49287 False
49288 False
49289 False
Name: indus10, Length: 49290, dtype: bool
```

```
banker.sum()
```

```
728
```

```
np.round(banker.mean()*100, 1) # Percentage of bankers in population -> Probability of a random person choosen in dataset to be a banker
```

```
1.5
```

```
def prob(A):
"""Computes probability of a proposition"""
return A.mean()
```

```
prob(banker)
```

```
0.014769730168391155
```

Note

1 Male 2 Female

```
female = (df['sex'] == 2)
prob(female)
```

```
0.5378575776019476
```

Note

```
Extremely liberal
Liberal
Slightly liberal
Moderate
Slightly conservative
Conservative
Extremely conservative
```

```
liberal = (df['polviews'] <=3)
prob(liberal)
```

```
0.27374721038750255
```

Strong democrat

Not strong democrat

Independent, near democrat

Independent

Independent, near republican

Not strong republican

Strong republican

Other party

```
democrate = (df['partyid'] <= 1); prob(democrate)
```

```
0.3662609048488537
```

## Conjunction#

prob (A and B) - Both A & B are true

```
prob(banker & democrate)
```

```
0.004686548995739501
```

## Conditional Probability#

Probability given a condition

E.g. -> Of all respondents who are liberal what fraction are Democrats

```
prob(democrate[liberal])
```

```
0.5206403320240125
```

```
len(democrate[liberal])
```

```
13493
```

```
len(democrate)
```

```
49290
```

```
len(female)
```

```
49290
```

```
len(female[banker])
```

```
728
```

```
prob(female[banker])
```

```
0.7706043956043956
```

```
def conditional(proposition, given):
""" Probability of A conditioned on given"""
return prob(proposition[given])
```

```
conditional(liberal, given=female)
```

```
0.27581004111500884
```

```
conditional(female, given=liberal)
```

```
0.5419106203216483
```

```
prob(liberal & female)
```

```
0.14834652059241227
```

```
prob(female & liberal)
```

```
0.14834652059241227
```

```
conditional(female, given=banker), conditional(banker, given=female)
```

```
(0.7706043956043956, 0.02116102749801969)
```

```
dummy = pd.DataFrame({
'A':[1,0,0],
'B':[1,1,0],
'C':[1,1,1]
}) >0; dummy
```

A | B | C | |
---|---|---|---|

0 | True | True | True |

1 | False | True | True |

2 | False | False | True |

```
conditional(dummy.A, dummy.B)
```

```
0.5
```

```
conditional(dummy.B, dummy.A)
```

```
1.0
```

```
conditional(dummy.B, dummy.C)
```

```
0.6666666666666666
```

```
conditional(dummy.C, dummy.B)
```

```
1.0
```

## Condition and Conjugation#

Probability respondent is a female given she is as liberal democrat

```
conditional(female, given=liberal & democrate)
```

```
0.576085409252669
```

Probability that they are a liberal female, given that they are a banker

```
conditional(liberal & female, given=banker)
```

```
0.17307692307692307
```

## Laws of probability#

In the next few sections, we’ll derive three relationships between conjunction and conditional probability:

Theorem 1: Using a conjunction to compute a conditional probability.

Theorem 2: Using a conditional probability to compute a conjunction.

Theorem 3: Using conditional(A, B) to compute conditional(B, A).

Theorem 3 is also known as Bayes’s Theorem.

I’ll write these theorems using mathematical notation for probability:

P(A) is the probability of proposition A.

P(A and B) is the probability of the conjunction of A and B, that is, the probability that both are true.

P(A|B) is the conditional probability of A given that B is true. The vertical line between A and B is pronounced “given”.

Tip

What fraction of bankers are female?

```
female[banker].mean()
```

```
0.7706043956043956
```

```
conditional(female, given=banker)
```

```
0.7706043956043956
```

Another way to compute

Fraction of respondents who are female bankers

Fraction of respondents who are bankers

```
prob(female & banker)/prob(banker)
```

```
0.7706043956043956
```