{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "55d50fe5-130b-455c-8e12-568b08c4b849",
   "metadata": {},
   "source": [
    "# Customer Lifetime Value "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e03e9518-688c-46dc-8064-69b877cd8dbb",
   "metadata": {},
   "source": [
    "```{admonition} Methodology\n",
    "- Do managerial segmentation \n",
    "- Identify behaviour for the customer for 1 year. [ Assume current year is N, Look at period from N-1 to N to evaluate transition matrix(TM)]\n",
    "    - Calculate Year N-1 & Year N customer matrix\n",
    "    - Join them on customer_id\n",
    "    - Tabulate to calculate Matrix \n",
    "    - Divide by row_sum to calculate transition probablities\n",
    "- Assume the behaviour displayed by user remains the same per segment over the years.\n",
    "- Assume Year N as 1st column in Segment_customer matrix \n",
    "- segment_customer(j+1) = segment_customer(j)*TM + New Acquisition(j) where j goes from 0-> M where years are N-> N+M\n",
    "- Assume avg/revenue for year N as revenue for first column \n",
    "- Evaluated yearly_revenue vector by multiplying with segment_customer matrix [ Assumption: Avg revenue per customer is same]\n",
    "- Multiply by discount_factor matrix with yearly_revenue vector to calculate net present which is your CLV\n",
    "- What is the value of your database?=> CLV - year1 revenue. \n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7a40229-8f76-4a56-aab1-4fcd73bc740f",
   "metadata": {},
   "source": [
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2dfec938-6cbe-4871-8e24-7bce9eec06b5",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import scipy as sp\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "import datetime as dt\n",
    "from sklearn.cluster import AgglomerativeClustering\n",
    "from scipy.cluster.hierarchy import dendrogram, linkage\n",
    "from sklearn.pipeline import Pipeline, make_pipeline\n",
    "from sklearn.preprocessing import StandardScaler, RobustScaler, FunctionTransformer\n",
    "from sklearn.linear_model import LogisticRegression, LinearRegression\n",
    "import statsmodels.api as sm\n",
    "import statsmodels.formula.api as smf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "97979947-3f9e-46f3-acee-ed58c507bdae",
   "metadata": {},
   "outputs": [],
   "source": [
    "sns.set()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6a929ac-dcc4-472d-815c-e4aac5df7491",
   "metadata": {},
   "source": [
    "## Data Collection & Feature Engineering"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1b44d915-9f79-4e15-b36d-5b2aea83d20a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>consumer_id</th>\n",
       "      <th>purchase_amount</th>\n",
       "      <th>date_of_purchase</th>\n",
       "      <th>year_of_purchase</th>\n",
       "      <th>days_since</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>760</td>\n",
       "      <td>25.0</td>\n",
       "      <td>2009-11-06</td>\n",
       "      <td>2009</td>\n",
       "      <td>2247</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>860</td>\n",
       "      <td>50.0</td>\n",
       "      <td>2012-09-28</td>\n",
       "      <td>2012</td>\n",
       "      <td>1190</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1200</td>\n",
       "      <td>100.0</td>\n",
       "      <td>2005-10-25</td>\n",
       "      <td>2005</td>\n",
       "      <td>3720</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1420</td>\n",
       "      <td>50.0</td>\n",
       "      <td>2009-07-09</td>\n",
       "      <td>2009</td>\n",
       "      <td>2367</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1940</td>\n",
       "      <td>70.0</td>\n",
       "      <td>2013-01-25</td>\n",
       "      <td>2013</td>\n",
       "      <td>1071</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   consumer_id  purchase_amount date_of_purchase  year_of_purchase  days_since\n",
       "0          760             25.0       2009-11-06              2009        2247\n",
       "1          860             50.0       2012-09-28              2012        1190\n",
       "2         1200            100.0       2005-10-25              2005        3720\n",
       "3         1420             50.0       2009-07-09              2009        2367\n",
       "4         1940             70.0       2013-01-25              2013        1071"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv(\"W106_purchases.txt\", sep='\\t', names=['consumer_id','purchase_amount', 'date_of_purchase'], parse_dates=['date_of_purchase'] )\n",
    "df['year_of_purchase'] = df['date_of_purchase'].dt.year\n",
    "max_date = df['date_of_purchase'].max() + dt.timedelta(days=1)\n",
    "df['days_since'] = (max_date - df['date_of_purchase']).dt.days\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "025c3128-ac32-4f2e-a8a3-86e83ced6c93",
   "metadata": {},
   "source": [
    "## Managerial Segmentation "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "56d08072-e81a-4d4c-a749-9791fe6b5e26",
   "metadata": {},
   "source": [
    "### Calculate Customer and segment summary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d63777e5-e603-4666-9937-d196e10a79ea",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_consumer_df(df, offset=0):\n",
    "    df = df.copy()\n",
    "    df = df[df['days_since']>offset]\n",
    "    consumer_df = df.groupby('consumer_id').agg(recency=('days_since', 'min'),\n",
    "                                            frequency=('date_of_purchase', 'count'),\n",
    "                                            monetary=('purchase_amount', 'mean'),\n",
    "                                            # avg_purchase_amount=('purchase_amount', 'mean'),\n",
    "                                            max_purchase_amount=('purchase_amount', 'max'),\n",
    "                                            first_purchase=('days_since', 'max'), \n",
    "                                            revenue_till_date=('purchase_amount', 'sum'),\n",
    "                                            first_purchase_date=('date_of_purchase', 'min'),\n",
    "                                            first_purchase_year=('year_of_purchase', 'min'),\n",
    "                                            last_purchase_date=('date_of_purchase', 'max'),\n",
    "                                            last_purchase_year=('year_of_purchase', 'max'),\n",
    "                                           ); consumer_df.head()\n",
    "    consumer_df['recency'] = consumer_df['recency']  - offset\n",
    "    consumer_df['first_purchase'] = consumer_df['first_purchase'] - offset\n",
    "    consumer_df['segment'] = np.nan\n",
    "    rec_filter  = consumer_df['recency'] \n",
    "    first_purchase_filter = consumer_df['first_purchase']\n",
    "    consumer_df.loc[(rec_filter <=365)  & (consumer_df['monetary'] >= 100), 'segment'] = 'active high'\n",
    "    consumer_df.loc[(rec_filter <=365)  & (consumer_df['monetary'] < 100), 'segment'] = 'active low'\"\"\n",
    "    consumer_df.loc[(rec_filter <=365) & (first_purchase_filter <=365) , 'segment'] = 'new active'\n",
    "    # consumer_df.loc[(rec_filter <=365) & (first_purchase_filter <=365) & (consumer_df['monetary'] < 100) , 'segment'] = 'new active low'\n",
    "    consumer_df.loc[(rec_filter>365) & (rec_filter<=2*365) & (consumer_df['monetary'] >= 100), 'segment'] = 'warm high'\n",
    "    consumer_df.loc[(rec_filter>365) & (rec_filter<=2*365) & (consumer_df['monetary'] < 100), 'segment'] = 'warm low'\n",
    "    consumer_df.loc[(rec_filter>365) & (rec_filter<=2*365) & (first_purchase_filter <=2*365) & (consumer_df['monetary'] >= 100) , 'segment'] = 'new warm high'\n",
    "    consumer_df.loc[(rec_filter>365) & (rec_filter<=2*365) & (first_purchase_filter <=2*365) & (consumer_df['monetary'] < 100) , 'segment'] = 'new warm low'\n",
    "    consumer_df.loc[(rec_filter>2*365) & (rec_filter<=3*365), 'segment'] = 'cold'\n",
    "    consumer_df.loc[(rec_filter>3*365), 'segment'] = 'inactive'\n",
    "    return consumer_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0a502267-00fa-41fa-8b20-289198d62c92",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_segment_summary(consumer_df):\n",
    "    if 'revenue' in consumer_df.columns:\n",
    "        segment_df = consumer_df.groupby('segment').agg(size=('recency', 'count'),\n",
    "                                       revenue_today=('revenue', 'sum'),\n",
    "                                       avg_revenue_today=('revenue', 'mean'),\n",
    "                                       revenue_till_date=('revenue_till_date', 'sum'),\n",
    "                                       avg_revenue_till_date=('revenue_till_date', 'mean'),\n",
    "                                       recency_min=('recency', 'min'),\n",
    "                                       recency=('recency', 'mean'),\n",
    "                                       recency_max=('recency', 'max'),\n",
    "                                       frequency_min=('frequency', 'min'),\n",
    "                                       frequency=('frequency', 'mean'),\n",
    "                                       frequency_max=('frequency', 'max'),\n",
    "                                       monetary_min=('monetary', 'min'),\n",
    "                                       monetary=('monetary', 'mean'),\n",
    "                                       monetary_max=('monetary', 'max'),\n",
    "                                       first_purchase=('first_purchase', 'max'),\n",
    "                                       first_purchase_date=('first_purchase_date', 'min'),\n",
    "                                       first_purchase_year=('first_purchase_year', 'min'),\n",
    "                                       last_purchase_date=('last_purchase_date', 'max'),\n",
    "                                       last_purchase_year=('last_purchase_year', 'max')\n",
    "                                      ).sort_values(by=['last_purchase_year', 'first_purchase_year','monetary'], ascending=[False, False, False])\n",
    "    else:\n",
    "        segment_df = consumer_df.groupby('segment').agg(size=('recency', 'count'),\n",
    "                                       revenue_till_date=('revenue_till_date', 'sum'),\n",
    "                                       avg_revenue_till_date=('revenue_till_date', 'mean'),\n",
    "                                       recency_min=('recency', 'min'),\n",
    "                                       recency=('recency', 'mean'),\n",
    "                                       recency_max=('recency', 'max'),\n",
    "                                       frequency_min=('frequency', 'min'),\n",
    "                                       frequency=('frequency', 'mean'),\n",
    "                                       frequency_max=('frequency', 'max'),\n",
    "                                       monetary_min=('monetary', 'min'),\n",
    "                                       monetary=('monetary', 'mean'),\n",
    "                                       monetary_max=('monetary', 'max'),\n",
    "                                       first_purchase=('first_purchase', 'max'),\n",
    "                                       first_purchase_date=('first_purchase_date', 'min'),\n",
    "                                       first_purchase_year=('first_purchase_year', 'min'),\n",
    "                                       last_purchase_date=('last_purchase_date', 'max'),\n",
    "                                       last_purchase_year=('last_purchase_year', 'max')\n",
    "                                      ).sort_values(by=['last_purchase_year', 'first_purchase_year','monetary'], ascending=[False, False, False])   \n",
    "    return segment_df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2317eda2-df27-49b4-b2a8-69026090c906",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>recency</th>\n",
       "      <th>frequency</th>\n",
       "      <th>monetary</th>\n",
       "      <th>max_purchase_amount</th>\n",
       "      <th>first_purchase</th>\n",
       "      <th>revenue_till_date</th>\n",
       "      <th>first_purchase_date</th>\n",
       "      <th>first_purchase_year</th>\n",
       "      <th>last_purchase_date</th>\n",
       "      <th>last_purchase_year</th>\n",
       "      <th>segment</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>consumer_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>3829</td>\n",
       "      <td>1</td>\n",
       "      <td>30.000000</td>\n",
       "      <td>30.0</td>\n",
       "      <td>3829</td>\n",
       "      <td>30.0</td>\n",
       "      <td>2005-07-08</td>\n",
       "      <td>2005</td>\n",
       "      <td>2005-07-08</td>\n",
       "      <td>2005</td>\n",
       "      <td>inactive</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>80</th>\n",
       "      <td>343</td>\n",
       "      <td>7</td>\n",
       "      <td>71.428571</td>\n",
       "      <td>80.0</td>\n",
       "      <td>3751</td>\n",
       "      <td>500.0</td>\n",
       "      <td>2005-09-24</td>\n",
       "      <td>2005</td>\n",
       "      <td>2015-01-23</td>\n",
       "      <td>2015</td>\n",
       "      <td>active low</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90</th>\n",
       "      <td>758</td>\n",
       "      <td>10</td>\n",
       "      <td>115.800000</td>\n",
       "      <td>153.0</td>\n",
       "      <td>3783</td>\n",
       "      <td>1158.0</td>\n",
       "      <td>2005-08-23</td>\n",
       "      <td>2005</td>\n",
       "      <td>2013-12-04</td>\n",
       "      <td>2013</td>\n",
       "      <td>cold</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>120</th>\n",
       "      <td>1401</td>\n",
       "      <td>1</td>\n",
       "      <td>20.000000</td>\n",
       "      <td>20.0</td>\n",
       "      <td>1401</td>\n",
       "      <td>20.0</td>\n",
       "      <td>2012-03-01</td>\n",
       "      <td>2012</td>\n",
       "      <td>2012-03-01</td>\n",
       "      <td>2012</td>\n",
       "      <td>inactive</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>130</th>\n",
       "      <td>2970</td>\n",
       "      <td>2</td>\n",
       "      <td>50.000000</td>\n",
       "      <td>60.0</td>\n",
       "      <td>3710</td>\n",
       "      <td>100.0</td>\n",
       "      <td>2005-11-04</td>\n",
       "      <td>2005</td>\n",
       "      <td>2007-11-14</td>\n",
       "      <td>2007</td>\n",
       "      <td>inactive</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             recency  frequency    monetary  max_purchase_amount  \\\n",
       "consumer_id                                                        \n",
       "10              3829          1   30.000000                 30.0   \n",
       "80               343          7   71.428571                 80.0   \n",
       "90               758         10  115.800000                153.0   \n",
       "120             1401          1   20.000000                 20.0   \n",
       "130             2970          2   50.000000                 60.0   \n",
       "\n",
       "             first_purchase  revenue_till_date first_purchase_date  \\\n",
       "consumer_id                                                          \n",
       "10                     3829               30.0          2005-07-08   \n",
       "80                     3751              500.0          2005-09-24   \n",
       "90                     3783             1158.0          2005-08-23   \n",
       "120                    1401               20.0          2012-03-01   \n",
       "130                    3710              100.0          2005-11-04   \n",
       "\n",
       "             first_purchase_year last_purchase_date  last_purchase_year  \\\n",
       "consumer_id                                                               \n",
       "10                          2005         2005-07-08                2005   \n",
       "80                          2005         2015-01-23                2015   \n",
       "90                          2005         2013-12-04                2013   \n",
       "120                         2012         2012-03-01                2012   \n",
       "130                         2005         2007-11-14                2007   \n",
       "\n",
       "                segment  \n",
       "consumer_id              \n",
       "10             inactive  \n",
       "80           active low  \n",
       "90                 cold  \n",
       "120            inactive  \n",
       "130            inactive  "
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cons_df_2014 = get_consumer_df(df, offset=365); cons_df_2014.head()\n",
    "cons_df_2015 = get_consumer_df(df, offset=0); cons_df_2015.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "493a9128-f717-47c2-b817-426486c6c11f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>size</th>\n",
       "      <th>revenue_till_date</th>\n",
       "      <th>avg_revenue_till_date</th>\n",
       "      <th>recency_min</th>\n",
       "      <th>recency</th>\n",
       "      <th>recency_max</th>\n",
       "      <th>frequency_min</th>\n",
       "      <th>frequency</th>\n",
       "      <th>frequency_max</th>\n",
       "      <th>monetary_min</th>\n",
       "      <th>monetary</th>\n",
       "      <th>monetary_max</th>\n",
       "      <th>first_purchase</th>\n",
       "      <th>first_purchase_date</th>\n",
       "      <th>first_purchase_year</th>\n",
       "      <th>last_purchase_date</th>\n",
       "      <th>last_purchase_year</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>segment</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>new active</th>\n",
       "      <td>1512</td>\n",
       "      <td>119699.21</td>\n",
       "      <td>79.166144</td>\n",
       "      <td>1</td>\n",
       "      <td>84.990741</td>\n",
       "      <td>360</td>\n",
       "      <td>1</td>\n",
       "      <td>1.045635</td>\n",
       "      <td>11</td>\n",
       "      <td>5.0</td>\n",
       "      <td>77.133847</td>\n",
       "      <td>4500.000000</td>\n",
       "      <td>360</td>\n",
       "      <td>2015-01-06</td>\n",
       "      <td>2015</td>\n",
       "      <td>2015-12-31</td>\n",
       "      <td>2015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>active high</th>\n",
       "      <td>573</td>\n",
       "      <td>788194.95</td>\n",
       "      <td>1375.558377</td>\n",
       "      <td>1</td>\n",
       "      <td>88.820244</td>\n",
       "      <td>344</td>\n",
       "      <td>2</td>\n",
       "      <td>5.888307</td>\n",
       "      <td>19</td>\n",
       "      <td>100.0</td>\n",
       "      <td>240.045740</td>\n",
       "      <td>4500.000000</td>\n",
       "      <td>4010</td>\n",
       "      <td>2005-01-08</td>\n",
       "      <td>2005</td>\n",
       "      <td>2015-12-31</td>\n",
       "      <td>2015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>active low</th>\n",
       "      <td>3313</td>\n",
       "      <td>810481.97</td>\n",
       "      <td>244.636876</td>\n",
       "      <td>1</td>\n",
       "      <td>108.361002</td>\n",
       "      <td>359</td>\n",
       "      <td>2</td>\n",
       "      <td>5.935406</td>\n",
       "      <td>45</td>\n",
       "      <td>5.0</td>\n",
       "      <td>40.724525</td>\n",
       "      <td>99.000000</td>\n",
       "      <td>4012</td>\n",
       "      <td>2005-01-06</td>\n",
       "      <td>2005</td>\n",
       "      <td>2015-12-31</td>\n",
       "      <td>2015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>new warm high</th>\n",
       "      <td>116</td>\n",
       "      <td>36166.22</td>\n",
       "      <td>311.777759</td>\n",
       "      <td>366</td>\n",
       "      <td>476.060345</td>\n",
       "      <td>701</td>\n",
       "      <td>1</td>\n",
       "      <td>1.060345</td>\n",
       "      <td>4</td>\n",
       "      <td>100.0</td>\n",
       "      <td>302.036379</td>\n",
       "      <td>4000.000000</td>\n",
       "      <td>707</td>\n",
       "      <td>2014-01-24</td>\n",
       "      <td>2014</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2014</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>new warm low</th>\n",
       "      <td>822</td>\n",
       "      <td>28854.50</td>\n",
       "      <td>35.102798</td>\n",
       "      <td>366</td>\n",
       "      <td>513.996350</td>\n",
       "      <td>720</td>\n",
       "      <td>1</td>\n",
       "      <td>1.042579</td>\n",
       "      <td>3</td>\n",
       "      <td>5.0</td>\n",
       "      <td>33.374290</td>\n",
       "      <td>90.000000</td>\n",
       "      <td>720</td>\n",
       "      <td>2014-01-11</td>\n",
       "      <td>2014</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2014</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>warm high</th>\n",
       "      <td>119</td>\n",
       "      <td>168498.98</td>\n",
       "      <td>1415.957815</td>\n",
       "      <td>366</td>\n",
       "      <td>455.126050</td>\n",
       "      <td>665</td>\n",
       "      <td>2</td>\n",
       "      <td>4.714286</td>\n",
       "      <td>14</td>\n",
       "      <td>100.0</td>\n",
       "      <td>327.407457</td>\n",
       "      <td>4000.000000</td>\n",
       "      <td>4004</td>\n",
       "      <td>2005-01-14</td>\n",
       "      <td>2005</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2014</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>warm low</th>\n",
       "      <td>901</td>\n",
       "      <td>159740.22</td>\n",
       "      <td>177.292142</td>\n",
       "      <td>366</td>\n",
       "      <td>474.377358</td>\n",
       "      <td>707</td>\n",
       "      <td>2</td>\n",
       "      <td>4.531632</td>\n",
       "      <td>20</td>\n",
       "      <td>5.0</td>\n",
       "      <td>38.591926</td>\n",
       "      <td>96.428571</td>\n",
       "      <td>4011</td>\n",
       "      <td>2005-01-07</td>\n",
       "      <td>2005</td>\n",
       "      <td>2014-12-31</td>\n",
       "      <td>2014</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cold</th>\n",
       "      <td>1903</td>\n",
       "      <td>232402.29</td>\n",
       "      <td>122.124167</td>\n",
       "      <td>731</td>\n",
       "      <td>857.781398</td>\n",
       "      <td>1087</td>\n",
       "      <td>1</td>\n",
       "      <td>2.303205</td>\n",
       "      <td>20</td>\n",
       "      <td>5.0</td>\n",
       "      <td>51.739893</td>\n",
       "      <td>2000.000000</td>\n",
       "      <td>4016</td>\n",
       "      <td>2005-01-02</td>\n",
       "      <td>2005</td>\n",
       "      <td>2013-12-31</td>\n",
       "      <td>2013</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>inactive</th>\n",
       "      <td>9158</td>\n",
       "      <td>850306.56</td>\n",
       "      <td>92.848500</td>\n",
       "      <td>1096</td>\n",
       "      <td>2178.110832</td>\n",
       "      <td>4014</td>\n",
       "      <td>1</td>\n",
       "      <td>1.814479</td>\n",
       "      <td>23</td>\n",
       "      <td>5.0</td>\n",
       "      <td>48.112771</td>\n",
       "      <td>3043.750000</td>\n",
       "      <td>4016</td>\n",
       "      <td>2005-01-02</td>\n",
       "      <td>2005</td>\n",
       "      <td>2012-12-31</td>\n",
       "      <td>2012</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               size  revenue_till_date  avg_revenue_till_date  recency_min  \\\n",
       "segment                                                                      \n",
       "new active     1512          119699.21              79.166144            1   \n",
       "active high     573          788194.95            1375.558377            1   \n",
       "active low     3313          810481.97             244.636876            1   \n",
       "new warm high   116           36166.22             311.777759          366   \n",
       "new warm low    822           28854.50              35.102798          366   \n",
       "warm high       119          168498.98            1415.957815          366   \n",
       "warm low        901          159740.22             177.292142          366   \n",
       "cold           1903          232402.29             122.124167          731   \n",
       "inactive       9158          850306.56              92.848500         1096   \n",
       "\n",
       "                   recency  recency_max  frequency_min  frequency  \\\n",
       "segment                                                             \n",
       "new active       84.990741          360              1   1.045635   \n",
       "active high      88.820244          344              2   5.888307   \n",
       "active low      108.361002          359              2   5.935406   \n",
       "new warm high   476.060345          701              1   1.060345   \n",
       "new warm low    513.996350          720              1   1.042579   \n",
       "warm high       455.126050          665              2   4.714286   \n",
       "warm low        474.377358          707              2   4.531632   \n",
       "cold            857.781398         1087              1   2.303205   \n",
       "inactive       2178.110832         4014              1   1.814479   \n",
       "\n",
       "               frequency_max  monetary_min    monetary  monetary_max  \\\n",
       "segment                                                                \n",
       "new active                11           5.0   77.133847   4500.000000   \n",
       "active high               19         100.0  240.045740   4500.000000   \n",
       "active low                45           5.0   40.724525     99.000000   \n",
       "new warm high              4         100.0  302.036379   4000.000000   \n",
       "new warm low               3           5.0   33.374290     90.000000   \n",
       "warm high                 14         100.0  327.407457   4000.000000   \n",
       "warm low                  20           5.0   38.591926     96.428571   \n",
       "cold                      20           5.0   51.739893   2000.000000   \n",
       "inactive                  23           5.0   48.112771   3043.750000   \n",
       "\n",
       "               first_purchase first_purchase_date  first_purchase_year  \\\n",
       "segment                                                                  \n",
       "new active                360          2015-01-06                 2015   \n",
       "active high              4010          2005-01-08                 2005   \n",
       "active low               4012          2005-01-06                 2005   \n",
       "new warm high             707          2014-01-24                 2014   \n",
       "new warm low              720          2014-01-11                 2014   \n",
       "warm high                4004          2005-01-14                 2005   \n",
       "warm low                 4011          2005-01-07                 2005   \n",
       "cold                     4016          2005-01-02                 2005   \n",
       "inactive                 4016          2005-01-02                 2005   \n",
       "\n",
       "              last_purchase_date  last_purchase_year  \n",
       "segment                                               \n",
       "new active            2015-12-31                2015  \n",
       "active high           2015-12-31                2015  \n",
       "active low            2015-12-31                2015  \n",
       "new warm high         2014-12-31                2014  \n",
       "new warm low          2014-12-31                2014  \n",
       "warm high             2014-12-31                2014  \n",
       "warm low              2014-12-31                2014  \n",
       "cold                  2013-12-31                2013  \n",
       "inactive              2012-12-31                2012  "
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "get_segment_summary(cons_df_2015)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c920355a-80a0-408e-b290-8c206c4e78ef",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['new active',\n",
       " 'active high',\n",
       " 'active low',\n",
       " 'new warm high',\n",
       " 'new warm low',\n",
       " 'warm high',\n",
       " 'warm low',\n",
       " 'cold',\n",
       " 'inactive']"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "segment_order = get_segment_summary(cons_df_2015).index.tolist(); segment_order"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a093bc5e-f062-4680-b8e4-cfd42be2cf87",
   "metadata": {},
   "source": [
    "### Calculate Transition Matrix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f7b9a7f3-f354-404b-ab87-bac19e96dd8b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>segment_x</th>\n",
       "      <th>segment_y</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>consumer_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>inactive</td>\n",
       "      <td>inactive</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>80</th>\n",
       "      <td>active low</td>\n",
       "      <td>active low</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90</th>\n",
       "      <td>warm high</td>\n",
       "      <td>cold</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>120</th>\n",
       "      <td>cold</td>\n",
       "      <td>inactive</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>130</th>\n",
       "      <td>inactive</td>\n",
       "      <td>inactive</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>160</th>\n",
       "      <td>inactive</td>\n",
       "      <td>inactive</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>190</th>\n",
       "      <td>inactive</td>\n",
       "      <td>inactive</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>220</th>\n",
       "      <td>inactive</td>\n",
       "      <td>inactive</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>230</th>\n",
       "      <td>inactive</td>\n",
       "      <td>inactive</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>240</th>\n",
       "      <td>active low</td>\n",
       "      <td>warm low</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              segment_x   segment_y\n",
       "consumer_id                        \n",
       "10             inactive    inactive\n",
       "80           active low  active low\n",
       "90            warm high        cold\n",
       "120                cold    inactive\n",
       "130            inactive    inactive\n",
       "160            inactive    inactive\n",
       "190            inactive    inactive\n",
       "220            inactive    inactive\n",
       "230            inactive    inactive\n",
       "240          active low    warm low"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_transition = pd.merge(cons_df_2014['segment'] , cons_df_2015['segment'], left_index=True, right_index=True, how='left')\n",
    "df_transition.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1d0eb718-1ccd-482c-ac53-d6f91f054f45",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>segment_y</th>\n",
       "      <th>active high</th>\n",
       "      <th>active low</th>\n",
       "      <th>cold</th>\n",
       "      <th>inactive</th>\n",
       "      <th>new warm high</th>\n",
       "      <th>new warm low</th>\n",
       "      <th>warm high</th>\n",
       "      <th>warm low</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>segment_x</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>active high</th>\n",
       "      <td>354</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>119</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>active low</th>\n",
       "      <td>22</td>\n",
       "      <td>2088</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>901</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cold</th>\n",
       "      <td>22</td>\n",
       "      <td>200</td>\n",
       "      <td>0</td>\n",
       "      <td>1931</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>inactive</th>\n",
       "      <td>35</td>\n",
       "      <td>250</td>\n",
       "      <td>0</td>\n",
       "      <td>7227</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>new active</th>\n",
       "      <td>89</td>\n",
       "      <td>410</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>116</td>\n",
       "      <td>822</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>new warm high</th>\n",
       "      <td>15</td>\n",
       "      <td>0</td>\n",
       "      <td>112</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>new warm low</th>\n",
       "      <td>0</td>\n",
       "      <td>96</td>\n",
       "      <td>1027</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>warm high</th>\n",
       "      <td>35</td>\n",
       "      <td>1</td>\n",
       "      <td>75</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>warm low</th>\n",
       "      <td>1</td>\n",
       "      <td>266</td>\n",
       "      <td>689</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "segment_y      active high  active low  cold  inactive  new warm high  \\\n",
       "segment_x                                                               \n",
       "active high            354           2     0         0              0   \n",
       "active low              22        2088     0         0              0   \n",
       "cold                    22         200     0      1931              0   \n",
       "inactive                35         250     0      7227              0   \n",
       "new active              89         410     0         0            116   \n",
       "new warm high           15           0   112         0              0   \n",
       "new warm low             0          96  1027         0              0   \n",
       "warm high               35           1    75         0              0   \n",
       "warm low                 1         266   689         0              0   \n",
       "\n",
       "segment_y      new warm low  warm high  warm low  \n",
       "segment_x                                         \n",
       "active high               0        119         0  \n",
       "active low                0          0       901  \n",
       "cold                      0          0         0  \n",
       "inactive                  0          0         0  \n",
       "new active              822          0         0  \n",
       "new warm high             0          0         0  \n",
       "new warm low              0          0         0  \n",
       "warm high                 0          0         0  \n",
       "warm low                  0          0         0  "
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "transition_count = pd.crosstab(df_transition.segment_x, df_transition.segment_y); transition_count"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "91d89b50-5c23-4821-a114-ce4f4490a51a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "segment_x\n",
       "active high       475\n",
       "active low       3011\n",
       "cold             2153\n",
       "inactive         7512\n",
       "new active       1437\n",
       "new warm high     127\n",
       "new warm low     1123\n",
       "warm high         111\n",
       "warm low          956\n",
       "dtype: int64"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "transition_count.sum(axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7371191b-89bd-4214-9bf8-87fc7a76bf79",
   "metadata": {},
   "outputs": [],
   "source": [
    "transition_matrix = (transition_count.T/ transition_count.sum(axis=1)).T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "abc843fc-6f12-49f6-b815-641251fcf4f2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['new active'], dtype='object')"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "transition_matrix.index.difference(transition_matrix.columns)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "89309567-ce18-4018-9c42-c85b84d7875b",
   "metadata": {},
   "outputs": [],
   "source": [
    "transition_matrix[transition_matrix.index.difference(transition_matrix.columns)] = np.nan"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7716edfd-6828-4462-9337-c0e53b8c3aca",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>segment_y</th>\n",
       "      <th>new active</th>\n",
       "      <th>active high</th>\n",
       "      <th>active low</th>\n",
       "      <th>new warm high</th>\n",
       "      <th>new warm low</th>\n",
       "      <th>warm high</th>\n",
       "      <th>warm low</th>\n",
       "      <th>cold</th>\n",
       "      <th>inactive</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>segment_x</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>new active</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.061935</td>\n",
       "      <td>0.285317</td>\n",
       "      <td>0.080724</td>\n",
       "      <td>0.572025</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>active high</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.745263</td>\n",
       "      <td>0.004211</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.250526</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>active low</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.007307</td>\n",
       "      <td>0.693457</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.299236</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>new warm high</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.118110</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.881890</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>new warm low</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.085485</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.914515</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>warm high</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.315315</td>\n",
       "      <td>0.009009</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.675676</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>warm low</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.001046</td>\n",
       "      <td>0.278243</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.720711</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cold</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.010218</td>\n",
       "      <td>0.092894</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.896888</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>inactive</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.004659</td>\n",
       "      <td>0.033280</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.962061</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "segment_y      new active  active high  active low  new warm high  \\\n",
       "segment_x                                                           \n",
       "new active            0.0     0.061935    0.285317       0.080724   \n",
       "active high           0.0     0.745263    0.004211       0.000000   \n",
       "active low            0.0     0.007307    0.693457       0.000000   \n",
       "new warm high         0.0     0.118110    0.000000       0.000000   \n",
       "new warm low          0.0     0.000000    0.085485       0.000000   \n",
       "warm high             0.0     0.315315    0.009009       0.000000   \n",
       "warm low              0.0     0.001046    0.278243       0.000000   \n",
       "cold                  0.0     0.010218    0.092894       0.000000   \n",
       "inactive              0.0     0.004659    0.033280       0.000000   \n",
       "\n",
       "segment_y      new warm low  warm high  warm low      cold  inactive  \n",
       "segment_x                                                             \n",
       "new active         0.572025   0.000000  0.000000  0.000000  0.000000  \n",
       "active high        0.000000   0.250526  0.000000  0.000000  0.000000  \n",
       "active low         0.000000   0.000000  0.299236  0.000000  0.000000  \n",
       "new warm high      0.000000   0.000000  0.000000  0.881890  0.000000  \n",
       "new warm low       0.000000   0.000000  0.000000  0.914515  0.000000  \n",
       "warm high          0.000000   0.000000  0.000000  0.675676  0.000000  \n",
       "warm low           0.000000   0.000000  0.000000  0.720711  0.000000  \n",
       "cold               0.000000   0.000000  0.000000  0.000000  0.896888  \n",
       "inactive           0.000000   0.000000  0.000000  0.000000  0.962061  "
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def get_transition_matrix(cons_df_pre, cons_df_curr, order=None):\n",
    "    df_transition = pd.merge(cons_df_pre['segment'] , cons_df_curr['segment'], left_index=True, right_index=True, how='left')\n",
    "    transition_count = pd.crosstab(df_transition.segment_x, df_transition.segment_y)\n",
    "    transition_matrix = (transition_count.T/ transition_count.sum(axis=1)).T\n",
    "    transition_matrix[transition_matrix.index.difference(transition_matrix.columns)] = 0.0\n",
    "    if order: return transition_matrix.loc[order, order]\n",
    "    return transition_matrix\n",
    "    \n",
    "    \n",
    "transition_matrix = get_transition_matrix(cons_df_2014, cons_df_2015, order=segment_order); transition_matrix"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7b0b15b-88c8-48e7-bf35-0341b57e6f97",
   "metadata": {},
   "source": [
    "### Calculate Segment Customer Matrix "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5dfff67b-7c81-4c84-814a-446e5add14cd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['new active', 'active high', 'active low', 'new warm high',\n",
       "       'new warm low', 'warm high', 'warm low', 'cold', 'inactive'],\n",
       "      dtype='object', name='segment_y')"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "transition_matrix.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cac0171a-4d70-4d40-bfe7-5ff9f149b8e3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "segment\n",
       "new active       1512\n",
       "active high       573\n",
       "active low       3313\n",
       "new warm high     116\n",
       "new warm low      822\n",
       "warm high         119\n",
       "warm low          901\n",
       "cold             1903\n",
       "inactive         9158\n",
       "Name: size, dtype: int64"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "segment_vector = get_segment_summary(cons_df_2015)['size']; segment_vector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3e1a1d43-169d-4d09-acc0-5bf2c24390cb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "segment\n",
       "new active       1000.0\n",
       "active high         0.0\n",
       "active low          0.0\n",
       "new warm high       0.0\n",
       "new warm low        0.0\n",
       "warm high           0.0\n",
       "warm low            0.0\n",
       "cold                0.0\n",
       "inactive            0.0\n",
       "Name: size, dtype: float64"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_customer_vector = segment_vector.copy()*0.0\n",
    "new_customer_vector.loc['new active'] = 1000\n",
    "new_customer_vector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "99f56cba-6c8a-44c7-911c-1b7afaf64450",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5cb1d955-0956-4d53-9e96-5e5e5181d1ee",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "duration = 10\n",
    "list(range(10))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3c45c0f4-4494-486d-8076-fd23d1d9342e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>segment_y</th>\n",
       "      <th>new active</th>\n",
       "      <th>active high</th>\n",
       "      <th>active low</th>\n",
       "      <th>new warm high</th>\n",
       "      <th>new warm low</th>\n",
       "      <th>warm high</th>\n",
       "      <th>warm low</th>\n",
       "      <th>cold</th>\n",
       "      <th>inactive</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>segment_x</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>new active</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.061935</td>\n",
       "      <td>0.285317</td>\n",
       "      <td>0.080724</td>\n",
       "      <td>0.572025</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>active high</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.745263</td>\n",
       "      <td>0.004211</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.250526</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>active low</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.007307</td>\n",
       "      <td>0.693457</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.299236</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>new warm high</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.118110</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.881890</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>new warm low</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.085485</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.914515</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>warm high</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.315315</td>\n",
       "      <td>0.009009</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.675676</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>warm low</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.001046</td>\n",
       "      <td>0.278243</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.720711</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cold</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.010218</td>\n",
       "      <td>0.092894</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.896888</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>inactive</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.004659</td>\n",
       "      <td>0.033280</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.962061</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "segment_y      new active  active high  active low  new warm high  \\\n",
       "segment_x                                                           \n",
       "new active            0.0     0.061935    0.285317       0.080724   \n",
       "active high           0.0     0.745263    0.004211       0.000000   \n",
       "active low            0.0     0.007307    0.693457       0.000000   \n",
       "new warm high         0.0     0.118110    0.000000       0.000000   \n",
       "new warm low          0.0     0.000000    0.085485       0.000000   \n",
       "warm high             0.0     0.315315    0.009009       0.000000   \n",
       "warm low              0.0     0.001046    0.278243       0.000000   \n",
       "cold                  0.0     0.010218    0.092894       0.000000   \n",
       "inactive              0.0     0.004659    0.033280       0.000000   \n",
       "\n",
       "segment_y      new warm low  warm high  warm low      cold  inactive  \n",
       "segment_x                                                             \n",
       "new active         0.572025   0.000000  0.000000  0.000000  0.000000  \n",
       "active high        0.000000   0.250526  0.000000  0.000000  0.000000  \n",
       "active low         0.000000   0.000000  0.299236  0.000000  0.000000  \n",
       "new warm high      0.000000   0.000000  0.000000  0.881890  0.000000  \n",
       "new warm low       0.000000   0.000000  0.000000  0.914515  0.000000  \n",
       "warm high          0.000000   0.000000  0.000000  0.675676  0.000000  \n",
       "warm low           0.000000   0.000000  0.000000  0.720711  0.000000  \n",
       "cold               0.000000   0.000000  0.000000  0.000000  0.896888  \n",
       "inactive           0.000000   0.000000  0.000000  0.000000  0.962061  "
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "transition_matrix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7840a1c0-bff2-4dee-aaa4-483e8c18d16a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "segment\n",
       "new active       1512\n",
       "active high       573\n",
       "active low       3313\n",
       "new warm high     116\n",
       "new warm low      822\n",
       "warm high         119\n",
       "warm low          901\n",
       "cold             1903\n",
       "inactive         9158\n",
       "Name: size, dtype: int64"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "segment_vector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12234532-a122-4888-a06c-4d768eaf52d2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "segment_y\n",
       "new active           0.000000\n",
       "active high        659.167726\n",
       "active low        3534.828749\n",
       "new warm high      122.054280\n",
       "new warm low       864.901879\n",
       "warm high          143.551579\n",
       "warm low           991.369313\n",
       "cold              1583.796574\n",
       "inactive         10517.329901\n",
       "dtype: float64"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "transition_matrix.multiply(segment_vector, axis='index').sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "26c2ff76-df1d-4c9f-8149-c474ee245fb3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>2015</th>\n",
       "      <th>2016</th>\n",
       "      <th>2017</th>\n",
       "      <th>2018</th>\n",
       "      <th>2019</th>\n",
       "      <th>2020</th>\n",
       "      <th>2021</th>\n",
       "      <th>2022</th>\n",
       "      <th>2023</th>\n",
       "      <th>2024</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>segment</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>new active</th>\n",
       "      <td>1512</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>active high</th>\n",
       "      <td>573</td>\n",
       "      <td>659.167726</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>active low</th>\n",
       "      <td>3313</td>\n",
       "      <td>3534.828749</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>new warm high</th>\n",
       "      <td>116</td>\n",
       "      <td>122.054280</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>new warm low</th>\n",
       "      <td>822</td>\n",
       "      <td>864.901879</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>warm high</th>\n",
       "      <td>119</td>\n",
       "      <td>143.551579</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>warm low</th>\n",
       "      <td>901</td>\n",
       "      <td>991.369313</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cold</th>\n",
       "      <td>1903</td>\n",
       "      <td>1583.796574</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>inactive</th>\n",
       "      <td>9158</td>\n",
       "      <td>10517.329901</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               2015          2016  2017  2018  2019  2020  2021  2022  2023  \\\n",
       "segment                                                                       \n",
       "new active     1512      0.000000   0.0   0.0   0.0   0.0   0.0   0.0   0.0   \n",
       "active high     573    659.167726   0.0   0.0   0.0   0.0   0.0   0.0   0.0   \n",
       "active low     3313   3534.828749   0.0   0.0   0.0   0.0   0.0   0.0   0.0   \n",
       "new warm high   116    122.054280   0.0   0.0   0.0   0.0   0.0   0.0   0.0   \n",
       "new warm low    822    864.901879   0.0   0.0   0.0   0.0   0.0   0.0   0.0   \n",
       "warm high       119    143.551579   0.0   0.0   0.0   0.0   0.0   0.0   0.0   \n",
       "warm low        901    991.369313   0.0   0.0   0.0   0.0   0.0   0.0   0.0   \n",
       "cold           1903   1583.796574   0.0   0.0   0.0   0.0   0.0   0.0   0.0   \n",
       "inactive       9158  10517.329901   0.0   0.0   0.0   0.0   0.0   0.0   0.0   \n",
       "\n",
       "               2024  \n",
       "segment              \n",
       "new active      0.0  \n",
       "active high     0.0  \n",
       "active low      0.0  \n",
       "new warm high   0.0  \n",
       "new warm low    0.0  \n",
       "warm high       0.0  \n",
       "warm low        0.0  \n",
       "cold            0.0  \n",
       "inactive        0.0  "
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "base=2015\n",
    "segment_matrix = pd.DataFrame(np.zeros([segment_vector.shape[0], duration]), index= segment_vector.index, columns=range(base, duration+base))\n",
    "segment_matrix.iloc[:, 0] = segment_vector\n",
    "segment_matrix.iloc[:, 1] = transition_matrix.multiply(segment_vector, axis='index').sum()\n",
    "segment_matrix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c9d3a180-611b-414f-9d53-8a2c7e03cbca",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>2015</th>\n",
       "      <th>2016</th>\n",
       "      <th>2017</th>\n",
       "      <th>2018</th>\n",
       "      <th>2019</th>\n",
       "      <th>2020</th>\n",
       "      <th>2021</th>\n",
       "      <th>2022</th>\n",
       "      <th>2023</th>\n",
       "      <th>2024</th>\n",
       "      <th>2025</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>segment</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>new active</th>\n",
       "      <td>1512</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>active high</th>\n",
       "      <td>573</td>\n",
       "      <td>659.0</td>\n",
       "      <td>643.0</td>\n",
       "      <td>628.0</td>\n",
       "      <td>610.0</td>\n",
       "      <td>596.0</td>\n",
       "      <td>583.0</td>\n",
       "      <td>572.0</td>\n",
       "      <td>563.0</td>\n",
       "      <td>555.0</td>\n",
       "      <td>548.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>active low</th>\n",
       "      <td>3313</td>\n",
       "      <td>3535.0</td>\n",
       "      <td>3302.0</td>\n",
       "      <td>3131.0</td>\n",
       "      <td>2952.0</td>\n",
       "      <td>2819.0</td>\n",
       "      <td>2716.0</td>\n",
       "      <td>2636.0</td>\n",
       "      <td>2574.0</td>\n",
       "      <td>2527.0</td>\n",
       "      <td>2490.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>new warm high</th>\n",
       "      <td>116</td>\n",
       "      <td>122.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>new warm low</th>\n",
       "      <td>822</td>\n",
       "      <td>865.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>warm high</th>\n",
       "      <td>119</td>\n",
       "      <td>144.0</td>\n",
       "      <td>165.0</td>\n",
       "      <td>161.0</td>\n",
       "      <td>157.0</td>\n",
       "      <td>153.0</td>\n",
       "      <td>149.0</td>\n",
       "      <td>146.0</td>\n",
       "      <td>143.0</td>\n",
       "      <td>141.0</td>\n",
       "      <td>139.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>warm low</th>\n",
       "      <td>901</td>\n",
       "      <td>991.0</td>\n",
       "      <td>1058.0</td>\n",
       "      <td>988.0</td>\n",
       "      <td>937.0</td>\n",
       "      <td>883.0</td>\n",
       "      <td>843.0</td>\n",
       "      <td>813.0</td>\n",
       "      <td>789.0</td>\n",
       "      <td>770.0</td>\n",
       "      <td>756.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cold</th>\n",
       "      <td>1903</td>\n",
       "      <td>1584.0</td>\n",
       "      <td>1710.0</td>\n",
       "      <td>874.0</td>\n",
       "      <td>821.0</td>\n",
       "      <td>782.0</td>\n",
       "      <td>740.0</td>\n",
       "      <td>709.0</td>\n",
       "      <td>684.0</td>\n",
       "      <td>665.0</td>\n",
       "      <td>650.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>inactive</th>\n",
       "      <td>9158</td>\n",
       "      <td>10517.0</td>\n",
       "      <td>11539.0</td>\n",
       "      <td>12635.0</td>\n",
       "      <td>12939.0</td>\n",
       "      <td>13185.0</td>\n",
       "      <td>13385.0</td>\n",
       "      <td>13541.0</td>\n",
       "      <td>13663.0</td>\n",
       "      <td>13759.0</td>\n",
       "      <td>13833.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               2015     2016     2017     2018     2019     2020     2021  \\\n",
       "segment                                                                     \n",
       "new active     1512      0.0      0.0      0.0      0.0      0.0      0.0   \n",
       "active high     573    659.0    643.0    628.0    610.0    596.0    583.0   \n",
       "active low     3313   3535.0   3302.0   3131.0   2952.0   2819.0   2716.0   \n",
       "new warm high   116    122.0      0.0      0.0      0.0      0.0      0.0   \n",
       "new warm low    822    865.0      0.0      0.0      0.0      0.0      0.0   \n",
       "warm high       119    144.0    165.0    161.0    157.0    153.0    149.0   \n",
       "warm low        901    991.0   1058.0    988.0    937.0    883.0    843.0   \n",
       "cold           1903   1584.0   1710.0    874.0    821.0    782.0    740.0   \n",
       "inactive       9158  10517.0  11539.0  12635.0  12939.0  13185.0  13385.0   \n",
       "\n",
       "                  2022     2023     2024     2025  \n",
       "segment                                            \n",
       "new active         0.0      0.0      0.0      0.0  \n",
       "active high      572.0    563.0    555.0    548.0  \n",
       "active low      2636.0   2574.0   2527.0   2490.0  \n",
       "new warm high      0.0      0.0      0.0      0.0  \n",
       "new warm low       0.0      0.0      0.0      0.0  \n",
       "warm high        146.0    143.0    141.0    139.0  \n",
       "warm low         813.0    789.0    770.0    756.0  \n",
       "cold             709.0    684.0    665.0    650.0  \n",
       "inactive       13541.0  13663.0  13759.0  13833.0  "
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def get_segment_customer_matrix(duration, transition_matrix, segment_vector, new_customer_vector=None,  base=2015):\n",
    "    if new_customer_vector is None: new_customer_vector = segment_vector.copy()*0\n",
    "    \n",
    "    segment_matrix = pd.DataFrame(np.zeros([segment_vector.shape[0], duration]), index= segment_vector.index, columns=range(base, duration+base))\n",
    "    segment_matrix.iloc[:, 0] = segment_vector\n",
    "    for i in range(1, duration):\n",
    "        segment_matrix.iloc[:, i] = transition_matrix.multiply(segment_matrix.iloc[:, i-1], axis='index').sum()+new_customer_vector\n",
    "    return round(segment_matrix)\n",
    "\n",
    "\n",
    "segment_customer_matrix = get_segment_customer_matrix(11, transition_matrix, segment_vector, new_customer_vector=None) \n",
    "segment_customer_matrix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0ac95ab0-adcc-43fb-8525-30e7e293a2cc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2015    18417.0\n",
       "2016    18417.0\n",
       "2017    18417.0\n",
       "2018    18417.0\n",
       "2019    18416.0\n",
       "2020    18418.0\n",
       "2021    18416.0\n",
       "2022    18417.0\n",
       "2023    18416.0\n",
       "2024    18417.0\n",
       "2025    18416.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "segment_customer_matrix.sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "302542ba-4d44-4f84-8367-9b7aea524502",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>revenue_2015</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>consumer_id</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>80</th>\n",
       "      <td>80.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>480</th>\n",
       "      <td>45.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>830</th>\n",
       "      <td>50.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>850</th>\n",
       "      <td>60.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>860</th>\n",
       "      <td>60.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             revenue_2015\n",
       "consumer_id              \n",
       "80                   80.0\n",
       "480                  45.0\n",
       "830                  50.0\n",
       "850                  60.0\n",
       "860                  60.0"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "revenue_df = df[df['year_of_purchase']==2015].groupby('consumer_id').agg(revenue_2015=('purchase_amount', 'sum')); revenue_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b98cb62f-8e19-4cbd-a6f2-f063906b9c1a",
   "metadata": {},
   "outputs": [],
   "source": [
    "segment_revenue_vector=pd.merge(cons_df_2015, revenue_df, left_index=True,right_index=True, how='left')[['segment','revenue_2015']].fillna(0)\\\n",
    "                         .groupby('segment').mean().loc[segment_order]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e4a7a758-a19d-4d19-9354-7f583f3b24c1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>revenue_2015</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>segment</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>new active</th>\n",
       "      <td>79.166144</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>active high</th>\n",
       "      <td>323.568935</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>active low</th>\n",
       "      <td>52.306043</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>new warm high</th>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>new warm low</th>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>warm high</th>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>warm low</th>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cold</th>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>inactive</th>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               revenue_2015\n",
       "segment                    \n",
       "new active        79.166144\n",
       "active high      323.568935\n",
       "active low        52.306043\n",
       "new warm high      0.000000\n",
       "new warm low       0.000000\n",
       "warm high          0.000000\n",
       "warm low           0.000000\n",
       "cold               0.000000\n",
       "inactive           0.000000"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "segment_revenue_vector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a93aef7c-74bf-4f92-80d1-6dee6ab99b1e",
   "metadata": {},
   "outputs": [],
   "source": [
    "segment_customer_revenue = segment_customer_matrix.multiply(segment_revenue_vector.values, axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb2840a2-c3fa-4050-934b-2c71fa0af1aa",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2015    478394.130000\n",
       "2016    398133.789962\n",
       "2017    380769.379008\n",
       "2018    366971.511648\n",
       "2019    351784.489138\n",
       "2020    340297.820341\n",
       "2021    330703.901766\n",
       "2022    322960.160047\n",
       "2023    316805.064971\n",
       "2024    311758.129473\n",
       "2025    307557.823339\n",
       "dtype: float64"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "yearly_revenue = segment_customer_revenue.sum(); yearly_revenue"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22b4d7e1-f158-48af-bf69-8c0c93954293",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1.0,\n",
       " 0.9090909090909091,\n",
       " 0.8264462809917354,\n",
       " 0.7513148009015775,\n",
       " 0.6830134553650705,\n",
       " 0.6209213230591549,\n",
       " 0.5644739300537772,\n",
       " 0.5131581182307065,\n",
       " 0.4665073802097331,\n",
       " 0.42409761837248455,\n",
       " 0.3855432894295314]"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "discount_rate = 0.1\n",
    "discount_factor = [1/(1+discount_rate)**i for i in range(0,duration+1)]\n",
    "discount_factor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e55f6d2c-3093-4643-a990-bc665eeeea2e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2015    478394.130000\n",
       "2016    361939.809056\n",
       "2017    314685.437197\n",
       "2018    275711.128210\n",
       "2019    240273.539470\n",
       "2020    211298.172840\n",
       "2021    186673.731114\n",
       "2022    165729.627993\n",
       "2023    147791.900897\n",
       "2024    132215.880218\n",
       "2025    118576.854900\n",
       "dtype: float64"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(yearly_revenue*discount_factor)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e7494ceb-93ee-49a0-aac1-24be94f02c39",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2633290.2118960177"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(yearly_revenue*discount_factor).sum() # Value of Database"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5e335f8-74c3-49dd-acaf-635bd1b6c2e7",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c4e0d41-f3d4-416b-9ded-4f6118d3e701",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c6511aaa-8a90-4f64-802c-c3d84269c391",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "62f411ff-a0ae-4880-a0f8-15a72f8d2024",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}