{ "cells": [ { "cell_type": "markdown", "id": "55d50fe5-130b-455c-8e12-568b08c4b849", "metadata": {}, "source": [ "# Customer Lifetime Value " ] }, { "cell_type": "markdown", "id": "e03e9518-688c-46dc-8064-69b877cd8dbb", "metadata": {}, "source": [ "```{admonition} Methodology\n", "- Do managerial segmentation \n", "- Identify behaviour for the customer for 1 year. [ Assume current year is N, Look at period from N-1 to N to evaluate transition matrix(TM)]\n", " - Calculate Year N-1 & Year N customer matrix\n", " - Join them on customer_id\n", " - Tabulate to calculate Matrix \n", " - Divide by row_sum to calculate transition probablities\n", "- Assume the behaviour displayed by user remains the same per segment over the years.\n", "- Assume Year N as 1st column in Segment_customer matrix \n", "- segment_customer(j+1) = segment_customer(j)*TM + New Acquisition(j) where j goes from 0-> M where years are N-> N+M\n", "- Assume avg/revenue for year N as revenue for first column \n", "- Evaluated yearly_revenue vector by multiplying with segment_customer matrix [ Assumption: Avg revenue per customer is same]\n", "- Multiply by discount_factor matrix with yearly_revenue vector to calculate net present which is your CLV\n", "- What is the value of your database?=> CLV - year1 revenue. \n", "```" ] }, { "cell_type": "markdown", "id": "f7a40229-8f76-4a56-aab1-4fcd73bc740f", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": null, "id": "2dfec938-6cbe-4871-8e24-7bce9eec06b5", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import scipy as sp\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import datetime as dt\n", "from sklearn.cluster import AgglomerativeClustering\n", "from scipy.cluster.hierarchy import dendrogram, linkage\n", "from sklearn.pipeline import Pipeline, make_pipeline\n", "from sklearn.preprocessing import StandardScaler, RobustScaler, FunctionTransformer\n", "from sklearn.linear_model import LogisticRegression, LinearRegression\n", "import statsmodels.api as sm\n", "import statsmodels.formula.api as smf" ] }, { "cell_type": "code", "execution_count": null, "id": "97979947-3f9e-46f3-acee-ed58c507bdae", "metadata": {}, "outputs": [], "source": [ "sns.set()" ] }, { "cell_type": "markdown", "id": "f6a929ac-dcc4-472d-815c-e4aac5df7491", "metadata": {}, "source": [ "## Data Collection & Feature Engineering" ] }, { "cell_type": "code", "execution_count": null, "id": "1b44d915-9f79-4e15-b36d-5b2aea83d20a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>consumer_id</th>\n", " <th>purchase_amount</th>\n", " <th>date_of_purchase</th>\n", " <th>year_of_purchase</th>\n", " <th>days_since</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>760</td>\n", " <td>25.0</td>\n", " <td>2009-11-06</td>\n", " <td>2009</td>\n", " <td>2247</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>860</td>\n", " <td>50.0</td>\n", " <td>2012-09-28</td>\n", " <td>2012</td>\n", " <td>1190</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>1200</td>\n", " <td>100.0</td>\n", " <td>2005-10-25</td>\n", " <td>2005</td>\n", " <td>3720</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>1420</td>\n", " <td>50.0</td>\n", " <td>2009-07-09</td>\n", " <td>2009</td>\n", " <td>2367</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>1940</td>\n", " <td>70.0</td>\n", " <td>2013-01-25</td>\n", " <td>2013</td>\n", " <td>1071</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " consumer_id purchase_amount date_of_purchase year_of_purchase days_since\n", "0 760 25.0 2009-11-06 2009 2247\n", "1 860 50.0 2012-09-28 2012 1190\n", "2 1200 100.0 2005-10-25 2005 3720\n", "3 1420 50.0 2009-07-09 2009 2367\n", "4 1940 70.0 2013-01-25 2013 1071" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(\"W106_purchases.txt\", sep='\\t', names=['consumer_id','purchase_amount', 'date_of_purchase'], parse_dates=['date_of_purchase'] )\n", "df['year_of_purchase'] = df['date_of_purchase'].dt.year\n", "max_date = df['date_of_purchase'].max() + dt.timedelta(days=1)\n", "df['days_since'] = (max_date - df['date_of_purchase']).dt.days\n", "df.head()" ] }, { "cell_type": "markdown", "id": "025c3128-ac32-4f2e-a8a3-86e83ced6c93", "metadata": {}, "source": [ "## Managerial Segmentation " ] }, { "cell_type": "markdown", "id": "56d08072-e81a-4d4c-a749-9791fe6b5e26", "metadata": {}, "source": [ "### Calculate Customer and segment summary" ] }, { "cell_type": "code", "execution_count": null, "id": "d63777e5-e603-4666-9937-d196e10a79ea", "metadata": {}, "outputs": [], "source": [ "def get_consumer_df(df, offset=0):\n", " df = df.copy()\n", " df = df[df['days_since']>offset]\n", " consumer_df = df.groupby('consumer_id').agg(recency=('days_since', 'min'),\n", " frequency=('date_of_purchase', 'count'),\n", " monetary=('purchase_amount', 'mean'),\n", " # avg_purchase_amount=('purchase_amount', 'mean'),\n", " max_purchase_amount=('purchase_amount', 'max'),\n", " first_purchase=('days_since', 'max'), \n", " revenue_till_date=('purchase_amount', 'sum'),\n", " first_purchase_date=('date_of_purchase', 'min'),\n", " first_purchase_year=('year_of_purchase', 'min'),\n", " last_purchase_date=('date_of_purchase', 'max'),\n", " last_purchase_year=('year_of_purchase', 'max'),\n", " ); consumer_df.head()\n", " consumer_df['recency'] = consumer_df['recency'] - offset\n", " consumer_df['first_purchase'] = consumer_df['first_purchase'] - offset\n", " consumer_df['segment'] = np.nan\n", " rec_filter = consumer_df['recency'] \n", " first_purchase_filter = consumer_df['first_purchase']\n", " consumer_df.loc[(rec_filter <=365) & (consumer_df['monetary'] >= 100), 'segment'] = 'active high'\n", " consumer_df.loc[(rec_filter <=365) & (consumer_df['monetary'] < 100), 'segment'] = 'active low'\"\"\n", " consumer_df.loc[(rec_filter <=365) & (first_purchase_filter <=365) , 'segment'] = 'new active'\n", " # consumer_df.loc[(rec_filter <=365) & (first_purchase_filter <=365) & (consumer_df['monetary'] < 100) , 'segment'] = 'new active low'\n", " consumer_df.loc[(rec_filter>365) & (rec_filter<=2*365) & (consumer_df['monetary'] >= 100), 'segment'] = 'warm high'\n", " consumer_df.loc[(rec_filter>365) & (rec_filter<=2*365) & (consumer_df['monetary'] < 100), 'segment'] = 'warm low'\n", " consumer_df.loc[(rec_filter>365) & (rec_filter<=2*365) & (first_purchase_filter <=2*365) & (consumer_df['monetary'] >= 100) , 'segment'] = 'new warm high'\n", " consumer_df.loc[(rec_filter>365) & (rec_filter<=2*365) & (first_purchase_filter <=2*365) & (consumer_df['monetary'] < 100) , 'segment'] = 'new warm low'\n", " consumer_df.loc[(rec_filter>2*365) & (rec_filter<=3*365), 'segment'] = 'cold'\n", " consumer_df.loc[(rec_filter>3*365), 'segment'] = 'inactive'\n", " return consumer_df" ] }, { "cell_type": "code", "execution_count": null, "id": "0a502267-00fa-41fa-8b20-289198d62c92", "metadata": {}, "outputs": [], "source": [ "def get_segment_summary(consumer_df):\n", " if 'revenue' in consumer_df.columns:\n", " segment_df = consumer_df.groupby('segment').agg(size=('recency', 'count'),\n", " revenue_today=('revenue', 'sum'),\n", " avg_revenue_today=('revenue', 'mean'),\n", " revenue_till_date=('revenue_till_date', 'sum'),\n", " avg_revenue_till_date=('revenue_till_date', 'mean'),\n", " recency_min=('recency', 'min'),\n", " recency=('recency', 'mean'),\n", " recency_max=('recency', 'max'),\n", " frequency_min=('frequency', 'min'),\n", " frequency=('frequency', 'mean'),\n", " frequency_max=('frequency', 'max'),\n", " monetary_min=('monetary', 'min'),\n", " monetary=('monetary', 'mean'),\n", " monetary_max=('monetary', 'max'),\n", " first_purchase=('first_purchase', 'max'),\n", " first_purchase_date=('first_purchase_date', 'min'),\n", " first_purchase_year=('first_purchase_year', 'min'),\n", " last_purchase_date=('last_purchase_date', 'max'),\n", " last_purchase_year=('last_purchase_year', 'max')\n", " ).sort_values(by=['last_purchase_year', 'first_purchase_year','monetary'], ascending=[False, False, False])\n", " else:\n", " segment_df = consumer_df.groupby('segment').agg(size=('recency', 'count'),\n", " revenue_till_date=('revenue_till_date', 'sum'),\n", " avg_revenue_till_date=('revenue_till_date', 'mean'),\n", " recency_min=('recency', 'min'),\n", " recency=('recency', 'mean'),\n", " recency_max=('recency', 'max'),\n", " frequency_min=('frequency', 'min'),\n", " frequency=('frequency', 'mean'),\n", " frequency_max=('frequency', 'max'),\n", " monetary_min=('monetary', 'min'),\n", " monetary=('monetary', 'mean'),\n", " monetary_max=('monetary', 'max'),\n", " first_purchase=('first_purchase', 'max'),\n", " first_purchase_date=('first_purchase_date', 'min'),\n", " first_purchase_year=('first_purchase_year', 'min'),\n", " last_purchase_date=('last_purchase_date', 'max'),\n", " last_purchase_year=('last_purchase_year', 'max')\n", " ).sort_values(by=['last_purchase_year', 'first_purchase_year','monetary'], ascending=[False, False, False]) \n", " return segment_df" ] }, { "cell_type": "code", "execution_count": null, "id": "2317eda2-df27-49b4-b2a8-69026090c906", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>recency</th>\n", " <th>frequency</th>\n", " <th>monetary</th>\n", " <th>max_purchase_amount</th>\n", " <th>first_purchase</th>\n", " <th>revenue_till_date</th>\n", " <th>first_purchase_date</th>\n", " <th>first_purchase_year</th>\n", " <th>last_purchase_date</th>\n", " <th>last_purchase_year</th>\n", " <th>segment</th>\n", " </tr>\n", " <tr>\n", " <th>consumer_id</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>10</th>\n", " <td>3829</td>\n", " <td>1</td>\n", " <td>30.000000</td>\n", " <td>30.0</td>\n", " <td>3829</td>\n", " <td>30.0</td>\n", " <td>2005-07-08</td>\n", " <td>2005</td>\n", " <td>2005-07-08</td>\n", " <td>2005</td>\n", " <td>inactive</td>\n", " </tr>\n", " <tr>\n", " <th>80</th>\n", " <td>343</td>\n", " <td>7</td>\n", " <td>71.428571</td>\n", " <td>80.0</td>\n", " <td>3751</td>\n", " <td>500.0</td>\n", " <td>2005-09-24</td>\n", " <td>2005</td>\n", " <td>2015-01-23</td>\n", " <td>2015</td>\n", " <td>active low</td>\n", " </tr>\n", " <tr>\n", " <th>90</th>\n", " <td>758</td>\n", " <td>10</td>\n", " <td>115.800000</td>\n", " <td>153.0</td>\n", " <td>3783</td>\n", " <td>1158.0</td>\n", " <td>2005-08-23</td>\n", " <td>2005</td>\n", " <td>2013-12-04</td>\n", " <td>2013</td>\n", " <td>cold</td>\n", " </tr>\n", " <tr>\n", " <th>120</th>\n", " <td>1401</td>\n", " <td>1</td>\n", " <td>20.000000</td>\n", " <td>20.0</td>\n", " <td>1401</td>\n", " <td>20.0</td>\n", " <td>2012-03-01</td>\n", " <td>2012</td>\n", " <td>2012-03-01</td>\n", " <td>2012</td>\n", " <td>inactive</td>\n", " </tr>\n", " <tr>\n", " <th>130</th>\n", " <td>2970</td>\n", " <td>2</td>\n", " <td>50.000000</td>\n", " <td>60.0</td>\n", " <td>3710</td>\n", " <td>100.0</td>\n", " <td>2005-11-04</td>\n", " <td>2005</td>\n", " <td>2007-11-14</td>\n", " <td>2007</td>\n", " <td>inactive</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " recency frequency monetary max_purchase_amount \\\n", "consumer_id \n", "10 3829 1 30.000000 30.0 \n", "80 343 7 71.428571 80.0 \n", "90 758 10 115.800000 153.0 \n", "120 1401 1 20.000000 20.0 \n", "130 2970 2 50.000000 60.0 \n", "\n", " first_purchase revenue_till_date first_purchase_date \\\n", "consumer_id \n", "10 3829 30.0 2005-07-08 \n", "80 3751 500.0 2005-09-24 \n", "90 3783 1158.0 2005-08-23 \n", "120 1401 20.0 2012-03-01 \n", "130 3710 100.0 2005-11-04 \n", "\n", " first_purchase_year last_purchase_date last_purchase_year \\\n", "consumer_id \n", "10 2005 2005-07-08 2005 \n", "80 2005 2015-01-23 2015 \n", "90 2005 2013-12-04 2013 \n", "120 2012 2012-03-01 2012 \n", "130 2005 2007-11-14 2007 \n", "\n", " segment \n", "consumer_id \n", "10 inactive \n", "80 active low \n", "90 cold \n", "120 inactive \n", "130 inactive " ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cons_df_2014 = get_consumer_df(df, offset=365); cons_df_2014.head()\n", "cons_df_2015 = get_consumer_df(df, offset=0); cons_df_2015.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "493a9128-f717-47c2-b817-426486c6c11f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>size</th>\n", " <th>revenue_till_date</th>\n", " <th>avg_revenue_till_date</th>\n", " <th>recency_min</th>\n", " <th>recency</th>\n", " <th>recency_max</th>\n", " <th>frequency_min</th>\n", " <th>frequency</th>\n", " <th>frequency_max</th>\n", " <th>monetary_min</th>\n", " <th>monetary</th>\n", " <th>monetary_max</th>\n", " <th>first_purchase</th>\n", " <th>first_purchase_date</th>\n", " <th>first_purchase_year</th>\n", " <th>last_purchase_date</th>\n", " <th>last_purchase_year</th>\n", " </tr>\n", " <tr>\n", " <th>segment</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>new active</th>\n", " <td>1512</td>\n", " <td>119699.21</td>\n", " <td>79.166144</td>\n", " <td>1</td>\n", " <td>84.990741</td>\n", " <td>360</td>\n", " <td>1</td>\n", " <td>1.045635</td>\n", " <td>11</td>\n", " <td>5.0</td>\n", " <td>77.133847</td>\n", " <td>4500.000000</td>\n", " <td>360</td>\n", " <td>2015-01-06</td>\n", " <td>2015</td>\n", " <td>2015-12-31</td>\n", " <td>2015</td>\n", " </tr>\n", " <tr>\n", " <th>active high</th>\n", " <td>573</td>\n", " <td>788194.95</td>\n", " <td>1375.558377</td>\n", " <td>1</td>\n", " <td>88.820244</td>\n", " <td>344</td>\n", " <td>2</td>\n", " <td>5.888307</td>\n", " <td>19</td>\n", " <td>100.0</td>\n", " <td>240.045740</td>\n", " <td>4500.000000</td>\n", " <td>4010</td>\n", " <td>2005-01-08</td>\n", " <td>2005</td>\n", " <td>2015-12-31</td>\n", " <td>2015</td>\n", " </tr>\n", " <tr>\n", " <th>active low</th>\n", " <td>3313</td>\n", " <td>810481.97</td>\n", " <td>244.636876</td>\n", " <td>1</td>\n", " <td>108.361002</td>\n", " <td>359</td>\n", " <td>2</td>\n", " <td>5.935406</td>\n", " <td>45</td>\n", " <td>5.0</td>\n", " <td>40.724525</td>\n", " <td>99.000000</td>\n", " <td>4012</td>\n", " <td>2005-01-06</td>\n", " <td>2005</td>\n", " <td>2015-12-31</td>\n", " <td>2015</td>\n", " </tr>\n", " <tr>\n", " <th>new warm high</th>\n", " <td>116</td>\n", " <td>36166.22</td>\n", " <td>311.777759</td>\n", " <td>366</td>\n", " <td>476.060345</td>\n", " <td>701</td>\n", " <td>1</td>\n", " <td>1.060345</td>\n", " <td>4</td>\n", " <td>100.0</td>\n", " <td>302.036379</td>\n", " <td>4000.000000</td>\n", " <td>707</td>\n", " <td>2014-01-24</td>\n", " <td>2014</td>\n", " <td>2014-12-31</td>\n", " <td>2014</td>\n", " </tr>\n", " <tr>\n", " <th>new warm low</th>\n", " <td>822</td>\n", " <td>28854.50</td>\n", " <td>35.102798</td>\n", " <td>366</td>\n", " <td>513.996350</td>\n", " <td>720</td>\n", " <td>1</td>\n", " <td>1.042579</td>\n", " <td>3</td>\n", " <td>5.0</td>\n", " <td>33.374290</td>\n", " <td>90.000000</td>\n", " <td>720</td>\n", " <td>2014-01-11</td>\n", " <td>2014</td>\n", " <td>2014-12-31</td>\n", " <td>2014</td>\n", " </tr>\n", " <tr>\n", " <th>warm high</th>\n", " <td>119</td>\n", " <td>168498.98</td>\n", " <td>1415.957815</td>\n", " <td>366</td>\n", " <td>455.126050</td>\n", " <td>665</td>\n", " <td>2</td>\n", " <td>4.714286</td>\n", " <td>14</td>\n", " <td>100.0</td>\n", " <td>327.407457</td>\n", " <td>4000.000000</td>\n", " <td>4004</td>\n", " <td>2005-01-14</td>\n", " <td>2005</td>\n", " <td>2014-12-31</td>\n", " <td>2014</td>\n", " </tr>\n", " <tr>\n", " <th>warm low</th>\n", " <td>901</td>\n", " <td>159740.22</td>\n", " <td>177.292142</td>\n", " <td>366</td>\n", " <td>474.377358</td>\n", " <td>707</td>\n", " <td>2</td>\n", " <td>4.531632</td>\n", " <td>20</td>\n", " <td>5.0</td>\n", " <td>38.591926</td>\n", " <td>96.428571</td>\n", " <td>4011</td>\n", " <td>2005-01-07</td>\n", " <td>2005</td>\n", " <td>2014-12-31</td>\n", " <td>2014</td>\n", " </tr>\n", " <tr>\n", " <th>cold</th>\n", " <td>1903</td>\n", " <td>232402.29</td>\n", " <td>122.124167</td>\n", " <td>731</td>\n", " <td>857.781398</td>\n", " <td>1087</td>\n", " <td>1</td>\n", " <td>2.303205</td>\n", " <td>20</td>\n", " <td>5.0</td>\n", " <td>51.739893</td>\n", " <td>2000.000000</td>\n", " <td>4016</td>\n", " <td>2005-01-02</td>\n", " <td>2005</td>\n", " <td>2013-12-31</td>\n", " <td>2013</td>\n", " </tr>\n", " <tr>\n", " <th>inactive</th>\n", " <td>9158</td>\n", " <td>850306.56</td>\n", " <td>92.848500</td>\n", " <td>1096</td>\n", " <td>2178.110832</td>\n", " <td>4014</td>\n", " <td>1</td>\n", " <td>1.814479</td>\n", " <td>23</td>\n", " <td>5.0</td>\n", " <td>48.112771</td>\n", " <td>3043.750000</td>\n", " <td>4016</td>\n", " <td>2005-01-02</td>\n", " <td>2005</td>\n", " <td>2012-12-31</td>\n", " <td>2012</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " size revenue_till_date avg_revenue_till_date recency_min \\\n", "segment \n", "new active 1512 119699.21 79.166144 1 \n", "active high 573 788194.95 1375.558377 1 \n", "active low 3313 810481.97 244.636876 1 \n", "new warm high 116 36166.22 311.777759 366 \n", "new warm low 822 28854.50 35.102798 366 \n", "warm high 119 168498.98 1415.957815 366 \n", "warm low 901 159740.22 177.292142 366 \n", "cold 1903 232402.29 122.124167 731 \n", "inactive 9158 850306.56 92.848500 1096 \n", "\n", " recency recency_max frequency_min frequency \\\n", "segment \n", "new active 84.990741 360 1 1.045635 \n", "active high 88.820244 344 2 5.888307 \n", "active low 108.361002 359 2 5.935406 \n", "new warm high 476.060345 701 1 1.060345 \n", "new warm low 513.996350 720 1 1.042579 \n", "warm high 455.126050 665 2 4.714286 \n", "warm low 474.377358 707 2 4.531632 \n", "cold 857.781398 1087 1 2.303205 \n", "inactive 2178.110832 4014 1 1.814479 \n", "\n", " frequency_max monetary_min monetary monetary_max \\\n", "segment \n", "new active 11 5.0 77.133847 4500.000000 \n", "active high 19 100.0 240.045740 4500.000000 \n", "active low 45 5.0 40.724525 99.000000 \n", "new warm high 4 100.0 302.036379 4000.000000 \n", "new warm low 3 5.0 33.374290 90.000000 \n", "warm high 14 100.0 327.407457 4000.000000 \n", "warm low 20 5.0 38.591926 96.428571 \n", "cold 20 5.0 51.739893 2000.000000 \n", "inactive 23 5.0 48.112771 3043.750000 \n", "\n", " first_purchase first_purchase_date first_purchase_year \\\n", "segment \n", "new active 360 2015-01-06 2015 \n", "active high 4010 2005-01-08 2005 \n", "active low 4012 2005-01-06 2005 \n", "new warm high 707 2014-01-24 2014 \n", "new warm low 720 2014-01-11 2014 \n", "warm high 4004 2005-01-14 2005 \n", "warm low 4011 2005-01-07 2005 \n", "cold 4016 2005-01-02 2005 \n", "inactive 4016 2005-01-02 2005 \n", "\n", " last_purchase_date last_purchase_year \n", "segment \n", "new active 2015-12-31 2015 \n", "active high 2015-12-31 2015 \n", "active low 2015-12-31 2015 \n", "new warm high 2014-12-31 2014 \n", "new warm low 2014-12-31 2014 \n", "warm high 2014-12-31 2014 \n", "warm low 2014-12-31 2014 \n", "cold 2013-12-31 2013 \n", "inactive 2012-12-31 2012 " ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "get_segment_summary(cons_df_2015)" ] }, { "cell_type": "code", "execution_count": null, "id": "c920355a-80a0-408e-b290-8c206c4e78ef", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['new active',\n", " 'active high',\n", " 'active low',\n", " 'new warm high',\n", " 'new warm low',\n", " 'warm high',\n", " 'warm low',\n", " 'cold',\n", " 'inactive']" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "segment_order = get_segment_summary(cons_df_2015).index.tolist(); segment_order" ] }, { "cell_type": "markdown", "id": "a093bc5e-f062-4680-b8e4-cfd42be2cf87", "metadata": {}, "source": [ "### Calculate Transition Matrix" ] }, { "cell_type": "code", "execution_count": null, "id": "f7b9a7f3-f354-404b-ab87-bac19e96dd8b", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>segment_x</th>\n", " <th>segment_y</th>\n", " </tr>\n", " <tr>\n", " <th>consumer_id</th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>10</th>\n", " <td>inactive</td>\n", " <td>inactive</td>\n", " </tr>\n", " <tr>\n", " <th>80</th>\n", " <td>active low</td>\n", " <td>active low</td>\n", " </tr>\n", " <tr>\n", " <th>90</th>\n", " <td>warm high</td>\n", " <td>cold</td>\n", " </tr>\n", " <tr>\n", " <th>120</th>\n", " <td>cold</td>\n", " <td>inactive</td>\n", " </tr>\n", " <tr>\n", " <th>130</th>\n", " <td>inactive</td>\n", " <td>inactive</td>\n", " </tr>\n", " <tr>\n", " <th>160</th>\n", " <td>inactive</td>\n", " <td>inactive</td>\n", " </tr>\n", " <tr>\n", " <th>190</th>\n", " <td>inactive</td>\n", " <td>inactive</td>\n", " </tr>\n", " <tr>\n", " <th>220</th>\n", " <td>inactive</td>\n", " <td>inactive</td>\n", " </tr>\n", " <tr>\n", " <th>230</th>\n", " <td>inactive</td>\n", " <td>inactive</td>\n", " </tr>\n", " <tr>\n", " <th>240</th>\n", " <td>active low</td>\n", " <td>warm low</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " segment_x segment_y\n", "consumer_id \n", "10 inactive inactive\n", "80 active low active low\n", "90 warm high cold\n", "120 cold inactive\n", "130 inactive inactive\n", "160 inactive inactive\n", "190 inactive inactive\n", "220 inactive inactive\n", "230 inactive inactive\n", "240 active low warm low" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_transition = pd.merge(cons_df_2014['segment'] , cons_df_2015['segment'], left_index=True, right_index=True, how='left')\n", "df_transition.head(10)" ] }, { "cell_type": "code", "execution_count": null, "id": "1d0eb718-1ccd-482c-ac53-d6f91f054f45", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th>segment_y</th>\n", " <th>active high</th>\n", " <th>active low</th>\n", " <th>cold</th>\n", " <th>inactive</th>\n", " <th>new warm high</th>\n", " <th>new warm low</th>\n", " <th>warm high</th>\n", " <th>warm low</th>\n", " </tr>\n", " <tr>\n", " <th>segment_x</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>active high</th>\n", " <td>354</td>\n", " <td>2</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>119</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>active low</th>\n", " <td>22</td>\n", " <td>2088</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>901</td>\n", " </tr>\n", " <tr>\n", " <th>cold</th>\n", " <td>22</td>\n", " <td>200</td>\n", " <td>0</td>\n", " <td>1931</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>inactive</th>\n", " <td>35</td>\n", " <td>250</td>\n", " <td>0</td>\n", " <td>7227</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>new active</th>\n", " <td>89</td>\n", " <td>410</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>116</td>\n", " <td>822</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>new warm high</th>\n", " <td>15</td>\n", " <td>0</td>\n", " <td>112</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>new warm low</th>\n", " <td>0</td>\n", " <td>96</td>\n", " <td>1027</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>warm high</th>\n", " <td>35</td>\n", " <td>1</td>\n", " <td>75</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " <tr>\n", " <th>warm low</th>\n", " <td>1</td>\n", " <td>266</td>\n", " <td>689</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " <td>0</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ "segment_y active high active low cold inactive new warm high \\\n", "segment_x \n", "active high 354 2 0 0 0 \n", "active low 22 2088 0 0 0 \n", "cold 22 200 0 1931 0 \n", "inactive 35 250 0 7227 0 \n", "new active 89 410 0 0 116 \n", "new warm high 15 0 112 0 0 \n", "new warm low 0 96 1027 0 0 \n", "warm high 35 1 75 0 0 \n", "warm low 1 266 689 0 0 \n", "\n", "segment_y new warm low warm high warm low \n", "segment_x \n", "active high 0 119 0 \n", "active low 0 0 901 \n", "cold 0 0 0 \n", "inactive 0 0 0 \n", "new active 822 0 0 \n", "new warm high 0 0 0 \n", "new warm low 0 0 0 \n", "warm high 0 0 0 \n", "warm low 0 0 0 " ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "transition_count = pd.crosstab(df_transition.segment_x, df_transition.segment_y); transition_count" ] }, { "cell_type": "code", "execution_count": null, "id": "91d89b50-5c23-4821-a114-ce4f4490a51a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "segment_x\n", "active high 475\n", "active low 3011\n", "cold 2153\n", "inactive 7512\n", "new active 1437\n", "new warm high 127\n", "new warm low 1123\n", "warm high 111\n", "warm low 956\n", "dtype: int64" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "transition_count.sum(axis=1)" ] }, { "cell_type": "code", "execution_count": null, "id": "7371191b-89bd-4214-9bf8-87fc7a76bf79", "metadata": {}, "outputs": [], "source": [ "transition_matrix = (transition_count.T/ transition_count.sum(axis=1)).T" ] }, { "cell_type": "code", "execution_count": null, "id": "abc843fc-6f12-49f6-b815-641251fcf4f2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['new active'], dtype='object')" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "transition_matrix.index.difference(transition_matrix.columns)" ] }, { "cell_type": "code", "execution_count": null, "id": "89309567-ce18-4018-9c42-c85b84d7875b", "metadata": {}, "outputs": [], "source": [ "transition_matrix[transition_matrix.index.difference(transition_matrix.columns)] = np.nan" ] }, { "cell_type": "code", "execution_count": null, "id": "7716edfd-6828-4462-9337-c0e53b8c3aca", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th>segment_y</th>\n", " <th>new active</th>\n", " <th>active high</th>\n", " <th>active low</th>\n", " <th>new warm high</th>\n", " <th>new warm low</th>\n", " <th>warm high</th>\n", " <th>warm low</th>\n", " <th>cold</th>\n", " <th>inactive</th>\n", " </tr>\n", " <tr>\n", " <th>segment_x</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>new active</th>\n", " <td>0.0</td>\n", " <td>0.061935</td>\n", " <td>0.285317</td>\n", " <td>0.080724</td>\n", " <td>0.572025</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>active high</th>\n", " <td>0.0</td>\n", " <td>0.745263</td>\n", " <td>0.004211</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.250526</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>active low</th>\n", " <td>0.0</td>\n", " <td>0.007307</td>\n", " <td>0.693457</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.299236</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>new warm high</th>\n", " <td>0.0</td>\n", " <td>0.118110</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.881890</td>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>new warm low</th>\n", " <td>0.0</td>\n", " <td>0.000000</td>\n", " <td>0.085485</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.914515</td>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>warm high</th>\n", " <td>0.0</td>\n", " <td>0.315315</td>\n", " <td>0.009009</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.675676</td>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>warm low</th>\n", " <td>0.0</td>\n", " <td>0.001046</td>\n", " <td>0.278243</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.720711</td>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>cold</th>\n", " <td>0.0</td>\n", " <td>0.010218</td>\n", " <td>0.092894</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.896888</td>\n", " </tr>\n", " <tr>\n", " <th>inactive</th>\n", " <td>0.0</td>\n", " <td>0.004659</td>\n", " <td>0.033280</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.962061</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ "segment_y new active active high active low new warm high \\\n", "segment_x \n", "new active 0.0 0.061935 0.285317 0.080724 \n", "active high 0.0 0.745263 0.004211 0.000000 \n", "active low 0.0 0.007307 0.693457 0.000000 \n", "new warm high 0.0 0.118110 0.000000 0.000000 \n", "new warm low 0.0 0.000000 0.085485 0.000000 \n", "warm high 0.0 0.315315 0.009009 0.000000 \n", "warm low 0.0 0.001046 0.278243 0.000000 \n", "cold 0.0 0.010218 0.092894 0.000000 \n", "inactive 0.0 0.004659 0.033280 0.000000 \n", "\n", "segment_y new warm low warm high warm low cold inactive \n", "segment_x \n", "new active 0.572025 0.000000 0.000000 0.000000 0.000000 \n", "active high 0.000000 0.250526 0.000000 0.000000 0.000000 \n", "active low 0.000000 0.000000 0.299236 0.000000 0.000000 \n", "new warm high 0.000000 0.000000 0.000000 0.881890 0.000000 \n", "new warm low 0.000000 0.000000 0.000000 0.914515 0.000000 \n", "warm high 0.000000 0.000000 0.000000 0.675676 0.000000 \n", "warm low 0.000000 0.000000 0.000000 0.720711 0.000000 \n", "cold 0.000000 0.000000 0.000000 0.000000 0.896888 \n", "inactive 0.000000 0.000000 0.000000 0.000000 0.962061 " ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def get_transition_matrix(cons_df_pre, cons_df_curr, order=None):\n", " df_transition = pd.merge(cons_df_pre['segment'] , cons_df_curr['segment'], left_index=True, right_index=True, how='left')\n", " transition_count = pd.crosstab(df_transition.segment_x, df_transition.segment_y)\n", " transition_matrix = (transition_count.T/ transition_count.sum(axis=1)).T\n", " transition_matrix[transition_matrix.index.difference(transition_matrix.columns)] = 0.0\n", " if order: return transition_matrix.loc[order, order]\n", " return transition_matrix\n", " \n", " \n", "transition_matrix = get_transition_matrix(cons_df_2014, cons_df_2015, order=segment_order); transition_matrix" ] }, { "cell_type": "markdown", "id": "a7b0b15b-88c8-48e7-bf35-0341b57e6f97", "metadata": {}, "source": [ "### Calculate Segment Customer Matrix " ] }, { "cell_type": "code", "execution_count": null, "id": "5dfff67b-7c81-4c84-814a-446e5add14cd", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['new active', 'active high', 'active low', 'new warm high',\n", " 'new warm low', 'warm high', 'warm low', 'cold', 'inactive'],\n", " dtype='object', name='segment_y')" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "transition_matrix.columns" ] }, { "cell_type": "code", "execution_count": null, "id": "cac0171a-4d70-4d40-bfe7-5ff9f149b8e3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "segment\n", "new active 1512\n", "active high 573\n", "active low 3313\n", "new warm high 116\n", "new warm low 822\n", "warm high 119\n", "warm low 901\n", "cold 1903\n", "inactive 9158\n", "Name: size, dtype: int64" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "segment_vector = get_segment_summary(cons_df_2015)['size']; segment_vector" ] }, { "cell_type": "code", "execution_count": null, "id": "3e1a1d43-169d-4d09-acc0-5bf2c24390cb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "segment\n", "new active 1000.0\n", "active high 0.0\n", "active low 0.0\n", "new warm high 0.0\n", "new warm low 0.0\n", "warm high 0.0\n", "warm low 0.0\n", "cold 0.0\n", "inactive 0.0\n", "Name: size, dtype: float64" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_customer_vector = segment_vector.copy()*0.0\n", "new_customer_vector.loc['new active'] = 1000\n", "new_customer_vector" ] }, { "cell_type": "code", "execution_count": null, "id": "99f56cba-6c8a-44c7-911c-1b7afaf64450", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "5cb1d955-0956-4d53-9e96-5e5e5181d1ee", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "duration = 10\n", "list(range(10))" ] }, { "cell_type": "code", "execution_count": null, "id": "3c45c0f4-4494-486d-8076-fd23d1d9342e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th>segment_y</th>\n", " <th>new active</th>\n", " <th>active high</th>\n", " <th>active low</th>\n", " <th>new warm high</th>\n", " <th>new warm low</th>\n", " <th>warm high</th>\n", " <th>warm low</th>\n", " <th>cold</th>\n", " <th>inactive</th>\n", " </tr>\n", " <tr>\n", " <th>segment_x</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>new active</th>\n", " <td>0.0</td>\n", " <td>0.061935</td>\n", " <td>0.285317</td>\n", " <td>0.080724</td>\n", " <td>0.572025</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>active high</th>\n", " <td>0.0</td>\n", " <td>0.745263</td>\n", " <td>0.004211</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.250526</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>active low</th>\n", " <td>0.0</td>\n", " <td>0.007307</td>\n", " <td>0.693457</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.299236</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>new warm high</th>\n", " <td>0.0</td>\n", " <td>0.118110</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.881890</td>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>new warm low</th>\n", " <td>0.0</td>\n", " <td>0.000000</td>\n", " <td>0.085485</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.914515</td>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>warm high</th>\n", " <td>0.0</td>\n", " <td>0.315315</td>\n", " <td>0.009009</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.675676</td>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>warm low</th>\n", " <td>0.0</td>\n", " <td>0.001046</td>\n", " <td>0.278243</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.720711</td>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>cold</th>\n", " <td>0.0</td>\n", " <td>0.010218</td>\n", " <td>0.092894</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.896888</td>\n", " </tr>\n", " <tr>\n", " <th>inactive</th>\n", " <td>0.0</td>\n", " <td>0.004659</td>\n", " <td>0.033280</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.000000</td>\n", " <td>0.962061</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ "segment_y new active active high active low new warm high \\\n", "segment_x \n", "new active 0.0 0.061935 0.285317 0.080724 \n", "active high 0.0 0.745263 0.004211 0.000000 \n", "active low 0.0 0.007307 0.693457 0.000000 \n", "new warm high 0.0 0.118110 0.000000 0.000000 \n", "new warm low 0.0 0.000000 0.085485 0.000000 \n", "warm high 0.0 0.315315 0.009009 0.000000 \n", "warm low 0.0 0.001046 0.278243 0.000000 \n", "cold 0.0 0.010218 0.092894 0.000000 \n", "inactive 0.0 0.004659 0.033280 0.000000 \n", "\n", "segment_y new warm low warm high warm low cold inactive \n", "segment_x \n", "new active 0.572025 0.000000 0.000000 0.000000 0.000000 \n", "active high 0.000000 0.250526 0.000000 0.000000 0.000000 \n", "active low 0.000000 0.000000 0.299236 0.000000 0.000000 \n", "new warm high 0.000000 0.000000 0.000000 0.881890 0.000000 \n", "new warm low 0.000000 0.000000 0.000000 0.914515 0.000000 \n", "warm high 0.000000 0.000000 0.000000 0.675676 0.000000 \n", "warm low 0.000000 0.000000 0.000000 0.720711 0.000000 \n", "cold 0.000000 0.000000 0.000000 0.000000 0.896888 \n", "inactive 0.000000 0.000000 0.000000 0.000000 0.962061 " ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "transition_matrix" ] }, { "cell_type": "code", "execution_count": null, "id": "7840a1c0-bff2-4dee-aaa4-483e8c18d16a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "segment\n", "new active 1512\n", "active high 573\n", "active low 3313\n", "new warm high 116\n", "new warm low 822\n", "warm high 119\n", "warm low 901\n", "cold 1903\n", "inactive 9158\n", "Name: size, dtype: int64" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "segment_vector" ] }, { "cell_type": "code", "execution_count": null, "id": "12234532-a122-4888-a06c-4d768eaf52d2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "segment_y\n", "new active 0.000000\n", "active high 659.167726\n", "active low 3534.828749\n", "new warm high 122.054280\n", "new warm low 864.901879\n", "warm high 143.551579\n", "warm low 991.369313\n", "cold 1583.796574\n", "inactive 10517.329901\n", "dtype: float64" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "transition_matrix.multiply(segment_vector, axis='index').sum()" ] }, { "cell_type": "code", "execution_count": null, "id": "26c2ff76-df1d-4c9f-8149-c474ee245fb3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>2015</th>\n", " <th>2016</th>\n", " <th>2017</th>\n", " <th>2018</th>\n", " <th>2019</th>\n", " <th>2020</th>\n", " <th>2021</th>\n", " <th>2022</th>\n", " <th>2023</th>\n", " <th>2024</th>\n", " </tr>\n", " <tr>\n", " <th>segment</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>new active</th>\n", " <td>1512</td>\n", " <td>0.000000</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " </tr>\n", " <tr>\n", " <th>active high</th>\n", " <td>573</td>\n", " <td>659.167726</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " </tr>\n", " <tr>\n", " <th>active low</th>\n", " <td>3313</td>\n", " <td>3534.828749</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " </tr>\n", " <tr>\n", " <th>new warm high</th>\n", " <td>116</td>\n", " <td>122.054280</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " </tr>\n", " <tr>\n", " <th>new warm low</th>\n", " <td>822</td>\n", " <td>864.901879</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " </tr>\n", " <tr>\n", " <th>warm high</th>\n", " <td>119</td>\n", " <td>143.551579</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " </tr>\n", " <tr>\n", " <th>warm low</th>\n", " <td>901</td>\n", " <td>991.369313</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " </tr>\n", " <tr>\n", " <th>cold</th>\n", " <td>1903</td>\n", " <td>1583.796574</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " </tr>\n", " <tr>\n", " <th>inactive</th>\n", " <td>9158</td>\n", " <td>10517.329901</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " 2015 2016 2017 2018 2019 2020 2021 2022 2023 \\\n", "segment \n", "new active 1512 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "active high 573 659.167726 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "active low 3313 3534.828749 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "new warm high 116 122.054280 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "new warm low 822 864.901879 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "warm high 119 143.551579 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "warm low 901 991.369313 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "cold 1903 1583.796574 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "inactive 9158 10517.329901 0.0 0.0 0.0 0.0 0.0 0.0 0.0 \n", "\n", " 2024 \n", "segment \n", "new active 0.0 \n", "active high 0.0 \n", "active low 0.0 \n", "new warm high 0.0 \n", "new warm low 0.0 \n", "warm high 0.0 \n", "warm low 0.0 \n", "cold 0.0 \n", "inactive 0.0 " ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "base=2015\n", "segment_matrix = pd.DataFrame(np.zeros([segment_vector.shape[0], duration]), index= segment_vector.index, columns=range(base, duration+base))\n", "segment_matrix.iloc[:, 0] = segment_vector\n", "segment_matrix.iloc[:, 1] = transition_matrix.multiply(segment_vector, axis='index').sum()\n", "segment_matrix" ] }, { "cell_type": "code", "execution_count": null, "id": "c9d3a180-611b-414f-9d53-8a2c7e03cbca", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>2015</th>\n", " <th>2016</th>\n", " <th>2017</th>\n", " <th>2018</th>\n", " <th>2019</th>\n", " <th>2020</th>\n", " <th>2021</th>\n", " <th>2022</th>\n", " <th>2023</th>\n", " <th>2024</th>\n", " <th>2025</th>\n", " </tr>\n", " <tr>\n", " <th>segment</th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>new active</th>\n", " <td>1512</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " </tr>\n", " <tr>\n", " <th>active high</th>\n", " <td>573</td>\n", " <td>659.0</td>\n", " <td>643.0</td>\n", " <td>628.0</td>\n", " <td>610.0</td>\n", " <td>596.0</td>\n", " <td>583.0</td>\n", " <td>572.0</td>\n", " <td>563.0</td>\n", " <td>555.0</td>\n", " <td>548.0</td>\n", " </tr>\n", " <tr>\n", " <th>active low</th>\n", " <td>3313</td>\n", " <td>3535.0</td>\n", " <td>3302.0</td>\n", " <td>3131.0</td>\n", " <td>2952.0</td>\n", " <td>2819.0</td>\n", " <td>2716.0</td>\n", " <td>2636.0</td>\n", " <td>2574.0</td>\n", " <td>2527.0</td>\n", " <td>2490.0</td>\n", " </tr>\n", " <tr>\n", " <th>new warm high</th>\n", " <td>116</td>\n", " <td>122.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " </tr>\n", " <tr>\n", " <th>new warm low</th>\n", " <td>822</td>\n", " <td>865.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " <td>0.0</td>\n", " </tr>\n", " <tr>\n", " <th>warm high</th>\n", " <td>119</td>\n", " <td>144.0</td>\n", " <td>165.0</td>\n", " <td>161.0</td>\n", " <td>157.0</td>\n", " <td>153.0</td>\n", " <td>149.0</td>\n", " <td>146.0</td>\n", " <td>143.0</td>\n", " <td>141.0</td>\n", " <td>139.0</td>\n", " </tr>\n", " <tr>\n", " <th>warm low</th>\n", " <td>901</td>\n", " <td>991.0</td>\n", " <td>1058.0</td>\n", " <td>988.0</td>\n", " <td>937.0</td>\n", " <td>883.0</td>\n", " <td>843.0</td>\n", " <td>813.0</td>\n", " <td>789.0</td>\n", " <td>770.0</td>\n", " <td>756.0</td>\n", " </tr>\n", " <tr>\n", " <th>cold</th>\n", " <td>1903</td>\n", " <td>1584.0</td>\n", " <td>1710.0</td>\n", " <td>874.0</td>\n", " <td>821.0</td>\n", " <td>782.0</td>\n", " <td>740.0</td>\n", " <td>709.0</td>\n", " <td>684.0</td>\n", " <td>665.0</td>\n", " <td>650.0</td>\n", " </tr>\n", " <tr>\n", " <th>inactive</th>\n", " <td>9158</td>\n", " <td>10517.0</td>\n", " <td>11539.0</td>\n", " <td>12635.0</td>\n", " <td>12939.0</td>\n", " <td>13185.0</td>\n", " <td>13385.0</td>\n", " <td>13541.0</td>\n", " <td>13663.0</td>\n", " <td>13759.0</td>\n", " <td>13833.0</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " 2015 2016 2017 2018 2019 2020 2021 \\\n", "segment \n", "new active 1512 0.0 0.0 0.0 0.0 0.0 0.0 \n", "active high 573 659.0 643.0 628.0 610.0 596.0 583.0 \n", "active low 3313 3535.0 3302.0 3131.0 2952.0 2819.0 2716.0 \n", "new warm high 116 122.0 0.0 0.0 0.0 0.0 0.0 \n", "new warm low 822 865.0 0.0 0.0 0.0 0.0 0.0 \n", "warm high 119 144.0 165.0 161.0 157.0 153.0 149.0 \n", "warm low 901 991.0 1058.0 988.0 937.0 883.0 843.0 \n", "cold 1903 1584.0 1710.0 874.0 821.0 782.0 740.0 \n", "inactive 9158 10517.0 11539.0 12635.0 12939.0 13185.0 13385.0 \n", "\n", " 2022 2023 2024 2025 \n", "segment \n", "new active 0.0 0.0 0.0 0.0 \n", "active high 572.0 563.0 555.0 548.0 \n", "active low 2636.0 2574.0 2527.0 2490.0 \n", "new warm high 0.0 0.0 0.0 0.0 \n", "new warm low 0.0 0.0 0.0 0.0 \n", "warm high 146.0 143.0 141.0 139.0 \n", "warm low 813.0 789.0 770.0 756.0 \n", "cold 709.0 684.0 665.0 650.0 \n", "inactive 13541.0 13663.0 13759.0 13833.0 " ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def get_segment_customer_matrix(duration, transition_matrix, segment_vector, new_customer_vector=None, base=2015):\n", " if new_customer_vector is None: new_customer_vector = segment_vector.copy()*0\n", " \n", " segment_matrix = pd.DataFrame(np.zeros([segment_vector.shape[0], duration]), index= segment_vector.index, columns=range(base, duration+base))\n", " segment_matrix.iloc[:, 0] = segment_vector\n", " for i in range(1, duration):\n", " segment_matrix.iloc[:, i] = transition_matrix.multiply(segment_matrix.iloc[:, i-1], axis='index').sum()+new_customer_vector\n", " return round(segment_matrix)\n", "\n", "\n", "segment_customer_matrix = get_segment_customer_matrix(11, transition_matrix, segment_vector, new_customer_vector=None) \n", "segment_customer_matrix" ] }, { "cell_type": "code", "execution_count": null, "id": "0ac95ab0-adcc-43fb-8525-30e7e293a2cc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2015 18417.0\n", "2016 18417.0\n", "2017 18417.0\n", "2018 18417.0\n", "2019 18416.0\n", "2020 18418.0\n", "2021 18416.0\n", "2022 18417.0\n", "2023 18416.0\n", "2024 18417.0\n", "2025 18416.0\n", "dtype: float64" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "segment_customer_matrix.sum()" ] }, { "cell_type": "code", "execution_count": null, "id": "302542ba-4d44-4f84-8367-9b7aea524502", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>revenue_2015</th>\n", " </tr>\n", " <tr>\n", " <th>consumer_id</th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>80</th>\n", " <td>80.0</td>\n", " </tr>\n", " <tr>\n", " <th>480</th>\n", " <td>45.0</td>\n", " </tr>\n", " <tr>\n", " <th>830</th>\n", " <td>50.0</td>\n", " </tr>\n", " <tr>\n", " <th>850</th>\n", " <td>60.0</td>\n", " </tr>\n", " <tr>\n", " <th>860</th>\n", " <td>60.0</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " revenue_2015\n", "consumer_id \n", "80 80.0\n", "480 45.0\n", "830 50.0\n", "850 60.0\n", "860 60.0" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "revenue_df = df[df['year_of_purchase']==2015].groupby('consumer_id').agg(revenue_2015=('purchase_amount', 'sum')); revenue_df.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "b98cb62f-8e19-4cbd-a6f2-f063906b9c1a", "metadata": {}, "outputs": [], "source": [ "segment_revenue_vector=pd.merge(cons_df_2015, revenue_df, left_index=True,right_index=True, how='left')[['segment','revenue_2015']].fillna(0)\\\n", " .groupby('segment').mean().loc[segment_order]" ] }, { "cell_type": "code", "execution_count": null, "id": "e4a7a758-a19d-4d19-9354-7f583f3b24c1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>revenue_2015</th>\n", " </tr>\n", " <tr>\n", " <th>segment</th>\n", " <th></th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>new active</th>\n", " <td>79.166144</td>\n", " </tr>\n", " <tr>\n", " <th>active high</th>\n", " <td>323.568935</td>\n", " </tr>\n", " <tr>\n", " <th>active low</th>\n", " <td>52.306043</td>\n", " </tr>\n", " <tr>\n", " <th>new warm high</th>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>new warm low</th>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>warm high</th>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>warm low</th>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>cold</th>\n", " <td>0.000000</td>\n", " </tr>\n", " <tr>\n", " <th>inactive</th>\n", " <td>0.000000</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " revenue_2015\n", "segment \n", "new active 79.166144\n", "active high 323.568935\n", "active low 52.306043\n", "new warm high 0.000000\n", "new warm low 0.000000\n", "warm high 0.000000\n", "warm low 0.000000\n", "cold 0.000000\n", "inactive 0.000000" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "segment_revenue_vector" ] }, { "cell_type": "code", "execution_count": null, "id": "a93aef7c-74bf-4f92-80d1-6dee6ab99b1e", "metadata": {}, "outputs": [], "source": [ "segment_customer_revenue = segment_customer_matrix.multiply(segment_revenue_vector.values, axis=1)" ] }, { "cell_type": "code", "execution_count": null, "id": "eb2840a2-c3fa-4050-934b-2c71fa0af1aa", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2015 478394.130000\n", "2016 398133.789962\n", "2017 380769.379008\n", "2018 366971.511648\n", "2019 351784.489138\n", "2020 340297.820341\n", "2021 330703.901766\n", "2022 322960.160047\n", "2023 316805.064971\n", "2024 311758.129473\n", "2025 307557.823339\n", "dtype: float64" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "yearly_revenue = segment_customer_revenue.sum(); yearly_revenue" ] }, { "cell_type": "code", "execution_count": null, "id": "22b4d7e1-f158-48af-bf69-8c0c93954293", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1.0,\n", " 0.9090909090909091,\n", " 0.8264462809917354,\n", " 0.7513148009015775,\n", " 0.6830134553650705,\n", " 0.6209213230591549,\n", " 0.5644739300537772,\n", " 0.5131581182307065,\n", " 0.4665073802097331,\n", " 0.42409761837248455,\n", " 0.3855432894295314]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "discount_rate = 0.1\n", "discount_factor = [1/(1+discount_rate)**i for i in range(0,duration+1)]\n", "discount_factor" ] }, { "cell_type": "code", "execution_count": null, "id": "e55f6d2c-3093-4643-a990-bc665eeeea2e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2015 478394.130000\n", "2016 361939.809056\n", "2017 314685.437197\n", "2018 275711.128210\n", "2019 240273.539470\n", "2020 211298.172840\n", "2021 186673.731114\n", "2022 165729.627993\n", "2023 147791.900897\n", "2024 132215.880218\n", "2025 118576.854900\n", "dtype: float64" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(yearly_revenue*discount_factor)" ] }, { "cell_type": "code", "execution_count": null, "id": "e7494ceb-93ee-49a0-aac1-24be94f02c39", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2633290.2118960177" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "(yearly_revenue*discount_factor).sum() # Value of Database" ] }, { "cell_type": "code", "execution_count": null, "id": "c5e335f8-74c3-49dd-acaf-635bd1b6c2e7", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "5c4e0d41-f3d4-416b-9ded-4f6118d3e701", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "c6511aaa-8a90-4f64-802c-c3d84269c391", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "62f411ff-a0ae-4880-a0f8-15a72f8d2024", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 5 }