Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Data Science for Marketing Analytics.: A practical guide to forming a killer marketing strategy through data analysis with Python
Data Science for Marketing Analytics.: A practical guide to forming a killer marketing strategy through data analysis with Python
Data Science for Marketing Analytics.: A practical guide to forming a killer marketing strategy through data analysis with Python
Ebook991 pages6 hours

Data Science for Marketing Analytics.: A practical guide to forming a killer marketing strategy through data analysis with Python

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Unleash the power of data to reach your marketing goals with this practical guide to data science for business.

This book will help you get started on your journey to becoming a master of marketing analytics with Python. You'll work with relevant datasets and build your practical skills by tackling engaging exercises and activities that simulate real-world market analysis projects.

You'll learn to think like a data scientist, build your problem-solving skills, and discover how to look at data in new ways to deliver business insights and make intelligent data-driven decisions.

As well as learning how to clean, explore, and visualize data, you'll implement machine learning algorithms and build models to make predictions. As you work through the book, you'll use Python tools to analyze sales, visualize advertising data, predict revenue, address customer churn, and implement customer segmentation to understand behavior.

By the end of this book, you'll have the knowledge, skills, and confidence to implement data science and machine learning techniques to better understand your marketing data and improve your decision-making.

LanguageEnglish
Release dateSep 7, 2021
ISBN9781800563889
Data Science for Marketing Analytics.: A practical guide to forming a killer marketing strategy through data analysis with Python

Related to Data Science for Marketing Analytics.

Related ebooks

Programming For You

View More

Related articles

Reviews for Data Science for Marketing Analytics.

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Data Science for Marketing Analytics. - Mirza Rahim Baig

    9781800560475_cov_Low_Res.png

    Data Science for Marketing Analytics

    second edition

    A practical guide to forming a killer marketing strategy through data analysis with Python

    Mirza Rahim Baig, Gururajan Govindan, and Vishwesh Ravi Shrimali

    Data Science for Marketing Analytics

    second edition

    Copyright © 2021 Packt Publishing

    All rights reserved. No part of this course may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this course to ensure the accuracy of the information presented. However, the information contained in this course is sold without warranty, either express or implied. Neither the authors nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this course.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this course by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Authors: Mirza Rahim Baig, Gururajan Govindan, and Vishwesh Ravi Shrimali

    Reviewers: Cara Davies and Subhranil Roy

    Managing Editors: Prachi Jain and Abhishek Rane

    Acquisitions Editors: Royluis Rodrigues, Kunal Sawant, and Sneha Shinde

    Production Editor: Salma Patel

    Editorial Board: Megan Carlisle, Mahesh Dhyani, Heather Gopsill, Manasa Kumar, Alex Mazonowicz, Monesh Mirpuri, Bridget Neale, Abhishek Rane, Brendan Rodrigues, Ankita Thakur, Nitesh Thakur, and Jonathan Wray

    First published: March 2019

    First edition authors: Tommy Blanchard, Debasish Behera, and Pranshu Bhatnagar

    Second edition: September 2021

    Production reference: 1060921

    ISBN: 978-1-80056-047-5

    Published by Packt Publishing Ltd.

    Livery Place, 35 Livery Street

    Birmingham B3 2PB, UK

    Table of Contents

    Preface

    1. Data Preparation and Cleaning

    Introduction

    Data Models and Structured Data

    pandas

    Importing and Exporting Data with pandas DataFrames

    Viewing and Inspecting Data in DataFrames

    Exercise 1.01: Loading Data Stored in a JSON File

    Exercise 1.02: Loading Data from Multiple Sources

    Structure of a pandas DataFrame and Series

    Data Manipulation

    Selecting and Filtering in pandas

    Creating DataFrames in Python

    Adding and Removing Attributes and Observations

    Combining Data

    Handling Missing Data

    Exercise 1.03: Combining DataFrames and Handling Missing Values

    Applying Functions and Operations on DataFrames

    Grouping Data

    Exercise 1.04: Applying Data Transformations

    Activity 1.01: Addressing Data Spilling

    Summary

    2. Data Exploration and Visualization

    Introduction

    Identifying and Focusing on the Right Attributes

    The groupby(  ) Function

    The unique(  ) function

    The value_counts(  ) function

    Exercise 2.01: Exploring the Attributes in Sales Data

    Fine Tuning Generated Insights

    Selecting and Renaming Attributes

    Reshaping the Data

    Exercise 2.02: Calculating Conversion Ratios for Website Ads.

    Pivot Tables

    Visualizing Data

    Exercise 2.03: Visualizing Data With pandas

    Visualization through Seaborn

    Visualization with Matplotlib

    Activity 2.01: Analyzing Advertisements

    Summary

    3. Unsupervised Learning and Customer Segmentation

    Introduction

    Segmentation

    Exercise 3.01: Mall Customer Segmentation – Understanding the Data

    Approaches to Segmentation

    Traditional Segmentation Methods

    Exercise 3.02: Traditional Segmentation of Mall Customers

    Unsupervised Learning (Clustering) for Customer Segmentation

    Choosing Relevant Attributes (Segmentation Criteria)

    Standardizing Data

    Exercise 3.03: Standardizing Customer Data

    Calculating Distance

    Exercise 3.04: Calculating the Distance between Customers

    K-Means Clustering

    Exercise 3.05: K-Means Clustering on Mall Customers

    Understanding and Describing the Clusters

    Activity 3.01: Bank Customer Segmentation for Loan Campaign

    Clustering with High-Dimensional Data

    Exercise 3.06: Dealing with High-Dimensional Data

    Activity 3.02: Bank Customer Segmentation with Multiple Features

    Summary

    4. Evaluating and Choosing the Best Segmentation Approach

    Introduction

    Choosing the Number of Clusters

    Exercise 4.01: Data Staging and Visualization

    Simple Visual Inspection to Choose the Optimal Number of Clusters

    Exercise 4.02: Choosing the Number of Clusters Based on Visual Inspection

    The Elbow Method with Sum of Squared Errors

    Exercise 4.03: Determining the Number of Clusters Using the Elbow Method

    Activity 4.01: Optimizing a Luxury Clothing Brand's Marketing Campaign Using Clustering

    More Clustering Techniques

    Mean-Shift Clustering

    Exercise 4.04: Mean-Shift Clustering on Mall Customers

    Benefits and Drawbacks of the Mean-Shift Technique

    k-modes and k-prototypes Clustering

    Exercise 4.05: Clustering Data Using the k-prototypes Method

    Evaluating Clustering

    Silhouette Score

    Exercise 4.06: Using Silhouette Score to Pick Optimal Number of Clusters

    Train and Test Split

    Exercise 4.07: Using a Train-Test Split to Evaluate Clustering Performance

    Activity 4.02: Evaluating Clustering on Customer Data

    The Role of Business in Cluster Evaluation

    Summary

    5. Predicting Customer Revenue Using Linear Regression

    Introduction

    Regression Problems

    Exercise 5.01: Predicting Sales from Advertising Spend Using Linear Regression

    Feature Engineering for Regression

    Feature Creation

    Data Cleaning

    Exercise 5.02: Creating Features for Customer Revenue Prediction

    Assessing Features Using Visualizations and Correlations

    Exercise 5.03: Examining Relationships between Predictors and the Outcome

    Activity 5.01: Examining the Relationship between Store Location and Revenue

    Performing and Interpreting Linear Regression

    Exercise 5.04: Building a Linear Model Predicting Customer Spend

    Activity 5.02: Predicting Store Revenue Using Linear Regression

    Summary

    6. More Tools and Techniques for Evaluating Regression Models

    Introduction

    Evaluating the Accuracy of a Regression Model

    Residuals and Errors

    Mean Absolute Error

    Root Mean Squared Error

    Exercise 6.01: Evaluating Regression Models of Location Revenue Using the MAE and RMSE

    Activity 6.01: Finding Important Variables for Predicting Responses to a Marketing Offer

    Using Recursive Feature Selection for Feature Elimination

    Exercise 6.02: Using RFE for Feature Selection

    Activity 6.02: Using RFE to Choose Features for Predicting Customer Spend

    Tree-Based Regression Models

    Random Forests

    Exercise 6.03: Using Tree-Based Regression Models to Capture Non-Linear Trends

    Activity 6.03: Building the Best Regression Model for Customer Spend Based on Demographic Data

    Summary

    7. Supervised Learning: Predicting Customer Churn

    Introduction

    Classification Problems

    Understanding Logistic Regression

    Revisiting Linear Regression

    Logistic Regression

    Cost Function for Logistic Regression

    Assumptions of Logistic Regression

    Exercise 7.01: Comparing Predictions by Linear and Logistic Regression on the Shill Bidding Dataset

    Creating a Data Science Pipeline

    Churn Prediction Case Study

    Obtaining the Data

    Exercise 7.02: Obtaining the Data

    Scrubbing the Data

    Exercise 7.03: Imputing Missing Values

    Exercise 7.04: Renaming Columns and Changing the Data Type

    Exploring the Data

    Exercise 7.05: Obtaining the Statistical Overview and Correlation Plot

    Visualizing the Data

    Exercise 7.06: Performing Exploratory Data Analysis (EDA)

    Activity 7.01: Performing the OSE technique from OSEMN

    Modeling the Data

    Feature Selection

    Exercise 7.07: Performing Feature Selection

    Model Building

    Exercise 7.08: Building a Logistic Regression Model

    Interpreting the Data

    Activity 7.02: Performing the MN technique from OSEMN

    Summary

    8. Fine-Tuning Classification Algorithms

    Introduction

    Support Vector Machines

    Intuition behind Maximum Margin

    Linearly Inseparable Cases

    Linearly Inseparable Cases Using the Kernel

    Exercise 8.01: Training an SVM Algorithm Over a Dataset

    Decision Trees

    Exercise 8.02: Implementing a Decision Tree Algorithm over a Dataset

    Important Terminology for Decision Trees

    Decision Tree Algorithm Formulation

    Random Forest

    Exercise 8.03: Implementing a Random Forest Model over a Dataset

    Classical Algorithms – Accuracy Compared

    Activity 8.01: Implementing Different Classification Algorithms

    Preprocessing Data for Machine Learning Models

    Standardization

    Exercise 8.04: Standardizing Data

    Scaling

    Exercise 8.05: Scaling Data After Feature Selection

    Normalization

    Exercise 8.06: Performing Normalization on Data

    Model Evaluation

    Exercise 8.07: Stratified K-fold

    Fine-Tuning of the Model

    Exercise 8.08: Fine-Tuning a Model

    Activity 8.02: Tuning and Optimizing the Model

    Performance Metrics

    Precision

    Recall

    F1 Score

    Exercise 8.09: Evaluating the Performance Metrics for a Model

    ROC Curve

    Exercise 8.10: Plotting the ROC Curve

    Activity 8.03: Comparison of the Models

    Summary

    9. Multiclass Classification Algorithms

    Introduction

    Understanding Multiclass Classification

    Classifiers in Multiclass Classification

    Exercise 9.01: Implementing a Multiclass Classification Algorithm on a Dataset

    Performance Metrics

    Exercise 9.02: Evaluating Performance Using Multiclass Performance Metrics

    Activity 9.01: Performing Multiclass Classification and Evaluating Performance

    Class-Imbalanced Data

    Exercise 9.03: Performing Classification on Imbalanced Data

    Dealing with Class-Imbalanced Data

    Exercise 9.04: Fixing the Imbalance of a Dataset Using SMOTE

    Activity 9.02: Dealing with Imbalanced Data Using scikit-learn

    Summary

    Appendix

    Preface

    About the Book

    Unleash the power of data to reach your marketing goals with this practical guide to data science for business.

    This book will help you get started on your journey to becoming a master of marketing analytics with Python. You'll work with relevant datasets and build your practical skills by tackling engaging exercises and activities that simulate real-world market analysis projects.

    You'll learn to think like a data scientist, build your problem-solving skills, and discover how to look at data in new ways to deliver business insights and make intelligent data-driven decisions.

    As well as learning how to clean, explore, and visualize data, you'll implement machine learning algorithms and build models to make predictions. As you work through the book, you'll use Python tools to analyze sales, visualize advertising data, predict revenue, address customer churn, and implement customer segmentation to understand behavior.

    This second edition has been updated to include new case studies that bring a more application-oriented approach to your marketing analytics journey. The code has also been updated to support the latest versions of Python and the popular data science libraries that have been used in the book. The practical exercises and activities have been revamped to prepare you for the real-world problems that marketing analysts need to solve. This will show you how to create a measurable impact on businesses large and small.

    By the end of this book, you'll have the knowledge, skills, and confidence to implement data science and machine learning techniques to better understand your marketing data and improve your decision-making.

    About the Authors

    Mirza Rahim Baig is an avid problem solver who uses deep learning and artificial intelligence to solve complex business problems. He has more than a decade of experience in creating value from data, harnessing the power of the latest in machine learning and AI with proficiency in using unstructured and structured data across areas like marketing, customer experience, catalog, supply chain, and other e-commerce sub-domains. Rahim is also a teacher - designing, creating, teaching data science for various learning platforms. He loves making the complex easy to understand. He is also an author of The Deep Learning Workshop, a hands-on guide to start your deep learning journey and build your own next-generation deep learning models.

    Gururajan Govindan is a data scientist, intrapreneur, and trainer with more than seven years of experience working across domains such as finance and insurance. He is also an author of The Data Analysis Workshop, a book focusing on data analytics. He is well known for his expertise in data-driven decision making and machine learning with Python.

    Vishwesh Ravi Shrimali graduated from BITS Pilani, where he studied mechanical engineering. He has a keen interest in programming and AI and has applied that interest in mechanical engineering projects. He has also written multiple blogs on OpenCV, deep learning, and computer vision. When he is not writing blogs or working on projects, he likes to go on long walks or play his acoustic guitar. He is also an author of The Computer Vision Workshop, a book focusing on OpenCV and its applications in real-world scenarios; as well as, Machine Learning for OpenCV (2nd Edition) - which introduces how to use OpenCV for machine learning applications.

    Who This Book Is For

    This marketing book is for anyone who wants to learn how to use Python for cutting-edge marketing analytics. Whether you're a developer who wants to move into marketing, or a marketing analyst who wants to learn more sophisticated tools and techniques, this book will get you on the right path. Basic prior knowledge of Python is required to work through the exercises and activities provided in this book.

    About the Chapters

    Chapter 1, Data Preparation and Cleaning, teaches you skills related to data cleaning along with various data preprocessing techniques using real-world examples.

    Chapter 2, Data Exploration and Visualization, teaches you how to explore and analyze data with the help of various aggregation techniques and visualizations using Matplotlib and Seaborn.

    Chapter 3, Unsupervised Learning and Customer Segmentation, teaches you customer segmentation, one of the most important skills for a data science professional in marketing. You will learn how to use machine learning to perform customer segmentation with the help of scikit-learn. You will also learn to evaluate segments from a business perspective.

    Chapter 4, Evaluating and Choosing the Best Segmentation Approach, expands your repertoire to various advanced clustering techniques and teaches principled numerical methods of evaluating clustering performance.

    Chapter 5, Predicting Customer Revenue using Linear Regression, gets you started on predictive modeling of quantities by introducing you to regression and teaching simple linear regression in a hands-on manner using scikit-learn.

    Chapter 6, More Tools and Techniques for Evaluating Regression Models, goes into more details of regression techniques, along with different regularization methods available to prevent overfitting. You will also discover the various evaluation metrics available to identify model performance.

    Chapter 7, Supervised Learning: Predicting Customer Churn, uses a churn prediction problem as the central problem statement throughout the chapter to cover different classification algorithms and their implementation using scikit-learn.

    Chapter 8, Fine-Tuning Classification Algorithms, introduces support vector machines and tree-based classifiers along with the evaluation metrics for classification algorithms. You will also learn about the process of hyperparameter tuning which will help you obtain better results using these algorithms.

    Chapter 9, Multiclass Classification Algorithms, introduces a multiclass classification problem statement and the classifiers that can be used to solve such problems. You will learn about imbalanced datasets and their treatment in detail. You will also discover the micro- and macro-evaluation metrics available in scikit-learn for these classifiers.

    Conventions

    Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, and, user input are shown as follows:

    df.head(n) will return the first n rows of the DataFrame. If no n is passed, the function considers n to be 5 by default.

    Words that you see on the screen, for example, in menus or dialog boxes, also appear in the same format.

    A block of code is set as follows:

    sales = pd.read_csv(sales.csv)

    sales.head()

    New important words are shown like this: a box plot is used to depict the distribution of numerical data and is primarily used for comparisons.

    Key parts of code snippets are emboldened as follows:

    df1 = pd.read_csv(timeSpent.csv)

    Code Presentation

    Lines of code that span multiple lines are split using a backslash (\). When the code is executed, Python will ignore the backslash, and treat the code on the next line as a direct continuation of the current line.

    For example,

    df = pd.DataFrame({'Currency': pd.Series(['USD','EUR','GBP']),\

                      'ValueInINR': pd.Series([70, 89, 99])})

    df = pd.DataFrame.from_dict({'Currency': ['USD','EUR','GBP'],\

                                'ValueInINR':[70, 89, 99]})

    df.head()

    Comments are added into code to help explain specific bits of logic. Single-line comments are denoted using the # symbol, as follows:

    # Importing the matplotlib library

    import matplotlib.pyplot as plt

    #Declaring the color of the plot as gray

    plt.bar(sales['Product line'], sales['Revenue'], color='gray')

    Multi-line comments are used as follows:

    "

    Importing classification report and confusion matrix from sklearn metrics

    "

    from sklearn.metrics import classification_report

    from sklearn.metrics import precision_recall_fscore_support

    Minimum Hardware Requirements

    For an optimal experience, we recommend the following hardware configuration:

    Processor: Dual Core or better

    Memory: 4 GB RAM

    Storage: 10 GB available space

    Downloading the Code Bundle

    Download the code files from GitHub at https://packt.link/59F3X. Refer to these code files for the complete code bundle. The files here contain the exercises, activities, and some intermediate code for each chapter. This can be a useful reference when you become stuck.

    On the GitHub repo's page, you can click the green Code button and then click the Download ZIP option to download the complete code as a ZIP file to your disk (refer to Figure 0.1). You can then extract these code files to a folder of your choice, for example, C:\Code.

    Figure 0.1: Download ZIP option

    Figure 0.1: Download ZIP option on GitHub

    On your system, the extracted ZIP file should contain all the files present in the GitHub repository:

    Figure 0.2: GitHub code directory structure

    Figure 0.2: GitHub code directory structure (Windows Explorer)

    Setting Up Your Environment

    Before you explore the book in detail, you need to set up specific software and tools. In the following section, you shall see how to do that.

    Installing Anaconda on Your System

    The code for all the exercises and activities in this book can be executed using Jupyter Notebooks. You'll first need to install the Anaconda Navigator, which is an interface through which you can access your Jupyter Notebooks. Anaconda Navigator will be installed as a part of Anaconda Individual Edition, which is an open-source Python distribution platform available for Windows, macOS, and Linux. Installing Anaconda will also install Python. Head to https://www.anaconda.com/distribution/.

    From the page that opens, click the Download button (annotated by 1). Make sure you are downloading the Individual Edition.

    Figure 0.3: Anaconda homepage

    Figure 0.3: Anaconda homepage

    The installer should start downloading immediately. The website will, by default, choose an installer based on your system configuration. If you prefer downloading Anaconda for a different operating system (Windows, macOS, or Linux) and system configuration (32- or 64-bit), click the Get Additional Installers link at the bottom of the box (refer to Figure 0.3). The page should scroll down to a section (refer to Figure 0.4) that lets you choose from various options based on the operating system and configuration you desire. For this book, it is recommended that you use the latest version of Python (3.8 or higher).

    Figure 0.4: Downloading Anaconda based on the OS

    Figure 0.4: Downloading Anaconda Installers based on the OS

    Follow the installation steps presented on the screen.

    Figure 0.5: Anaconda setup

    Figure 0.5: Anaconda setup

    On Windows, if you've never installed Python on your system before, you can select the checkbox that prompts you to add Anaconda to your PATH. This will let you run Anaconda-specific commands (like conda) from the default command prompt. If you have Python installed or had installed an earlier version of Anaconda in the past, it is recommended that you leave it unchecked (you may run Anaconda commands from the Anaconda Prompt application instead). The installation may take a while depending on your system configuration.

    Figure 0.6: Anaconda installation steps

    Figure 0.6: Anaconda installation steps

    For more detailed instructions, you may refer to the official documentation for Linux by clicking this link (https://docs.anaconda.com/anaconda/install/linux/), macOS using this link (https://docs.anaconda.com/anaconda/install/mac-os/), and Windows using this link (https://docs.anaconda.com/anaconda/install/windows/).

    To check if Anaconda Navigator is correctly installed, look for Anaconda Navigator in your applications. Look for an application that has the following icon. Depending on your operating system, the icon's aesthetics may vary slightly.

    Figure 0.7: Anaconda Navigator icon

    Figure 0.7: Anaconda Navigator icon

    You can also search for the application using your operating system's search functionality. For example, on Windows 10, you can use the Windows Key + S combination and type in Anaconda Navigator. On macOS, you can use Spotlight search. On Linux, you can open the terminal and type the anaconda-navigator command and press the return key.

    Figure 0.8: Searching for Anaconda Navigator on Windows 10

    Figure 0.8: Searching for Anaconda Navigator on Windows 10

    For detailed steps on how to verify if Anaconda Navigator is installed, refer to the following link: https://docs.anaconda.com/anaconda/install/verify-install/.

    Click the icon to open Anaconda Navigator. It may take a while to load for the first time, but upon successful installation, you should see a similar screen:

    Figure 0.9: Anaconda Navigator screen

    Figure 0.9: Anaconda Navigator screen

    If you have more questions about the installation process, you may refer to the list of frequently asked questions from the Anaconda documentation: https://docs.anaconda.com/anaconda/user-guide/faq/.

    Launching Jupyter Notebook

    Once the Anaconda Navigator is open, you can launch the Jupyter Notebook interface from this screen. The following steps will show you how to do that:

    Open Anaconda Navigator. You should see the following screen:

    Figure 0.10: Anaconda Navigator screen

    Figure 0.10: Anaconda Navigator screen

    Now, click Launch under the Jupyter Notebook panel to start the notebook interface on your local system.

    Figure 0.11: Jupyter notebook launch option

    Figure 0.11: Jupyter notebook launch option

    On clicking the Launch button, you'll notice that even though nothing changes in the window shown in the preceding screenshot, a new tab opens up in your default browser. This is known as the Notebook Dashboard. It will, by default, open to your root folder. For Windows users, this path would be something similar to C:\Users\. On macOS and Linux, it will be /home//.

    Figure 0.12: Notebook dashboard

    Figure 0.12: Notebook dashboard

    Note that you can also open a Jupyter Notebook by simply running the command jupyter notebook in the terminal or command prompt. Or you can search for Jupyter Notebook in your applications just like you did in Figure 0.8.

    You can use this Dashboard as a file explorer to navigate to the directory where you have downloaded or stored the code files for the book (refer to the Downloading the Code Bundle section on how to download the files from GitHub). Once you have navigated to your desired directory, you can start by creating a new Notebook. Alternatively, if you've downloaded the code from our repository, you can open an existing Notebook as well (Notebook files will have a .inpyb extension). The menus here are quite simple to use:

    Figure 0.13: Jupyter notebook navigator menu options walkthrough

    Figure 0.13: Jupyter notebook navigator menu options walkthrough

    If you make any changes to the directory using your operating system's file explorer and the changed file isn't showing up in the Jupyter Notebook Navigator, click the Refresh Notebook List button (annotated as 1). To quit, click the Quit button (annotated as 2). To create a new file (a new Jupyter Notebook), you can click the New button (annotated as 3).

    Clicking the New button will open a dropdown menu as follows:

    Figure 0.14: Creating a new Jupyter notebook

    Figure 0.14: Creating a new Jupyter notebook

    Note

    A detailed tutorial on the interface and the keyboard shortcuts for Jupyter Notebooks can be found here: https://jupyter-notebook.readthedocs.io/en/stable/notebook.html.

    You can get started and create your first notebook by selecting Python 3; however, it is recommended that you also set up the virtual environment we've provided. Installing the environment will also install all the packages required for running the code in this book. The following section will show you how to do that.

    Installing the ds-marketing Virtual Environment

    As you run the code for the exercises and activities, you'll notice that even after installing Anaconda, there are certain libraries like kmodes which you'll need to install separately as you progress in the book. Then again, you may already have these libraries installed, but their versions may be different from the ones we've used, which may lead to varying results. That's why we've provided an environment.yml file with this book that will:

    Install all the packages and libraries required for this book at once.

    Make sure that the version numbers of your libraries match the ones we've used to write the code for this book.

    Make sure that the code you write based on this book remains separate from any other coding environment you may have.

    You can download the environment.yml file by clicking the following link: http://packt.link/dBv1k.

    Save this file, ideally in the same folder where you'll be running the code for this book. If you've downloaded the code from GitHub as detailed in the Downloading the Code Bundle section, this file should already be present in the parent directory, and you won't need to download it separately.

    To set up the environment, follow these steps:

    On macOS, open Terminal from the Launchpad (you can find more information about Terminal here: https://support.apple.com/en-in/guide/terminal/apd5265185d-f365-44cb-8b09-71a064a42125/mac). On Linux, open the Terminal application that's native to your distribution. On Windows, you can open the Anaconda Prompt instead by simply searching for the application. You can do this by opening the Start menu and searching for Anaconda Prompt.

    Figure 0.15: Searching for Anaconda Prompt on Windows

    Figure 0.15: Searching for Anaconda Prompt on Windows

    A new terminal like the following should open. By default, it will start in your home directory:

    Figure 0.16: Anaconda terminal prompt

    Figure 0.16: Anaconda terminal prompt

    In the case of Linux, it would look like the following:

    Figure 0.17: Terminal in Linux

    Figure 0.17: Terminal in Linux

    In the terminal, navigate to the directory where you've saved the environment.yml file on your computer using the cd command. Say you've saved the file in Documents\Data-Science-for-Marketing-Analytics-Second-Edition. In that case, you'll type the following command in the prompt and press Enter:

    cd Documents\Data-Science-for-Marketing-Analytics-Second-Edition

    Note that the command may vary slightly based on your directory structure and your operating system.

    Now that you've navigated to the correct folder, create a new conda environment by typing or pasting the following command in the terminal. Press Enter to run the command.

    conda env create -f environment.yml

    This will install the ds-marketing virtual environment along with the libraries that are required to run the code in this book. In case you see a prompt asking you to confirm before proceeding, type y and press Enter to continue creating the environment. Depending on your system configuration, it may take a while for the process to complete.

    Note

    For a complete list of conda commands, visit the following link: https://conda.io/projects/conda/en/latest/index.html. For a detailed guide on how to manage conda environments, please visit the following link: https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html.

    Once complete, type or paste the following command in the shell to activate the newly installed environment, ds-marketing.

    conda activate ds-marketing

    If the installation is successful, you'll see the environment name in brackets change from base to ds-marketing:

    Figure 0.18: Environment name showing up in the shell

    Figure 0.18: Environment name showing up in the shell

    Run the following command to install ipykernel in the newly activated conda environment:

    pip install ipykernel

    Note

    On macOS and Linux, you'll need to specify pip3 instead of pip.

    In the same environment, run the following command to add ipykernel as a Jupyter kernel:

    python -m ipykernel install --user --name=ds-marketing

    Windows only: If you're on Windows, type or paste the following command. Otherwise, you may skip this step and exit the terminal.

    conda install pywin32

    Select the created kernel ds-marketing when you start your Jupyter notebook.

    Figure 0.19: Selecting the ds-marketing kernel

    Figure 0.19: Selecting the ds-marketing kernel

    A new tab will open with a fresh untitled Jupyter notebook where you can start writing your code:

    Figure 0.20: A new Jupyter notebook

    Figure 0.20: A new Jupyter notebook

    Running the Code Online Using Binder

    You can also try running the code files for this book in a completely online environment through an interactive Jupyter Notebook interface called Binder. Along with the individual code files that can be downloaded locally, we have provided a link that will help you quickly access the Binder version of the GitHub repository for the book. Using this link, you can run any of the .inpyb code files for this book in a cloud-based online interactive environment. Click the following link to open the online Binder version of the book's repository to give it a try: https://packt.link/GdQOp. It is recommended that you save the link in your browser bookmarks for future reference (you may also use the launch binder link provided in the README section of the book's GitHub page).

    Depending on your internet connection, it may take a while to load, but once loaded, you'll get the same interface as you would when running the code in a local Jupyter Notebook (all your shortcuts should work as well):

    Figure 0.21: Binder lets you run Jupyter Notebooks in a browser

    Figure 0.21: Binder lets you run Jupyter Notebooks in a browser

    Binder is an online service that helps you read and execute Jupyter Notebook files (.inpyb) present in any public GitHub repository in a cloud-based environment. However, please note that there are certain memory constraints associated with Binder. This means that running multiple Jupyter Notebooks instances at the same time or running processes that consume a lot of memory (like model training) can result in a kernel crash or kernel reset. Moreover, any changes you make in these online Notebooks would not be stored, and the Notebooks will reset to the latest version present in the repository whenever you close and re-open the Binder link. A stable internet connection is required to use Binder. You can find out more about the Binder Project here: https://jupyter.org/binder.

    This is a recommended option for readers who want to have a quick look at the code and experiment with it without downloading the entire repository on their local machine.

    Get in Touch

    Feedback from our readers is always welcome.

    General feedback: If you have any questions about this book, please mention the book title in the subject of your message and email us at [email protected].

    Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you could report this to us. Please visit www.packtpub.com/support/errata and complete the form.

    Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you could provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

    If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit https://authors.packtpub.com/.

    Please Leave a Review

    Let us know what you think by leaving a detailed, impartial review on Amazon. We appreciate all feedback – it helps us continue to make great products and help aspiring developers build their skills. Please spare a few minutes to give your thoughts – it makes a big difference to us. You can leave a review by clicking the following link: https://packt.link/r/1800560478.

    To Azra, Aiza, Duha and Aidama - you inspire courage, strength, and grace.

    - Mirza Rahim Baig

    To Appa, Amma, Vindhya, Madhu, and Ishan - The Five Pillars of my life.

    - Gururajan Govindan

    To Nanaji, Dadaji, and Appa - for their wisdom, inspiration, and unconditional love.

    - Vishwesh Ravi Shrimali

    1. Data Preparation and Cleaning

    Overview

    In this chapter, you'll learn the skills required to process and clean data to effectively ready it for further analysis. Using the pandas library in Python, you will learn how to read and import data from various file formats, including JSON and CSV, into a DataFrame. You'll then learn how to perform slicing, aggregation, and filtering on DataFrames. By the end of the chapter, you will consolidate your data cleaning skills by learning how to join DataFrames, handle missing values, and even combine data from various sources.

    Introduction

    "Since you liked this artist, you'll also like their new album, Customers who bought bread also bought butter, and 1,000 people near you have also ordered this item." Every day, recommendations like these influence customers' shopping decisions, helping them discover new products. Such recommendations are possible thanks to data science techniques that leverage data to create complex models, perform sophisticated tasks, and derive valuable customer insights with great precision. While the use of data science principles in marketing analytics is a proven, cost-effective, and efficient strategy, many companies are still not using these techniques to their full potential. There is a wide gap between the possible and actual usage of these techniques.

    This book is designed to teach you skills that will help you contribute toward bridging that gap. It covers a wide range of useful techniques that will allow you to leverage everything data science can do in terms of strategies and decision-making in the marketing domain. By the end of the book, you should be able to successfully create and manage an end-to-end marketing analytics solution in Python, segment customers based on the data provided, predict their lifetime value, and model their decision-making behavior using data science techniques.

    You will start your journey by first learning how to clean and prepare data. Raw data from external sources cannot be used directly; it needs to be analyzed, structured, and filtered before it can be used any further. In this chapter, you will learn how to manipulate rows and columns and apply transformations to data to ensure you have the right data with the right attributes. This is an essential skill in a data analyst's arsenal because, otherwise, the outcome of your analysis will be based on incorrect data, thereby making it a classic example of garbage in, garbage out. But before you start working with the data, it is important to understand its nature - in other words, the different types of data you'll be working with.

    Data Models and Structured Data

    When you build an analytical solution, the first thing that you need to do is to build a data model. A data model is an overview of the data sources that you will be using, their relationships with other data sources, where exactly the data from a specific source is going to be fetched, and in what form (such as an Excel file, a database, or a JSON from an internet

    Enjoying the preview?
    Page 1 of 1