Python Feature Engineering Cookbook: Over 70 recipes for creating, engineering, and transforming features to build machine learning models

Ebook686 pages4 hours

Python Feature Engineering Cookbook: Over 70 recipes for creating, engineering, and transforming features to build machine learning models

Name: Python Feature Engineering Cookbook: Over 70 recipes for creating, engineering, and transforming features to build machine learning models
Author: Soledad Galli
ISBN: 9781789807820

By Soledad Galli

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Extract accurate information from data to train and improve machine learning models using NumPy, SciPy, pandas, and scikit-learn libraries

Key Features

Discover solutions for feature generation, feature extraction, and feature selection

Uncover the end-to-end feature engineering process across continuous, discrete, and unstructured datasets

Implement modern feature extraction techniques using Python's pandas, scikit-learn, SciPy and NumPy libraries

Book Description

Feature engineering is invaluable for developing and enriching your machine learning models. In this cookbook, you will work with the best tools to streamline your feature engineering pipelines and techniques and simplify and improve the quality of your code.

Using Python libraries such as pandas, scikit-learn, Featuretools, and Feature-engine, you'll learn how to work with both continuous and discrete datasets and be able to transform features from unstructured datasets. You will develop the skills necessary to select the best features as well as the most suitable extraction techniques. This book will cover Python recipes that will help you automate feature engineering to simplify complex processes. You'll also get to grips with different feature engineering strategies, such as the box-cox transform, power transform, and log transform across machine learning, reinforcement learning, and natural language processing (NLP) domains.

By the end of this book, you'll have discovered tips and practical solutions to all of your feature engineering problems.

What you will learn

Simplify your feature engineering pipelines with powerful Python packages

Get to grips with imputing missing values

Encode categorical variables with a wide set of techniques

Extract insights from text quickly and effortlessly

Develop features from transactional data and time series data

Derive new features by combining existing variables

Understand how to transform, discretize, and scale your variables

Create informative variables from date and time

Who this book is for

This book is for machine learning professionals, AI engineers, data scientists, and NLP and reinforcement learning engineers who want to optimize and enrich their machine learning models with the best features. Knowledge of machine learning and Python coding will assist you with understanding the concepts covered in this book.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateJan 22, 2020

ISBN9781789807820

Author

Soledad Galli

Related authors

Skip carousel

Related to Python Feature Engineering Cookbook

Related ebooks

Skip carousel

Advanced C++ Programming Cookbook: Become an expert C++ programmer by mastering concepts like templates, concurrency, and type deduction
Ebook
Advanced C++ Programming Cookbook: Become an expert C++ programmer by mastering concepts like templates, concurrency, and type deduction
byDr. Rian Quinn
Rating: 0 out of 5 stars
0 ratings
C++ System Programming Cookbook: Practical recipes for Linux system-level programming using the latest C++ features
Ebook
C++ System Programming Cookbook: Practical recipes for Linux system-level programming using the latest C++ features
byOnorato Vaticone
Rating: 0 out of 5 stars
0 ratings
PyTorch 1.x Reinforcement Learning Cookbook: Over 60 recipes to design, develop, and deploy self-learning AI models using Python
Ebook
PyTorch 1.x Reinforcement Learning Cookbook: Over 60 recipes to design, develop, and deploy self-learning AI models using Python
byYuxi (Hayden) Liu
Rating: 0 out of 5 stars
0 ratings
DAX Cookbook: Over 120 recipes to enhance your business with analytics, reporting, and business intelligence
Ebook
DAX Cookbook: Over 120 recipes to enhance your business with analytics, reporting, and business intelligence
byGreg Deckler
Rating: 0 out of 5 stars
0 ratings
Python for Finance Cookbook: Over 50 recipes for applying modern Python libraries to financial data analysis
Ebook
Python for Finance Cookbook: Over 50 recipes for applying modern Python libraries to financial data analysis
byEryk Lewinson
Rating: 0 out of 5 stars
0 ratings
Qlik Sense Cookbook.: Over 80 recipes on data analytics to solve business intelligence challenges
Ebook
Qlik Sense Cookbook.: Over 80 recipes on data analytics to solve business intelligence challenges
byPablo Labbe
Rating: 0 out of 5 stars
0 ratings
Microsoft Dynamics 365 Business Central Cookbook: Effective recipes for developing and deploying applications with Dynamics 365 Business Central
Ebook
Microsoft Dynamics 365 Business Central Cookbook: Effective recipes for developing and deploying applications with Dynamics 365 Business Central
byMichael Glue
Rating: 0 out of 5 stars
0 ratings
Elasticsearch 8.x Cookbook: Over 180 recipes to perform fast, scalable, and reliable searches for your enterprise
Ebook
Elasticsearch 8.x Cookbook: Over 180 recipes to perform fast, scalable, and reliable searches for your enterprise
byAlberto Paro
Rating: 0 out of 5 stars
0 ratings
PySpark Cookbook: Over 60 recipes for implementing big data processing and analytics using Apache Spark and Python
Ebook
PySpark Cookbook: Over 60 recipes for implementing big data processing and analytics using Apache Spark and Python
byDenny Lee
Rating: 0 out of 5 stars
0 ratings
Modern JavaScript Web Development Cookbook: Easy solutions to common and everyday JavaScript development problems
Ebook
Modern JavaScript Web Development Cookbook: Easy solutions to common and everyday JavaScript development problems
byFederico Kereki
Rating: 0 out of 5 stars
0 ratings
Matplotlib 3.0 Cookbook: Over 150 recipes to create highly detailed interactive visualizations using Python
Ebook
Matplotlib 3.0 Cookbook: Over 150 recipes to create highly detailed interactive visualizations using Python
bySrinivasa Rao Poladi
Rating: 0 out of 5 stars
0 ratings
Kotlin Programming Cookbook: Explore more than 100 recipes that show how to build robust mobile and web applications with Kotlin, Spring Boot, and Android
Ebook
Kotlin Programming Cookbook: Explore more than 100 recipes that show how to build robust mobile and web applications with Kotlin, Spring Boot, and Android
byRashi Karanpuria
Rating: 0 out of 5 stars
0 ratings
Python Image Processing Cookbook: Over 60 recipes to help you perform complex image processing and computer vision tasks with ease
Ebook
Python Image Processing Cookbook: Over 60 recipes to help you perform complex image processing and computer vision tasks with ease
bySandipan Dey
Rating: 0 out of 5 stars
0 ratings
Kotlin Standard Library Cookbook: Master the powerful Kotlin standard library through practical code examples
Ebook
Kotlin Standard Library Cookbook: Master the powerful Kotlin standard library through practical code examples
bySamuel Urbanowicz
Rating: 0 out of 5 stars
0 ratings
Machine Learning for Cybersecurity Cookbook: Over 80 recipes on how to implement machine learning algorithms for building security systems using Python
Ebook
Machine Learning for Cybersecurity Cookbook: Over 80 recipes on how to implement machine learning algorithms for building security systems using Python
byEmmanuel Tsukerman
Rating: 0 out of 5 stars
0 ratings
Rust Standard Library Cookbook: Over 75 recipes to leverage the power of Rust
Ebook
Rust Standard Library Cookbook: Over 75 recipes to leverage the power of Rust
byJan Hohenheim
Rating: 0 out of 5 stars
0 ratings
Drupal 8 Development Cookbook: Harness the power of Drupal 8 with this practical recipe-based guide
Ebook
Drupal 8 Development Cookbook: Harness the power of Drupal 8 with this practical recipe-based guide
byGlaman Matt
Rating: 0 out of 5 stars
0 ratings
Generative Adversarial Networks Cookbook: Over 100 recipes to build generative models using Python, TensorFlow, and Keras
Ebook
Generative Adversarial Networks Cookbook: Over 100 recipes to build generative models using Python, TensorFlow, and Keras
byJosh Kalin
Rating: 0 out of 5 stars
0 ratings
Jupyter Cookbook: Over 75 recipes to perform interactive computing across Python, R, Scala, Spark, JavaScript, and more
Ebook
Jupyter Cookbook: Over 75 recipes to perform interactive computing across Python, R, Scala, Spark, JavaScript, and more
byDan Toomey
Rating: 0 out of 5 stars
0 ratings
Powershell Core 6.2 Cookbook: Leverage command-line shell scripting to effectively manage your enterprise environment
Ebook
Powershell Core 6.2 Cookbook: Leverage command-line shell scripting to effectively manage your enterprise environment
byJan-Hendrik Peters
Rating: 0 out of 5 stars
0 ratings
Tkinter GUI Application Development Cookbook: A practical solution to your GUI development problems with Python and Tkinter
Ebook
Tkinter GUI Application Development Cookbook: A practical solution to your GUI development problems with Python and Tkinter
byAlejandro Rodas de Paz
Rating: 0 out of 5 stars
0 ratings
Extending Microsoft Dynamics 365 Finance and Supply Chain Management Cookbook: Create and extend secure and scalable ERP solutions to improve business processes, 2nd Edition
Ebook
Extending Microsoft Dynamics 365 Finance and Supply Chain Management Cookbook: Create and extend secure and scalable ERP solutions to improve business processes, 2nd Edition
bySimon Buxton
Rating: 0 out of 5 stars
0 ratings
Alteryx Designer Cookbook: Over 60 recipes to transform your data into insights and take your productivity to a new level
Ebook
Alteryx Designer Cookbook: Over 60 recipes to transform your data into insights and take your productivity to a new level
byAlberto Guisande
Rating: 0 out of 5 stars
0 ratings
C# 7 and .NET Core Cookbook: Serverless programming, Microservices and more
Ebook
C# 7 and .NET Core Cookbook: Serverless programming, Microservices and more
byDirk Strauss
Rating: 0 out of 5 stars
0 ratings
Embedded Programming with Modern C++ Cookbook: Practical recipes to help you build robust and secure embedded applications on Linux
Ebook
Embedded Programming with Modern C++ Cookbook: Practical recipes to help you build robust and secure embedded applications on Linux
byIgor Viarheichyk
Rating: 0 out of 5 stars
0 ratings
MicroPython Cookbook: Over 110 practical recipes for programming embedded systems and microcontrollers with Python
Ebook
MicroPython Cookbook: Over 110 practical recipes for programming embedded systems and microcontrollers with Python
byMarwan Alsabbagh
Rating: 0 out of 5 stars
0 ratings
PostgreSQL High Availability Cookbook: Managing a reliable PostgreSQL database
Ebook
PostgreSQL High Availability Cookbook: Managing a reliable PostgreSQL database
byShaun M. Thomas
Rating: 0 out of 5 stars
0 ratings
FastAPI Cookbook: Develop high-performance APIs and web applications with Python
Ebook
FastAPI Cookbook: Develop high-performance APIs and web applications with Python
byGiunio De Luca
Rating: 0 out of 5 stars
0 ratings
Practical C Programming: Solutions for modern C developers to create efficient and well-structured programs
Ebook
Practical C Programming: Solutions for modern C developers to create efficient and well-structured programs
byB.M. Harwani
Rating: 0 out of 5 stars
0 ratings
Java Deep Learning Cookbook: Train neural networks for classification, NLP, and reinforcement learning using Deeplearning4j
Ebook
Java Deep Learning Cookbook: Train neural networks for classification, NLP, and reinforcement learning using Deeplearning4j
byRahul Raj
Rating: 0 out of 5 stars
0 ratings

Data Modeling & Design For You

Skip carousel

Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps
Ebook
Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps
byJason Scotts
Rating: 3 out of 5 stars
3/5
Deep Learning: An Essential Guide to Deep Learning for Beginners Who Want to Understand How Deep Neural Networks Work and Relate to Machine Learning and Artificial Intelligence
Ebook
Deep Learning: An Essential Guide to Deep Learning for Beginners Who Want to Understand How Deep Neural Networks Work and Relate to Machine Learning and Artificial Intelligence
byHerbert Jones
Rating: 5 out of 5 stars
5/5
Mastering Agile User Stories
Ebook
Mastering Agile User Stories
byDeEtta Balthazar
Rating: 4 out of 5 stars
4/5
Data Visualization: a successful design process
Ebook
Data Visualization: a successful design process
byAndy Kirk
Rating: 4 out of 5 stars
4/5
Thinking in Algorithms: Strategic Thinking Skills, #2
Ebook
Thinking in Algorithms: Strategic Thinking Skills, #2
byAlbert Rutherford
Rating: 5 out of 5 stars
5/5
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
Data Analytics for Beginners: Introduction to Data Analytics
Ebook
Data Analytics for Beginners: Introduction to Data Analytics
byAnthony S. Williams
Rating: 4 out of 5 stars
4/5
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
Ebook
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
byalasdair gilchrist
Rating: 0 out of 5 stars
0 ratings
Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
Ebook
Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
byMichael Blake
Rating: 5 out of 5 stars
5/5
Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016
Ebook
Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016
byRob Collie
Rating: 4 out of 5 stars
4/5
Practical Data Analysis Cookbook
Ebook
Practical Data Analysis Cookbook
byTomasz Drabas
Rating: 0 out of 5 stars
0 ratings
Bayesian Analysis with Python
Ebook
Bayesian Analysis with Python
byOsvaldo Martin
Rating: 5 out of 5 stars
5/5
Tableau Desktop Certified Associate: Exam Guide: Develop your Tableau skills and prepare for Tableau certification with tips from industry experts
Ebook
Tableau Desktop Certified Associate: Exam Guide: Develop your Tableau skills and prepare for Tableau certification with tips from industry experts
byDmitry Anoshin
Rating: 0 out of 5 stars
0 ratings
Data Analytics with Python: Data Analytics in Python Using Pandas
Ebook
Data Analytics with Python: Data Analytics in Python Using Pandas
byFrank Millstein
Rating: 3 out of 5 stars
3/5
Managing Data Using Excel
Ebook
Managing Data Using Excel
byMark Gardener
Rating: 5 out of 5 stars
5/5
R: Data Analysis and Visualization
Ebook
R: Data Analysis and Visualization
byBrett Lantz
Rating: 5 out of 5 stars
5/5
Tableau Cookbook – Recipes for Data Visualization
Ebook
Tableau Cookbook – Recipes for Data Visualization
byShweta Sankhe-Savale
Rating: 0 out of 5 stars
0 ratings
WordPress For Beginners - How To Set Up A Self Hosted WordPress Blog
Ebook
WordPress For Beginners - How To Set Up A Self Hosted WordPress Blog
byCyrus Jackson
Rating: 0 out of 5 stars
0 ratings
Living in Data: A Citizen's Guide to a Better Information Future
Ebook
Living in Data: A Citizen's Guide to a Better Information Future
byJer Thorp
Rating: 4 out of 5 stars
4/5
Hands-On Data Science for Marketing: Improve your marketing strategies with machine learning using Python and R
Ebook
Hands-On Data Science for Marketing: Improve your marketing strategies with machine learning using Python and R
byYoon Hyup Hwang
Rating: 5 out of 5 stars
5/5
AI-Driven Data Engineering
Ebook
AI-Driven Data Engineering
byChuck Sherman
Rating: 0 out of 5 stars
0 ratings
Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code
Ebook
Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code
byPedro Lopes
Rating: 0 out of 5 stars
0 ratings
150 Most Poweful Excel Shortcuts: Secrets of Saving Time with MS Excel
Ebook
150 Most Poweful Excel Shortcuts: Secrets of Saving Time with MS Excel
byAndrei Besedin
Rating: 3 out of 5 stars
3/5
A Concise Guide to Object Orientated Programming
Ebook
A Concise Guide to Object Orientated Programming
byalasdair gilchrist
Rating: 0 out of 5 stars
0 ratings
AutoCAD® Pocket Reference
Ebook
AutoCAD® Pocket Reference
byCheryl R. Shrock
Rating: 0 out of 5 stars
0 ratings
Mastering Python Design Patterns
Ebook
Mastering Python Design Patterns
bySakis Kasampalis
Rating: 0 out of 5 stars
0 ratings
Kafka in Action
Ebook
Kafka in Action
byDylan Scott
Rating: 0 out of 5 stars
0 ratings
PostgreSQL for Data Architects
Ebook
PostgreSQL for Data Architects
byJayadevan Maymala
Rating: 0 out of 5 stars
0 ratings
Machine Learning Interview Questions
Ebook
Machine Learning Interview Questions
byTech Interviews
Rating: 5 out of 5 stars
5/5
DAX Patterns: Second Edition
Ebook
DAX Patterns: Second Edition
byMarco Russo
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

72: Teaching and Learning Angular: Summary Kent C. Dodds (@kentcdodds) & Shai Reznik (@shai_reznik) join us for episode 72 about teaching and learning the popular Angular JavaScript Framework. These two veteran technologists provide great insights into how they teach code, what...
Podcast episode
72: Teaching and Learning Angular: Summary Kent C. Dodds (@kentcdodds) & Shai Reznik (@shai_reznik) join us for episode 72 about teaching and learning the popular Angular JavaScript Framework. These two veteran technologists provide great insights into how they teach code, what...
byThe Web Platform Podcast
0 ratings
0% found this document useful
Working With Developers
Podcast episode
Working With Developers
byBusiness Analysis Live!
0 ratings
0% found this document useful
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
Podcast episode
Harnessing Generative AI For Creating Educational Content With Illumidesk: Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.
byData Engineering Podcast
0 ratings
0% found this document useful
436: Creating Conditions For Your Best Work with Steph Viccari
Podcast episode
436: Creating Conditions For Your Best Work with Steph Viccari
byThe Bike Shed
0 ratings
0% found this document useful
15: Lifecycle: A Martech Saga part 4: Picking the right MQL model: You need a good MQL model so that marketing leads make it to sales and get followed up. There are a lot of ways to define MQLs and pass them over. It’s very common to have a lead scoring model, and it’s the best way to get to build a scalable, highly auto
Podcast episode
15: Lifecycle: A Martech Saga part 4: Picking the right MQL model: You need a good MQL model so that marketing leads make it to sales and get followed up. There are a lot of ways to define MQLs and pass them over. It’s very common to have a lead scoring model, and it’s the best way to get to build a scalable, highly auto
byHumans of Martech
0 ratings
0% found this document useful
Potluck — $100k Dev Jobs × Sponsored Blog Posts × How To Keep Your Skills Up To Date × Libraries vs Custom × Dev Tools × More!: It’s another potluck! In this episode, Scott and Wes answer your questions about VS Code, JavaScript, $100k-per-year dev jobs, sponsored blog posts, how to use dev tools, how to keep your skills up to date, and more! Prismic - Sponsor Prismic is a...
Podcast episode
Potluck — $100k Dev Jobs × Sponsored Blog Posts × How To Keep Your Skills Up To Date × Libraries vs Custom × Dev Tools × More!: It’s another potluck! In this episode, Scott and Wes answer your questions about VS Code, JavaScript, $100k-per-year dev jobs, sponsored blog posts, how to use dev tools, how to keep your skills up to date, and more! Prismic - Sponsor Prismic is a...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Backstage & Internal Developer Portals
Podcast episode
Backstage & Internal Developer Portals
byThe Cloudcast
0 ratings
0% found this document useful
341: 11 No-Code Tools I’m Using as a Six-Figure Course Creator
Podcast episode
341: 11 No-Code Tools I’m Using as a Six-Figure Course Creator
bySelling the Couch
0 ratings
0% found this document useful
Holiday SEO: Leverage the Holiday Search Traffic Spike: How to leverage the holiday search traffic spike, step-by-step, in just one hour.
Podcast episode
Holiday SEO: Leverage the Holiday Search Traffic Spike: How to leverage the holiday search traffic spike, step-by-step, in just one hour.
byThe Unofficial Shopify Podcast
0 ratings
0% found this document useful
Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI: The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, production-like data available for developing and testing your software, analytics, and machine learning projects. In this episode Adam Kamor explores the factors that make this such a complex problem to solve, the approach that he and his team have taken to turn it into a reliable product, and how you can start using it to replace your own collection of scripts.
Podcast episode
Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI: The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, production-like data available for developing and testing your software, analytics, and machine learning projects. In this episode Adam Kamor explores the factors that make this such a complex problem to solve, the approach that he and his team have taken to turn it into a reliable product, and how you can start using it to replace your own collection of scripts.
byData Engineering Podcast
0 ratings
0% found this document useful
InDesignSecrets Podcast 214: In this episode: News about our sessions at Adobe MAX, The InDesign Conference, and InDesign Magazine Time to Make the Calendars — Faster! Understanding Frames: Points vs. Paths Obscure InDesign Feature of the...
Podcast episode
InDesignSecrets Podcast 214: In this episode: News about our sessions at Adobe MAX, The InDesign Conference, and InDesign Magazine Time to Make the Calendars — Faster! Understanding Frames: Points vs. Paths Obscure InDesign Feature of the...
byInDesign Secrets
0 ratings
0% found this document useful
70: Web Components at Microsoft: Summary Daniel Buchner (@csuwildcat), former Mozillian & Program Manager at Microsoft takes us through the plans for Web Components at Microsoft. Daniel is the creator of the Web Components free open source library, X-Tag which Microsoft is now...
Podcast episode
70: Web Components at Microsoft: Summary Daniel Buchner (@csuwildcat), former Mozillian & Program Manager at Microsoft takes us through the plans for Web Components at Microsoft. Daniel is the creator of the Web Components free open source library, X-Tag which Microsoft is now...
byThe Web Platform Podcast
0 ratings
0% found this document useful
BONUS: How To Help Your Product Owner Define Value | Vasco Duarte: BONUS: How To Help Your Product Owner Define Value, with Vasco Duarte Read the and search through the world’s largest audio library on Scrum directly on the . Merry Christmas, everyone! This is the first of 5 BONUS episodes for this Christmas...
Podcast episode
BONUS: How To Help Your Product Owner Define Value | Vasco Duarte: BONUS: How To Help Your Product Owner Define Value, with Vasco Duarte Read the and search through the world’s largest audio library on Scrum directly on the . Merry Christmas, everyone! This is the first of 5 BONUS episodes for this Christmas...
byScrum Master Toolbox Podcast: Agile storytelling from the trenches
0 ratings
0% found this document useful
The Potential of Generative AI for L&D With Donald Clark
Podcast episode
The Potential of Generative AI for L&D With Donald Clark
byThe Learning & Development Podcast
0 ratings
0% found this document useful
Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI: Making effective use of data requires proper context around the information that is being used. As the size and complexity of your organization increases the difficulty of ensuring that everyone has the necessary knowledge about how to get their work done scales exponentially. Wikis and intranets are a common way to attempt to solve this problem, but they are frequently ineffective. Rehgan Avon co-founded AlignAI to help address this challenge through a more purposeful platform designed to collect and distribute the knowledge of how and why data is used in a business. In this episode she shares the strategic and tactical elements of how to make more effective use of the technical and organizational resources that are available to you for getting work done with data.
Podcast episode
Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI: Making effective use of data requires proper context around the information that is being used. As the size and complexity of your organization increases the difficulty of ensuring that everyone has the necessary knowledge about how to get their work done scales exponentially. Wikis and intranets are a common way to attempt to solve this problem, but they are frequently ineffective. Rehgan Avon co-founded AlignAI to help address this challenge through a more purposeful platform designed to collect and distribute the knowledge of how and why data is used in a business. In this episode she shares the strategic and tactical elements of how to make more effective use of the technical and organizational resources that are available to you for getting work done with data.
byData Engineering Podcast
0 ratings
0% found this document useful
Weekly Update - Managing 2 Full-Time Jobs?! | Digital Twin Technology is Growing | The Rise of Custom AI Bots
Podcast episode
Weekly Update - Managing 2 Full-Time Jobs?! | Digital Twin Technology is Growing | The Rise of Custom AI Bots
byFuture-Focused with Christopher Lind
0 ratings
0% found this document useful
66: Custom Elements & Skate.js: Summary Atlassian leaders Trey Shugart (@treshugart) and Jonathon Creenaune (@jcreenaune) chat with us about how and why they created Skate.js. Skate is a lightweight Web Components wrapper created to help the needs of a large and diverse technology...
Podcast episode
66: Custom Elements & Skate.js: Summary Atlassian leaders Trey Shugart (@treshugart) and Jonathon Creenaune (@jcreenaune) chat with us about how and why they created Skate.js. Skate is a lightweight Web Components wrapper created to help the needs of a large and diverse technology...
byThe Web Platform Podcast
0 ratings
0% found this document useful
Being a Student of PM, with Josh Nankivel: Hey, quick note: This episode is sponsored by the STAREAST 2011 Conference. STAREAST is the premier gathering place for software testers, developers, and managers to interact and learn how to improve software-testing practices. This year's line-up featur
Podcast episode
Being a Student of PM, with Josh Nankivel: Hey, quick note: This episode is sponsored by the STAREAST 2011 Conference. STAREAST is the premier gathering place for software testers, developers, and managers to interact and learn how to improve software-testing practices. This year's line-up featur
byPeople and Projects Podcast: Project Management Podcast
0 ratings
0% found this document useful
#300: Bali Special | Sim Khela - Future of Blockchain and New Earth: Born in India, raised in California and educated in Calgary, Sim Khela is a jack of all trades and master of some. After studying Communications and Culture at University of Calgary he moved on to study Electronics Engineering at South Alberta...
Podcast episode
#300: Bali Special | Sim Khela - Future of Blockchain and New Earth: Born in India, raised in California and educated in Calgary, Sim Khela is a jack of all trades and master of some. After studying Communications and Culture at University of Calgary he moved on to study Electronics Engineering at South Alberta...
bySoul 2 Soul Business
0 ratings
0% found this document useful
Potluck — Corn Shucking × Self-Hosting Images × WordPress × Getting Scammed × Portfolios: It’s another Potluck! In this episode, Scott and Wes answer your questions about corn shucking, self-hosting images, WordPress, getting scammed, portfolios, more! Linode - Sponsor Whether you’re working on a personal project or managing...
Podcast episode
Potluck — Corn Shucking × Self-Hosting Images × WordPress × Getting Scammed × Portfolios: It’s another Potluck! In this episode, Scott and Wes answer your questions about corn shucking, self-hosting images, WordPress, getting scammed, portfolios, more! Linode - Sponsor Whether you’re working on a personal project or managing...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
How Does Your Personal Image and Selfcare Impact Sales | Donald Kelly - 1588: Your personal brand and image impact everything about your sales process. Creating and maintaining a professional self-care routine is necessary to put your best foot forward to make the most money and foster the most connections. In today’s episode...
Podcast episode
How Does Your Personal Image and Selfcare Impact Sales | Donald Kelly - 1588: Your personal brand and image impact everything about your sales process. Creating and maintaining a professional self-care routine is necessary to put your best foot forward to make the most money and foster the most connections. In today’s episode...
byThe Sales Evangelist
0 ratings
0% found this document useful
A Kata Geek in the Communities: Deondra Wardelle: CEO at DeondraWardelle.com, Co-host of "KataCon7" For show notes and discount codes for KataCon7 and for Deondra's workshop, go to http://leanblog.org/405 or scroll down. My guest for Episode #405 is Deondra Wardelle, CEO of her own company and one o...
Podcast episode
A Kata Geek in the Communities: Deondra Wardelle: CEO at DeondraWardelle.com, Co-host of "KataCon7" For show notes and discount codes for KataCon7 and for Deondra's workshop, go to http://leanblog.org/405 or scroll down. My guest for Episode #405 is Deondra Wardelle, CEO of her own company and one o...
byLean Blog Interviews - Healthcare, Manufacturing, Business, and Leadership
0 ratings
0% found this document useful
Episode 283: Implementing Design Systems with Dan Mall: How do you make your design system an internal rooted practice? Our guest today is Dan Mall, entrepreneur and author of Design That Scales. You’ll learn how small practices can evolve into a design system, why adoption should tie in to your company’s mission statement, how to declare your strategic plan when establishing a design system, and more.
Podcast episode
Episode 283: Implementing Design Systems with Dan Mall: How do you make your design system an internal rooted practice? Our guest today is Dan Mall, entrepreneur and author of Design That Scales. You’ll learn how small practices can evolve into a design system, why adoption should tie in to your company’s mission statement, how to declare your strategic plan when establishing a design system, and more.
byUI Breakfast: UI/UX Design and Product Strategy
0 ratings
0% found this document useful
Gitting After It with Katie Sylor-Miller: Katie Sylor-Miller is a frontend architect at Etsy, a company she joined in November 2015. Prior to this position, Katie worked as a senior front end developer at Constant Contact, a technical lead at EF Education, a front end web developer at Miller Syst
Podcast episode
Gitting After It with Katie Sylor-Miller: Katie Sylor-Miller is a frontend architect at Etsy, a company she joined in November 2015. Prior to this position, Katie worked as a senior front end developer at Constant Contact, a technical lead at EF Education, a front end web developer at Miller Syst
byScreaming in the Cloud
0 ratings
0% found this document useful
Hasty Treat - Refactoring: In this Hasty Treat, Scott and Wes discuss refactoring, what it is, why you should do it, when to do it, as well as best practices and much more. Netlify — Sponsor is the best way to deploy and host a front-end website. All the features...
Podcast episode
Hasty Treat - Refactoring: In this Hasty Treat, Scott and Wes discuss refactoring, what it is, why you should do it, when to do it, as well as best practices and much more. Netlify — Sponsor is the best way to deploy and host a front-end website. All the features...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
At the Helm of Starship EDB with Ed Boyajian: Ed Boyajian, CEO of EDB, is here to talk databases, but perhaps more importantly, to squelch some pronunciation issues! Postgres, via Ed, is a central topic to today’s discussion and Ed’s insight both personally and in regard to EDB, are quite enlightenin
Podcast episode
At the Helm of Starship EDB with Ed Boyajian: Ed Boyajian, CEO of EDB, is here to talk databases, but perhaps more importantly, to squelch some pronunciation issues! Postgres, via Ed, is a central topic to today’s discussion and Ed’s insight both personally and in regard to EDB, are quite enlightenin
byScreaming in the Cloud
0 ratings
0% found this document useful
Why Your Sales Team's Tech Tools Should Act as a GPS and Not a Map | Chris Shutts - 1581: In pretty much any team dynamic, some will perform at their best and some less than up-to-par. So how can we utilize our tech stack to help low-performers blossom into the great sellers we know they can be? In today’s episode of The Sales...
Podcast episode
Why Your Sales Team's Tech Tools Should Act as a GPS and Not a Map | Chris Shutts - 1581: In pretty much any team dynamic, some will perform at their best and some less than up-to-par. So how can we utilize our tech stack to help low-performers blossom into the great sellers we know they can be? In today’s episode of The Sales...
byThe Sales Evangelist
0 ratings
0% found this document useful
Bringing Visibility to Cloud Backups with Chadd Kenney: Chadd Kenney is the VP of Product at Clumio, makers of a backup as a service solution for enterprise cloud environments. Previously, he worked as VP of Products and Solutions at Pure Storage, field CTO and divisional systems engineering manager at Dell EM
Podcast episode
Bringing Visibility to Cloud Backups with Chadd Kenney: Chadd Kenney is the VP of Product at Clumio, makers of a backup as a service solution for enterprise cloud environments. Previously, he worked as VP of Products and Solutions at Pure Storage, field CTO and divisional systems engineering manager at Dell EM
byScreaming in the Cloud
0 ratings
0% found this document useful
EP152: An Inside Look Into Scalable’s Content Agency with Erin MacPherson General Manager of Scalable Brand Studios
Podcast episode
EP152: An Inside Look Into Scalable’s Content Agency with Erin MacPherson General Manager of Scalable Brand Studios
byThe DigitalMarketer Podcast
0 ratings
0% found this document useful
872: Coaching Break – The Evolution of Masterminds: Collaborative Spaces: I like to stay highly attuned to how things are changing so that I can offer a bit of forewarning. Usually in a good way, as it is in this case. What I see evolving is the intention and energy of masterminds to what I'm calling collaborative spaces....
Podcast episode
872: Coaching Break – The Evolution of Masterminds: Collaborative Spaces: I like to stay highly attuned to how things are changing so that I can offer a bit of forewarning. Usually in a good way, as it is in this case. What I see evolving is the intention and energy of masterminds to what I'm calling collaborative spaces....
byThe Self-Employed Life
0 ratings
0% found this document useful

Skip carousel

Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
Cybersecurity Made Simple: Taming The Password
The European Business Review
Article
Cybersecurity Made Simple: Taming The Password
Mar 1, 2022
8 min read
The Changing Language Of Design
HWM Singapore
Article
The Changing Language Of Design
May 2, 2019
5 min read
The Era of Human + Machine Innovation
Rotman Management
Article
The Era of Human + Machine Innovation
Jan 1, 2019
Interview by Karen Christensen In today's environment, organizations that don't keep up with customers' evolving needs are doomed. What is the best way to get a handle on these evolving needs? The first step in understanding your customers is to acce
5 min read
Customer-centric From Its Core
NZ Marketing
Article
Customer-centric From Its Core
Sep 16, 2018
What’s been the biggest change you have seen in your career? I think one of the biggest things I am seeing recently is data becoming more and more important in how businesses operate and how they deliver customer experiences. Even a few years ago whe
2 min read
Feature
AdNews
Article
Feature
May 22, 2022
4 min read
Learning to Love What I Don’t Know
Inc.
Article
Learning to Love What I Don’t Know
Nov 1, 2017
LIKE MANY WHO HAVE made the leap into Startupland, I guessed from the outset that I had a lot to learn. I was right. Indeed, I jumped into the wormhole of blind spots and unknown unknowns. This has been especially true on matters technological. At Io
2 min read
Quantum Leap
Marketing
Article
Quantum Leap
Jul 11, 2019
6 min read
Smart Answers: GenAI Tool Makes It Easier To Find The Info You Need On PCWorld
PCWorld
Article
Smart Answers: GenAI Tool Makes It Easier To Find The Info You Need On PCWorld
Sep 5, 2023
4 min read
Web App Security
Linux Format
Article
Web App Security
Jun 29, 2021
8 min read
Here’s How You Future-proof Yourself
Her World Singapore
Article
Here’s How You Future-proof Yourself
May 22, 2019
5 min read
Problems Solved
Computeractive
Article
Problems Solved
Mar 16, 2022
11 min read
How Can I Use Artificial Intelligence (AI) More Effectively At Work?
Her World Singapore
Article
How Can I Use Artificial Intelligence (AI) More Effectively At Work?
May 7, 2024
2 min read
Backups Are Dead, Long Live Backups!
Linux Format
Article
Backups Are Dead, Long Live Backups!
Aug 25, 2020
No one needs to back up any more. Yes, after decades of tech “journalists” telling you to back up your systems, modern ways of working have made the backup obsolete. No, wait, don’t go! We’re not obsolete, just yet. It’s true enough that many of us a
1 min read
Launch a Side Hustle!
Home Business Magazine
Article
Launch a Side Hustle!
Dec 31, 2019
25 min read
In Conversation with RAJIV JAYARAMAN Founder-CEO, Knolskape
Techfastly
Article
In Conversation with RAJIV JAYARAMAN Founder-CEO, Knolskape
Sep 1, 2021
14 min read
Q&A: LINKEDIN IS BULLISH ON AI. WILL THAT HELP JOB SEEKERS?
TechLife News
Article
Q&A: LINKEDIN IS BULLISH ON AI. WILL THAT HELP JOB SEEKERS?
Aug 3, 2024
4 min read
Make AI Work For You
Linux Format
Article
Make AI Work For You
Apr 2, 2024
8 min read
“Be Global But Act Local because Each Economy Is Unique”
Business Today
Article
“Be Global But Act Local because Each Economy Is Unique”
Dec 8, 2023
6 min read
Q&A: LINKEDIN IS BULLISH ON AI. WILL THAT HELP JOB SEEKERS?
AppleMagazine
Article
Q&A: LINKEDIN IS BULLISH ON AI. WILL THAT HELP JOB SEEKERS?
Aug 2, 2024
4 min read
Experimenting with AI
NZ Marketing
Article
Experimenting with AI
Dec 8, 2023
Lessons, advice and cautions from fellow Marketers. AI was everywhere in 2023. New AI tools seemed to be released every day and existing ones kept improving. No wonder it was one of the biggest topics of conversation among marketers and that everyone
7 min read
How To Master Your Portfolio For Modelling
3D World
Article
How To Master Your Portfolio For Modelling
Nov 7, 2023
6 min read
Copilot Is The Most Exciting Office Update In Years, So Why Is Jon Honeyball So Nervous?
PC Pro Magazine
Article
Copilot Is The Most Exciting Office Update In Years, So Why Is Jon Honeyball So Nervous?
Aug 10, 2023
3 min read
Transforming To Paper-lite In Your Business
Facility Management
Article
Transforming To Paper-lite In Your Business
Jun 27, 2019
4 min read
Mail Server
Linux Format
Article
Mail Server
Jun 1, 2021
In response to Jack Kendrick, in issue 275 “Pyconfusion”, this attitude is something that bugs me, especially with Windows users who bash Linux, just because you have to sometimes use some grey matter to use it. I see it all the time on forums and Fa
3 min read
Check, Please
Inc.
Article
Check, Please
Apr 4, 2023
7 min read
02 Hang-on! I’m Talking To A What?
HWM Singapore
Article
02 Hang-on! I’m Talking To A What?
Aug 10, 2023
3 min read
Everything You Need To Know About AI To Be Productive
Evening Standard
Article
Everything You Need To Know About AI To Be Productive
Jul 17, 2024
2 min read
Jasper vs Writesonic
PC Pro Magazine
Article
Jasper vs Writesonic
Apr 6, 2023
5 min read
Jobs Of The Future
True Love
Article
Jobs Of The Future
Jan 26, 2023
5 min read

Related categories

Skip carousel

Reviews for Python Feature Engineering Cookbook

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Python Feature Engineering Cookbook - Soledad Galli

Python Feature Engineering Cookbook

Python Feature Engineering Cookbook

Over 70 recipes for creating, engineering, and transforming features to build machine learning models

Soledad Galli

BIRMINGHAM - MUMBAI

Python Feature Engineering Cookbook

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Pravin Dhandre

Acquisition Editor: Devika Battike

Content Development Editor: Nathanya Dias

Senior Editor: Ayaan Hoda

Technical Editor: Manikandan Kurup

Copy Editor: Safis Editing

Project Coordinator: Aishwarya Mohan

Proofreader: Safis Editing

Indexer: Manju Arasan

Production Designer: Aparna Bhagat

First published: January 2020

Production reference: 1210120

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-78980-631-1

www.packt.com

Packt.com

Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Fully searchable for easy access to vital information

Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the author

Soledad Galli is a lead data scientist with more than 10 years of experience in world-class academic institutions and renowned businesses. She has researched, developed, and put into production machine learning models for insurance claims, credit risk assessment, and fraud prevention. Soledad received a Data Science Leaders' award in 2018 and was named one of LinkedIn's voices in data science and analytics in 2019. She is passionate about enabling people to step into and excel in data science, which is why she mentors data scientists and speaks at data science meetings regularly. She also teaches online courses on machine learning in a prestigious Massive Open Online Course platform, which have reached more than 10,000 students worldwide.

About the reviewer

Greg Walters has been involved with computers and computer programming since 1972. He is well versed in Visual Basic, Visual Basic .NET, Python, and SQL, and is an accomplished user of MySQL, SQLite, Microsoft SQL Server, Oracle, C++, Delphi, Modula-2, Pascal, C, 80x86 Assembler, COBOL, and Fortran. He is a programming trainer and has trained numerous people on many pieces of computer software, including MySQL, Open Database Connectivity, Quattro Pro, Corel Draw!, Paradox, Microsoft Word, Excel, DOS, Windows 3.11, Windows for Workgroups, Windows 95, Windows NT, Windows 2000, Windows XP, and Linux. He is semi-retired and has written over 100 articles for the Full Circle magazine. He is open to working as a freelancer on various projects.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Title Page

Copyright and Credits

Python Feature Engineering Cookbook

About Packt

Why subscribe?

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Sections

Getting ready

How to do it…

How it works…

There's more…

See also

Get in touch

Reviews

Foreseeing Variable Problems When Building ML Models

Technical requirements

Identifying numerical and categorical variables

Getting ready

How to do it...

How it works...

There's more...

See also

Quantifying missing data

Getting ready

How to do it...

How it works...

Determining cardinality in categorical variables

Getting ready

How to do it...

How it works...

There's more...

Pinpointing rare categories in categorical variables

Getting ready

How to do it...

How it works...

Identifying a linear relationship

How to do it...

How it works...

There's more...

See also

Identifying a normal distribution

How to do it...

How it works...

There's more...

See also

Distinguishing variable distribution

Getting ready

How to do it...

How it works...

See also

Highlighting outliers

Getting ready

How to do it...

How it works...

Comparing feature magnitude

Getting ready

How to do it...

How it works...

Imputing Missing Data

Technical requirements

Removing observations with missing data

How to do it...

How it works...

See also

Performing mean or median imputation

How to do it...

How it works...

There's more...

See also

Implementing mode or frequent category imputation

How to do it...

How it works...

See also

Replacing missing values with an arbitrary number

How to do it...

How it works...

There's more...

See also

Capturing missing values in a bespoke category

How to do it...

How it works...

See also

Replacing missing values with a value at the end of the distribution

How to do it...

How it works...

See also

Implementing random sample imputation

How to do it...

How it works...

See also

Adding a missing value indicator variable

Getting ready

How to do it...

How it works...

There's more...

See also

Performing multivariate imputation by chained equations

Getting ready

How to do it...

How it works...

There's more...

Assembling an imputation pipeline with scikit-learn

How to do it...

How it works...

See also

Assembling an imputation pipeline with Feature-engine

How to do it...

How it works...

See also

Encoding Categorical Variables

Technical requirements

Creating binary variables through one-hot encoding

Getting ready

How to do it...

How it works...

There's more...

See also

Performing one-hot encoding of frequent categories

Getting ready

How to do it...

How it works...

There's more...

Replacing categories with ordinal numbers

How to do it...

How it works...

There's more...

See also

Replacing categories with counts or frequency of observations

How to do it...

How it works...

There's more...

Encoding with integers in an ordered manner

How to do it...

How it works...

See also

Encoding with the mean of the target

How to do it...

How it works...

See also

Encoding with the Weight of Evidence

How to do it...

How it works...

See also

Grouping rare or infrequent categories

How to do it...

How it works...

See also

Performing binary encoding

Getting ready

How to do it...

How it works...

See also

Performing feature hashing

Getting ready

How to do it...

How it works...

See also

Transforming Numerical Variables

Technical requirements

Transforming variables with the logarithm

How to do it...

How it works...

See also

Transforming variables with the reciprocal function

How to do it...

How it works...

See also

Using square and cube root to transform variables

How to do it...

How it works...

There's more...

Using power transformations on numerical variables

How to do it...

How it works...

There's more...

See also

Performing Box-Cox transformation on numerical variables

How to do it...

How it works...

See also

Performing Yeo-Johnson transformation on numerical variables

How to do it...

How it works...

See also

Performing Variable Discretization

Technical requirements

Dividing the variable into intervals of equal width

How to do it...

How it works...

See also

Sorting the variable values in intervals of equal frequency

How to do it...

How it works...

Performing discretization followed by categorical encoding

How to do it...

How it works...

See also

Allocating the variable values in arbitrary intervals

How to do it...

How it works...

Performing discretization with k-means clustering

How to do it...

How it works...

Using decision trees for discretization

Getting ready

How to do it...

How it works...

There's more...

See also

Working with Outliers

Technical requirements

Trimming outliers from the dataset

How to do it...

How it works...

There's more...

Performing winsorization

How to do it...

How it works...

There's more...

See also

Capping the variable at arbitrary maximum and minimum values

How to do it...

How it works...

There's more...

See also

Performing zero-coding – capping the variable at zero

How to do it...

How it works...

There's more...

See also

Deriving Features from Dates and Time Variables

Technical requirements

Extracting date and time parts from a datetime variable

How to do it...

How it works...

See also

Deriving representations of the year and month

How to do it...

How it works...

See also

Creating representations of day and week

How to do it...

How it works...

See also

Extracting time parts from a time variable

How to do it...

How it works...

Capturing the elapsed time between datetime variables

How to do it...

How it works...

See also

Working with time in different time zones

How to do it...

How it works...

See also

Performing Feature Scaling

Technical requirements

Standardizing the features

How to do it...

How it works...

See also

Performing mean normalization

How to do it...

How it works...

There's more...

See also

Scaling to the maximum and minimum values

How to do it...

How it works...

See also

Implementing maximum absolute scaling

How to do it...

How it works...

There's more...

See also

Scaling with the median and quantiles

How to do it...

How it works...

See also

Scaling to vector unit length

How to do it...

How it works...

See also

Applying Mathematical Computations to Features

Technical requirements

Combining multiple features with statistical operations

Getting ready

How to do it...

How it works...

There's more...

See also

Combining pairs of features with mathematical functions

Getting ready

How to do it...

How it works...

There's more...

See also

Performing polynomial expansion

Getting ready

How to do it...

How it works...

There's more...

See also

Deriving new features with decision trees

Getting ready

How to do it...

How it works...

There's more...

Carrying out PCA

Getting ready

How to do it...

How it works...

See also

Creating Features with Transactional and Time Series Data

Technical requirements

Aggregating transactions with mathematical operations

Getting ready

How to do it...

How it works...

There's more...

See also

Aggregating transactions in a time window

Getting ready

How to do it...

How it works...

There's more...

See also

Determining the number of local maxima and minima

Getting ready

How to do it...

How it works...

There's more...

See also

Deriving time elapsed between time-stamped events

How to do it...

How it works...

There's more...

See also

Creating features from transactions with Featuretools

How to do it...

How it works...

There's more...

See also

Extracting Features from Text Variables

Technical requirements

Counting characters, words, and vocabulary

Getting ready

How to do it...

How it works...

There's more...

See also

Estimating text complexity by counting sentences

Getting ready

How to do it...

How it works...

There's more...

Creating features with bag-of-words and n-grams

Getting ready

How to do it...

How it works...

See also

Implementing term frequency-inverse document frequency

Getting ready

How to do it...

How it works...

See also

Cleaning and stemming text variables

Getting ready

How to do it...

How it works...

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

Python Feature Engineering Cookbook covers well-demonstrated recipes focused on solutions that will assist machine learning teams in identifying and extracting features to develop highly optimized and enriched machine learning models. This book includes recipes to extract and transform features from structured datasets, time series, transactions data and text. It includes recipes concerned with automating the feature engineering process, along with the widest arsenal of tools for categorical variable encoding, missing data imputation and variable discretization. Further, it provides different strategies of feature transformation, such as Box-Cox transform and other mathematical operations and includes the use of decision trees to combine existing features into new ones. Each of these recipes is demonstrated in practical terms with the help of NumPy, SciPy, pandas, scikit-learn, Featuretools and Feature-engine in Python.

Throughout this book, you will be practicing feature generation, feature extraction and transformation, leveraging the power of scikit-learn’s feature engineering arsenal, Featuretools and Feature-engine using Python and its powerful libraries.

Who this book is for

This book is intended for machine learning professionals, AI engineers, and data scientists who want to optimize and enrich their machine learning models with the best features. Prior knowledge of machine learning and Python coding is expected.

What this book covers

Chapter 1, Foreseeing Variable Problems in Building ML Models, covers how to identify the different problems that variables may present and that challenge machine learning algorithm performance. We'll learn how to identify missing data in variables, quantify the cardinality of the variable, and much more besides.

Chapter 2, Imputing Missing Data, explains how to engineer variables that show missing information for some observations. In a typical dataset, variables will display values for certain observations, while values will be missing for other observations. We'll introduce various techniques to fill those missing values with some additional values, and the code to execute the techniques.

Chapter 3, Encoding Categorical Variables, introduces various classical and widely used techniques to transform categorical variables into numerical variables and also demonstrates a technique for reducing the dimension of highly cardinal variables as well as how to tackle infrequent values. This chapter also includes more complex techniques for encoding categorical variables, as described and used in the 2009 KDD competition.

Chapter 4, Transforming Numerical Variables, uses various recipes to transform numerical variables, typically non-Gaussian, into variables that follow a more Gaussian-like distribution by applying multiple mathematical functions.

Chapter 5, Performing Variable Discretization, covers how to create bins and distribute the values of the variables across them. The aim of this technique is to improve the spread of values across a range. It also includes well established and frequently used techniques like equal width and equal frequency discretization and more complex processes like discretization with decision trees and many more.

Chapter 6, Working with Outliers, teaches a few mainstream techniques to remove outliers from the variables in the dataset. We'll also learn how to cap outliers at a given arbitrary minimum/maximum value.

Chapter 7, Deriving Features from Dates and Time Variables, describes how to create features from dates and time variables. Date variables can't be used as such to build machine learning models for multiple reasons. We'll learn how to combine information from multiple time variables, like calculating time elapsed between variables and also, importantly, working with variables in different time zones.

Chapter 8, Performing Feature Scaling, covers the methods that we can use to put the variables within the same scale. We'll also learn how to standardize variables, how to scale to minimum and maximum value, how to do mean normalization or scale to vector norm, among other techniques.

Chapter 9, Applying Mathematical Computations to Features, explains how to create new variables from existing ones by utilizing different mathematical computations. We'll learn how to create new features through the addition/difference/multiplication/division of existing variables and more. We will also learn how to expand the feature space with polynomial expansion and how to combine features using decision trees.

Chapter 10, Creating Features with Transactional and Time Series Data, covers how to create static features from transactional information, so that we obtain a static view of a customer, or client, at any point in time. We'll learn how to combine features using math operations, across transactions, in specific time windows and capture time between transactions. We'll also discuss how to determine time between special events. We'll briefly dive into signal processing and learn how to determine and quantify local maxima and local minima.

Chapter 11, Extracting Features from Text Variables, explains how to derive features from text variables. We'll learn to create new features through the addition of existing variables. We will learn how to capture the complexity of the text by capturing the number of characters, words, sentences, the vocabulary and the lexical variety. We will also learn how to create Bag of Words and how to implement TF-IDF with and without n-grams

To get the most out of this book

Python Feature Engineering Cookbook will help machine learning practitioners improve their data preprocessing and manipulation skills, empowering them to modify existing variables or create new features from existing data. You will learn how to implement many feature engineering techniques with multiple open source tools, streamlining and simplifying code while adhering to coding best practices. Thus, to make the most of this book, you are expected to have an understanding of machine learning and machine learning algorithms, some previous experience with data processing, and a degree of familiarity with datasets. In addition, working knowledge of Python and some familiarity with Python numerical computing libraries such as NumPy, pandas, Matplotlib, and scikit-learn will be beneficial. You are required to be experienced in the use of Python through Jupyter Notebooks, in iterative Python through a Python console or Command Prompt, or have experience using a dedicated Python IDE, such as PyCharm or Spyder.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Select the Support tab.

Click on Code Downloads.

Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Python-Feature-Engineering-Cookbook. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781789806311_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: The nunique() method ignores missing values by default.

A block of code is set as follows:

import pandas as pd

from sklearn.datasets import load_boston

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import PolynomialFeatures

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

X_train['A7'] = np.where(X_train['A7'].isin(frequent_cat), X_train['A7'], 'Rare')

X_test['A7'] = np.where(X_test['A7'].isin(frequent_cat), X_test['A7'], 'Rare')

Any command-line input or output is written as follows:

$ pip install feature-engine

Bold: Indicates a new term, an important word, or words that you see on screen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: Click the Download button.

Warnings or important notes appear like this.

Tips and tricks appear like this.

Sections

In this book, you will find several headings that appear frequently (Getting ready, How to do it..., How it works..., There's more..., and See also).

To give clear instructions on how to complete a recipe, use these sections as follows:

Getting ready

This section tells you what to expect in the recipe and describes how to set up any software or any preliminary settings required for the recipe.

How to do it…

This section contains the steps required to follow the recipe.

How it works…

This section usually consists of a detailed explanation of what happened in the previous section.

There's more…

This section consists of additional information about the recipe in order to make you more knowledgeable about the recipe.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Foreseeing Variable Problems When Building ML Models

A variable is a characteristic, number, or quantity that can be measured or counted. Most variables in a dataset are either numerical or categorical. Numerical variables take numbers as values and can be discrete or continuous, whereas for categorical variables, the values are selected from a group of categories,

Enjoying the preview?

Page 1 of 1

Python Feature Engineering Cookbook: Over 70 recipes for creating, engineering, and transforming features to build machine learning models

About this ebook

Soledad Galli

Related authors

Related to Python Feature Engineering Cookbook

Related ebooks

Data Modeling & Design For You

Related podcast episodes

Related articles

Related categories

Reviews for Python Feature Engineering Cookbook

What did you think?

Book preview

Python Feature Engineering Cookbook - Soledad Galli

Python Feature Engineering Cookbook

Why subscribe?

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Table of Contents

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Sections

Getting ready

There's more…

See also

Get in touch

Reviews

Foreseeing Variable Problems When Building ML Models