George Liu

George Liu

Greater Houston
17K followers 500+ connections

About

I solve business problems using data science. Combining a unique skillset of machine…

Articles by George

See all articles

Contributions

Activity

Join now to see all activity

Experience

  • Sense Graphic

    Sense

    Greater Houston

  • -

    Greater Toronto Area, Canada

  • -

    Greater Toronto Area, Canada

  • -

    Greater Toronto Area, Canada

  • -

  • -

    Greater Toronto Area, Canada

  • -

    Greater Toronto Area, Canada

  • -

    Greater Toronto Area, Canada

  • -

    Greater Toronto Area, Canada

Education

Licenses & Certifications

Publications

Courses

  • A/B Testing

    -

  • Bayesian Statistics

    -

  • Convolutional Neural Networks

    -

  • Data Analysis with R

    -

  • Data Science Capstone

    -

  • Data Visualization and D3.js

    -

  • Data Wrangling with MongoDB

    -

  • Deep Learning

    -

  • Descriptive Statistics

    -

  • Developing Data Products

    -

  • Exploratory Data Analysis

    -

  • Getting and Cleaning Data

    -

  • Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization

    -

  • Inferential Statistics

    -

  • Intro to Data Analysis

    -

  • Intro to HTML and CSS

    -

  • Intro to Hadoop and MapReduce

    -

  • Intro to Machine Learning

    -

  • JavaScript Basics

    -

  • Model Evaluation & Validation

    -

  • Neural Networks and Deep Learning

    -

  • Practical Machine Learning

    -

  • R Programming

    -

  • Regression Models

    -

  • Reinforcement Learning

    -

  • Reproducible Research

    -

  • Sequence Models

    -

  • Statistical Inference

    -

  • Structuring Machine Learning Projects

    -

  • Supervised Learning

    -

  • The Data Scientist’s Toolbox

    -

  • Unsupervised Learning

    -

Projects

  • Face Generation with Generative Adversarial Networks (GAN)

    Using state of the art GAN's to generate human face images based on the CelebFaces Attributes Dataset (CelebA) dataset.

    See project
  • Language Translation with Recurrent Neural Networks (RNN)

    Training a sequence to sequence model on a dataset of English and French sentences that can translate new sentences from English to French

    See project
  • TV Script Generation with Recurrent Neural Networks (RNN)

    In this project, we generate Simpsons TV scripts using RNNs. Part of the Simpsons dataset of scripts from 27 seasons are used. The Neural Network generates a new TV script for a scene at Moe's Tavern.

    See project
  • Classifying Images with Convolutional Neural Network (CNN)

    In this project, we classify images from the CIFAR-10 dataset. The dataset consists of airplanes, dogs, cats, and other objects. After preprocessing the images, we train a convolutional neural network on all the samples.

    See project
  • Using Neural Network to Predict Bike Ridership

    In this project, we build a neural network with Numpy and use it to predict daily bike rental ridership.

    See project
  • Build an Answer Classifier for Quora

    Quora uses a combination of machine learning algorithms and moderation to ensure high-quality content on the site. High answer quality has helped Quora distinguish itself from other Q&A sites on the web. In this project, we will devise a classifier that is able to tell good answers from bad answers, as well as humans can.

    See project
  • Identifying Dissatisfied Customers for Santander Bank with Machine Learning

    --> Technology Used: Python, Scikit Learn, xgboost.
    From frontline support teams to C-suites, customer satisfaction is a key measure of success. Unhappy customers don't stick around. What's more, unhappy customers rarely voice their dissatisfaction before leaving. In a recent Kaggle competition, Santander Bank asks Kagglers to help them identify dissatisfied customers early in their relationship. Doing so would allow Santander to take proactive steps to improve a customer's happiness…

    --> Technology Used: Python, Scikit Learn, xgboost.
    From frontline support teams to C-suites, customer satisfaction is a key measure of success. Unhappy customers don't stick around. What's more, unhappy customers rarely voice their dissatisfaction before leaving. In a recent Kaggle competition, Santander Bank asks Kagglers to help them identify dissatisfied customers early in their relationship. Doing so would allow Santander to take proactive steps to improve a customer's happiness before it's too late. In this project, we'll tackle this competition and try to predict whether a customer is satisfied or not using customer data.

    See project
  • Creating Customer Segmentations with Unsupervised Learning Techniques

    --> Technology Used: Python, Scikit Learn.
    In this project, we aim to use unsupervised learning techniques to help our client, a wholesale grocery distributor, to identify different customer segments in order to better understand customer behaviors and devise corresponding marketing strategy and operational plan to better meet customers’ needs. Principal Component Analysis (PCA), Independent Component Analysis (ICA) and Gaussian Mixture Model clustering (GMM) techniques are explored and…

    --> Technology Used: Python, Scikit Learn.
    In this project, we aim to use unsupervised learning techniques to help our client, a wholesale grocery distributor, to identify different customer segments in order to better understand customer behaviors and devise corresponding marketing strategy and operational plan to better meet customers’ needs. Principal Component Analysis (PCA), Independent Component Analysis (ICA) and Gaussian Mixture Model clustering (GMM) techniques are explored and final customer segmentations are recommended.

    See project
  • Train a Smart Cab to Drive with Reinforcement Learning

    --> Technology Used: Python, Scikit Learn, Numpy.
    A smart cab is a self-driving car from the not-so-distant future that ferries people from one arbitrary location to another. In this project, we will use reinforcement learning to train a smart cab drive. In particular, the Q-learning algorithm will be implemented in a simulated environment to learn the optimal actions to take for the cab to drive to a certain target location.

    The simulated environment is a grid world comprising a…

    --> Technology Used: Python, Scikit Learn, Numpy.
    A smart cab is a self-driving car from the not-so-distant future that ferries people from one arbitrary location to another. In this project, we will use reinforcement learning to train a smart cab drive. In particular, the Q-learning algorithm will be implemented in a simulated environment to learn the optimal actions to take for the cab to drive to a certain target location.

    The simulated environment is a grid world comprising a primary driving agent (the smart cab) and several dummy agents as traffic. There are traffic lights at all intersections, the primary driving agent needs to learn to drive according to traffic and light situations.

    See project
  • Building a Student Intervention System

    --> Technology Used: Python, Scikit Learn.
    With the advent of the data analytics era, student learning data is becoming increasingly
    available through various online education tools such as Edmodo etc. The availability of these
    data, when coupled with powerful machine learning techniques, will provide invaluable insights
    that were never possible before. The aim of this project is to build a student intervention
    system that uses high school student data to make predictions…

    --> Technology Used: Python, Scikit Learn.
    With the advent of the data analytics era, student learning data is becoming increasingly
    available through various online education tools such as Edmodo etc. The availability of these
    data, when coupled with powerful machine learning techniques, will provide invaluable insights
    that were never possible before. The aim of this project is to build a student intervention
    system that uses high school student data to make predictions about their likelihood of passing
    final exam, so that appropriate early intervention can be made to ensure student success.

    See project
  • Predicting Boston Housing Prices

    --> Technology Used: Python, Scikit Learn.
    Using historical data to predict Boston house prices.

    See project
  • Wrangle OpenStreetMap Data

    --> Technology Used: Python, MongoDB.
    OpenStreetMap(OSM) is an open map database built by volunteers around the world. The data created are available for free use under the Open Database License. As the data were crowd-sourced, there can be inconsistency in how the data is presented. In this project, we will improve the data quality with data wrangling and store the cleaned data in MongoDB. In particular, XML map data will be parsed, audited and processed for street type inconsistency…

    --> Technology Used: Python, MongoDB.
    OpenStreetMap(OSM) is an open map database built by volunteers around the world. The data created are available for free use under the Open Database License. As the data were crowd-sourced, there can be inconsistency in how the data is presented. In this project, we will improve the data quality with data wrangling and store the cleaned data in MongoDB. In particular, XML map data will be parsed, audited and processed for street type inconsistency before being reshaped and converted into JSON format data, later imported into the MongoDB database.

    See project
  • Investigate Titanic Dataset

    --> Technology Used: Python, Pandas, Numpy.
    RMS Titanic's sinking was one of the worst maritime disasters in modern history. With the dataset obtained from Kaggle, we can now garner some insights about the passengers on board the ship. In this project, we will examine the Titanic dataset and try to answer the following questions: Were all passengers on board equally likely to survive? If not, what were some characteristics for people who survived compared with people who didn't?

    See project
  • A Study on Relationship between Abortion Attitude and Education Level

    --> Technology Used: R, dplyr.
    In this project, we will research the 2012 General Social Survey data. Specifically, we’ll study the relationship between the “degree” and the “abnomore” variables to answer the question: “Is there a relationship between people’s education and whether they support abortions for women who are married and don’t want any more children?”

    The General Social Survey (GSS) is a national research project intended to identify attributes and attitudes of the…

    --> Technology Used: R, dplyr.
    In this project, we will research the 2012 General Social Survey data. Specifically, we’ll study the relationship between the “degree” and the “abnomore” variables to answer the question: “Is there a relationship between people’s education and whether they support abortions for women who are married and don’t want any more children?”

    The General Social Survey (GSS) is a national research project intended to identify attributes and attitudes of the American society and to facilitate comparison between the US and other countries. The research scope includes respondents’ background, personal and family information, societal concerns, workplace and economic concerns etc. The survey covered the period from 1972 to 2012 (in this data set).

    See project
  • Test a Perceptual Phenomenon - Stroop Effect

    In a Stroop task, participants are presented with a list of words, with each word displayed in a color of ink. The participant’s task is to say out loud the color of the ink in which the word is printed. The task has two conditions: a congruent words condition, and an incongruent words condition. In the congruent words condition, the words being displayed are color words whose names match the colors in which they are printed: for example RED, BLUE. In the incongruent words condition, the words…

    In a Stroop task, participants are presented with a list of words, with each word displayed in a color of ink. The participant’s task is to say out loud the color of the ink in which the word is printed. The task has two conditions: a congruent words condition, and an incongruent words condition. In the congruent words condition, the words being displayed are color words whose names match the colors in which they are printed: for example RED, BLUE. In the incongruent words condition, the words displayed are color words whose names do not match the colors in which they are printed: for example PURPLE, ORANGE. In each project, we measure the time it takes to name the ink colors in equally-sized lists. Each participant will go through and record a time from each condition.

    See project
  • Human Activity Recognition Using Machine Learning

    --> Technology Used: R, dplyr, caret.
    Using devices such as Misfit and Fitbit it is now possible to collect a large amount of data about personal activity relatively inexpensively. One thing that people regularly do is to quantify how much of a particular activity they do, but they rarely quantify how well they do it. In this project, we try to identify human activity using data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants.

    See project
  • Storm Data Analysis

    --> Technology Used: R, Dplyr.
    In this project, we look at the national storm data produced by the National Weather Service. The data covered period from 1950 to 2011. In particular, the data are used to identify the events that are most harmful to population health and the events that have the greatest economic consequences.

    See project
  • Flight Delay Data Visualization

    -

    --> Technology Used: HTML, CSS, Javascript, D3.js, Dimple.js.
    This visualization provides a quick overview of United States air carriers' flight delay situation. It highlights the percentages of delays and average delay time across different carriers. The visualization is based on RITA's 2008 flight delays data (the data is first transformed and summarized using R). UniqueCarrier (carrier name) and ArrDelay (arrival delay time in minutes) are the two variables that are examined for…

    --> Technology Used: HTML, CSS, Javascript, D3.js, Dimple.js.
    This visualization provides a quick overview of United States air carriers' flight delay situation. It highlights the percentages of delays and average delay time across different carriers. The visualization is based on RITA's 2008 flight delays data (the data is first transformed and summarized using R). UniqueCarrier (carrier name) and ArrDelay (arrival delay time in minutes) are the two variables that are examined for insights in this project.

    See project
  • Free Trial Screener A/B Test

    -

    A/B test is an effective and powerful tool in web site optimization and app development. It is widely used by data scientists in the technology industry. In this project, we will go over the full process of A/B testing and design an A/B test for Udacity in order to make a decision about the launch of a Free Trial Screener.

    See project
  • Identify Fraud from Enron Dataset

    -

    --> Technology Used: Python, Scikit Learn.
    Once a gigantic corporation claiming to have revenues of 111 billion dollars, Enron’s share price decreased from 90 dollars to just pennies in mid-2000 before it filed for bankruptcy in 2001. The Enron Scandal was a result of “creative accounting” and corporate fraud, and this fraud was linked to a small number of employees out of Enron’s 20,000 staff. Given data, is it possible to detect fraud and identify “person of interest” for…

    --> Technology Used: Python, Scikit Learn.
    Once a gigantic corporation claiming to have revenues of 111 billion dollars, Enron’s share price decreased from 90 dollars to just pennies in mid-2000 before it filed for bankruptcy in 2001. The Enron Scandal was a result of “creative accounting” and corporate fraud, and this fraud was linked to a small number of employees out of Enron’s 20,000 staff. Given data, is it possible to detect fraud and identify “person of interest” for investigation? In this project, we will use the publicly available Enron dataset that contains financial and email data of 146 people to identify fraud using machine learning techniques.

    See project
  • Understanding Wine Quality through the Lens of Data Analysis Using R

    -

    --> Technology Used: R, ggplot2.
    We have always relied on wine experts who use their esoteric jargon to rate wine qualities for us. But what exactly is wine quality based on? What are the criteria? In this project, we look at the Wine Quality dataset and use data analysis methods with R to explore the relationship between wine quality and various attributes such as acidity, sugar and alcohol.

    See project
  • Mining Customer Reviews for Business Opportunities

    -

    --> Technology Used: R, tm, dplyr.
    Yelp.com is available in 15 languages and has 142 million unique visitors per month. As a result, a huge amount of customer review data is generated on a regular basis. How can we turn the huge amount data generated by Yelp into business insights that are beneficial for the business community? In this project, we try to answer this question by examining data provided by the Yelp Data Set Challenge to garner business insights.

    See project
  • Lansing Dental Growth Strategy Project

    -

    The project aimed at identifying the optimal growth strategy for Lansing Dental Center.

    Other creators
  • Call Center Operations Innovation Project

    -

    The project aimed at identifying improvement opportunities of the MCIS call center operations.

    Other creators

Recommendations received

More activity by George

View George’s full profile

  • See who you know in common
  • Get introduced
  • Contact George directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named George Liu in United States

Add new skills with these courses