George Liu

Greater Houston

17K followers 500+ connections

View mutual connections with George

Welcome back

Email or phone

Password

Forgot password?

or

New to LinkedIn? Join now

or

New to LinkedIn? Join now

University of Colorado Boulder

About

I solve business problems using data science. Combining a unique skillset of machine…

Articles by George

What Does an Ideal Data Scientist’s Profile Look Like? — Findings from Analyzing 1000 Indeed Job Postings

What Does an Ideal Data Scientist’s Profile Look Like? — Findings from Analyzing 1000 Indeed Job Postings

By George Liu

Nov 27, 2018
Navigating the Data Science Careers Landscape - Identify the Right Role to Jumpstart Your Career in Data

Navigating the Data Science Careers Landscape - Identify the Right Role to Jumpstart Your Career in Data

By George Liu

Jul 17, 2018
6 Proven Steps to Land a Job in Data Science

6 Proven Steps to Land a Job in Data Science

By George Liu

Mar 20, 2018

See all articles

Contributions

How can you ensure data cleaning enhances the credibility of your data story?

One of the most effective ways of exploring the data is to review individual samples. In either Python or R, you can do random sampling of the data to examine further, and you can do multiple rounds of sampling! The goal is to get intimate with your data and build an intuition of your data, which then can be combined with later quantitative ways of exploration to gain insights driving your analysis.

George Liu contributed 11 months ago Upvote

Activity

This repo covers everything you need to know about MLOps. The goal of the series is to understand the basics of MLOps like model building…

This repo covers everything you need to know about MLOps. The goal of the series is to understand the basics of MLOps like model building…

Liked by George Liu
The most comprehensive overview of LLM-as-a-Judge! READ IT! 🚨 "Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)” summarizes and…

The most comprehensive overview of LLM-as-a-Judge! READ IT! 🚨 "Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)” summarizes and…

Liked by George Liu
🚗 𝟮𝟭 𝗛𝗼𝘂𝗿𝘀, 𝗖𝗼𝘂𝗻𝘁𝗹𝗲𝘀𝘀 𝗠𝗶𝗹𝗲𝘀, 𝗮𝗻𝗱 𝗮 𝗦𝘂𝗺𝗺𝗲𝗿 𝘁𝗼 𝗥𝗲𝗺𝗲𝗺𝗯𝗲𝗿 Imagine packing up your life, driving 21 hours from…

🚗 𝟮𝟭 𝗛𝗼𝘂𝗿𝘀, 𝗖𝗼𝘂𝗻𝘁𝗹𝗲𝘀𝘀 𝗠𝗶𝗹𝗲𝘀, 𝗮𝗻𝗱 𝗮 𝗦𝘂𝗺𝗺𝗲𝗿 𝘁𝗼 𝗥𝗲𝗺𝗲𝗺𝗯𝗲𝗿 Imagine packing up your life, driving 21 hours from…

Liked by George Liu

Join now to see all activity

Experience

Sense

Greater Houston
-

Greater Toronto Area, Canada
-

Greater Toronto Area, Canada
-

Greater Toronto Area, Canada
-
-

Greater Toronto Area, Canada
-

Greater Toronto Area, Canada
-

Greater Toronto Area, Canada
-

Greater Toronto Area, Canada

Education

University of Colorado Boulder

Licenses & Certifications

Finetuning Large Language Models

DeepLearning.AI

Issued Aug 2024

See credential
Evaluating and Debugging Generative AI

DeepLearning.AI

Issued Jun 2024

See credential
Generative AI with Large Language Models

Coursera

Issued Jun 2024

See credential
Machine Learning Engineering for Production (MLOps) Specialization

Coursera

Issued Jan 2022

Credential ID 8ZA4CLLAJK8B

See credential
AWS Cloud Technical Essentials

Coursera

Issued Jul 2021

Credential ID HBHN5DG9EKKD

See credential
Natural Language Processing Specialization

Coursera

Issued Apr 2021

Credential ID J3P4XGPMBEEG

See credential
C++ Tutorial course

SoloLearn

Issued Dec 2018

Credential ID 1051-361187

See credential
Deep Learning Specialization

Coursera

Issued Dec 2018

Credential ID TZSD7YS85F8L

See credential
Deep Learning Nanodegree

Udacity

Issued Jan 2017

See credential
Tableau 9 Advanced Training: Master Tableau for Data Science

Udemy

Issued Sep 2016

Credential ID UC-8E2QRWJP

See credential
Bayesian Statistics

Coursera

Issued Aug 2016

Credential ID XWS8NFWX4HGY

See credential
Machine Learning Engineer Nanodegree

Udacity

Issued May 2016

See credential
Data Analyst Nanodegree

Udacity

Issued Mar 2016

See credential
Data Science Specialization

Coursera

Issued Dec 2015

Credential ID BZFMACEHYNJW

See credential
Data Analysis and Statistical Inference

Coursera

Issued Nov 2015

Credential ID BVXSJVACK4

See credential

Publications

Building an End-To-End Data Science Project

Towards Data Science December 7, 2018

Documenting various learnings from my end-to-end Ideal Profiles data science project including the process, iterative thinking, modularization and reproducibility.

See publication
Scraping Job Posting Data from Indeed using Selenium and BeautifulSoup

Towards Data Science December 2, 2018

Finally, you have this reproducible solution to gather data from Indeed that actually works, on your computer. A walk through of web scraping that requires mimicking clicking behaviour using Selenium.

See publication
What Does an Ideal Data Scientist’s Profile Look Like?

Towards Data Science November 24, 2018

A quantitative analysis of skill requirements based on Indeed job posting data for various Data Science roles

See publication

Courses

A/B Testing

-
Bayesian Statistics

-
Convolutional Neural Networks

-
Data Analysis with R

-
Data Science Capstone

-
Data Visualization and D3.js

-
Data Wrangling with MongoDB

-
Deep Learning

-
Descriptive Statistics

-
Developing Data Products

-
Exploratory Data Analysis

-
Getting and Cleaning Data

-
Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization

-
Inferential Statistics

-
Intro to Data Analysis

-
Intro to HTML and CSS

-
Intro to Hadoop and MapReduce

-
Intro to Machine Learning

-
JavaScript Basics

-
Model Evaluation & Validation

-
Neural Networks and Deep Learning

-
Practical Machine Learning

-
R Programming

-
Regression Models

-
Reinforcement Learning

-
Reproducible Research

-
Sequence Models

-
Statistical Inference

-
Structuring Machine Learning Projects

-
Supervised Learning

-
The Data Scientist’s Toolbox

-
Unsupervised Learning

-

Projects

Face Generation with Generative Adversarial Networks (GAN)

Jun 2017

Using state of the art GAN's to generate human face images based on the CelebFaces Attributes Dataset (CelebA) dataset.

See project
Language Translation with Recurrent Neural Networks (RNN)

May 2017

Training a sequence to sequence model on a dataset of English and French sentences that can translate new sentences from English to French

See project
TV Script Generation with Recurrent Neural Networks (RNN)

Apr 2017

In this project, we generate Simpsons TV scripts using RNNs. Part of the Simpsons dataset of scripts from 27 seasons are used. The Neural Network generates a new TV script for a scene at Moe's Tavern.

See project
Classifying Images with Convolutional Neural Network (CNN)

Mar 2017

In this project, we classify images from the CIFAR-10 dataset. The dataset consists of airplanes, dogs, cats, and other objects. After preprocessing the images, we train a convolutional neural network on all the samples.

See project
Using Neural Network to Predict Bike Ridership

Feb 2017

In this project, we build a neural network with Numpy and use it to predict daily bike rental ridership.

See project
Build an Answer Classifier for Quora

Jun 2016

Quora uses a combination of machine learning algorithms and moderation to ensure high-quality content on the site. High answer quality has helped Quora distinguish itself from other Q&A sites on the web. In this project, we will devise a classifier that is able to tell good answers from bad answers, as well as humans can.

See project
Identifying Dissatisfied Customers for Santander Bank with Machine Learning

May 2016

--> Technology Used: Python, Scikit Learn, xgboost.
From frontline support teams to C-suites, customer satisfaction is a key measure of success. Unhappy customers don't stick around. What's more, unhappy customers rarely voice their dissatisfaction before leaving. In a recent Kaggle competition, Santander Bank asks Kagglers to help them identify dissatisfied customers early in their relationship. Doing so would allow Santander to take proactive steps to improve a customer's happiness…

--> Technology Used: Python, Scikit Learn, xgboost.
From frontline support teams to C-suites, customer satisfaction is a key measure of success. Unhappy customers don't stick around. What's more, unhappy customers rarely voice their dissatisfaction before leaving. In a recent Kaggle competition, Santander Bank asks Kagglers to help them identify dissatisfied customers early in their relationship. Doing so would allow Santander to take proactive steps to improve a customer's happiness before it's too late. In this project, we'll tackle this competition and try to predict whether a customer is satisfied or not using customer data.

See project
Creating Customer Segmentations with Unsupervised Learning Techniques

Apr 2016

--> Technology Used: Python, Scikit Learn.
In this project, we aim to use unsupervised learning techniques to help our client, a wholesale grocery distributor, to identify different customer segments in order to better understand customer behaviors and devise corresponding marketing strategy and operational plan to better meet customers’ needs. Principal Component Analysis (PCA), Independent Component Analysis (ICA) and Gaussian Mixture Model clustering (GMM) techniques are explored and…

--> Technology Used: Python, Scikit Learn.
In this project, we aim to use unsupervised learning techniques to help our client, a wholesale grocery distributor, to identify different customer segments in order to better understand customer behaviors and devise corresponding marketing strategy and operational plan to better meet customers’ needs. Principal Component Analysis (PCA), Independent Component Analysis (ICA) and Gaussian Mixture Model clustering (GMM) techniques are explored and final customer segmentations are recommended.

See project
Train a Smart Cab to Drive with Reinforcement Learning

Apr 2016

--> Technology Used: Python, Scikit Learn, Numpy.
A smart cab is a self-driving car from the not-so-distant future that ferries people from one arbitrary location to another. In this project, we will use reinforcement learning to train a smart cab drive. In particular, the Q-learning algorithm will be implemented in a simulated environment to learn the optimal actions to take for the cab to drive to a certain target location.

The simulated environment is a grid world comprising a…

--> Technology Used: Python, Scikit Learn, Numpy.
A smart cab is a self-driving car from the not-so-distant future that ferries people from one arbitrary location to another. In this project, we will use reinforcement learning to train a smart cab drive. In particular, the Q-learning algorithm will be implemented in a simulated environment to learn the optimal actions to take for the cab to drive to a certain target location.

The simulated environment is a grid world comprising a primary driving agent (the smart cab) and several dummy agents as traffic. There are traffic lights at all intersections, the primary driving agent needs to learn to drive according to traffic and light situations.

See project
Building a Student Intervention System

Mar 2016

--> Technology Used: Python, Scikit Learn.
With the advent of the data analytics era, student learning data is becoming increasingly
available through various online education tools such as Edmodo etc. The availability of these
data, when coupled with powerful machine learning techniques, will provide invaluable insights
that were never possible before. The aim of this project is to build a student intervention
system that uses high school student data to make predictions…

--> Technology Used: Python, Scikit Learn.
With the advent of the data analytics era, student learning data is becoming increasingly
available through various online education tools such as Edmodo etc. The availability of these
data, when coupled with powerful machine learning techniques, will provide invaluable insights
that were never possible before. The aim of this project is to build a student intervention
system that uses high school student data to make predictions about their likelihood of passing
final exam, so that appropriate early intervention can be made to ensure student success.

See project
Predicting Boston Housing Prices

Mar 2016

--> Technology Used: Python, Scikit Learn.
Using historical data to predict Boston house prices.

See project
Wrangle OpenStreetMap Data

Dec 2015

--> Technology Used: Python, MongoDB.
OpenStreetMap(OSM) is an open map database built by volunteers around the world. The data created are available for free use under the Open Database License. As the data were crowd-sourced, there can be inconsistency in how the data is presented. In this project, we will improve the data quality with data wrangling and store the cleaned data in MongoDB. In particular, XML map data will be parsed, audited and processed for street type inconsistency…

--> Technology Used: Python, MongoDB.
OpenStreetMap(OSM) is an open map database built by volunteers around the world. The data created are available for free use under the Open Database License. As the data were crowd-sourced, there can be inconsistency in how the data is presented. In this project, we will improve the data quality with data wrangling and store the cleaned data in MongoDB. In particular, XML map data will be parsed, audited and processed for street type inconsistency before being reshaped and converted into JSON format data, later imported into the MongoDB database.

See project
Investigate Titanic Dataset

Nov 2015

--> Technology Used: Python, Pandas, Numpy.
RMS Titanic's sinking was one of the worst maritime disasters in modern history. With the dataset obtained from Kaggle, we can now garner some insights about the passengers on board the ship. In this project, we will examine the Titanic dataset and try to answer the following questions: Were all passengers on board equally likely to survive? If not, what were some characteristics for people who survived compared with people who didn't?

See project
A Study on Relationship between Abortion Attitude and Education Level

Oct 2015

--> Technology Used: R, dplyr.
In this project, we will research the 2012 General Social Survey data. Specifically, we’ll study the relationship between the “degree” and the “abnomore” variables to answer the question: “Is there a relationship between people’s education and whether they support abortions for women who are married and don’t want any more children?”

The General Social Survey (GSS) is a national research project intended to identify attributes and attitudes of the…

--> Technology Used: R, dplyr.
In this project, we will research the 2012 General Social Survey data. Specifically, we’ll study the relationship between the “degree” and the “abnomore” variables to answer the question: “Is there a relationship between people’s education and whether they support abortions for women who are married and don’t want any more children?”

The General Social Survey (GSS) is a national research project intended to identify attributes and attitudes of the American society and to facilitate comparison between the US and other countries. The research scope includes respondents’ background, personal and family information, societal concerns, workplace and economic concerns etc. The survey covered the period from 1972 to 2012 (in this data set).

See project
Test a Perceptual Phenomenon - Stroop Effect

Oct 2015

In a Stroop task, participants are presented with a list of words, with each word displayed in a color of ink. The participant’s task is to say out loud the color of the ink in which the word is printed. The task has two conditions: a congruent words condition, and an incongruent words condition. In the congruent words condition, the words being displayed are color words whose names match the colors in which they are printed: for example RED, BLUE. In the incongruent words condition, the words…

In a Stroop task, participants are presented with a list of words, with each word displayed in a color of ink. The participant’s task is to say out loud the color of the ink in which the word is printed. The task has two conditions: a congruent words condition, and an incongruent words condition. In the congruent words condition, the words being displayed are color words whose names match the colors in which they are printed: for example RED, BLUE. In the incongruent words condition, the words displayed are color words whose names do not match the colors in which they are printed: for example PURPLE, ORANGE. In each project, we measure the time it takes to name the ink colors in equally-sized lists. Each participant will go through and record a time from each condition.

See project
Human Activity Recognition Using Machine Learning

Sep 2015

--> Technology Used: R, dplyr, caret.
Using devices such as Misfit and Fitbit it is now possible to collect a large amount of data about personal activity relatively inexpensively. One thing that people regularly do is to quantify how much of a particular activity they do, but they rarely quantify how well they do it. In this project, we try to identify human activity using data from accelerometers on the belt, forearm, arm, and dumbell of 6 participants.

See project
Storm Data Analysis

Aug 2015

--> Technology Used: R, Dplyr.
In this project, we look at the national storm data produced by the National Weather Service. The data covered period from 1950 to 2011. In particular, the data are used to identify the events that are most harmful to population health and the events that have the greatest economic consequences.

See project
Flight Delay Data Visualization

Feb 2016 - Mar 2016

--> Technology Used: HTML, CSS, Javascript, D3.js, Dimple.js.
This visualization provides a quick overview of United States air carriers' flight delay situation. It highlights the percentages of delays and average delay time across different carriers. The visualization is based on RITA's 2008 flight delays data (the data is first transformed and summarized using R). UniqueCarrier (carrier name) and ArrDelay (arrival delay time in minutes) are the two variables that are examined for…

--> Technology Used: HTML, CSS, Javascript, D3.js, Dimple.js.
This visualization provides a quick overview of United States air carriers' flight delay situation. It highlights the percentages of delays and average delay time across different carriers. The visualization is based on RITA's 2008 flight delays data (the data is first transformed and summarized using R). UniqueCarrier (carrier name) and ArrDelay (arrival delay time in minutes) are the two variables that are examined for insights in this project.

See project
Free Trial Screener A/B Test

Jan 2016 - Feb 2016

A/B test is an effective and powerful tool in web site optimization and app development. It is widely used by data scientists in the technology industry. In this project, we will go over the full process of A/B testing and design an A/B test for Udacity in order to make a decision about the launch of a Free Trial Screener.

See project
Identify Fraud from Enron Dataset

Dec 2015 - Jan 2016

--> Technology Used: Python, Scikit Learn.
Once a gigantic corporation claiming to have revenues of 111 billion dollars, Enron’s share price decreased from 90 dollars to just pennies in mid-2000 before it filed for bankruptcy in 2001. The Enron Scandal was a result of “creative accounting” and corporate fraud, and this fraud was linked to a small number of employees out of Enron’s 20,000 staff. Given data, is it possible to detect fraud and identify “person of interest” for…

--> Technology Used: Python, Scikit Learn.
Once a gigantic corporation claiming to have revenues of 111 billion dollars, Enron’s share price decreased from 90 dollars to just pennies in mid-2000 before it filed for bankruptcy in 2001. The Enron Scandal was a result of “creative accounting” and corporate fraud, and this fraud was linked to a small number of employees out of Enron’s 20,000 staff. Given data, is it possible to detect fraud and identify “person of interest” for investigation? In this project, we will use the publicly available Enron dataset that contains financial and email data of 146 people to identify fraud using machine learning techniques.

See project
Understanding Wine Quality through the Lens of Data Analysis Using R

Dec 2015 - Jan 2016

--> Technology Used: R, ggplot2.
We have always relied on wine experts who use their esoteric jargon to rate wine qualities for us. But what exactly is wine quality based on? What are the criteria? In this project, we look at the Wine Quality dataset and use data analysis methods with R to explore the relationship between wine quality and various attributes such as acidity, sugar and alcohol.

See project
Mining Customer Reviews for Business Opportunities

Oct 2015 - Nov 2015

--> Technology Used: R, tm, dplyr.
Yelp.com is available in 15 languages and has 142 million unique visitors per month. As a result, a huge amount of customer review data is generated on a regular basis. How can we turn the huge amount data generated by Yelp into business insights that are beneficial for the business community? In this project, we try to answer this question by examining data provided by the Yelp Data Set Challenge to garner business insights.

See project
Lansing Dental Growth Strategy Project

Jul 2013 - Aug 2013
The project aimed at identifying the optimal growth strategy for Lansing Dental Center.

Other creators
Call Center Operations Innovation Project

Jun 2013 - Aug 2013
The project aimed at identifying improvement opportunities of the MCIS call center operations.

Other creators

Recommendations received

42 people have recommended George

Join now to view

More activity by George

🚀 The Future of Talent Acquisition is Here! The way we recruit is evolving with the rise of intelligent recruitment automation - many…

🚀 The Future of Talent Acquisition is Here! The way we recruit is evolving with the rise of intelligent recruitment automation - many…

Liked by George Liu
I'm sharing a free Gemma-2 2b finetuning & a chat UI notebook! Google released a free model which beats ChatGPT 3.5! & finetuning with 🦥Unsloth AI…

I'm sharing a free Gemma-2 2b finetuning & a chat UI notebook! Google released a free model which beats ChatGPT 3.5! & finetuning with 🦥Unsloth AI…

Liked by George Liu
If you're looking to finetune Llama 3.1, did you know Kaggle offers 30 hours of GPUs for free per week? Also with Unsloth AI, get 2x faster and 60%…

If you're looking to finetune Llama 3.1, did you know Kaggle offers 30 hours of GPUs for free per week? Also with Unsloth AI, get 2x faster and 60%…

Liked by George Liu
A new version of 🤗 transformers has landed 🛬 v4.44 ended up being a performance-oriented upgrade for LLM users: faster compiled models, lower GPU…

A new version of 🤗 transformers has landed 🛬 v4.44 ended up being a performance-oriented upgrade for LLM users: faster compiled models, lower GPU…

Liked by George Liu
📢 Full FP8 Llama 3.1 405B Now Available! 📢 Exciting news from Neural Magic! Our research team has successfully compressed the largest model from…

📢 Full FP8 Llama 3.1 405B Now Available! 📢 Exciting news from Neural Magic! Our research team has successfully compressed the largest model from…

Liked by George Liu
Microsoft just open-sourced GraphRAG. It might be the best Python library to extract insights from unstructured text. It uses LLMs to automate the…

Microsoft just open-sourced GraphRAG. It might be the best Python library to extract insights from unstructured text. It uses LLMs to automate the…

Liked by George Liu
Evaluating LLMs are very crucial for generative models. Do you know how to evaluate such models? Here are few popular metrics. 📍…

Evaluating LLMs are very crucial for generative models. Do you know how to evaluate such models? Here are few popular metrics. 📍…

Liked by George Liu
Super cool just got my first Kaggle Silver Medal for our Mistral notebook! Unsloth AI makes finetuning LLMs like Mistral, Llama-3, Phi-3 2x faster &…

Super cool just got my first Kaggle Silver Medal for our Mistral notebook! Unsloth AI makes finetuning LLMs like Mistral, Llama-3, Phi-3 2x faster &…

Liked by George Liu
We're sharing 2 free Colab notebooks for continued pretraining with QLoRA! Our new 🦥Unsloth AI release allows you to easily continually pretrain…

We're sharing 2 free Colab notebooks for continued pretraining with QLoRA! Our new 🦥Unsloth AI release allows you to easily continually pretrain…

Liked by George Liu
From where I'm standing, LLM tool use is likely one of the most promising paradigms for the future of computing. Software engineers should start to…

From where I'm standing, LLM tool use is likely one of the most promising paradigms for the future of computing. Software engineers should start to…

Liked by George Liu
Despite all the initial excitement around GenAI, people’s expectations of it are starting to evolve. For those in the talent acquisition and HR…

Despite all the initial excitement around GenAI, people’s expectations of it are starting to evolve. For those in the talent acquisition and HR…

Liked by George Liu

View George’s full profile

See who you know in common
Get introduced
Contact George directly

Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Others named George Liu in United States

201 others named George Liu in United States are on LinkedIn

See others named George Liu

Add new skills with these courses

See all courses