Week 2.vikas Gupta

INDUSTRIAL INTERNSHIP
Weekly Performance Report (WPR)
Student Name: Vikas Gupta

Head-Coordinator Name:
Team Leader: Namira Rangrej
Organization: CureYa
Hours Worked: Monday-1 hour, Tuesday-30 minutes, Wednesday-1 hour, Thursday-30 minutes,
Friday- 5 hours
Summarize your thoughts regarding your internship this week. Include duties you have performed,
facts, and procedures you have learned, skills you have mastered, and observations you have made.
Monday:
Machine Learning Fundamentals explanation With diagrams and graphs: StatQuest, Yam example
with training and testing data. Machine Learning is all about making predictions and classifications.
We use testing data to evaluate the ML methods. Bias Variance Tradeoff: fitting the training data
well but making poor predictions. Cross Validation: allows us to compare different learning methods
and get a sense of how well they will work in practice. In ML, estimating the parameters is called
“Training the Algorithm” and evaluating a method is called “Testing the Algorithm.” 4-fold cross
validation: divide the data into 4 blocks. Leave one out cross validation: Each sample is tested
individually. 10-fold cross validation: divide data into 10 blocks. Confusion Matix: Similar data in
random forest and and logic regression leads to confusion matrix. So, we use sensitivity, specificity,
ROC and LOC for better decision making. Size of a confusion matrix is determined by the number of
attributes we want to predict. A confusion matrix tells you what your machine learning algorithm did
right and what it did wrong.
Tuesday:
Sensitivity=True Positives/True Positives+ False Negatives
Specificity=True Negatives/True Negatives+ False Positives
Calculation of Sensitivity and specificity for 2*2 and 3*3 confusion matrix.
Wednesday:
Python:- Python basics, Python Data Types, Python Functions, Data Structures, Python Libraries for
data science.
Git fundamentals
Thursday:
Online Expert Webinar: Data and ML Learning Part 1:- Introduction to Google cloud platform: 4
characterstics of cloud platform. You’ll not be hosting any infrastructure inside the premises, rather
there’ll be a vendor who’ll be hosting huge amount of resources for you such as compute power,
storage, networking etc. As a user, you’ll use these resources from vendor for your work and you get
charged for it. You demand these resources dynamically and vendor provides it in a elastic way.
There’s explosion of data in the present world and only 1% of that data is actually analyzed. Google
Cloud Platform (GCP): It generates insights to explore the data. Data engineers play most important
roles and there’s a great demand for data skills. There are 4 major challenges of Big Data: a)
Migrating existing workloads, b) Analyzing large datasets at scale to determine which products are
suitable, c) Building streaming data pipelines (real time analysis), d) Applying ML to your data.
Product recommendations using cloud SQL and Spark, Classifying returning customers with BigQuery
ML, BigQuery: peta bytes scale data warehouse on GCP which helps us in generating insights. Real
time dashboards with pub/sub, data flow and data studio. Creating a pipeline on GCP, Classify
images with ML 2 ways using pre-built models.
Google’s mission: organize the world’s information and make it universally accessible and useful. In
order to organize this large data, a very string storage infrastructure using which you can organize
this type of information. In order to make this data accessible and useful, you need strong
networking server layer using which it can be accessible and then you require data analysis products
which can analyze data for user and make it useful. Fundamental Layers of GCP are Compute power
to process the data, Storage to store this data, Networking as we require this data to be available
across the globe and take good advantage from this data and Security which is most important part
for data storage. Compute Power: ML models require significant compute resources. E. g. ,
Automatic video stabilization for Google photos. Data sources: Image frames (stills from video),
Phone gyroscope, Lens motion.
What makes Google Google: Its physical network, its thousands of fiber miles and those many
thousands of servers that, in aggregate, add up to mother of all clouds. Tensor Processing Units
(TPUs) are specialized ML hardware. TPUs enable faster models and more iterations. Cloud TPU pods
have transformed the approach to visual shopping by delivering a 10X speedup over the previous
infrastructure. TPUs have specialized cooling technology and application specialized IC design.
Creating a Virtual Machine (VM) on Google Cloud Platform: Login to Google cloud platform→ go to
compute from the navigation menu→ click virtual machine→ click VM Instances→ create VM→ fill
the parameters→ click create and machine becomes available to you within 90 seconds after the
launch and green tick verifies it.
Creating a cloud storage bucket for your data: Navigation Menu→ storage→ select a plain storage.
Storage area is called a bucket.
Networking: Google private network carries as much as 40% of the world’s internet traffic every day.
Google’s data center network speed enables the separation of compute and storage.
Security: All of the world’s data becomes encrypted as soon as it enters Google’s network.
Big Data and ML products:

Query 2 billion lines of code in less than 30 seconds: Navigation Menu→ Big Data→ BigQuery
Case Study: Go JEK, Indonesian company completed. QWiklabs.
Friday:
Data and ML Learning Part-II: Module- Recommendations and Predictions:-
GCP Fundamentals: Big Data and machine learning: Product recommendations using cloud SQL and
Spark. Recommendation systems require data, a model and training/serving infrastructure to train
the model.
Use Case Study: Recommending house rental options. Train machine learning models on data not on
rules. 1) Learn from history of all houses liked by that user, 2) Choose a new house from inventory to
recommend, 3) Predict rating
2 types of methods to predict ratings: stream and batch. In stream, it’s real-time, means you’re going
to compute your rating instantaneously and you’re going to present it. In batch, we compute the
rating once per day. Compute ratings quickly by using a Hadoop cluster once each day, by batch
method. Setting up on-premise hadoop cluster is a tedious task. We use Cloud Dataproc as it helps in
hosting hadoop and spark related activities very easily on GCP.
Cloud DataProc: It’s a rich open-source eco-system for big data and it manages hadoop
implementation on GCP. Creating a cluster on machine: it takes 2 minutes for the cluster to go from
provisioning state to running state.
DataProc clusters are advantageous to on-premise clusters as you’re not required to run your
Dataproc clusters forever, unlike the on-premise clusters. It’s because you’ll be using cloud storage
which are persistent data storage to store your input and output data. These clusters can be resized
on the fly. Cloud Dataproc is serverless and easy to resize.
Cloud SQL helps in managing RDBMS on GCP. Cloud SQL is a fully managed RDBMS. Creating
instances and importing data. Predicting visitor purchases using BigQuery ML. BigQuery is 2 services
in one: 1) fast SQL Query Engine, 2) Managed storage for datasets. The only difference between
datasets and databases is that there’s no concept of indexes, primary key, foreign keys in datasets.
In BigQuery, the data is located physically in a distributed file system called Colossus.
Managed Storage for datasets:BigQuery can query external data sources in GCS and drive directly.
Querying Google Sheets from BigQuery: Go to Google drive→ create Google sheet→ fill data→ Go to
Google console→ BigQuery → create table in dataset
Geographic data and it’s advantages:1) compact representation of geographic data, 2) it’ll enable to
create visualization out of this geographic data more smoothly.
Create BQ model with SQL: BQML is a way to easily build machine learning models. The 4 steps are
1) Create dataset, 2) create/train model, 3) Evaluate model, 4) Predict/classify ML.
Creating a machine learning model with SQL and running and ending a lab.
Student Signature: Vikas Gupta Date: 7/5/2021
Head Co-ordinator Signature: Date:

Week 2.vikas Gupta

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 2.vikas Gupta

Uploaded by

Copyright:

Available Formats

INDUSTRIAL INTERNSHIP

Weekly Performance Report (WPR)

Student Name: Vikas Gupta

Big Data and ML products:

Case Study: Go JEK, Indonesian company completed. QWiklabs.

Student Signature: Vikas Gupta Date: 7/5/2021

Head Co-ordinator Signature: Date:

You might also like