Ajay Kadiyala

Bengaluru, Karnataka, India

108K followers 500+ connections

View mutual connections with Ajay

Welcome back

Email or phone

Password

Forgot password?

or

New to LinkedIn? Join now

or

New to LinkedIn? Join now

Join to follow

Brillio

Siddhartha Institute of Engineering & Technology.

About

As a Lead Data Engineer with 6.5+ years of experience, At Brillio, my team and I tackle…

Contributions

How do you document data lineage for stakeholders?

as per my experience, to document data lineage for stakeholders, we need to consider, Use flowcharts or data lineage tools Maintain a catalog with detailed metadata Develop dictionaries explaining data definitions Leverage specialized tools to automate lineage documentation. Establish clear documentation standards Collaborate with stakeholders to validate and refine the lineage documentation. Ensure easy access to lineage information for all relevant parties. Implement version control to track changes in data lineage over time. Provide training to stakeholders on how to interpret and use data lineage information. Enforce data governance practices to maintain data lineage accuracy and consistency.

Ajay Kadiyala contributed 11 months ago 1 Upvote
What are the best ways to stay current in data engineering and science?

To stay current in data engineering and data science is very hard, but following this approach definitely put you on top. Continuous Learning Industry Forums Hands-On Projects Stay Inquisitive Read and Research Join Communities Follow Thought Leaders Build a Portfolio Experiment and Innovate Collaborate

Ajay Kadiyala contributed 11 months ago Upvote
What factors should you consider when choosing a data extraction tool?

When choosing a data extraction tool, several critical factors they are: Firstly, compatibility with your data sources and formats is crucial. Ensure that the tool supports the data types and systems you need to extract from. Secondly, consider scalability and performance; the tool should handle your data volume and extraction frequency efficiently. Data security and compliance are paramount, so choose a tool that offers robust encryption and data protection features. Ease of use and integration capabilities with your existing tech stack are also essential to streamline workflows. Lastly, evaluate pricing models and ongoing support to ensure the tool aligns with your budget and provides reliable customer assistance when needed.

Ajay Kadiyala contributed 11 months ago Upvote
How do you manage data growth and scalability in your storage system?

Use following manage data growth and scalability effectively. 1. Cloud Storage 2. Data Compression 3. Distributed File Systems 4. Data Lifecycle Management 5. Horizontal Scaling 6. Data Partitioning 7. Data Deduplication 8. Monitoring and Optimization 9. Cloud Data Warehouses 10. NoSQL Databases

Ajay Kadiyala contributed 11 months ago 1 Upvote
How can businesses maximize the benefits of big data analytics?

big data analytics has the potential to revolutionize how businesses operate and compete. By setting clear objectives, investing in data quality and talent, and embracing advanced analytics techniques, companies can maximize the benefits of big data analytics. Additionally, a commitment to data security, ethics, and continuous improvement is crucial for long-term success in the data-driven landscape. 1. Define Clear Objectives 2. Collect Relevant Data 3. Invest in Data Quality 4. Utilize Advanced Analytics 5. Embrace Real-Time Analytics 6. Scalability 7. Data Security 8. Data Integration 9. Data Governance 10. Invest in Talent

Ajay Kadiyala contributed 11 months ago Upvote
How can you use animation to tell your data's story?

Animation is a powerful technique in data visualization that involves the use of moving elements, transitions, and dynamic effects to enhance the understanding and engagement of data. It can be used in various data presentation contexts, including reports, presentations, and interactive dashboards. Animation helps in highlighting key insights, simplifying complex data, and guiding the viewer's focus. In today's data-driven world, effectively conveying complex information is crucial. Data visualization has become the go-to method for making data understandable and actionable, but to truly make an impact, consider incorporating animation into your data storytelling toolkit.

Ajay Kadiyala contributed 11 months ago Upvote

Join now to see all contributions

Activity

Want to be a data engineer? Here are the 5 must-have skills: [1] SQL: • Learn to write complex queries to extract, transform, and load…

Want to be a data engineer? Here are the 5 must-have skills: [1] SQL: • Learn to write complex queries to extract, transform, and load…

Liked by Ajay Kadiyala
India to build tunnels worth 1 lakh crore in the next few years! Ministry of roads transport and highways, Nitin Gadkari is set to revolutionize…

India to build tunnels worth 1 lakh crore in the next few years! Ministry of roads transport and highways, Nitin Gadkari is set to revolutionize…

Liked by Ajay Kadiyala
𝗧𝘂𝗻𝗲 𝗢𝘂𝘁 𝗡𝗼𝗶𝘀𝗲 𝗪𝗶𝘁𝗵 𝗖𝗵𝗮𝗶𝗻-𝗼𝗳-𝗧𝗵𝗼𝘂𝗴𝗵𝘁 𝗣𝗿𝗼𝗺𝗽𝘁𝗶𝗻𝗴 🍊 Chain-of-Thought (CoT) Prompting is a strategy that…

𝗧𝘂𝗻𝗲 𝗢𝘂𝘁 𝗡𝗼𝗶𝘀𝗲 𝗪𝗶𝘁𝗵 𝗖𝗵𝗮𝗶𝗻-𝗼𝗳-𝗧𝗵𝗼𝘂𝗴𝗵𝘁 𝗣𝗿𝗼𝗺𝗽𝘁𝗶𝗻𝗴 🍊 Chain-of-Thought (CoT) Prompting is a strategy that…

Liked by Ajay Kadiyala

Join now to see all activity

Experience

Brillio

Bengaluru, Karnataka, India
-

Bengaluru, Karnataka, India
-

Bengaluru, Karnataka, India
-

Bangalore Urban, Karnataka, India
-

Bengaluru Area, India

Education

Siddhartha Institute of Engineering & Technology.

2014 - 2017

Activities and Societies: Qualified for project at Central Tool Design organization Hyderabad. Represented District Level Cricket, Kabaddi Teams Volunteered at social activities through NGC(National Green Corps)
2011 - 2014

Activities and Societies: Volunteered and certified at Indian Armed forces. Proficient at playing Volleyball. Certified Intern at National atmospheric research laboratory( ISRO affiliated.)

Licenses & Certifications

Project Management Foundations

LinkedIn

Issued May 2022

See credential
Hands On Essentials - Data Warehouse

Snowflake

Issued Mar 2022

See credential
Microsoft Certified: Azure Data Scientist Associate

Microsoft

Issued Jan 2022

Credential ID 990420255

See credential
Microsoft Certified: Azure Data Fundamentals

Microsoft

Issued Apr 2021

Credential ID H764-3371

See credential
MySQL

Udemy

Issued Apr 2021
Microsoft Certified: Azure Fundamentals

Microsoft

Issued Oct 2020

Credential ID 990420255

See credential
Data Analysis with Python

Coursera

Issued Sep 2020

Credential ID 5ZKVLNBUHC8F

See credential
Data Science Methodology

Coursera

Issued Sep 2020

Credential ID K6VA4KZKTJQP

See credential
Databases and SQL for Data Science

Coursera

Issued Sep 2020

Credential ID FGBBB72RVN63

See credential
Python for Data Science and AI

Coursera

Issued Sep 2020

Credential ID 8C7587SYY7T8

See credential
Data Fluency: Exploring and Describing Data

LinkedIn

Issued Aug 2020

See credential
Data Science Orientation

Coursera

Issued Aug 2020

Credential ID [email protected]

See credential
Excel Statistics Essential Training: 1

LinkedIn

Issued Aug 2020

See credential
Learning Data Analytics

LinkedIn

Issued Aug 2020

See credential
Tools for Data Science

Coursera

Issued Aug 2020

Credential ID SLCFNFXWCNQT

See credential
What is Data Science?

Coursera

Issued Aug 2020

Credential ID RNYPDHNG5J7F

See credential
Game development using Pyga

GUVI Geek Networks, IITM Research Park

Issued Jul 2020

Credential ID 5u5P6b916N46144av9

See credential
Introduction to Programming Using Python

Udemy

Issued Jul 2020
Introduction to Programming Using Python

Udemy

Issued Jul 2020
SnowPro Core Certification

Snowflake

Issued Oct 2023 Expires Oct 2025

See credential
Microsoft Certified: Azure Developer Associate

Microsoft

Issued Sep 2023 Expires Sep 2024

Credential ID CD67C36CBD25D72F

See credential
Microsoft Certified: Azure Data Engineer Associate

Microsoft

Issued Jun 2022 Expires Jun 2023

Credential ID 990420255

See credential
Microsoft Certified: Power BI Data Analyst Associate

Microsoft

Issued Mar 2022 Expires Mar 2023

Credential ID I199-6036

See credential
ccna routing&switching

Cisco

Projects

Log Analytics Project with Spark Streaming and Kafka

Jan 2023 - Jan 2023

What is Log Processing?
The process of evaluating, understanding, and comprehending computer-generated documents known as logs is known as log analysis. A wide range of programmable technologies, including networking devices, operating systems, apps, and more, produce logs. A log is a collection of messages in chronological order that describe what is going on in a system. Regardless, log analysis is the subtle technique of evaluating and interpreting these messages in order to get insights…

What is Log Processing?
The process of evaluating, understanding, and comprehending computer-generated documents known as logs is known as log analysis. A wide range of programmable technologies, including networking devices, operating systems, apps, and more, produce logs. A log is a collection of messages in chronological order that describe what is going on in a system. Regardless, log analysis is the subtle technique of evaluating and interpreting these messages in order to get insights into any system's underlying functioning. Web server log analysis can offer important insights on everything from security to customer service to SEO. The information collected in web server logs can help you with:
Network troubleshooting efforts
Development and quality assurance
Identifying and understanding security issues
Customer service
Maintaining compliance with both government and corporate policies
The common log-file format is as follows:
remotehost rfc931 authuser [date] "request" status bytes

Key Takeaways:
● Understanding the project and how to use AWS EC2 Instance
● Understanding the basics of Containers, log analysis, and their application & Port Forwarding
● Visualizing the complete Architecture of the system
● Introduction to Docker
● Usage of docker-composer and starting all tools
● Exploring dataset and common log format
● Understanding Lambda Architecture.
● Installing NiFi and using it for data ingestion
● Installing Kafka and using it for creating topics
● Publishing logs using NiFi
● Integration of NiFi and Kafka
● Installing Spark and using it for data processing and cleaning
● Integration of Kafka and Spark
● Reading data from Kafka via Spark structured streaming API
● Installing and creating namespace and table in Cassandra
● Integration of Spark and Cassandra
● Continuously loading data in Cassandra for aggregated results.
● Integrating Cassandra and Plotly and Dash
● Displaying live stream, Hourly and Daily results using Python Plotly and Dash

See project
Retail Analytics Project Example using Sqoop, HDFS, and Hive

Jan 2023 - Jan 2023

Business Overview:
Retail analytics is the process of delivering analytical data on inventory levels, supply chain movement, customer demand, sales, and other important factors for marketing and procurement choices. Demand and supply data analytics may be utilized to manage procurement levels as well as make marketing decisions. Retail analytics provides us with precise consumer insights and insights into the organization's business and procedures, as well as the scope and need for…

Business Overview:
Retail analytics is the process of delivering analytical data on inventory levels, supply chain movement, customer demand, sales, and other important factors for marketing and procurement choices. Demand and supply data analytics may be utilized to manage procurement levels as well as make marketing decisions. Retail analytics provides us with precise consumer insights and insights into the organization's business and procedures, as well as the scope and need for development.
Companies may use retail analytics to strengthen their marketing strategies by better grasping individual preferences and gaining more detailed data.They may design strategies that focus on people and have a success rate by combining demographic data.

Here, we will be using Walmart store sales data to perform analysis and answer the following questions:
Which store has a minimum and maximum sales?
Which store has a maximum standard deviation?
Which store/s has an excellent quarterly growth rate in Q3'2012?
Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together.

Approach
● Containers for all the services are spun up using Docker.
● Setup for MySQL is performed for Table creation using the dataset.
● Data is imported using Sqoop into Hive.
● Data transformation is performed for analysis and reporting.

Key Takeaways
● Understanding the project and how to use AWS EC2 Instance
● Introduction to Docker
● Visualizing the complete Architecture of the system
● Usage of docker-composer and starting all tools
● Understanding HFDS and various file formats
● Understanding the use of different HDFS command
● Understanding Sqoop Jobs and valuable tools
● Introduction to Hive architecture
● Understanding Hive Joins and Views
● Performing various transformation tasks in Hive
● Setting up MySQL for table creation
● Migrating from RDBMS to Hive warehouse

Tech Stack
➔Language: SQL, Bash
➔Services: AWS EC2, Docker, MySQL, Sqoop, Hive, HDFS

See project
Data Processing and Transformation in Hive using Azure VM

Nov 2022 - Nov 2022

Business Overview
Big Data is a collection of massive quantities of semi-structured and unstructured data created by a heterogeneous group of high-performance devices spanning from social networks to scientific computing applications. Companies have the ability to collect massive amounts of data, and they must ensure that the data is in highly usable condition by the time it reaches data scientists and analysts. The profession of data engineering involves designing and constructing systems…

Business Overview
Big Data is a collection of massive quantities of semi-structured and unstructured data created by a heterogeneous group of high-performance devices spanning from social networks to scientific computing applications. Companies have the ability to collect massive amounts of data, and they must ensure that the data is in highly usable condition by the time it reaches data scientists and analysts. The profession of data engineering involves designing and constructing systems for acquiring, storing, and analyzing vast volumes of data. It is a broad field with applications in nearly every industry.

Apache Hadoop is a Big Data solution that allows for the distributed processing of enormous data volumes across computer clusters by employing basic programming techniques. It is meant to scale from a single server to thousands of computers, each of which will provide local computation and storage.

Apache Hive is a fault-tolerant distributed data warehouse system that allows large-scale analytics. Hive allows users to access, write, and manage petabytes of data using SQL. It is built on Apache Hadoop, and as a result, it is tightly integrated with Hadoop and is designed to manage petabytes of data quickly. Hive is distinguished by its ability to query enormous datasets utilizing a SQL-like interface and an Apache Tez, MapReduce, or Spark engine.

Dataset Description
In this project, we will use the Airlines dataset to demonstrate the issues related to massive amounts of data and how various Hive components can be used to tackle them. Following are the files used in this project, along with a few of their fields :

airlines.csv - IATA_code, airport_name, city, state, country
carrier.csv - code, description
plane-data.csv - tail_number, type, manufacturer, model, engine_type
Flights data (yearly) - flight_num, departure, arrival, origin, destination, distance

Tech Stack
➔ Language: HQL
➔ Services: Azure VM, Hive, Hadoop

See project
Learn Data Processing with Spark SQL using Scala on AWS

Nov 2022 - Nov 2022

Agenda:
Apache Spark is an open-source distributed processing solution for substantial data workloads. It combines in-memory caching and rapid query execution for quick analytic queries on any amount of data. It includes development APIs in Java, Scala, Python, and R. It allows code reuse across various workloads, including batch processing, interactive queries, real-time analytics, machine learning, and graph processing.
Scala is a multi-paradigm, general-purpose, high-level programming…

Agenda:
Apache Spark is an open-source distributed processing solution for substantial data workloads. It combines in-memory caching and rapid query execution for quick analytic queries on any amount of data. It includes development APIs in Java, Scala, Python, and R. It allows code reuse across various workloads, including batch processing, interactive queries, real-time analytics, machine learning, and graph processing.
Scala is a multi-paradigm, general-purpose, high-level programming language. It's an object-oriented programming language that also supports functional programming. Scala applications can be converted to byte-codes and run on the Java Virtual Machine (JVM). Scala is a scalable programming language, and JavaScript run-times are also available. This project presents the fundamentals of Scala in an easy-to-understand manner.

Aim:
This project involves in understanding basics of scala RDD, performing transformations and actions, also involves analyzing the Movies dataset using RDD and Spark SQL.

Data Description:
In the project, we will use Movies and Rating datasets. The Movies dataset contains movie id, title, release date, etc. The Rating dataset contains customer id, movie id, ratings, and timestamp information.

Approach:
● Create an AWS EC2 instance and launch it.
● Create docker images using docker-compose file on EC2 machine via ssh.
● Load data from local machine into Spark container via EC2 machine.
● Perform analysis on Movie and Ratings data.

Project Takeaways:
● Understanding various services provided by AWS
● Creating an AWS EC2 instance and launching it
● Connecting to an AWS EC2 instance via SSH
● Dockerization.
● Copying a file from a local machine to an EC2 machine
● Understanding fundamentals of Scala.
● Creating RDDs
● Applying Transformation operations on RDDs
● Difference between RDDs and Dataframes
● Perform analysis using RDDs

Tech Stack
➔ Language: SQL, Scala
➔ Services: AWS EC2, Docker, Hive, HDFS, Spark

See project
Streaming Data Pipeline using Spark, HBase and Phoenix

Aug 2022 - Aug 2022

Business objective:
Sensor networks clearly focus on a wide range of practice-oriented research as individual sensors and related networks for structural monitoring and highly efficient monitoring. The sensor network is designed to respond to emergencies in real-time while observing the radiation field. An intelligent system-based sensor network is presented and used to monitor the condition of remote wells. The network consists of three sensors used to collect oil field data, including…

Business objective:
Sensor networks clearly focus on a wide range of practice-oriented research as individual sensors and related networks for structural monitoring and highly efficient monitoring. The sensor network is designed to respond to emergencies in real-time while observing the radiation field. An intelligent system-based sensor network is presented and used to monitor the condition of remote wells. The network consists of three sensors used to collect oil field data, including temperature, pressure, and gas. Intelligent microprocessor sensors are generally designed for oil well data processing, critical failure alarms or signals, traditional data storage or signals, and data/status connections.

Aim:
To build an application that monitors oil wells. Sensors in oil rigs generate streaming data processed by Spark and stored in HBase for use by various analytical and reporting tools.

Approach
Create an AWS EC2 instance and launch it.
Create docker images using docker-compose file on EC2 machine via ssh.
Download the dataset and load it into HDFS storage.
Read data from HDFS storage and write into HBase table using Spark.
Create Phoenix view on top of HBase table to analyze data using SQL queries.

Tech Stack
● AWS EC2
● Docker
● Scala
● HBase
● Apache Spark SQL
● Spark Structured Streaming
● HDFS
● Apache Phoenix
● SBT

Project Takeaways:
● Understanding various services provided by AWS
● Creating an AWS EC2 instance and launching it
● Connecting to an AWS EC2 instance via SSH
● Copying a file from a local machine to an EC2 machine
● Dockerization
● Download the dataset and load it into HDFS
● Difference between RDBMS and HBase
● SBT packaging
● Read data from HDFS and write into HBase tables
● Understanding of Apache Phoenix
● Create a Phoenix view on top of the HBase table

See project
Hive Mini Project to Build a Data Warehouse for e-Commerce

Jul 2022 - Jul 2022

Agenda:
For hive project Using SQL is still highly popular, and it will be for the foreseeable future. Most big data technologies have been modified to allow users to interact with them using SQL.

This big data project will look at Hive's capabilities to run analytical queries on massive datasets. We will use the Adventure works dataset in a MySQL dataset for this project, and we'll need to ingest and modify the data. We'll use Adventure works sales and Customer demographics data to…

Agenda:
For hive project Using SQL is still highly popular, and it will be for the foreseeable future. Most big data technologies have been modified to allow users to interact with them using SQL.

This big data project will look at Hive's capabilities to run analytical queries on massive datasets. We will use the Adventure works dataset in a MySQL dataset for this project, and we'll need to ingest and modify the data. We'll use Adventure works sales and Customer demographics data to perform analysis.

Approach
● Create an AWS EC2 instance and launch it.
● Create docker images using docker-compose file on EC2 machine via ssh.
● Create tables in MySQL.
● Load data from MySQL into HDFS storage using Sqoop commands.
● Move data from HDFS to Hive.
● Integrate Hive into Spark.
● Using Scala, extract Customer demographics information from data and store it as parquet files.
● Move parquet files from Spark to Hive.
● Create tables in Hive and load data from Parquet files into tables.
● Perform Hive analytics on Sales and Customer demographics data.

Project Takeaways
● Understanding various services provided by AWS
● Creating an AWS EC2 instance
● Connecting to an AWS EC2 instance via SSH
● Introduction to Docker
● Visualizing the complete Architecture of the system
● Usage of docker-composer and starting all tools
● Copying a file from a local machine to an EC2 machine
● Understanding the schema of the dataset
● Data ingestion/transformation using Sqoop, Spark, and Hive
● Moving the data from MySQL to HDFS
● Creating Hive table and troubleshooting it
● Using Parquet and Xpath to access schema
● Understanding the use of GROUP BY, GROUPING SETS, ROLL-UP, CUBE clauses
● Understanding different analytic functions in Hive

Tech Stack
Language: SQL, Scala
Services: AWS EC2, Docker, MySQL, Sqoop, Hive, HDFS, Spark

See project
SQL Project for Data Analysis using Oracle Database-Part 7

Jul 2022 - Jul 2022

Agenda of the project:
This is the seventh project in the SQL project series. This project involves the understanding of the Online Shopping Database, and using this database to perform the following Data Wrangling activities

1. Split full name into the first name and last name
2. Correct phone numbers and emails which are not in a proper format
3. Correct contact number and remove full name
4. Read BLOB column and fetch attribute details from the regular tag
5. Read BLOB…

Agenda of the project:
This is the seventh project in the SQL project series. This project involves the understanding of the Online Shopping Database, and using this database to perform the following Data Wrangling activities

1. Split full name into the first name and last name
2. Correct phone numbers and emails which are not in a proper format
3. Correct contact number and remove full name
4. Read BLOB column and fetch attribute details from the regular tag
5. Read BLOB column and fetch attribute details from nested columns
6. Read BLOB column and fetch attribute details from nested columns
7. Create separate tables for blob attributes
8. Remove invalid records from order_items where shipment_id is not mapped
9. Map missing first name and last name with email id credentials

Key Takeaways:
● Understanding the project and how to use Oracle SQL Developer.
● Understanding the basics of data analysis, SQL commands, and their application.
● Understanding the use of Oracle SQL Developer.
● Understanding the concept of Data Wrangling.
● Understanding the Online Shopping database.
● Perform Data Wrangling activities on the data.

Tech stack:
● SQL Programming language
● Oracle SQL Developer

See project
Analyse Yelp Dataset with Spark & Parquet Format on Azure Databricks

Jun 2022 - Jun 2022

Business Overview
The project involves Analyzing Yelp Dataset with Spark & Parquet Format on Azure Databricks. We download the Yelp dataset from the Yelp website and understand the problem.Then a solution is designed which defines ingestion of data, preparation of data and publishing it on Databricks. Then subscription is set up for using Microsoft Azure and categorisation of resources are done into a resource group. A storage account is set up to store all the data required for Analyzing…

Business Overview
The project involves Analyzing Yelp Dataset with Spark & Parquet Format on Azure Databricks. We download the Yelp dataset from the Yelp website and understand the problem.Then a solution is designed which defines ingestion of data, preparation of data and publishing it on Databricks. Then subscription is set up for using Microsoft Azure and categorisation of resources are done into a resource group. A storage account is set up to store all the data required for Analyzing Yelp Dataset with spark & Parquet format on Azure Databricks. Creation of containers in a storage account and uploading of the Yelp dataset in it. Creating an Azure data factory, a copy data pipeline and starting link storage for standard storage account in Azure datafactory. Copying data from Azure storage to Azure data lake storage using a copy data pipeline in Azure data factory followed by the conversion of Yelp dataset from JSON to Parquet file format and JSON to Delta format. Then performing partitioning, repartitioning and coalesce on the dataset in Databricks. Performing data analysis on the repartitioned dataset and finally deducing the recommendations

Approach
Read yelp datasets in ADLS and convert JSON to parquet for better performance.
Convert JSON to Delta Format.
Total records in each dataset.
Partition tip dataset tip by a date column.
repartition() vs coalesce()
Find the top 3 users based on their total number of reviews.
Find the top 10 users with the most fans
Analyse the top 10 categories by a number of reviews.
Analyse top businesses which have over 1000 reviews.
Analyse Business Data: Number of restaurants per state.
Analyze the top 3 restaurants in each state.
List the top restaurants in a state by the number of reviews.
Numbers of restaurants in Arizona state per city.
Broadcast Join: restaurants as per review ratings in Pheonix city.
Most rated Italian restaurant in Pheonix.

Tech Stack
Language: Python3
Services: Azure Data factory, Azure Databricks, ADLS

See project
Build an Azure Recommendation Engine on Movielens Dataset

Jun 2022 - Jun 2022

Agenda of the project?
The Project involves deriving Movie Recommendations using Python and Spark on Microsoft Azure. We understand the problem and download the Movielens dataset from the grouplens website. Then a subscription is set up for Azure, and created a resource group. A storage account is a setup to store all the data required for serving movie recommendations using Python and Spark on Azure, followed by creating a storage blob account in the same resource group. Firstly, we make…

Agenda of the project?
The Project involves deriving Movie Recommendations using Python and Spark on Microsoft Azure. We understand the problem and download the Movielens dataset from the grouplens website. Then a subscription is set up for Azure, and created a resource group. A storage account is a setup to store all the data required for serving movie recommendations using Python and Spark on Azure, followed by creating a storage blob account in the same resource group. Firstly, we make containers in a storage account and standard storage blob account and upload the movielens zip file dataset in its standard storage blob account. Then we create an Azure data factory, a copy data pipeline, and start link storage for standard blob storage account in the Azure data factory. We are copying data from Azure blob storage to Azure data lake storage using a copy data pipeline in the Azure data factory. It is followed by creating the databricks workspace, cluster on databricks, and accessing Azure data lake storage from databricks. We are creating mount points and extracting the zip file to get CSV files. Finally, we upload files into databricks, read the datasets into Spark dataframes in databricks, and analyze the dataset to get the movie recommendations.

Data Analysis:
Data is downloaded from grouplens website containing
Resource manager is created in Azure & Storage account.
Pipeline is created to copy the data from Azure blob storage to Azure data lake storage in the Azure data factory.
The Databricks workspace created and accessed Azure data lake storage from databricks followed by the creation of Mount pairs.
extracted the Movielens data zip file to get the CSV files out of it using the Databricks file system(DFS) and ADF.
In Transformation & load process, the dataset in Spark is read into Spark dataframes. Data tags are read into Spark in Databricks.
Finally, data is analyzed into Spark in Databricks using mount points, and data is visualized using bar charts

See project
End-to-End Big Data Project to Learn PySpark SQL Functions

Jun 2022 - Jun 2022

Business Overview
Apache Spark is a distributed processing engine that is open source and used for large data applications. It uses in-memory caching and efficient query execution for quick analytic queries against any quantity of data. It offers code reuse across many workloads such as batch processing, interactive queries, real-time analytics, machine learning, and graph processing. It provides development APIs in Java, Scala, Python, and R.

Agenda
This is the fifth project in…

Business Overview
Apache Spark is a distributed processing engine that is open source and used for large data applications. It uses in-memory caching and efficient query execution for quick analytic queries against any quantity of data. It offers code reuse across many workloads such as batch processing, interactive queries, real-time analytics, machine learning, and graph processing. It provides development APIs in Java, Scala, Python, and R.

Agenda
This is the fifth project in the Pyspark series. The fourth project involves advanced functionalities of Dataframes with the help of business case study, also the use of Spark submit command. This project mainly focuses on PySpark SQL, SQL function, and various joins available in PySpark SQL with the help of business case study.

Key Takeaways:
● Understanding the project overview
● Introduction to PySpark
● Introduction to SQL
● Features and benefits of SQL
● Understanding Spark SQL
● Understanding the business case study
● Understanding business requirements
● Converting PySpark Dataframes into SQL tables
● Different types of Joins in PySpark SQL
● Implementing PySpark SQL code

Tech stack:
➔Language: Python
➔Package: Pyspark

See project
GCP Project to Explore Cloud Functions using Python Part 1

Jun 2022 - Jun 2022

Business Overview:
Google Cloud is a collection of physical assets, such as computers and hard disk drives, and virtual resources, such as virtual machines (VMs), housed in Google data centers worldwide. This resource distribution has various advantages, including redundancy in a failure and decreased latency by putting resources closer to customers. This release also presents some guidelines for combining resources.

GCP offers a web-based graphical user interface for managing Google…

Business Overview:
Google Cloud is a collection of physical assets, such as computers and hard disk drives, and virtual resources, such as virtual machines (VMs), housed in Google data centers worldwide. This resource distribution has various advantages, including redundancy in a failure and decreased latency by putting resources closer to customers. This release also presents some guidelines for combining resources.

GCP offers a web-based graphical user interface for managing Google Cloud projects and resources. If a user prefers to work at the command line, the gcloud command-line tool can handle most Google Cloud activities.

We will explore the following services of GCP:
Cloud Storage
Compute Engine
PubSub

Key Takeaways
● Introduction to the Google Cloud Console
● Understanding Cloud Storage concepts and classes
● Creating a Service Account
● Setting up Gcloud SDK
● Installing Python and other dependencies
● Understanding retention policies and holds
● Setting up GCP Virtual Machine and SSH configuration
● Understanding Pub/Sub Architecture
● Creating a Pub/Sub Topic and implementing message flow
● Implementing Pub/Sub notification using GCS

Tech Stack:
Language: Python3
Services: Cloud Storage, Compute Engine, Pub/Sub
Hadoop Project-Analysis of Yelp Dataset using Hadoop Hive

Jun 2022 - Jun 2022

Business Overview
Big Data is the collection of huge datasets of semi-structured and unstructured data, generated by the high-performance heterogeneous group of devices ranging from social networks to scientific computing applications. Companies have the potential to gather large volumes of data, and they must guarantee that the data is in a highly useable shape by the time it reaches data scientists and analysts. Data engineering is the profession of creating and constructing systems for…

Business Overview
Big Data is the collection of huge datasets of semi-structured and unstructured data, generated by the high-performance heterogeneous group of devices ranging from social networks to scientific computing applications. Companies have the potential to gather large volumes of data, and they must guarantee that the data is in a highly useable shape by the time it reaches data scientists and analysts. Data engineering is the profession of creating and constructing systems for gathering, storing, and analyzing
large amounts of data. It is a vast field with applications in almost every sector. Apache Hadoop is a Big Data technology that enables the distributed processing of massive data volumes across computer clusters using simple programming concepts. It is intended to grow from a single server to thousands of computers, each supplying local computing and storage. Yelp is a community review site and an American multinational firm based in San Francisco, California. It publishes crowd-sourced reviews of local businesses as well as the online reservation service Yelp Reservations. Yelp has made a portion of their data available in order to launch a new activity called the Yelp Dataset Challenge, which allows anyone to do research or analysis to find what insights are buried in their data. Due to the bulk of the data, this project only selects a subset of Yelp data. User and Review dataset is considered for this session.

Key Takeaways
Understanding Project overview
Introduction to Big Data
Overview of Hadoop ecosystem
Understanding Hive concepts
Understanding the dataset
Implementing Hive table operations
Creating static and dynamic Partitioning
Creating Hive Buckets
Understanding different file formats in Hive
Using Complex Hive Functions in Hive
Launching EMR cluster in AWS

Tech Stack
Language: HQL
Services: AWS EMR, Hive, HDFS, AWS S3

See project
PySpark Project to Learn Advanced DataFrame Concepts

Jun 2022 - Jun 2022

Business Overview
Apache Spark is a distributed processing engine that is open source and used for large data applications. It uses in-memory caching and efficient query execution for quick analytic queries against any quantity of data. It offers code reuse across many workloads such as batch processing, interactive queries, real-time analytics, machine learning, and graph processing. It provides development APIs in Java, Scala, Python, and R.

Agenda:
This is the fourth project in…

Business Overview
Apache Spark is a distributed processing engine that is open source and used for large data applications. It uses in-memory caching and efficient query execution for quick analytic queries against any quantity of data. It offers code reuse across many workloads such as batch processing, interactive queries, real-time analytics, machine learning, and graph processing. It provides development APIs in Java, Scala, Python, and R.

Agenda:
This is the fourth project in the Pyspark series. The third project involves in-depth knowledge of Dataframes, different types of Dataframe operations, also implementation of transformation and action functions on spark dataframe. This project involves advanced functionalities of Dataframes with the help of a business case study, and the use of the Spark submit command.

Key Takeaways:
● Understanding the project overview
● Introduction to PySpark
● Understanding Spark Architecture and Lifecycle
● Introduction to Spark Operations
● Understanding the business case study
● Understanding business requirements
● Understanding Resilient Distributed Data (RDD)
● Difference between Transformation and Action
● Methods of creating Dataframes in pyspark
● Implementation of spark submit command

Tech stack:
➔Language: Python
➔Package: Pyspark

See project
Snowflake Azure Project to build real-time Twitter feed dashboard

Jun 2022 - Jun 2022

Business Overview
Companies miss opportunities and are exposed to risk as a result of delays in company operations and decision-making. Organizations can move rapidly based on real-time data since it reveals issues and opportunities. Data that is gathered, processed, and evaluated in real-time is referred to as real-time data, and it comprises data that is ready to utilize as soon as it is created. A snapshot of historical data is what near real-time data is. When speed is critical, near…

Business Overview
Companies miss opportunities and are exposed to risk as a result of delays in company operations and decision-making. Organizations can move rapidly based on real-time data since it reveals issues and opportunities. Data that is gathered, processed, and evaluated in real-time is referred to as real-time data, and it comprises data that is ready to utilize as soon as it is created. A snapshot of historical data is what near real-time data is. When speed is critical, near real-time processing is preferred, although
processing time in minutes rather than seconds is acceptable. Batch data that has been previously stored is considerably slower, and by the time it is ready to use, it might be days old.

Dataset Description
We will use the Twitter API and fetch tweets and their metadata(re-tweets, comments,likes) using Python.

Approach
● We write API calls to fetch Twitter insights in real-time via Python and this code can be run in a local machine every day once
● We create a Snowpipe component in Snowflakes by using Azure IAM integration(cross-account access) as Snowflakes hosted on the Azure account is different from the Azure account we own. This in turn uses Azure EventGrid and Function App in the backend to automate the file load
● As soon as the script generates files in Azure Blob storage, Snowpipe recognizes the file arrival and loads the snowflakes table with file data automatically
● We create a dashboard in Snowflakes that is scheduled to refresh every 30 mins to show actual feed data from Twitter, Eg. No of likes and comments/feed to understand popular feed and their sentiment.

Tech Stack
Language: Python
Services: Azure Storage Account, Azure Queue, Snowpipe, Snowflake, Azure Resource Group

See project
Hands-On Real Time PySpark Project for Beginners

May 2022 - May 2022

Business Overview
Apache Spark is a distributed processing engine that is open source and used for large data applications. For quick analytic queries against any quantity of data, it uses in-memory caching and efficient query execution. It offers code reuse across many workloads such as batch processing, interactive queries, real-time analytics, machine learning, and graph processing. It provides development APIs in Java, Scala, Python, and R.

Data Pipeline:
A data pipeline is a…

Business Overview
Apache Spark is a distributed processing engine that is open source and used for large data applications. For quick analytic queries against any quantity of data, it uses in-memory caching and efficient query execution. It offers code reuse across many workloads such as batch processing, interactive queries, real-time analytics, machine learning, and graph processing. It provides development APIs in Java, Scala, Python, and R.

Data Pipeline:
A data pipeline is a technique for transferring data from one system to another. The data may or may not be updated, and it may be handled in real-time (or streaming) rather than in batches. The data pipeline encompasses everything from harvesting or acquiring data using various methods to storing raw data, cleaning, validating, and transforming data into a query-worthy format, displaying KPIs, and managing the above process.

PySpark:
PySpark is a Python interface for Apache Spark. It not only lets you develop Spark applications using Python APIs, but it also includes the PySpark shell for interactively examining data in a distributed context. PySpark supports most of Spark's capabilities, including Spark SQL, DataFrame, Streaming,MLlib, and Spark Core. In this project, you will learn about core Spark architecture, Spark Sessions, Transformation, Actions, and Optimization Techniques using PySpark.

Key Takeaways:
● Understanding the project overview
● Introduction to PySpark
● Understanding Spark Architecture and Lifecycle
● Introduction to Spark Operations
● Understanding the components of Spark Apache
● Understanding Resilient Distributed Data (RDD)
● Difference between Transformation and Action
● Understanding Interactive Spark Shell
● Understanding the concept of Directed Acyclic Graph(DAG)
● Features of Spark
● Applications of Spark

Tech stack:
➔Language: Python
➔Package: Pyspark

See project
PySpark Big Data Project to Learn RDD Operations

May 2022 - May 2022

Business Overview
Apache Spark is a distributed processing engine that is open source and used for large data applications. It uses in-memory caching and efficient query execution for quick analytic queries against any quantity of data. It offers code reuse across many workloads such as batch processing, interactive queries, real-time analytics, machine learning, and graph processing. It provides development APIs in Java, Scala, Python, and R.

Agenda:
This is the second project in…

Business Overview
Apache Spark is a distributed processing engine that is open source and used for large data applications. It uses in-memory caching and efficient query execution for quick analytic queries against any quantity of data. It offers code reuse across many workloads such as batch processing, interactive queries, real-time analytics, machine learning, and graph processing. It provides development APIs in Java, Scala, Python, and R.

Agenda:
This is the second project in the Pyspark series. The first project involves the Pyspark introduction, Spark component and architecture, and basic introduction about RDD and DAG. This project involves in-depth knowledge of RDD, different types of RDD operations, the difference between transformation and action, and the various functions available in transformation and action with their execution.

Key Takeaways:
● Understanding the project overview
● Introduction to PySpark
● Understanding Spark Architecture and Lifecycle
● Introduction to Spark Operations
● Understanding the components of Spark Apache
● Understanding Resilient Distributed Data (RDD)
● Difference between Transformation and Action
● Understanding Interactive Spark Shell
● Understanding the concept of Directed Acyclic Graph(DAG)
● Understanding different Transformation functions
● Execute different Transformation functions
● Understanding different Action functions
● Execute different Action functions

Tech stack:
➔Language: Python
➔Package: Pyspark

See project
PySpark Project for Beginners to Learn DataFrame Operations

May 2022 - May 2022

Business Overview
Apache Spark is a distributed processing engine that is open source and used for large data applications. It uses in-memory caching and efficient query execution for quick analytic queries against any quantity of data. It offers code reuse across many workloads such as batch processing, interactive queries, real-time analytics, machine learning, and graph processing. It provides development APIs in Java, Scala, Python, and R

Agenda:
This is the third project in…

Business Overview
Apache Spark is a distributed processing engine that is open source and used for large data applications. It uses in-memory caching and efficient query execution for quick analytic queries against any quantity of data. It offers code reuse across many workloads such as batch processing, interactive queries, real-time analytics, machine learning, and graph processing. It provides development APIs in Java, Scala, Python, and R

Agenda:
This is the third project in the Pyspark series. The second project involves in-depth knowledge of RDD, different types of RDD operations, the difference between transformation and action, and the various functions available in transformation and action with their execution. This project involves in-depth knowledge of Dataframes, different types of Dataframe operations, also implementation of transformation and action functions on spark dataframe.

Key Takeaways:
● Understanding the project overview
● Introduction to PySpark
● Understanding Spark Architecture and Lifecycle
● Introduction to Spark Operations
● Understanding the components of Spark Apache
● Understanding Resilient Distributed Data (RDD)
● Difference between Transformation and Action
● Understanding about datasets
● Understanding spark Dataframes
● Difference between Structured and Semi-structured data
● Methods of creating Dataframes in pyspark
● Understanding UDF in spark
● Implementation of Transformation and Action functions on spark Dataframe

Tech stack:
➔Language: Python
➔Package: Pyspark

See project
SQL Project for Data Analysis using Oracle Database-Part 4

May 2022 - May 2022

● Understanding the project and how to use Oracle SQL Developer
● Understanding the basics of data analysis, SQL commands, and their application
● Understanding the use of Oracle SQL Developer
● Understanding the difference between COUNT(*) and COUNT(column_name).
● Data analysis using WITH clause.
● Categorization using CASE statement.
● Understanding the inline view.
● Simplify query with WITH clause and View.
● Understanding the use of the ROWNUM clause.

Tech…

● Understanding the project and how to use Oracle SQL Developer
● Understanding the basics of data analysis, SQL commands, and their application
● Understanding the use of Oracle SQL Developer
● Understanding the difference between COUNT(*) and COUNT(column_name).
● Data analysis using WITH clause.
● Categorization using CASE statement.
● Understanding the inline view.
● Simplify query with WITH clause and View.
● Understanding the use of the ROWNUM clause.

Tech stack:
● SQL Programming language
● Oracle SQL Developer

See project
SQL Project for Data Analysis using Oracle Database-Part 5

May 2022 - May 2022

Key Takeaways:
● Understanding the project and how to use Oracle SQL Developer
● Understanding the basics of data analysis, SQL commands, and their application
● Understanding the use of Oracle SQL Developer
● Understanding the ROW_NUMBER function
● Data analysis using the RANK function
● Difference between RANK and DENSE_RANK functions
● Understanding the use of SUBSTR and INSTR functions
● Data analysis using the built-in functions
● Deal with NULL values using the…

Key Takeaways:
● Understanding the project and how to use Oracle SQL Developer
● Understanding the basics of data analysis, SQL commands, and their application
● Understanding the use of Oracle SQL Developer
● Understanding the ROW_NUMBER function
● Data analysis using the RANK function
● Difference between RANK and DENSE_RANK functions
● Understanding the use of SUBSTR and INSTR functions
● Data analysis using the built-in functions
● Deal with NULL values using the NVL function
● Understanding the use of COALESCE function
● Change the date format

Tech stack:
SQL Programming language
Oracle SQL Developer

See project
SQL Project for Data Analysis using Oracle Database-Part 6

May 2022 - May 2022

Key Takeaways:
● Understanding the project and how to use Oracle SQL Developer.
● Understanding the basics of data analysis, SQL commands, and their application.
● Understanding the use of Oracle SQL Developer.
● Understanding the concept of Data Wrangling.
● Remove unwanted features from data using SQL queries.
● Deal with missing data.
● How to remove missing data using SQL queries.
● How to impute missing data using SQL queries.
● Understanding Pivot and Unpivot…

Key Takeaways:
● Understanding the project and how to use Oracle SQL Developer.
● Understanding the basics of data analysis, SQL commands, and their application.
● Understanding the use of Oracle SQL Developer.
● Understanding the concept of Data Wrangling.
● Remove unwanted features from data using SQL queries.
● Deal with missing data.
● How to remove missing data using SQL queries.
● How to impute missing data using SQL queries.
● Understanding Pivot and Unpivot functions in SQL.
● Pivoting rows to columns using SQL queries.
● Pivoting rows to columns with joins using SQL queries

Tech stack:
● SQL Programming language
● Oracle SQL Developer

See project
SQL Project for Data Analysis using Oracle Database-Part 1

Apr 2022 - Apr 2022

Data Analysis:
● The Oracle Database 21c is downloaded from the Oracle website for SQL Data analysis.
● The SQL Developer is downloaded for working in Oracle databases, connecting it to the
“SYSTEM” username and creating tables in the database.
● Data is inserted into the tables, followed by the exploration of the tables, including a
walkthrough of columns and seeing comments.
● The listing of the Employees and Departments is done based on some conditions using the
SQL…

Data Analysis:
● The Oracle Database 21c is downloaded from the Oracle website for SQL Data analysis.
● The SQL Developer is downloaded for working in Oracle databases, connecting it to the
“SYSTEM” username and creating tables in the database.
● Data is inserted into the tables, followed by the exploration of the tables, including a
walkthrough of columns and seeing comments.
● The listing of the Employees and Departments is done based on some conditions using the
SQL commands followed by displaying the records in an ordered manner and handling of the
NULL values.
● The selection of the records is made based on some patterns like Wildcard, Operators, etc
followed by implementation of the Data Manipulation commands(DML) like Add, Update and
Delete for the Data Analysis.
● The backup of the table where migration is going on is taken, followed by COMMIT and
ROLLBACK commands. Then the listing of distinct records is done, and further renaming of
the columns.
● Finally, a listing of the employee details based on the complex nested conditions is done.

Tech stack:
● SQL Programming language
● Oracle SQL Developer

See project
SQL Project for Data Analysis using Oracle Database-Part 2

Apr 2022 - Apr 2022

This is the second project in the SQL project series. This project’s Agenda involves Analyzing the data using SQL on the Oracle Database Software. Understanding different types of Joins(Inner join, Left outer join, Right outer join, Full outer join, Self join), different types of Operators(Minus, Union, Union all, Intersect) and Resolve the column ambiguously defined error.

Tech stack:
● SQL Programming language
● Oracle SQL Developer

See project
SQL Project for Data Analysis using Oracle Database-Part 3

Apr 2022 - Apr 2022

This is the third project in the SQL project series; the second project involved analyzing the data using SQL on the Oracle Database Software. Understanding different types of Joins(Inner join, Left outer join, Right outer join, Full outer join, Self join), different types of Operators(Minus, Union, Union all, Intersect). This project involves the data analysis using Sub-query, Group-by clause and Exists clause. It also consists of using inline view and aggregate functions(Min, Max, Count, Avg)…

This is the third project in the SQL project series; the second project involved analyzing the data using SQL on the Oracle Database Software. Understanding different types of Joins(Inner join, Left outer join, Right outer join, Full outer join, Self join), different types of Operators(Minus, Union, Union all, Intersect). This project involves the data analysis using Sub-query, Group-by clause and Exists clause. It also consists of using inline view and aggregate functions(Min, Max, Count, Avg) to perform better analysis on data.

Tech stack:
SQL Programming language
Oracle SQL Developer

See project

Recommendations received

6 people have recommended Ajay

Join now to view

More activity by Ajay

Four years ago, I transitioned from Data Analyst to Data Engineer has been challenging, but it was absolutely worth it. From mastering ETL…

Four years ago, I transitioned from Data Analyst to Data Engineer has been challenging, but it was absolutely worth it. From mastering ETL…

Liked by Ajay Kadiyala
𝐌𝐲 𝐭𝐨𝐩 𝟓 𝐝𝐚𝐭𝐚 𝐞𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 𝐛𝐨𝐨𝐤𝐬 that every data engineer should read. 𝐖𝐡𝐚𝐭 𝐚𝐫𝐞 𝐲𝐨𝐮𝐫𝐬? 📚 📚 1. Designing…

𝐌𝐲 𝐭𝐨𝐩 𝟓 𝐝𝐚𝐭𝐚 𝐞𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 𝐛𝐨𝐨𝐤𝐬 that every data engineer should read. 𝐖𝐡𝐚𝐭 𝐚𝐫𝐞 𝐲𝐨𝐮𝐫𝐬? 📚 📚 1. Designing…

Liked by Ajay Kadiyala
SQL is here to stay. Invest time in learning SQL, your future self will thank you. And don't just learn SQL, try to master it. By understanding…

SQL is here to stay. Invest time in learning SQL, your future self will thank you. And don't just learn SQL, try to master it. By understanding…

Liked by Ajay Kadiyala
i am looking for one data engineer(part time) for below skills... 5+ years in data engineering. 4+ years in python. 4+ years in data modelling, data…

i am looking for one data engineer(part time) for below skills... 5+ years in data engineering. 4+ years in python. 4+ years in data modelling, data…

Liked by Ajay Kadiyala
End to End Big Data Pipeline - From Ingestion to Reporting . . So, How a Big Data pipeline looks like? Collect -> Ingest -> Store -> Process ->…

End to End Big Data Pipeline - From Ingestion to Reporting . . So, How a Big Data pipeline looks like? Collect -> Ingest -> Store -> Process ->…

Liked by Ajay Kadiyala
i am looking for one data engineer(part time) for below skills... 5+ years in data engineering. 4+ years in python. 4+ years in data modelling, data…

i am looking for one data engineer(part time) for below skills... 5+ years in data engineering. 4+ years in python. 4+ years in data modelling, data…

Posted by Ajay Kadiyala
My employer Globant is hiring for Azure Data Engineers !!! Role : Data Engineer Tech Stack : Spark/PySpark, Databricks, ADF, ADLS, Synapse and Azure…

My employer Globant is hiring for Azure Data Engineers !!! Role : Data Engineer Tech Stack : Spark/PySpark, Databricks, ADF, ADLS, Synapse and Azure…

Liked by Ajay Kadiyala
9 SQL repositories on GitHub to improve your SQL skills and your portfolio🎯........ 1. Sequelize Sequelize is a promise-based Postgres, MySQL…

9 SQL repositories on GitHub to improve your SQL skills and your portfolio🎯........ 1. Sequelize Sequelize is a promise-based Postgres, MySQL…

Liked by Ajay Kadiyala
If you are interested in learning about Software Architectural Patterns, this post is for you. Give it a read. 👇 In this post, I am going to talk…

If you are interested in learning about Software Architectural Patterns, this post is for you. Give it a read. 👇 In this post, I am going to talk…

Liked by Ajay Kadiyala
Navya Nanda might be the granddaughter of Amitabh Bachchan She might be inheriting a business worth 7000+ crore She might have 1 million+ followers…

Navya Nanda might be the granddaughter of Amitabh Bachchan She might be inheriting a business worth 7000+ crore She might have 1 million+ followers…

Liked by Ajay Kadiyala
Data Analysis is not just SQL. Data Analysis is not just PowerBI/Tableau. Data Analysis is not just Python. Data Analysis is not just…

Data Analysis is not just SQL. Data Analysis is not just PowerBI/Tableau. Data Analysis is not just Python. Data Analysis is not just…

Liked by Ajay Kadiyala
Thoughtworks hiring for the below roles, dm your resume if interested. 1. Senior Consultant - Developer 2. Lead Consultant:Infrastructure Consultant…

Thoughtworks hiring for the below roles, dm your resume if interested. 1. Senior Consultant - Developer 2. Lead Consultant:Infrastructure Consultant…

Liked by Ajay Kadiyala

View Ajay’s full profile

See who you know in common
Get introduced
Contact Ajay directly

Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Ajay Kadiyala in India

12 others named Ajay Kadiyala in India are on LinkedIn

See others named Ajay Kadiyala

Add new skills with these courses

See all courses

Ajay Kadiyala

Bengaluru, Karnataka, India 108K followers 500+ connections

See your mutual connections View mutual connections with Ajay Sign in Welcome back Email or phone Password Show Forgot password? Sign in or New to LinkedIn? Join now or New to LinkedIn? Join now

About

Contributions

Activity

Want to be a data engineer? Here are the 5 must-have skills: [1] SQL: • Learn to write complex queries to extract, transform, and load…

Liked by Ajay Kadiyala

India to build tunnels worth 1 lakh crore in the next few years! Ministry of roads transport and highways, Nitin Gadkari is set to revolutionize…

Liked by Ajay Kadiyala

𝗧𝘂𝗻𝗲 𝗢𝘂𝘁 𝗡𝗼𝗶𝘀𝗲 𝗪𝗶𝘁𝗵 𝗖𝗵𝗮𝗶𝗻-𝗼𝗳-𝗧𝗵𝗼𝘂𝗴𝗵𝘁 𝗣𝗿𝗼𝗺𝗽𝘁𝗶𝗻𝗴 🍊 Chain-of-Thought (CoT) Prompting is a strategy that…

Liked by Ajay Kadiyala

Experience

-

-

-

-

Education

Licenses & Certifications

MySQL

Introduction to Programming Using Python

Introduction to Programming Using Python

ccna routing&switching

Projects

Jan 2023 - Jan 2023

Jan 2023 - Jan 2023

Nov 2022 - Nov 2022

Nov 2022 - Nov 2022

Aug 2022 - Aug 2022

Jul 2022 - Jul 2022

Jul 2022 - Jul 2022

Jun 2022 - Jun 2022

Jun 2022 - Jun 2022

Jun 2022 - Jun 2022

GCP Project to Explore Cloud Functions using Python Part 1

Jun 2022 - Jun 2022

Jun 2022 - Jun 2022

Jun 2022 - Jun 2022

Jun 2022 - Jun 2022

May 2022 - May 2022

May 2022 - May 2022

May 2022 - May 2022

May 2022 - May 2022

May 2022 - May 2022

May 2022 - May 2022

Apr 2022 - Apr 2022

Apr 2022 - Apr 2022

Apr 2022 - Apr 2022

Recommendations received

Sirish Peddinti

Jyothirmayi Madicherla

More activity by Ajay

Four years ago, I transitioned from Data Analyst to Data Engineer has been challenging, but it was absolutely worth it. From mastering ETL…

Liked by Ajay Kadiyala

𝐌𝐲 𝐭𝐨𝐩 𝟓 𝐝𝐚𝐭𝐚 𝐞𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 𝐛𝐨𝐨𝐤𝐬 that every data engineer should read. 𝐖𝐡𝐚𝐭 𝐚𝐫𝐞 𝐲𝐨𝐮𝐫𝐬? 📚 📚 1. Designing…

Liked by Ajay Kadiyala

SQL is here to stay. Invest time in learning SQL, your future self will thank you. And don't just learn SQL, try to master it. By understanding…

Liked by Ajay Kadiyala

i am looking for one data engineer(part time) for below skills... 5+ years in data engineering. 4+ years in python. 4+ years in data modelling, data…

Liked by Ajay Kadiyala

End to End Big Data Pipeline - From Ingestion to Reporting . . So, How a Big Data pipeline looks like? Collect -> Ingest -> Store -> Process ->…

Liked by Ajay Kadiyala

i am looking for one data engineer(part time) for below skills... 5+ years in data engineering. 4+ years in python. 4+ years in data modelling, data…

Posted by Ajay Kadiyala

My employer Globant is hiring for Azure Data Engineers !!! Role : Data Engineer Tech Stack : Spark/PySpark, Databricks, ADF, ADLS, Synapse and Azure…

Liked by Ajay Kadiyala

Bengaluru, Karnataka, India

108K followers 500+ connections

View mutual connections with Ajay

Welcome back

Email or phone

Password

Forgot password?

or

New to LinkedIn? Join now

or

New to LinkedIn? Join now