Mohit Motwani

Mohit Motwani

Data Engineer @ CoinDCX | xCiti | NIT Trichy | Python, Spark, SQL, Azure, Databricks, Big Data, Data Lake, ETL and SAS

Bengaluru, Karnataka, India
6K followers 500+ connections

About

Hello, I'm Mohit - a seasoned, goal-driven, and detail-oriented Data Engineer working at Citi skilled in Python, Pyspark, SQL, Databricks, Airflow, Azure, Azure Data Factory, Azure Synapse, Excel, SAS, and more.

I have a Master's degree in Computer Science from the National Institute of Technology, Trichy. In my free time, I enjoy coaching and mentoring.

In my current role, I am working as a Data Engineer in a dynamic team within a migration unit, specializing in Python, SQL, Apache Spark, Spark SQL and Spark optimizations and performance tuning. Our focus lies in seamlessly transitioning large-scale data systems for optimum performance and efficiency. Collaborating closely with stakeholders, we meticulously gather requirements, assess data complexities, and execute migration strategies that ensure minimal disruption to operations.

Throughout my tenure, I've championed innovation by introducing novel approaches to data processing and analysis. By automating cumbersome ETL processes and implementing cutting-edge techniques, I've contributed to significant time and cost savings, reducing turnaround time from 6 days to 3 days and culminating in efficiency gains equivalent to over 1.5 full-time equivalents (FTE).

In my toolkit, I bring expertise in Data Analysis, Data Modeling, Distributed Computing, Data Forecasting, Data Warehousing, Data Cleansing, ETL and ELT processes optimization, Transformation, and Automation.

Let's connect and explore opportunities together.
You can reach me via LinkedIn DM
Email: [email protected] 📩
Phone - +91-9509038128

Contributions

Experience

  • CoinDCX Graphic

    Data Engineer

    CoinDCX

    - Present 1 month

    Bengaluru, Karnataka, India

  • Citi

    Citi

    2 years 3 months

    • Citi Graphic

      Big Data Engineer

      Citi

      - 1 year 3 months

      Chennai, Tamil Nadu, India

      • Responsible for building scalable distributed data solutions for data migration using Spark.

      • Developed Spark Jobs to ingest data into HDFS data lakes.

      • Worked with various file formats like CSV, JSON, ORC, AVRO and Parquet.

      • Continuous monitoring and managing of the Hadoop cluster through Cloudera Manager.

      • Collaborated with cross-functional teams to 𝐢𝐦𝐩𝐥𝐞𝐦𝐞𝐧𝐭 𝐯𝐚𝐥𝐮𝐞 𝐬𝐭𝐫𝐞𝐚𝐦 𝐦𝐚𝐩𝐩𝐢𝐧𝐠 (𝐕𝐒𝐌) 𝐬𝐭𝐫𝐚𝐭𝐞𝐠𝐢𝐞𝐬, successfully achieving…

      • Responsible for building scalable distributed data solutions for data migration using Spark.

      • Developed Spark Jobs to ingest data into HDFS data lakes.

      • Worked with various file formats like CSV, JSON, ORC, AVRO and Parquet.

      • Continuous monitoring and managing of the Hadoop cluster through Cloudera Manager.

      • Collaborated with cross-functional teams to 𝐢𝐦𝐩𝐥𝐞𝐦𝐞𝐧𝐭 𝐯𝐚𝐥𝐮𝐞 𝐬𝐭𝐫𝐞𝐚𝐦 𝐦𝐚𝐩𝐩𝐢𝐧𝐠 (𝐕𝐒𝐌) 𝐬𝐭𝐫𝐚𝐭𝐞𝐠𝐢𝐞𝐬, successfully achieving process improvement and waste elimination goals.

      • Contributed to the 𝐚𝐮𝐭𝐨𝐦𝐚𝐭𝐢𝐨𝐧 𝐨𝐟 𝐝𝐚𝐭𝐚 𝐩𝐫𝐨𝐟𝐢𝐥𝐢𝐧𝐠 𝐚𝐧𝐝 𝐪𝐮𝐚𝐥𝐢𝐭𝐲 𝐜𝐡𝐞𝐜𝐤𝐬, resulting in reduced waiting times, and fostering cross-functional collaboration for an overall streamlined workflow.


      • Tools: Python, Pyspark, Spark SQL, SQL Server, Teradata, Cloudera, Hadoop, HDFS, Unravel Data platform, Git & Github, Impala, SAS

    • Citi Graphic

      Data Analyst

      Citi

      - 1 year 2 months

      Chennai, Tamil Nadu, India

      • 𝐑𝐞𝐚𝐥𝐢𝐳𝐞𝐝 𝐚 𝟒𝟎% 𝐫𝐞𝐝𝐮𝐜𝐭𝐢𝐨𝐧 𝐢𝐧 𝐚𝐯𝐞𝐫𝐚𝐠𝐞 𝐓𝐮𝐫𝐧𝐚𝐫𝐨𝐮𝐧𝐝 𝐓𝐢𝐦𝐞 (TAT) through the automation of key data
      analysis tasks and achieved significant resource optimization by 𝐬𝐚𝐯𝐢𝐧𝐠 ~𝟏.𝟓 𝐅𝐓𝐄.

      • Achieved cost savings of $50,000 by implementing data-driven decision-making strategies to reduce over-remediation. Remediated 20,000+ customers, recovering ~$1 million.

      • Collaborated with business partners from different Lines of business to…

      • 𝐑𝐞𝐚𝐥𝐢𝐳𝐞𝐝 𝐚 𝟒𝟎% 𝐫𝐞𝐝𝐮𝐜𝐭𝐢𝐨𝐧 𝐢𝐧 𝐚𝐯𝐞𝐫𝐚𝐠𝐞 𝐓𝐮𝐫𝐧𝐚𝐫𝐨𝐮𝐧𝐝 𝐓𝐢𝐦𝐞 (TAT) through the automation of key data
      analysis tasks and achieved significant resource optimization by 𝐬𝐚𝐯𝐢𝐧𝐠 ~𝟏.𝟓 𝐅𝐓𝐄.

      • Achieved cost savings of $50,000 by implementing data-driven decision-making strategies to reduce over-remediation. Remediated 20,000+ customers, recovering ~$1 million.

      • Collaborated with business partners from different Lines of business to gather requirements.

      •Tools: Python, Pyspark, Spark SQL, Teradata, Cloudera, SAS

  • Subject Matter Expert (Statistics)

    TutorBro Ltd.

    - 1 year 3 months

    New Delhi, Delhi, India

    - Mentored clients on statistical concepts, resulting in a 15% increase in client satisfaction.

    - Assisted customers with information on different queries on topics like probability and probabilistic distributions, parametric and non-parametric hypothesis testing, Regression, ANOVA, and similar concepts.

Education

  • National Institute of Technology, Tiruchirappalli Graphic

    National Institute of Technology, Tiruchirappalli

    Master's degree Computer Science 8.9

    -

    Activities and Societies: Member of Parallel Computing Club 2020-2022

    Member of Parallel Computing Club 2020-2022
    - Organised and Coordinated for the club's annual event intra collegiate INNOVAC 2021 and its special events.
    https://innovac21.org/

  • University of Rajasthan Graphic

    University of Rajasthan

    Bachelor's degree MATHEMATICS AND STATISTICS

    -

Licenses & Certifications

Projects

  • Cricket Stadium Data Pipeline

    -

    o Engineered and implemented data pipeline using Apache Airflow to extract Wikipedia data through
    web scraping, loaded into Google Cloud Storage in columnar file format.
    o Leveraged Google Cloud Dataproc for data cleaning and transformations. BigQuery for scalable and
    efficient storage. Integrated Tableau with BigQuery, enabling dynamic and interactive visualizations
    using Tableau

  • End-to-End Data Modernization

    -

    o Developed Data ingestion pipeline from SQL server (OnPrem) to Data Lake in Azure Data Factory.
    o Architected Data Lakehouse by developing 3 data zones in Data Lake using Azure Databricks.
    o Developed pipeline to load data to Azure Synapse for data analysis and Reporting using Tableau.

  • Lane Detection Using OpenCV

    -

    This project uses the image and video processing module from the OpenCV library for road lane detection problems. The aim of this project is to avoid accidental deaths and provide a better safety on roads, by use of advanced technologies in driving assistance system.

    See project
  • Uber Cluster Analysis Using PySpark

    -

    Designed Clusters based on Uber pick up data where centroids can be created as Uber zones for ensuring minimum distance and time required by the driver for reaching to assigned customers.

    See project
  • HATE SPEECH DETECTION

    -

    Built a Sentiment Analysis model by training sample points using Natural Language Processing techniques like tokenizing, bag of words and data stemming.
    Utilized Machine learning algorithms like Logistic Regression, Decision Tree, Random Forest, Extremely Random Trees and XGBoost.

  • OCR Browser Extension

    -

    Designed a Browser extension for text recognition to capture and extract text based data from image, videos, and PDF. Works for popular Browsers like Chrome, Firefox and Edge.

    See project

View Mohit’s full profile

  • See who you know in common
  • Get introduced
  • Contact Mohit directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Mohit Motwani in India

Add new skills with these courses