Python For Data Analytics

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Python for data analytics

Please implement Python coding for all the problems.

1) Please take care of missing data present in the “Data.csv” file using python module
“sklearn.impute” and its methods, also collect all the data that has “Salary” less than
“70,000”.

import pandas as pd

from sklearn.impute import SimpleImputer

# Load the CSV file into a DataFrame

data = pd.read_csv('D:\Data.csv')

# Initialize the SimpleImputer with mean strategy (you can choose other strategies)

imputer = SimpleImputer(strategy='mean')

# Impute missing values in the 'Salaries' column

data['Salaries'] = imputer.fit_transform(data[['Salaries']])

# Filter data where 'Salaries' is less than 70000

filtered_data = data[data['Salaries'] < 70000]

print(filtered_data)

2) Subtracting dates:
Python date objects let us treat calendar dates as something similar to numbers: we can
compare them, sort them, add, and even subtract them. Do math with dates in a way that
would be a pain to do by hand. The 2007 Florida hurricane season was one of the busiest
on record, with 8 hurricanes in one year. The first one hit on May 9th, 2007, and the last
one hit on December 13th, 2007. How many days elapsed between the first and last
hurricane in 2007?

Instructions:

Import date from datetime.

Create a date object for May 9th, 2007, and assign it to the start variable.
Create a date object for December 13th, 2007, and assign it to the end variable.

Subtract start from end, to print the number of days in the resulting timedelta
object.

from datetime import date

# Define the start and end dates

start_date = date(2007, 5, 9)

end_date = date(2007, 12, 13)

# Calculate the difference between dates

days_elapsed = (end_date - start_date).days

print(f"Days elapsed between the first and last hurricane in 2007: {days_elapsed} days")

3) Representing dates in different ways

Date objects in Python have a great number of ways they can be printed out as strings. In
some cases, you want to know the date in a clear, language-agnostic format. In other
cases, you want something which can fit into a paragraph and flow naturally.

Print out the same date, August 26, 1992 (the day that Hurricane Andrew made landfall in
Florida), in a number of different ways, by using the “ .strftime() ” method. Store it in a
variable called “Andrew”.

Instructions:

Print it in the format 'YYYY-MM', 'YYYY-DDD' and 'MONTH (YYYY)'

# Create a date object for August 26, 1992

hurricane_date = date(1992, 8, 26)

# Print the date in different formats

print("Different representations of August 26, 1992:")

from datetime import datetime

# Define the date

Andrew = datetime(1992, 8, 26)


# Print in 'YYYY-MM' format

print(Andrew.strftime('%Y-%m'))

# Print in 'YYYY-DDD' format

print(Andrew.strftime('%Y-%j'))

# Print in 'MONTH (YYYY)' format

print(Andrew.strftime('%B (%Y)'))

4) For the dataset “Indian_cities”,


a) Find out top 10 states in female-male sex ratio
b) Find out top 10 cities in total number of graduates
c) Find out top 10 cities and their locations in respect of total
effective_literacy_rate.

5) For the data set “Indian_cities”


a) Construct histogram on literates_total and comment about the inferences
b) Construct scatter plot between male graduates and female graduates

6) For the data set “Indian_cities”


a) Construct Boxplot on total effective literacy rate and draw inferences
b) Find out the number of null values in each column of the dataset and delete
them.

You might also like