Working with Pandas
content of the csv file (file.csv) is as follows:
name,city,happiness(0-10),height(cm),weight(kg)
John,Kolkata,7,180,70.45
Michael,Delhi,6,170,67.45
David,Mumbai,8,160,60.45
Sarah,Chennai,9,150,50.45
Daniel,Kolkata,7,180,70.45
Emily,Delhi,6,170,67.45
Olivia,Mumbai,8,160,60.45
Ethan,Chennai,9,150,50.45
Sophia,Kolkata,7,180,70.45
Matthew,Delhi,6,170,67.45
Karen,Mumbai,8,160,60.45
James,Chennai,9,150,50.45
Zoe,Kolkata,7,180,70.45
Logan,Delhi,6,170,67.45
Hannah,Mumbai,8,160,60.45
Liam,Chennai,9,150,50.45
Emma,Kolkata,7,180,70.45
Ava,Delhi,6,170,67.45
Noah,Mumbai,8,160,60.45
Mia,Chennai,9,150,50.45
Benjamin,Kolkata,7,180,70.45
Aria,Delhi,6,170,67.45
William,Mumbai,8,160,60.45
Grace,Chennai,9,150,50.45
# read csv file
import pandas as pd
df = pd.read_csv('file.csv')
# describe the data
df.describe() # this will describe statistic of all columns with numeric data
happiness(0-10) | height(cm) | weight(kg) | |
---|---|---|---|
count | 24.00000 | 24.000000 | 24.000000 |
mean | 7.50000 | 165.458333 | 62.225000 |
std | 1.14208 | 11.348163 | 7.928773 |
min | 6.00000 | 147.000000 | 48.900000 |
25% | 6.75000 | 157.250000 | 57.500000 |
50% | 7.50000 | 165.500000 | 63.400000 |
75% | 8.25000 | 174.000000 | 67.700000 |
max | 9.00000 | 183.000000 | 73.500000 |
result = df.describe()
dict(result)['happiness(0-10)']['std'] # descriptive statistics of columns can be converted as a dict to access its values
# better way to access the mean, stdev of a column
df['happiness(0-10)'].mean()
7.5
df['happiness(0-10)'].mean()
df['happiness(0-10)'].std()
df['happiness(0-10)'].median()
df['happiness(0-10)'].quantile(0.25) # 25th percentile
df['city'].mode() # return most frequent values as a list, use mode()[0] to most common value of dataset
# min, max, sum, mean, median, std, var, quantile, mode
0 Chennai
1 Delhi
2 Kolkata
3 Mumbai
Name: city, dtype: object
Some Examples
# get average height of people living in both Mumbai and Kolkata
df[(df['city'] == 'Mumbai') | (df['city'] == 'Kolkata')]['height(cm)'].mean()
170.25
# the average height of people living in Chennai with weight above 64kg in 2 decimal places
df[(df['city'] == 'Chennai') & (df['weight(kg)'] < 64)]['height(cm)'].mean().round(2)
150.83
# get the mean and stdev of happiness of people living in Mumbai
df[df['city'] == 'Mumbai']['happiness(0-10)'].mean(), df[df['city'] == 'Mumbai']['happiness(0-10)'].std()
(8.0, 0.0)