Working with Pandas

content of the csv file (file.csv) is as follows:

name,city,happiness(0-10),height(cm),weight(kg)
John,Kolkata,7,180,70.45
Michael,Delhi,6,170,67.45
David,Mumbai,8,160,60.45
Sarah,Chennai,9,150,50.45
Daniel,Kolkata,7,180,70.45
Emily,Delhi,6,170,67.45
Olivia,Mumbai,8,160,60.45
Ethan,Chennai,9,150,50.45
Sophia,Kolkata,7,180,70.45
Matthew,Delhi,6,170,67.45
Karen,Mumbai,8,160,60.45
James,Chennai,9,150,50.45
Zoe,Kolkata,7,180,70.45
Logan,Delhi,6,170,67.45
Hannah,Mumbai,8,160,60.45
Liam,Chennai,9,150,50.45
Emma,Kolkata,7,180,70.45
Ava,Delhi,6,170,67.45
Noah,Mumbai,8,160,60.45
Mia,Chennai,9,150,50.45
Benjamin,Kolkata,7,180,70.45
Aria,Delhi,6,170,67.45
William,Mumbai,8,160,60.45
Grace,Chennai,9,150,50.45
# read csv file
import pandas as pd
 
df = pd.read_csv('file.csv')
# describe the data
df.describe() # this will describe statistic of all columns with numeric data
happiness(0-10)height(cm)weight(kg)
count24.0000024.00000024.000000
mean7.50000165.45833362.225000
std1.1420811.3481637.928773
min6.00000147.00000048.900000
25%6.75000157.25000057.500000
50%7.50000165.50000063.400000
75%8.25000174.00000067.700000
max9.00000183.00000073.500000
result = df.describe()
dict(result)['happiness(0-10)']['std'] # descriptive statistics of columns can be converted as a dict to access its values
# better way to access the mean, stdev of a column
df['happiness(0-10)'].mean()
7.5
df['happiness(0-10)'].mean()
df['happiness(0-10)'].std()
df['happiness(0-10)'].median()
df['happiness(0-10)'].quantile(0.25) # 25th percentile
df['city'].mode() # return most frequent values as a list, use mode()[0] to most common value of dataset
# min, max, sum, mean, median, std, var, quantile, mode
0    Chennai
1      Delhi
2    Kolkata
3     Mumbai
Name: city, dtype: object

Some Examples

# get average height of people living in both Mumbai and Kolkata
df[(df['city'] == 'Mumbai') | (df['city'] == 'Kolkata')]['height(cm)'].mean()
170.25
# the average height of people living in Chennai with weight above 64kg in 2 decimal places
df[(df['city'] == 'Chennai') & (df['weight(kg)'] < 64)]['height(cm)'].mean().round(2)
150.83
# get the mean and stdev of happiness of people living in Mumbai
df[df['city'] == 'Mumbai']['happiness(0-10)'].mean(), df[df['city'] == 'Mumbai']['happiness(0-10)'].std()
(8.0, 0.0)