pandas-stats

Working with Pandas

content of the csv file (file.csv) is as follows:

name,city,happiness(0-10),height(cm),weight(kg)
John,Kolkata,7,180,70.45
Michael,Delhi,6,170,67.45
David,Mumbai,8,160,60.45
Sarah,Chennai,9,150,50.45
Daniel,Kolkata,7,180,70.45
Emily,Delhi,6,170,67.45
Olivia,Mumbai,8,160,60.45
Ethan,Chennai,9,150,50.45
Sophia,Kolkata,7,180,70.45
Matthew,Delhi,6,170,67.45
Karen,Mumbai,8,160,60.45
James,Chennai,9,150,50.45
Zoe,Kolkata,7,180,70.45
Logan,Delhi,6,170,67.45
Hannah,Mumbai,8,160,60.45
Liam,Chennai,9,150,50.45
Emma,Kolkata,7,180,70.45
Ava,Delhi,6,170,67.45
Noah,Mumbai,8,160,60.45
Mia,Chennai,9,150,50.45
Benjamin,Kolkata,7,180,70.45
Aria,Delhi,6,170,67.45
William,Mumbai,8,160,60.45
Grace,Chennai,9,150,50.45

# read csv file
import pandas as pd
 
df = pd.read_csv('file.csv')

# describe the data
df.describe() # this will describe statistic of all columns with numeric data

	happiness(0-10)	height(cm)	weight(kg)
count	24.00000	24.000000	24.000000
mean	7.50000	165.458333	62.225000
std	1.14208	11.348163	7.928773
min	6.00000	147.000000	48.900000
25%	6.75000	157.250000	57.500000
50%	7.50000	165.500000	63.400000
75%	8.25000	174.000000	67.700000
max	9.00000	183.000000	73.500000

result = df.describe()
dict(result)['happiness(0-10)']['std'] # descriptive statistics of columns can be converted as a dict to access its values
# better way to access the mean, stdev of a column
df['happiness(0-10)'].mean()

7.5

df['happiness(0-10)'].mean()
df['happiness(0-10)'].std()
df['happiness(0-10)'].median()
df['happiness(0-10)'].quantile(0.25) # 25th percentile
df['city'].mode() # return most frequent values as a list, use mode()[0] to most common value of dataset
# min, max, sum, mean, median, std, var, quantile, mode

0    Chennai
1      Delhi
2    Kolkata
3     Mumbai
Name: city, dtype: object

Some Examples

# get average height of people living in both Mumbai and Kolkata
df[(df['city'] == 'Mumbai') | (df['city'] == 'Kolkata')]['height(cm)'].mean()

170.25

# the average height of people living in Chennai with weight above 64kg in 2 decimal places
df[(df['city'] == 'Chennai') & (df['weight(kg)'] < 64)]['height(cm)'].mean().round(2)

150.83

# get the mean and stdev of happiness of people living in Mumbai
df[df['city'] == 'Mumbai']['happiness(0-10)'].mean(), df[df['city'] == 'Mumbai']['happiness(0-10)'].std()

(8.0, 0.0)

🌳 My Digital Garden

Explorer

pandas-stats

Table of Contents

Working with Pandas

Some Examples

Backlinks

Table of Contents

Recent Notes

docs-template

htpc-setup

net-connections

Graph View