
📘 Data Wrangling with Pandas – Summary
This notebook demonstrates step-by-step data wrangling techniques using the Titanic dataset.
The main tasks include exploring, cleaning, transforming, grouping, and merging data.
🔗 Useful Links
- 👉 Open in Google Colab
OR - Open in Github
- 👉 GitHub Titanic Dataset
1. Getting Information about the Data
- Load dataset (
read_csv
). - Preview data using
head()
. - Get dataset dimensions with
shape
. - Generate descriptive statistics with
describe()
.
2. Slicing a DataFrame
- Select rows by position (
iloc
). - Set an index (
set_index
). - Select rows by label (
loc
).
3. Selecting Rows Based on Conditionals
- Filter rows using conditions (
df[df['Sex']=='female']
). - Combine multiple conditions with
&
(e.g., female passengers aged ≥65).
4. Sorting Values
- Sort rows using
sort_values()
.
Example: sorting passengers by age.
5. Replacing Values
- Replace specific values (
replace({'male':'M'})
). - Use regex replacement (
replace(r"male","third",regex=True)
).
6. Renaming Columns
- Rename columns using
rename()
.
Example: renamingSex → Gender
.
7. Finding Maximum, Minimum, Sum, Average, Count
- Use aggregation functions like
max()
,min()
,mean()
,sum()
, andcount()
on a column.
8. Finding Unique Values
- Get unique values in a column (
unique()
). - Count occurrences (
value_counts()
).
9. Handling Missing Data
- Identify missing values (
isnull()
/isna()
). - Replace values with
NaN
(replace
+numpy.nan
). - Fill missing values with statistics (e.g., mean age using
fillna()
).
10. Deleting Columns
- Drop columns with
drop(axis=1)
.
11. Deleting Rows
- Remove rows conditionally (
df[df['Sex']!='male']
). - Drop specific values (e.g., removing age = 65).
12. Dropping Duplicate Rows
- Remove duplicates using
drop_duplicates()
. - Count duplicates with
duplicated().sum()
.
13. Group Rows by Values
- Group rows with
groupby()
. - Apply aggregations (e.g., count per
Sex
).
14. Grouping Rows by Time
- Create a time index using
pd.date_range()
. - Resample by time period (
W
= weekly,M
= monthly). - Perform aggregations (sum, mean) on grouped data.
15. Aggregating Operations and Statistics
- Use
aggregate()
with multiple operations.
Example:Age
min & max,Fare
mean.
16. Looping Over a Column
- Iterate through column values with a loop.
- Example: converting first two names to uppercase.
17. Applying a Function Over a Column
- Apply a custom function with
apply()
.
Example: convert names to uppercase.
18. Applying a Function to Groups
- Apply functions to grouped data.
Example:groupby('Sex').apply(lambda x: x.count())
.
19. Concatenating DataFrames
- Combine DataFrames vertically with
pd.concat()
.
20. Merging DataFrames
- Merge DataFrames on a common key using
pd.merge()
.
Example: merging employees with sales data.
✅ Final Notes
This notebook demonstrates key pandas techniques for data wrangling:
- Data exploration & cleaning.
- Slicing, filtering, sorting.
- Handling missing data & duplicates.
- Aggregating & grouping.
- Combining multiple DataFrames (
concat
&merge
).
Together, these operations form the foundation of data preprocessing and analysis in Python.
Jai Hanuman
4 Reactions
0 Bookmarks