Login

Sign Up

📘 Data Wrangling with Pandas – Summary
_Ujjwal_

Posted on Aug 18, 2025 | AIML

📘 Data Wrangling with Pandas – Summary

This notebook demonstrates step-by-step data wrangling techniques using the Titanic dataset.
The main tasks include exploring, cleaning, transforming, grouping, and merging data.

🔗 Useful Links


1. Getting Information about the Data

  • Load dataset (read_csv).
  • Preview data using head().
  • Get dataset dimensions with shape.
  • Generate descriptive statistics with describe().

2. Slicing a DataFrame

  • Select rows by position (iloc).
  • Set an index (set_index).
  • Select rows by label (loc).

3. Selecting Rows Based on Conditionals

  • Filter rows using conditions (df[df['Sex']=='female']).
  • Combine multiple conditions with & (e.g., female passengers aged ≥65).

4. Sorting Values

  • Sort rows using sort_values().
    Example: sorting passengers by age.

5. Replacing Values

  • Replace specific values (replace({'male':'M'})).
  • Use regex replacement (replace(r"male","third",regex=True)).

6. Renaming Columns

  • Rename columns using rename().
    Example: renaming Sex → Gender.

7. Finding Maximum, Minimum, Sum, Average, Count

  • Use aggregation functions like max(), min(), mean(), sum(), and count() on a column.

8. Finding Unique Values

  • Get unique values in a column (unique()).
  • Count occurrences (value_counts()).

9. Handling Missing Data

  • Identify missing values (isnull() / isna()).
  • Replace values with NaN (replace + numpy.nan).
  • Fill missing values with statistics (e.g., mean age using fillna()).

10. Deleting Columns

  • Drop columns with drop(axis=1).

11. Deleting Rows

  • Remove rows conditionally (df[df['Sex']!='male']).
  • Drop specific values (e.g., removing age = 65).

12. Dropping Duplicate Rows

  • Remove duplicates using drop_duplicates().
  • Count duplicates with duplicated().sum().

13. Group Rows by Values

  • Group rows with groupby().
  • Apply aggregations (e.g., count per Sex).

14. Grouping Rows by Time

  • Create a time index using pd.date_range().
  • Resample by time period (W = weekly, M = monthly).
  • Perform aggregations (sum, mean) on grouped data.

15. Aggregating Operations and Statistics

  • Use aggregate() with multiple operations.
    Example: Age min & max, Fare mean.

16. Looping Over a Column

  • Iterate through column values with a loop.
  • Example: converting first two names to uppercase.

17. Applying a Function Over a Column

  • Apply a custom function with apply().
    Example: convert names to uppercase.

18. Applying a Function to Groups

  • Apply functions to grouped data.
    Example: groupby('Sex').apply(lambda x: x.count()).

19. Concatenating DataFrames

  • Combine DataFrames vertically with pd.concat().

20. Merging DataFrames

  • Merge DataFrames on a common key using pd.merge().
    Example: merging employees with sales data.

✅ Final Notes

This notebook demonstrates key pandas techniques for data wrangling:

  • Data exploration & cleaning.
  • Slicing, filtering, sorting.
  • Handling missing data & duplicates.
  • Aggregating & grouping.
  • Combining multiple DataFrames (concat & merge).

Together, these operations form the foundation of data preprocessing and analysis in Python.


Jai Hanuman

4 Reactions

0 Bookmarks

Read next

_Ujjwal_

_Ujjwal_

Dec 14, 24

4 min read

|

Building an Own AI Chatbot: Integrating Custom Knowledge Bases

_Ujjwal_

_Ujjwal_

Dec 15, 24

9 min read

|

Exploratory data analysis with Pandas:Part 1