# Filter rows where age > 30 filtered_df = df.filter(pl.col('age') > 30) print("\nFiltered DataFrame (age > 30):") print(filtered_df)
# Add a new column 'bonus' which is 10% of the salary df = df.with_column((pl.col('salary') * 0.1).alias('bonus')) print("\nDataFrame with Bonus Column:") print(df)
# Group by 'department' and calculate the average salary and total bonus grouped_df = df.groupby('department').agg([ pl.col('salary').mean().alias('avg_salary'), pl.col('bonus').sum().alias('total_bonus') ]) print("\nGrouped by Department with Aggregations:") print(grouped_df)
# Sort the DataFrame by 'salary' in descending order sorted_df = df.sort('salary', reverse=True) print("\nSorted DataFrame by Salary (Descending):") print(sorted_df)
Explanation
Reading CSV:
pl.read_csv('data.csv') reads the CSV file into a $Polars$ DataFrame.
Displaying the DataFrame:
print(df) shows the original DataFrame.
Selecting Specific Columns:
df.select(['name', 'age', 'salary']) selects only the name, age, and salary columns.
Filtering Rows:
df.filter(pl.col('age') > 30) filters the DataFrame to include only rows where the age is greater than $30$.
Adding a New Column:
df.with_column((pl.col('salary') * 0.1).alias('bonus')) adds a new column bonus which is $10$% of the salary.
Grouping and Aggregating:
df.groupby('department').agg([pl.col('salary').mean().alias('avg_salary'), pl.col('bonus').sum().alias('total_bonus')]) groups the DataFrame by department and calculates the average salary and total bonus for each department.
Sorting:
df.sort('salary', reverse=True) sorts the DataFrame by salary in descending order.