Table of Contents#
- Pandas: The Swiss Army Knife of Data Manipulation
- NumPy: Foundation for Numerical Computing
- Matplotlib: The Grandfather of Python Visualization
- Seaborn: Statistical Visualization Made Easy
- Scikit-learn: Machine Learning for Everyone
- Plotly: Interactive Visualization for Modern Dashboards
- SQLAlchemy: Bridging Python and Databases
- PySpark: Big Data Processing with Python
- StatsModels: Statistical Modeling and Hypothesis Testing
- Jupyter Notebooks/Lab: The Data Analyst’s Playground
- Conclusion
- References
1. Pandas: The Swiss Army Knife of Data Manipulation#
If data analysis were a kitchen, Pandas would be your chef’s knife—versatile, indispensable, and capable of handling almost any task. Built on NumPy, Pandas is designed for tabular data manipulation (think Excel spreadsheets or SQL tables) and is the foundation of most data analysis workflows.
What It Is#
Pandas introduces two core data structures:
- DataFrame: A 2D table with rows (observations) and columns (features).
- Series: A 1D array (like a single column of a DataFrame) with an index.
These structures let you load, clean, transform, and analyze data with just a few lines of code.
Key Features#
- Data Loading: Read CSV, Excel, SQL, JSON, and more with
pd.read_*()functions. - Data Cleaning: Handle missing values (
dropna(),fillna()), remove duplicates (drop_duplicates()), and correct data types (astype()). - Transformation: Merge/join datasets (
merge()), group data by categories (groupby()), and reshape data (pivot_table()). - Time Series: Manipulate dates (
to_datetime()), resample time-based data (resample()), and handle time zones.
Why Data Analysts Need It#
Nearly every data analysis project starts with data wrangling—turning raw, messy data into a structured format. Pandas excels here. It replaces manual Excel work with reproducible code, saving you hours of copy-pasting and reducing human error.
Practical Example: Explore a Dataset#
Let’s load a CSV (e.g., the Iris dataset) and perform basic exploration:
import pandas as pd
# Load data (replace with your file path)
df = pd.read_csv("iris.csv")
# View the first 5 rows
print("First 5 Rows:\n", df.head())
# Get summary statistics (mean, median, std, etc.)
print("\nSummary Statistics:\n", df.describe())
# Group by species and calculate mean sepal length
species_mean = df.groupby("species")["sepal_length"].mean()
print("\nMean Sepal Length by Species:\n", species_mean)
# Filter rows where sepal_length > 5.0 (readable with .query())
filtered_df = df.query("sepal_length > 5.0")
print("\nFiltered Rows (sepal_length > 5.0):\n", filtered_df.head())Pro Tips#
- Method Chaining: Combine operations for cleaner code (e.g.,
df.dropna().groupby("species").mean()). - Avoid Loops: Pandas is optimized for vectorized operations—use
apply()only when necessary. - Indexing: Use
loc[](label-based) andiloc[](position-based) for precise data access (e.g.,df.loc[0:5, ["sepal_length", "species"]]).
2. NumPy: Foundation for Numerical Computing#
If Pandas is the chef’s knife, NumPy is the cutting board—it’s the underlying framework that makes Pandas (and many other libraries) work. NumPy specializes in multi-dimensional arrays and fast numerical operations, which are critical for handling large datasets efficiently.
What It Is#
NumPy (short for "Numerical Python") provides the ndarray (n-dimensional array) object, which is far more efficient than Python’s built-in lists for numerical computations. It also includes a vast library of mathematical functions (e.g., linear algebra, Fourier transforms) that operate directly on these arrays.
Key Features#
- Vectorization: Perform operations on entire arrays instead of looping through elements (10–100x faster).
- Broadcasting: Combine arrays of different shapes (e.g., add a 1D array to a 2D array).
- Mathematical Functions: Built-in support for trigonometry, exponentials, logarithms, and linear algebra (
numpy.linalg). - Random Number Generation: Create reproducible random data for simulations.
Why Data Analysts Need It#
Pandas relies on NumPy for its core operations—understanding NumPy will help you debug performance issues, write custom functions, and work with raw numerical data (e.g., sensor readings, images).
Practical Example: Array Operations#
Let’s create a NumPy array and perform basic calculations:
import numpy as np
# Create a 2D array (3 rows, 2 columns)
arr = np.array([[1, 2], [3, 4], [5, 6]])
print("Original Array:\n", arr)
# Calculate the mean of each column
col_means = np.mean(arr, axis=0)
print("\nColumn Means:", col_means) # Output: [3. 4.]
# Multiply the array by 2 (vectorized operation)
doubled_arr = arr * 2
print("\nDoubled Array:\n", doubled_arr)
# Matrix multiplication (dot product)
mat1 = np.array([[1, 2], [3, 4]])
mat2 = np.array([[5, 6], [7, 8]])
product = np.dot(mat1, mat2)
print("\nMatrix Product:\n", product)Pro Tips#
- Use
np.array()Instead of Lists: For numerical data, always convert lists to NumPy arrays for speed. - Avoid
forLoops: Use vectorized operations—e.g.,arr * 2instead of[x*2 for x in arr]. - Check Array Shape: Use
arr.shapeto verify dimensions before operations (prevents broadcasting errors).
3. Matplotlib: The Grandfather of Python Visualization#
Matplotlib is the original Python plotting library—and it’s still the most flexible. While newer libraries (like Seaborn or Plotly) simplify common tasks, Matplotlib gives you full control over every aspect of your plots (axes, labels, colors, etc.).
What It Is#
Matplotlib is a low-level library for creating static, animated, or interactive plots. It’s inspired by MATLAB’s plotting syntax, making it familiar to users transitioning from MATLAB.
Key Features#
- Plot Types: Line, bar, histogram, scatter, pie, and 3D plots.
- Customization: Adjust axes limits, labels, legends, colors, and fonts.
- Subplots: Create multi-plot layouts (e.g., 2x2 grid of charts).
- Export: Save plots as high-resolution images (PNG, JPG) or vector graphics (SVG, PDF).
Why Data Analysts Need It#
Matplotlib is the foundation of Python visualization—Seaborn and Plotly both build on it. It’s essential for creating publication-quality plots where you need precise control over design (e.g., for reports or academic papers).
Practical Example: Line Plot with Annotations#
Let’s create a line plot showing sales over time:
import matplotlib.pyplot as plt
import numpy as np
# Generate sample data
months = np.arange(1, 13)
sales = [100, 120, 90, 150, 180, 200, 220, 250, 230, 190, 160, 140]
# Create plot
plt.figure(figsize=(10, 6)) # Set plot size
plt.plot(months, sales, marker='o', color='b', label='Monthly Sales') # Line plot with markers
# Add labels and title
plt.xlabel('Month')
plt.ylabel('Sales ($)')
plt.title('2024 Monthly Sales Trend')
# Add legend and grid
plt.legend()
plt.grid(True, linestyle='--', alpha=0.7)
# Annotate peak sales (August)
peak_month = 8
peak_sales = 250
plt.annotate(
f'Peak: {peak_sales}',
xy=(peak_month, peak_sales),
xytext=(peak_month + 1, peak_sales + 10),
arrowprops=dict(facecolor='red', shrink=0.05)
)
# Show plot
plt.show()Pro Tips#
- Use
plt.subplots(): For better layout control (instead ofplt.figure()). Example:fig, ax = plt.subplots(figsize=(10, 6)) ax.plot(months, sales) ax.set_xlabel('Month') - Save High-Resolution Plots: Use
plt.savefig('sales_plot.png', dpi=300)for crisp images. - Avoid Overplotting: Use transparency (
alpha=0.5) for scatter plots with many points.
4. Seaborn: Statistical Visualization Made Easy#
Seaborn is a high-level visualization library built on Matplotlib. It simplifies the creation of statistical plots (e.g., heatmaps, boxplots, regression lines) and adds aesthetically pleasing themes by default.
What It Is#
Seaborn’s goal is to make statistical graphics accessible—you can create complex plots (like a correlation heatmap) with 1–2 lines of code instead of 10–20 lines in Matplotlib.
Key Features#
- Statistical Plots: Heatmaps (
heatmap()), boxplots (boxplot()), violin plots (violinplot()), pairplots (pairplot()), and regression plots (regplot()). - Themes: Predefined styles (
sns.set_style()) like "darkgrid" or "whitegrid" for consistent visuals. - Color Palettes: Built-in palettes (
sns.color_palette()) for categorical or sequential data. - Integration with Pandas: Works seamlessly with DataFrames—no need to convert to NumPy arrays.
Why Data Analysts Need It#
Seaborn is perfect for exploratory data analysis (EDA)—it lets you quickly visualize relationships between variables (e.g., how age correlates with income) without writing boilerplate code.
Practical Example: Correlation Heatmap#
Let’s visualize correlations between features in the Iris dataset:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
# Load Iris dataset (built into Seaborn)
df = sns.load_dataset('iris')
# Calculate correlation matrix
corr_matrix = df.corr(numeric_only=True)
# Create heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(
corr_matrix,
annot=True, # Show correlation values
cmap='coolwarm', # Color palette
fmt='.2f', # Format values to 2 decimal places
linewidths=0.5 # Add lines between cells
)
plt.title('Iris Feature Correlation Heatmap')
plt.show()Pro Tips#
- Use
sns.set_style(): Apply a theme at the start of your script for consistency. - Combine with Matplotlib: Use Seaborn for the main plot, then Matplotlib for fine-tuning (e.g., adjusting axis labels).
- Try
pairplot(): For quick exploration of relationships between all numerical features.
5. Scikit-learn: Machine Learning for Everyone#
Scikit-learn (or "sklearn") is the most popular machine learning library for Python. It provides simple, consistent APIs for almost every ML task—from preprocessing data to training models to evaluating performance.
What It Is#
Scikit-learn is built on NumPy, Pandas, and Matplotlib. It focuses on supervised (predictive) and unsupervised (descriptive) learning, with algorithms like linear regression, random forests, and k-means clustering.
Key Features#
- Preprocessing: Scale data (
StandardScaler), encode categorical variables (OneHotEncoder), and split datasets (train_test_split). - Models: Linear regression, logistic regression, decision trees, random forests, SVMs, and more.
- Evaluation: Metrics like accuracy, precision, recall, RMSE, and R².
- Model Selection: Cross-validation (
cross_val_score), hyperparameter tuning (GridSearchCV), and pipeline creation (Pipeline).
Why Data Analysts Need It#
Even if you’re not a "machine learning engineer," Scikit-learn lets you build predictive models to answer questions like:
- "Which customers are likely to churn?"
- "What’s the expected sales volume next quarter?"
It’s designed for practicality—you can train a model in 5 lines of code.
Practical Example: Linear Regression#
Let’s predict house prices using the Boston Housing dataset:
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Load dataset
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)
y = pd.Series(boston.target, name='MEDV') # Median home value
# Split into train/test sets (80/20)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate model
rmse = mean_squared_error(y_test, y_pred, squared=False)
r2 = r2_score(y_test, y_pred)
print(f"RMSE: {rmse:.2f}") # Lower = better (prediction error)
print(f"R²: {r2:.2f}") # Higher = better (explained variance)Pro Tips#
- Use
train_test_split: Always split data into training and test sets to avoid overfitting. - Scale Numerical Data: Most ML models (e.g., SVMs, neural networks) require scaled data.
- Use Pipelines: Chain preprocessing and modeling steps to avoid data leakage (e.g., scaling before splitting).
6. Plotly: Interactive Visualization for Modern Dashboards#
Plotly is a interactive visualization library that lets you create plots users can zoom, pan, hover over, and click. It’s perfect for building web-based dashboards (with Plotly Dash) or sharing interactive plots in reports.
What It Is#
Plotly has two main components:
- Plotly Express: A high-level API for quick, beautiful plots (like Seaborn).
- Plotly Graph Objects: A low-level API for full customization (like Matplotlib).
It also includes Dash—a framework for building interactive web apps with Python (no JavaScript required).
Key Features#
- Interactive Plots: Zoom, pan, hover tooltips, and clickable legends.
- Plot Types: Scatter, line, bar, histogram, choropleth (maps), and 3D plots.
- Dash: Build dashboards with sliders, dropdowns, and buttons.
- Offline Support: Save plots as HTML files (shareable without a server).
Why Data Analysts Need It#
Static plots are great for reports, but interactive plots let stakeholders explore data on their own (e.g., zoom in on a specific time period or filter by category). Dash takes this further by letting you build full-fledged apps (e.g., a sales dashboard with real-time data).
Practical Example: Interactive Scatter Plot#
Let’s create a scatter plot of Iris flowers with hover tooltips:
import plotly.express as px
from sklearn.datasets import load_iris
# Load Iris dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = iris.target_names[iris.target]
# Create interactive scatter plot
fig = px.scatter(
df,
x='sepal length (cm)',
y='sepal width (cm)',
color='species',
size='petal length (cm)',
hover_data=['petal width (cm)'],
title='Iris Flower Sepal Dimensions'
)
# Show plot (opens in browser)
fig.show()Pro Tips#
- Use Plotly Express First: It’s faster for common plots—switch to Graph Objects only if you need customization.
- Try Dash: For dashboards, start with the Dash Tutorial.
- Share Plots: Save as HTML with
fig.write_html('iris_plot.html')—anyone can open it in a browser.
7. SQLAlchemy: Bridging Python and Databases#
Most enterprise data lives in SQL databases (e.g., PostgreSQL, MySQL, SQLite). SQLAlchemy lets you connect Python to these databases, write SQL queries in Python, and even map database tables to Python objects (ORM).
What It Is#
SQLAlchemy has two main components:
- SQLAlchemy Core: A SQL toolkit for writing raw SQL queries in Python (with parameterized queries to prevent SQL injection).
- SQLAlchemy ORM: An Object-Relational Mapper—maps database tables to Python classes (e.g., a
Userclass maps to auserstable).
Key Features#
- Database Support: Works with SQLite, PostgreSQL, MySQL, Oracle, and more.
- Parameterized Queries: Safe, secure way to insert user input into queries.
- ORM: Write Python code instead of SQL (e.g.,
User.query.filter_by(name='Alice').all()). - Integration with Pandas: Load SQL query results directly into a DataFrame (
pd.read_sql()).
Why Data Analysts Need It#
You’ll often need to pull data from a database into Python for analysis. SQLAlchemy simplifies this process—you don’t have to switch between Python and SQL, and you can use Pandas to analyze the results.
Practical Example: Query a SQLite Database#
Let’s connect to a SQLite database, create a table, insert data, and query it:
from sqlalchemy import create_engine, Column, Integer, String, ForeignKey
from sqlalchemy.orm import declarative_base, sessionmaker
import pandas as pd
# Step 1: Connect to database (SQLite in this case)
engine = create_engine('sqlite:///sales.db') # Creates sales.db if it doesn't exist
Base = declarative_base()
# Step 2: Define a table schema (ORM)
class Sale(Base):
__tablename__ = 'sales'
id = Column(Integer, primary_key=True)
product = Column(String)
quantity = Column(Integer)
revenue = Column(Integer)
# Step 3: Create table in database
Base.metadata.create_all(engine)
# Step 4: Insert data
Session = sessionmaker(bind=engine)
session = Session()
# Add a sale
sale1 = Sale(product='Laptop', quantity=2, revenue=2000)
session.add(sale1)
session.commit()
# Step 5: Query data (ORM)
sales = session.query(Sale).filter(Sale.revenue > 1000).all()
for sale in sales:
print(f"Product: {sale.product}, Revenue: {sale.revenue}")
# Step 6: Load query results into Pandas DataFrame
df = pd.read_sql(session.query(Sale).statement, engine)
print("\nPandas DataFrame:\n", df)Pro Tips#
- Use
create_engine(): For connecting to databases—replace the URL with your database credentials (e.g.,postgresql://user:password@host:port/dbname). - Prefer ORM for Complex Apps: For simple queries, use Core (raw SQL) or Pandas
read_sql(). For complex applications, ORM is more maintainable. - Prevent SQL Injection: Always use parameterized queries (e.g.,
session.query(Sale).filter(Sale.product == product_name)instead of string concatenation).
8. PySpark: Big Data Processing with Python#
When your data is too large to fit in memory (terabytes or petabytes), Pandas won’t cut it. PySpark—Python’s API for Apache Spark—lets you process big data using distributed computing (running code on multiple servers).
What It Is#
Apache Spark is a framework for distributed data processing. PySpark lets you use Spark’s power with Python, including:
- Spark DataFrames: Similar to Pandas DataFrames but distributed across a cluster.
- Spark SQL: Query data with SQL (like SQLAlchemy).
- MLlib: Spark’s machine learning library (similar to Scikit-learn).
Key Features#
- Distributed Computing: Process data across thousands of servers (handles terabytes of data).
- Spark DataFrames: Same API as Pandas (easy to learn if you know Pandas).
- Fault Tolerance: Spark automatically recovers from node failures.
- Integration: Works with Hadoop, S3, and other big data tools.
Why Data Analysts Need It#
If you work with big data (e.g., clickstream data, sensor data, social media data), PySpark is essential. It lets you perform the same operations as Pandas (filtering, grouping, aggregating) but on datasets that are too large for a single computer.
Practical Example: Spark DataFrame Operations#
Let’s load a CSV into a Spark DataFrame and perform basic transformations:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, sum
# Step 1: Create a SparkSession (entry point to Spark)
spark = SparkSession.builder \
.appName("BigDataAnalysis") \
.getOrCreate()
# Step 2: Load CSV into Spark DataFrame
df = spark.read.csv("large_sales_data.csv", header=True, inferSchema=True)
# Step 3: Show first 5 rows
df.show(5)
# Step 4: Filter and aggregate data
# Calculate total revenue per product (for quantity > 1)
product_revenue = df \
.filter(col("quantity") > 1) \
.groupBy("product") \
.agg(sum("revenue").alias("total_revenue")) \
.orderBy(col("total_revenue").desc())
# Show results
product_revenue.show()
# Step 5: Stop SparkSession
spark.stop()Pro Tips#
- Use Spark DataFrames Instead of RDDs: RDDs (Resilient Distributed Datasets) are lower-level—DataFrames are faster and easier to use.
- Avoid UDFs: User-Defined Functions (UDFs) are slow—use built-in Spark functions (e.g.,
col(),sum()) instead. - Cache Data: Use
df.cache()for repeated computations (speeds up queries).
9. StatsModels: Statistical Modeling and Hypothesis Testing#
Pandas and Scikit-learn are great for descriptive and predictive analytics, but StatsModels is designed for inferential statistics—answering questions like:
- "Is there a significant relationship between advertising spend and sales?"
- "Do men and women have different average incomes?"
What It Is#
StatsModels is a library for statistical modeling and hypothesis testing. It provides implementations of classic statistical methods (e.g., OLS regression, t-tests, ANOVA) with detailed output (like R’s summary() function).
Key Features#
- Regression Models: Ordinary Least Squares (OLS), logistic regression, Poisson regression.
- Hypothesis Testing: t-tests, ANOVA, chi-square tests.
- Time Series: ARIMA, SARIMA, and exponential smoothing.
- Formula API: Use R-style formulas (e.g.,
sales ~ ads + price) for model specification.
Why Data Analysts Need It#
Inferential statistics helps you validate hypotheses and draw conclusions about populations (not just samples). For example, you can use StatsModels to prove that a new marketing campaign significantly increased sales (not just that sales went up by chance).
Practical Example: OLS Regression#
Let’s test the relationship between advertising spend and sales:
import statsmodels.api as sm
import pandas as pd
# Create sample data
data = {
'ads_spend': [100, 200, 300, 400, 500],
'sales': [1500, 2500, 3200, 4000, 4800]
}
df = pd.DataFrame(data)
# Add constant term (required for OLS)
df['const'] = 1
# Fit OLS model
model = sm.OLS(df['sales'], df[['const', 'ads_spend']])
results = model.fit()
# Print summary (includes coefficients, p-values, R²)
print(results.summary())Pro Tips#
- Use the Formula API: For simpler model specification (requires
patsylibrary):import statsmodels.formula.api as smf model = smf.ols('sales ~ ads_spend', data=df).fit() - Check Residuals: Use
results.residto verify model assumptions (e.g., residuals should be normally distributed). - Interpret P-Values: A p-value < 0.05 indicates a statistically significant relationship.
10. Jupyter Notebooks/Lab: The Data Analyst’s Playground#
Jupyter Notebooks (and Jupyter Lab) are interactive computing environments that let you combine code, text, visuals, and equations in a single document. They’re the default tool for exploratory data analysis (EDA)—where you iterate quickly, test hypotheses, and document your process.
What It Is#
Jupyter Notebooks use cells to separate content:
- Code Cells: Run Python code (outputs appear below).
- Markdown Cells: Write text, headings, bullet points, and LaTeX equations.
- Raw Cells: For plain text (e.g., HTML, SQL).
Jupyter Lab is the next-generation interface—more modern, with better support for multiple files and extensions.
Key Features#
- Interactive Execution: Run cells one at a time (no need to rerun the entire script).
- Documentation: Combine code with explanations (great for sharing work with teammates).
- Integration: Works with all the libraries we’ve covered (Pandas, Matplotlib, etc.).
- Sharing: Save notebooks as HTML, PDF, or Python scripts (or use JupyterHub for collaborative work).
Why Data Analysts Need It#
EDA is all about iteration—you load data, clean it, plot it, test a hypothesis, and repeat. Jupyter Notebooks let you do this without rerunning your entire codebase. They also let you document your thought process (critical for reproducibility and collaboration).
Practical Example: Jupyter Notebook Workflow#
Here’s a typical EDA workflow in Jupyter:
- Markdown Cell: Explain the goal (e.g., "Analyze 2024 sales data to identify trends").
- Code Cell: Load data with Pandas:
import pandas as pd df = pd.read_csv("sales_2024.csv") - Code Cell: Explore data:
df.head() df.describe() - Markdown Cell: Note observations (e.g., "Sales peak in Q3—likely holiday season").
- Code Cell: Visualize with Matplotlib:
import matplotlib.pyplot as plt plt.plot(df['month'], df['sales']) plt.xlabel('Month') plt.ylabel('Sales') plt.title('2024 Sales Trend') - Markdown Cell: Draw conclusions (e.g., "Increase marketing spend in Q2 to capitalize on Q3 growth").
Pro Tips#
- Use Jupyter Lab: It’s more modern and has better features (e.g., drag-and-drop cells).
- Install Extensions: Use
jupyter labextensions installfor extra functionality (e.g., table of contents, code formatting). - Save Regularly: Jupyter can crash—save often (or use auto-save).
Conclusion#
Python’s library ecosystem is what makes it the go-to language for data analysis. Each library we covered serves a unique purpose, but together they form a end-to-end workflow:
- Load/Clean Data: Pandas + SQLAlchemy
- Numerical Computations: NumPy
- Visualize Trends: Matplotlib + Seaborn + Plotly
- Statistical Modeling: StatsModels
- Machine Learning: Scikit-learn
- Big Data: PySpark
- Document/Share: Jupyter Notebooks
Learning Path Recommendation#
If you’re new to Python data analysis, start with Pandas and NumPy—mastering these two will make every other library easier. Next, learn Matplotlib and Seaborn for visualization. Then, dive into Scikit-learn for machine learning or PySpark if you work with big data. Finally, add StatsModels for inferential statistics and Plotly for interactive dashboards.
Remember: You don’t need to be an expert in every library—focus on the ones that align with your work. But even a basic understanding of all 10 will make you a more versatile, confident data analyst.
The best way to learn? Practice. Pick a dataset (Kaggle has thousands), and try to answer a question with these libraries. The more you use them, the more intuitive they’ll become.
References#
- Pandas: pandas.pydata.org
- NumPy: numpy.org/doc/
- Matplotlib: matplotlib.org/stable/index.html
- Seaborn: seaborn.pydata.org/
- Scikit-learn: scikit-learn.org/stable/
- Plotly: plotly.com/python/
- SQLAlchemy: sqlalchemy.org/
- PySpark: spark.apache.org/docs/latest/api/python/
- StatsModels: statsmodels.org/stable/index.html
- Jupyter: jupyter.org/documentation
- Additional Resources:
- DataCamp’s Python for Data Analysis Track: datacamp.com/tracks/python-for-data-analysis
- Real Python’s Pandas Tutorial: realpython.com/pandas-tutorial/
- Kaggle Datasets: kaggle.com/datasets (practice with real data!)
Let me know if you’d like to dive deeper into any of these libraries—I’d be happy to share more tips and examples!