2 numpy
This tutorial is designed to take you from a NumPy novice to someone who can confidently manipulate multi-dimensional data. NumPy (Numerical Python) is the foundation of data science in Python, providing the speed of C with the ease of Python syntax.
1. Introduction to the NumPy Array
The core of NumPy is the ndarray (n-dimensional array). Unlike Python lists, NumPy arrays must contain elements of the same type, making them significantly faster and more memory-efficient.
Import
Creating Arrays
You can create arrays from lists or using built-in NumPy functions.
# From a list
arr = np.array([1, 2, 3, 4])
# Arrays of zeros, ones, or random numbers
zeros = np.zeros((2, 3)) # 2 rows, 3 columns of 0.0
ones = np.ones((3, 2)) # 3 rows, 2 columns of 1.0
rand = np.random.rand(2, 2) # Random values between 0 and 1
2. Shape and Reshaping
Understanding dimensions is crucial. An array can be 1D (vector), 2D (matrix), or 3D+ (tensor).
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape) # Output: (2, 3) - 2 rows, 3 columns
print(arr.ndim) # Output: 2 - Two dimensions
# Reshaping: Changing the structure without changing the data
new_arr = arr.reshape(3, 2) # Changes 2x3 into 3x2
3. Indexing and Slicing
Slicing in NumPy is similar to Python lists but extends to multiple dimensions.
arr = np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]])
# Accessing a single element: arr[row, col]
print(arr[0, 1]) # Output: 20
# Slicing: arr[row_start:row_end, col_start:col_end]
print(arr[0:2, 1:3])
# Returns:
# [[20, 30],
# [50, 60]]
4. Vectorized Operations
One of NumPy's best features is vectorization. You don't need for loops to perform math on every element; you just apply the math to the array itself.
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Element-wise addition
print(a + b) # [5, 7, 9]
# Universal Functions (ufuncs)
print(np.sqrt(a)) # Square root of each element
print(np.exp(a)) # Exponential
5. Broadcasting
Broadcasting allows NumPy to perform operations on arrays of different shapes, provided they meet certain compatibility rules.
arr = np.array([[1, 2, 3], [4, 5, 6]])
scalar = 10
# The scalar is "stretched" to match the shape of the array
print(arr + scalar)
# Output:
# [[11, 12, 13],
# [14, 15, 16]]
6. Aggregations (Math & Stats)
NumPy provides fast ways to calculate statistics across the entire array or specific axes.
- Axis 0: Vertical (down the columns)
- Axis 1: Horizontal (across the rows)
data = np.array([[1, 2], [3, 4]])
print(np.sum(data)) # 10 (Total sum)
print(np.mean(data, axis=0)) # [2.0, 3.0] (Mean of columns)
print(np.max(data, axis=1)) # [2, 4] (Max of each row)
7. Boolean Indexing (Filtering)
You can filter data using logical conditions. This is the "secret sauce" for data cleaning.
arr = np.array([1, 5, 8, 10, 12])
# Create a mask
mask = arr > 7
print(mask) # [False, False, True, True, True]
# Apply the mask to get values
print(arr[mask]) # [8, 10, 12]
Summary Checklist
- Use
np.array()to start. - Check
arr.shapeoften to avoid dimension errors. - Avoid
forloops; use vectorized math instead. - Use
axis=0for columns andaxis=1for rows in stats.