Skip to content

2 numpy

This tutorial is designed to take you from a NumPy novice to someone who can confidently manipulate multi-dimensional data. NumPy (Numerical Python) is the foundation of data science in Python, providing the speed of C with the ease of Python syntax.


1. Introduction to the NumPy Array

The core of NumPy is the ndarray (n-dimensional array). Unlike Python lists, NumPy arrays must contain elements of the same type, making them significantly faster and more memory-efficient.

Import

import numpy as np

Creating Arrays

You can create arrays from lists or using built-in NumPy functions.

# From a list
arr = np.array([1, 2, 3, 4])

# Arrays of zeros, ones, or random numbers
zeros = np.zeros((2, 3))    # 2 rows, 3 columns of 0.0
ones = np.ones((3, 2))     # 3 rows, 2 columns of 1.0
rand = np.random.rand(2, 2) # Random values between 0 and 1

2. Shape and Reshaping

Understanding dimensions is crucial. An array can be 1D (vector), 2D (matrix), or 3D+ (tensor).

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr.shape)  # Output: (2, 3) - 2 rows, 3 columns
print(arr.ndim)   # Output: 2 - Two dimensions

# Reshaping: Changing the structure without changing the data
new_arr = arr.reshape(3, 2) # Changes 2x3 into 3x2

3. Indexing and Slicing

Slicing in NumPy is similar to Python lists but extends to multiple dimensions.

arr = np.array([[10, 20, 30], [40, 50, 60], [70, 80, 90]])

# Accessing a single element: arr[row, col]
print(arr[0, 1])  # Output: 20

# Slicing: arr[row_start:row_end, col_start:col_end]
print(arr[0:2, 1:3]) 
# Returns:
# [[20, 30],
#  [50, 60]]

4. Vectorized Operations

One of NumPy's best features is vectorization. You don't need for loops to perform math on every element; you just apply the math to the array itself.

a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Element-wise addition
print(a + b)  # [5, 7, 9]

# Universal Functions (ufuncs)
print(np.sqrt(a))  # Square root of each element
print(np.exp(a))   # Exponential

5. Broadcasting

Broadcasting allows NumPy to perform operations on arrays of different shapes, provided they meet certain compatibility rules.

arr = np.array([[1, 2, 3], [4, 5, 6]])
scalar = 10

# The scalar is "stretched" to match the shape of the array
print(arr + scalar)
# Output:
# [[11, 12, 13],
#  [14, 15, 16]]

6. Aggregations (Math & Stats)

NumPy provides fast ways to calculate statistics across the entire array or specific axes.

  • Axis 0: Vertical (down the columns)
  • Axis 1: Horizontal (across the rows)
data = np.array([[1, 2], [3, 4]])

print(np.sum(data))          # 10 (Total sum)
print(np.mean(data, axis=0)) # [2.0, 3.0] (Mean of columns)
print(np.max(data, axis=1))  # [2, 4] (Max of each row)

7. Boolean Indexing (Filtering)

You can filter data using logical conditions. This is the "secret sauce" for data cleaning.

arr = np.array([1, 5, 8, 10, 12])

# Create a mask
mask = arr > 7
print(mask) # [False, False, True, True, True]

# Apply the mask to get values
print(arr[mask]) # [8, 10, 12]

Summary Checklist

  • Use np.array() to start.
  • Check arr.shape often to avoid dimension errors.
  • Avoid for loops; use vectorized math instead.
  • Use axis=0 for columns and axis=1 for rows in stats.