Numpy notes

Numpy is one of the most important package for data scientist. Here are some notes from my daily work.

Import package

import numpy as np

Create array

  • create vector and matrix
vector = np.array([1, 2, 3, 4, 5, 6, 7, 8])
matrix = np.array([[1, 2, 3, 4],
                   [5, 6, 7, 8]])
  • zero and empty matrix
## input should be a tuple
m_zero = np.zeros((3, 3))
m_empty = np.empty((3, 3))
  • diagonal matrix
np.fill_diagonal(m_zero, 5);
print(m_zero)

[[ 5. 0. 0.][ 0. 5. 0.][ 0. 0. 5.]]

  • sequence
## from 0 to 2, step 0.2
np.arange(0, 2, 0.2)

array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8])

## from 0 to 2, 10 points
np.linspace(0, 2, 10)

array([ 0. , 0.22222222, 0.44444444, 0.66666667, 0.88888889, 1.11111111, 1.33333333, 1.55555556, 1.77777778, 2. ])

Basics

  • number of axix
matrix.ndim

2

  • shape
matrix.shape

(2, 4)

  • number of elements
matrix.size

8

  • data type
matrix.dtype

dtype(‘int64’)

Math operations

  • element-wise
a = np.array([[1,1], [0,1]])
b = np.array([[2,0], [3,4]])
a + b

array([[3, 1], [3, 5]])

a.add(b)

array([[3, 1], [3, 5]])

## multiple, two are same
a * b
np.multiply(a, b)

array([[2, 0], [0, 4]])

np.sqrt(b)

array([[ 1.41421356, 0. ], [ 1.73205081, 2. ]])

  • matrix operator
np.dot(a, b)

array([[5, 4], [3, 4]])

np.matmul(a, b)

array([[5, 4], [3, 4]])

  • sum, min, max
np.sum(a, axis=1)

array([2, 1])

np.min(a, axis=0)

array([0, 1])

Select

vector[2:5]

array([3, 4, 5])

matrix[1:3, 2]

array([7])

matrix[:, 1:3]

array([[2, 3], [6, 7]])

Transform

  • transpose
print(matrix.T)

[[1 5][2 6][3 7][4 8]]

  • reshape
print(vector.reshape(2,4))

[[1 2 3 4][5 6 7 8]]

## keep 2d, -1 automatically calculate dimension
matrix.reshape(-1, 8)

array([[1, 2, 3, 4, 5, 6, 7, 8]])

## covert to 1d
matrix.reshape(8)

array([1, 2, 3, 4, 5, 6, 7, 8])

## same as above
matrix.ravel()
matrix.flatten()

array([1, 2, 3, 4, 5, 6, 7, 8])

  • stack
## input should be a tuple
np.hstack((a, b))

array([[1, 1, 2, 0], [0, 1, 3, 4]])

np.vstack((a, b))

array([[1, 1],[0, 1],[2, 0],[3, 4]])