Working with Numpy Arrays

Once we have created an array (or more than one), we will want to work with it in different ways. The most simple ways one can manipulate an array is to access data with in it, extract subsets of data, split it, reshape it or join it to another array.

Thus in this notebook we are going to address the following topics:

  • Attributes of arrays: size, shape, memory consumption

  • Indexing of arrays: Extracting and setting the value of individual array elements

  • Slicing of arrays: Extracting and setting smaller subarrays within a larger array

  • Reshaping of arrays: Changing the shape of a given array

  • Joining and splitting of arrays: Combining multiple arrays into one, and splitting one array into many

First let’s import the NumPy package:

import numpy as np
np.random.seed(1234567890)  # seed for reproducibility

Array Attributes

Before we start manipulating an array, it’s probably useful to understand the attributes of an array. In the previous notebook we already discussed the data type of an array, but there are other useful attributes we may be interested in.

Let’s start by creating an array:

x1 = np.random.randint(10, size=(3, 4))
x1
array([[2, 6, 3, 1],
       [9, 9, 4, 0],
       [9, 6, 4, 0]])

Every array has attributes that detail information about the dimensionality, shape and size:

print("x1 ndim: ", x1.ndim)
print("x1 shape:", x1.shape)
print("x1 size: ", x1.size)
x1 ndim:  2
x1 shape: (3, 4)
x1 size:  12

and we can extract the data type:

print("dtype:", x1.dtype)
dtype: int64
x1.astype(np.float).dtype
/tmp/ipykernel_2466/1970607590.py:1: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  x1.astype(np.float).dtype
dtype('float64')

We can also find out the size of individual elements of an array (in bytes), and the size of the entire array:

print("itemsize:", x1.itemsize, "bytes")
print("nbytes:", x1.nbytes, "bytes")
itemsize: 8 bytes
nbytes: 96 bytes

NumPy arrays of different data types are stored with different sizes on our computer:

print("itemsize as float:", x1.astype(np.float).itemsize, "bytes")
print("nbytes as float:", x1.astype(np.float).nbytes, "bytes")
itemsize as float: 8 bytes
nbytes as float: 96 bytes
/tmp/ipykernel_2466/863203131.py:1: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  print("itemsize as float:", x1.astype(np.float).itemsize, "bytes")
/tmp/ipykernel_2466/863203131.py:2: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  print("nbytes as float:", x1.astype(np.float).nbytes, "bytes")

Array Indexing

Similar to lists and other Python objects we discussed in the ‘Basics’ module, NumPy arrays are indexed, and we can access particular elements of an array using the square bracket notation:

x2 = np.random.randint(5, size = 10)
x2
array([3, 3, 1, 4, 0, 3, 1, 2, 3, 1])
x2[1]
3

Like all Python objects, it is important to remember that the index starts at zero, 0:

x2[0] # indexing starts at zero
3

And we can acccess elements by indexing from the back of an array using the -n notation:

x2[-1]
1
x2[-2]
3

Indexing Multidimensional Arrays

We can also index arrays in higher dimensions.

For two dimensional arrays, providing a single value inside the square bracket while extract the entire row:

x1[0]
array([2, 6, 3, 1])

To extract individual elements we need to provide the complete index, using two dimensions:

x1[0,0]
2

The indexing ‘from the back’ method also works on multidimensional arrays:

x1[-1,-1]
0
x1[-1,-2]
4

Using Indexing to Modify Values:

We can use the index notation to modify the values:

x1[0,0] = 5
x1
array([[5, 6, 3, 1],
       [9, 9, 4, 0],
       [9, 6, 4, 0]])

But we should keep in mind that unlike a Python list, Numpy requires elements in an array to have a fixed type:

x1[0,0] = 3.14
x1
array([[3, 6, 3, 1],
       [9, 9, 4, 0],
       [9, 6, 4, 0]])
try:
    x1[0,0] = 'hello'
except ValueError:
    print('Elements must have same type')
    
x1
Elements must have same type
array([[3, 6, 3, 1],
       [9, 9, 4, 0],
       [9, 6, 4, 0]])

Array Slicing

Instead of only extracting individual array elements, we can use indexing to access sub-arrays through slicing. The standard syntax is

x[start:stop:step]

and one or more of these can be left unspecified. By leaving a value unspecified you get the default values start=0, stop=size of dimension, step=1.

Let’s see this in action:

x  = np.arange(10)
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x[:5]
array([0, 1, 2, 3, 4])
x[5:]
array([5, 6, 7, 8, 9])
x[2:5]
array([2, 3, 4])
x[::2]
array([0, 2, 4, 6, 8])
x[2::2]
array([2, 4, 6, 8])
x[0:7:2]
array([0, 2, 4, 6])
x[1::2]
array([1, 3, 5, 7, 9])

Multi-dimensional arrays

One can access sub-arrays from multiple dimensions. We need to separate the slices we want to extract with a comma like we did when getting individual values:

x1
array([[3, 6, 3, 1],
       [9, 9, 4, 0],
       [9, 6, 4, 0]])
x1[1:]
array([[9, 9, 4, 0],
       [9, 6, 4, 0]])
x1[1:, :2]
array([[9, 9],
       [9, 6]])
x1[1::2, ::2] # odd rows, even columns
array([[9, 4]])

Accessing specific rows or columns

Often we want to extract an entire row or column a 2-D array:

x1[0, :]
array([3, 6, 3, 1])
# equiv to
x1[0]
array([3, 6, 3, 1])
x1[:,3]
array([1, 0, 0])

No Copy Views

An important to know fact about array slicing is that it returns a view of the array data, rather than a copy:

x3 = x1[:2, 1:3]
x3
array([[6, 3],
       [9, 4]])
x3[1,0] = 30
x3
array([[ 6,  3],
       [30,  4]])
x1 # yikes!
array([[ 3,  6,  3,  1],
       [ 9, 30,  4,  0],
       [ 9,  6,  4,  0]])

The decision to return a view when we slice is actually memory efficient. If we have a large array we are not storing multiple copies of the same information.

If we do want an explicit copy of the array - we can get it using the copy() method:

x4 = x1[:2, 1:3].copy()
x4
array([[ 6,  3],
       [30,  4]])

So that any modifications do not influence the original array:

x4[1,0] = 11
x4
array([[ 6,  3],
       [11,  4]])
x1 ## much more desirable
array([[ 3,  6,  3,  1],
       [ 9, 30,  4,  0],
       [ 9,  6,  4,  0]])

Array Reshaping

An array can be reshaped to take on a different set of dimensions using the reshape method:

grid = np.arange(1,10)
grid
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
matrix = grid.reshape(3,3)
matrix
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

But the initial size of the array must match the size of the new, reshaped array:

grid.reshape(3,4)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_2466/2695741815.py in <module>
----> 1 grid.reshape(3,4)

ValueError: cannot reshape array of size 9 into shape (3,4)

Arrays by themselves do not possess the shape of a matrix, i.e there is not row and column information:

x = np.arange(1,4)
x
array([1, 2, 3])
x.shape
(3,)

We can use the reshape method to cast an array into multidimensions:

x.reshape(1,3)
array([[1, 2, 3]])
x.reshape(1,3).shape
(1, 3)

Or, we can use the newaxis keyword inside a slice operation:

x[:, np.newaxis]
array([[1],
       [2],
       [3]])
x[:, np.newaxis].shape
(3, 1)
x[np.newaxis]
array([[1, 2, 3]])
x[np.newaxis].shape
(1, 3)

Array Concatenation

Multiple arrays can be combined into one using the functionality provided by np.concatenate, np.vstack and np.hstack

y1 = np.arange(1,4)
y1
array([1, 2, 3])
y2 = np.arange(4,7)
y2
array([4, 5, 6])
np.concatenate([y1,y2])
array([1, 2, 3, 4, 5, 6])
x1
array([[ 3,  6,  3,  1],
       [ 9, 30,  4,  0],
       [ 9,  6,  4,  0]])
np.concatenate([x1,x1])
array([[ 3,  6,  3,  1],
       [ 9, 30,  4,  0],
       [ 9,  6,  4,  0],
       [ 3,  6,  3,  1],
       [ 9, 30,  4,  0],
       [ 9,  6,  4,  0]])

By default, concatenate appends row-wise but if we want column-wise concatenation we can simply set axis=1:

np.concatenate([x1,x1], axis = 1)
array([[ 3,  6,  3,  1,  3,  6,  3,  1],
       [ 9, 30,  4,  0,  9, 30,  4,  0],
       [ 9,  6,  4,  0,  9,  6,  4,  0]])

We can only concatenate arrays of the same shape:

y3 = np.arange(4,10).reshape(2,3)
y3
array([[4, 5, 6],
       [7, 8, 9]])
y3.shape
(2, 3)
y1.shape
(3,)
np.concatenate([y1,y3])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-558db4f40558> in <module>()
----> 1 np.concatenate([y1,y3])

ValueError: all the input arrays must have same number of dimensions

We can instead use the vstack functionality to get the desired result:

np.vstack([y1,y3])
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

Arrays can be stacked along the column/horizontal axis too:

np.hstack([y3, [[99],[99]] ])
array([[ 4,  5,  6, 99],
       [ 7,  8,  9, 99]])

Array Splitting

We can also do the opposite of concatenation, which is known as splitting using np.split, np.hsplit and np.vsplit:

z = np.arange(15)
z
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])
np.split(z, 3) # split arrays must have equal dimension
[array([0, 1, 2, 3, 4]), array([5, 6, 7, 8, 9]), array([10, 11, 12, 13, 14])]
z2 = z.reshape(3,5)
z2
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

We can put the elements from the split into their own variables using tuples:

z21, z22, z23  = np.split(z2, 3)

z21
array([[0, 1, 2, 3, 4]])
np.split(z2, 5, axis = 1)
[array([[ 0],
        [ 5],
        [10]]), array([[ 1],
        [ 6],
        [11]]), array([[ 2],
        [ 7],
        [12]]), array([[ 3],
        [ 8],
        [13]]), array([[ 4],
        [ 9],
        [14]])]
np.hsplit(z2, [2])
[array([[ 0,  1],
        [ 5,  6],
        [10, 11]]), array([[ 2,  3,  4],
        [ 7,  8,  9],
        [12, 13, 14]])]
np.vsplit(z2, [2])
[array([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]]), array([[10, 11, 12, 13, 14]])]