Sorting Values

As a final step in our deep-dive into NumPy arrays we explore functionality to sort an array.

You probably already guessed that NumPy has built in sort functions that are fast - and you are correct. So let’s quickly look at them.

As always we start by loading NumPy:

import numpy as np
rand = np.random.RandomState(1234567890)

Sorting a 1-D array

We can sort by value, either creating a new array:

x = np.array([5, 1, 3, 2,4])
np.sort(x)
array([1, 2, 3, 4, 5])

or performing the operation in place:

x.sort()
print(x)
[1 2 3 4 5]

We can also return the indices of the sorted elememts:

x = np.array([5, 1, 3, 2,4])
i = np.argsort(x)
print(i)
[1 3 2 4 0]

which is can be useful for indexing:

x[i]
array([1, 2, 3, 4, 5])

Sorting Multiple Dimensions

Sorting extends to multidimensional arrays:

rand = np.random.RandomState(42)
X = rand.randint(0, 10, (5, 5))
print(X)
[[6 3 7 4 6]
 [9 2 6 7 4]
 [3 7 7 2 5]
 [4 1 7 5 1]
 [4 0 9 5 8]]

If we tried to sort the entire array:

np.sort(X)
array([[3, 4, 6, 6, 7],
       [2, 4, 6, 7, 9],
       [2, 3, 5, 7, 7],
       [1, 1, 4, 5, 7],
       [0, 4, 5, 8, 9]])

NumPy will actually sort along each row

# sort each row of X
np.sort(X, axis=1)
array([[3, 4, 6, 6, 7],
       [2, 4, 6, 7, 9],
       [2, 3, 5, 7, 7],
       [1, 1, 4, 5, 7],
       [0, 4, 5, 8, 9]])
we can sort along a column too
  File "/tmp/ipykernel_2740/1178170317.py", line 1
    we can sort along a column too
       ^
SyntaxError: invalid syntax
# sort each column of X
np.sort(X, axis=0)
array([[3, 0, 6, 2, 1],
       [4, 1, 7, 4, 4],
       [4, 2, 7, 5, 5],
       [6, 3, 7, 5, 6],
       [9, 7, 9, 7, 8]])

Partial Sorts and partioning

We don’t have to sort an entire array. Sometimes we want to partition values:

x= rand.randint(10, size=10)
x
array([0, 9, 6, 4, 0, 7, 3, 5, 1, 6])
np.partition(x, 3)
array([0, 0, 1, 3, 6, 4, 5, 6, 9, 7])

the first three values in the resulting array are the three smallest in the array, and the remaining array positions contain the remaining values. Within the two partitions, the elements have arbitrary order.

We can partition along an arbitrary axis of a multidimensional array:

X = rand.randint(10, size=25).reshape(5,5)
X
array([[8, 3, 5, 1, 7],
       [1, 8, 1, 1, 2],
       [5, 8, 7, 8, 9],
       [8, 2, 1, 1, 8],
       [4, 5, 7, 2, 5]])
# along columns
np.partition(X, 2, axis=0)
array([[1, 2, 1, 1, 2],
       [4, 3, 1, 1, 5],
       [5, 5, 5, 1, 7],
       [8, 8, 7, 8, 8],
       [8, 8, 7, 2, 9]])
# along rows
np.partition(X, 2, axis=1)
array([[1, 3, 5, 8, 7],
       [1, 1, 1, 8, 2],
       [5, 7, 8, 8, 9],
       [1, 1, 2, 8, 8],
       [2, 4, 5, 7, 5]])

argpartition gets us the indices along the axis we sorted on

np.argpartition(X, 2, axis=1)
array([[3, 1, 2, 0, 4],
       [0, 2, 3, 1, 4],
       [0, 2, 1, 3, 4],
       [2, 3, 1, 0, 4],
       [3, 0, 1, 2, 4]])