Sorting Values¶
As a final step in our deep-dive into NumPy arrays we explore functionality to sort an array.
You probably already guessed that NumPy has built in sort functions that are fast - and you are correct. So let’s quickly look at them.
As always we start by loading NumPy:
import numpy as np
rand = np.random.RandomState(1234567890)
Sorting a 1-D array¶
We can sort by value, either creating a new array:
x = np.array([5, 1, 3, 2,4])
np.sort(x)
array([1, 2, 3, 4, 5])
or performing the operation in place:
x.sort()
print(x)
[1 2 3 4 5]
We can also return the indices of the sorted elememts:
x = np.array([5, 1, 3, 2,4])
i = np.argsort(x)
print(i)
[1 3 2 4 0]
which is can be useful for indexing:
x[i]
array([1, 2, 3, 4, 5])
Sorting Multiple Dimensions¶
Sorting extends to multidimensional arrays:
rand = np.random.RandomState(42)
X = rand.randint(0, 10, (5, 5))
print(X)
[[6 3 7 4 6]
[9 2 6 7 4]
[3 7 7 2 5]
[4 1 7 5 1]
[4 0 9 5 8]]
If we tried to sort the entire array:
np.sort(X)
array([[3, 4, 6, 6, 7],
[2, 4, 6, 7, 9],
[2, 3, 5, 7, 7],
[1, 1, 4, 5, 7],
[0, 4, 5, 8, 9]])
NumPy will actually sort along each row
# sort each row of X
np.sort(X, axis=1)
array([[3, 4, 6, 6, 7],
[2, 4, 6, 7, 9],
[2, 3, 5, 7, 7],
[1, 1, 4, 5, 7],
[0, 4, 5, 8, 9]])
we can sort along a column too
File "/tmp/ipykernel_2740/1178170317.py", line 1
we can sort along a column too
^
SyntaxError: invalid syntax
# sort each column of X
np.sort(X, axis=0)
array([[3, 0, 6, 2, 1],
[4, 1, 7, 4, 4],
[4, 2, 7, 5, 5],
[6, 3, 7, 5, 6],
[9, 7, 9, 7, 8]])
Partial Sorts and partioning¶
We don’t have to sort an entire array. Sometimes we want to partition values:
x= rand.randint(10, size=10)
x
array([0, 9, 6, 4, 0, 7, 3, 5, 1, 6])
np.partition(x, 3)
array([0, 0, 1, 3, 6, 4, 5, 6, 9, 7])
the first three values in the resulting array are the three smallest in the array, and the remaining array positions contain the remaining values. Within the two partitions, the elements have arbitrary order.
We can partition along an arbitrary axis of a multidimensional array:
X = rand.randint(10, size=25).reshape(5,5)
X
array([[8, 3, 5, 1, 7],
[1, 8, 1, 1, 2],
[5, 8, 7, 8, 9],
[8, 2, 1, 1, 8],
[4, 5, 7, 2, 5]])
# along columns
np.partition(X, 2, axis=0)
array([[1, 2, 1, 1, 2],
[4, 3, 1, 1, 5],
[5, 5, 5, 1, 7],
[8, 8, 7, 8, 8],
[8, 8, 7, 2, 9]])
# along rows
np.partition(X, 2, axis=1)
array([[1, 3, 5, 8, 7],
[1, 1, 1, 8, 2],
[5, 7, 8, 8, 9],
[1, 1, 2, 8, 8],
[2, 4, 5, 7, 5]])
argpartition
gets us the indices along the axis we sorted on
np.argpartition(X, 2, axis=1)
array([[3, 1, 2, 0, 4],
[0, 2, 3, 1, 4],
[0, 2, 1, 3, 4],
[2, 3, 1, 0, 4],
[3, 0, 1, 2, 4]])