Numpy Basics - Creating Arrays¶
The fundamental building block of scientific computing is the N-dimensional array.
Our focus is going to be on 1-dimensional and 2-dimensional arrays, i.e vectors and matrices, because these are the most familiar types of arrays to economists who are less-familiar with scientfic computing. Everything we discuss extends to N-dimensions when the user feels comfortable moving into ‘the next dimension.’
In this notebook we focus on creating arrays, either by manually inputting the data ourselves, or using some built-in functionality to create an array with certain structures. In a later notebook we discuss how to import an array from an existing file on our machine.
Fixed Type Arrays¶
Since Python 3.3, Python has a built in array
module that allows us to create a dense array where all elements are of a uniform type:
import array
number_list = list(range(10))
my_array = array.array('i', number_list)
my_array
array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
another_array = array.array('f', number_list)
another_array
array('f', [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0])
Note that i
indicates that the contents of my_array
are integers, whilst f
indicates the contents of another_array
are floating point numbers.
Python’s array
module offers an efficient way os storing arrays, but we will often want to perform operations on, or with, the data stored in an array. NumPy
offers this functionality - so we now turn to creating NumPy arrays - and we will stay in the NumPy paradigm for the remainder of the module.
Numpy Arrays¶
import numpy as np
We can use the np.array
method to create an array from a Python list:
array_numpy = np.array(number_list)
array_numpy
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
or more simply:
np.array([1,2,3,4])
array([1, 2, 3, 4])
Recall from last week, that Python allowed lists to have multiple types of data stored in them. NumPy will not allow this - and will constrain all the contents of an array to have the same type.
If we try and enter different data types into an array NumPy will re-cast the data to all be of one type through ‘up-casting.’ Let’s see that in action:
np.array([1, 5, 4, 9.2])
array([1. , 5. , 4. , 9.2])
np.array([9.2, 'hello'])
array(['9.2', 'hello'], dtype='<U32')
We now see this term dtype
, which is the data type of the arrray we created. We can allow NumPy to choose the data type for us by letting it make its own decisions, or we can manually set the datatype ourselves:
Numpy Data Types¶
Here’s a few examples where we manually set the data type - using common types we may come across:
np.array([1.0, 2.0, 3.0], dtype='int')
array([1, 2, 3])
np.array([1.0, 2.0, 3.0], dtype='float')
array([1., 2., 3.])
np.array([1.0, 2.0, 3.0], dtype='complex')
array([1.+0.j, 2.+0.j, 3.+0.j])
np.array([1.0, 2.0, 3.0], dtype=np.float)
/tmp/ipykernel_2447/4220931706.py:1: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
np.array([1.0, 2.0, 3.0], dtype=np.float)
array([1., 2., 3.])
These are not the only data types NumPy can work with; and it’s important to know these types.
Here’s a list of most data types that you will encounter:
Data type |
Description |
---|---|
|
Boolean (True or False) stored as a byte |
|
Default integer type (same as C |
|
Identical to C |
|
Integer used for indexing (same as C |
|
Byte (-128 to 127) |
|
Integer (-32768 to 32767) |
|
Integer (-2147483648 to 2147483647) |
|
Integer (-9223372036854775808 to 9223372036854775807) |
|
Unsigned integer (0 to 255) |
|
Unsigned integer (0 to 65535) |
|
Unsigned integer (0 to 4294967295) |
|
Unsigned integer (0 to 18446744073709551615) |
|
Shorthand for |
|
Half precision float: sign bit, 5 bits exponent, 10 bits mantissa |
|
Single precision float: sign bit, 8 bits exponent, 23 bits mantissa |
|
Double precision float: sign bit, 11 bits exponent, 52 bits mantissa |
|
Shorthand for |
|
Complex number, represented by two 32-bit floats |
|
Complex number, represented by two 64-bit floats |
Source: Jake VanderPlas (2016), Python Data Science Handbook Essential Tools for Working with Data, O’Reilly Media.
More on Creating Arrays¶
We often will not want to manually enter the contents of an array ourselves. It is more efficient to create arrays from scratch using routines built into NumPy.
Here are several examples, including extensions to creating 2-D arrays:
# Create a length-10 integer array filled with integer zeros
np.zeros(10, dtype=int)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
# Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)
array([[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]])
# Create a 3x5 array filled with 4
np.full((3, 5), 4)
array([[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4],
[4, 4, 4, 4, 4]])
Some of the more useful functionality for producing arrays is the ability to produce a seqeunce of numbers according to a pattern.
The arange
method allows us to create a linear sequence with a given step size:
np.arange(0, 20, 2)
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
np.arange(0, 20, 0.2)
array([ 0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. ,
2.2, 2.4, 2.6, 2.8, 3. , 3.2, 3.4, 3.6, 3.8, 4. , 4.2,
4.4, 4.6, 4.8, 5. , 5.2, 5.4, 5.6, 5.8, 6. , 6.2, 6.4,
6.6, 6.8, 7. , 7.2, 7.4, 7.6, 7.8, 8. , 8.2, 8.4, 8.6,
8.8, 9. , 9.2, 9.4, 9.6, 9.8, 10. , 10.2, 10.4, 10.6, 10.8,
11. , 11.2, 11.4, 11.6, 11.8, 12. , 12.2, 12.4, 12.6, 12.8, 13. ,
13.2, 13.4, 13.6, 13.8, 14. , 14.2, 14.4, 14.6, 14.8, 15. , 15.2,
15.4, 15.6, 15.8, 16. , 16.2, 16.4, 16.6, 16.8, 17. , 17.2, 17.4,
17.6, 17.8, 18. , 18.2, 18.4, 18.6, 18.8, 19. , 19.2, 19.4, 19.6,
19.8])
It’s important to notice that the end point is not included in the array, and the arange
method does not have an option to ‘force’ the end point in to the array.
Another way of creating a linear sequence is to use the linspace
method. An advantage of this method is that it does allow is to choose whether we want the end point included or not:
# evenly spaced between 0 and 20
np.linspace(0, 20, 11)
array([ 0., 2., 4., 6., 8., 10., 12., 14., 16., 18., 20.])
np.linspace(0, 20, 10, endpoint=False)
array([ 0., 2., 4., 6., 8., 10., 12., 14., 16., 18.])
We do not need to restrict ourselves to sequences uniformally distributed in linear space. the logspace
method allows to create arrays uniformally distributed in log-space:
np.logspace(0, 4.0, num=10)
array([1.00000000e+00, 2.78255940e+00, 7.74263683e+00, 2.15443469e+01,
5.99484250e+01, 1.66810054e+02, 4.64158883e+02, 1.29154967e+03,
3.59381366e+03, 1.00000000e+04])
Creating an array this way is equivalent to:
y = np.linspace(0, 4, num=10)
y
array([0. , 0.44444444, 0.88888889, 1.33333333, 1.77777778,
2.22222222, 2.66666667, 3.11111111, 3.55555556, 4. ])
np.set_printoptions(precision=4)
np.power(10, y).astype(np.float)
/tmp/ipykernel_2447/3696506027.py:2: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
np.power(10, y).astype(np.float)
array([1.0000e+00, 2.7826e+00, 7.7426e+00, 2.1544e+01, 5.9948e+01,
1.6681e+02, 4.6416e+02, 1.2915e+03, 3.5938e+03, 1.0000e+04])
More handy ways to create arrays¶
NumPy has a built-in function to create an identity matrix:
np.eye(3)
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
and can create empty
matrices but you mightn’t get what’s expected, NumPy doesn’t set the array values to zero:
np.empty(5)
array([4.6444e-310, 0.0000e+000, 6.9326e-310, 2.4209e-322, 2.3715e-322])
np.empty(3)
array([1., 1., 1.])
The documentation argues that not setting values to zero is computationally faster, but this is user beware territory.
If you want a array of zeroes:
np.zeros(5)
array([0., 0., 0., 0., 0.])
Or a matrix:
np.zeros([5,5])
array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])
Creating a Matrix of Random values:¶
We can also use NumPy to create an array or matrix of numbers that are randomly generated from some distribution, using np.random
.
For example, an array of uniformally distributed random numbers:
np.random.random(5)
array([0.7782, 0.5411, 0.4425, 0.0814, 0.8379])
Or a 2-D array:
np.random.random((3, 3))
array([[0.3723, 0.8678, 0.8891],
[0.4403, 0.3162, 0.5736],
[0.9002, 0.9659, 0.9528]])
If we want to draw numbers from a std normal distribution:
np.random.normal(0, 1, (3, 3))
array([[-0.4057, -0.77 , 1.9894],
[-0.1713, -1.9382, -1.2112],
[-0.5041, -0.5712, 1.0889]])
Or generate random integers:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))
array([[8, 1, 2],
[0, 4, 0],
[3, 7, 0]])
we can do all of that using NumPy too.
Challenges:¶
Create a 10x10 numpy array of random numbers drawn from a normal distribution with mean 5 and std dev 3
Convert the matrix you created in (1) to be of ‘integer type’
Convert the matrix in (1) to be of type string