Numpy Basics - Creating Arrays

The fundamental building block of scientific computing is the N-dimensional array.

Our focus is going to be on 1-dimensional and 2-dimensional arrays, i.e vectors and matrices, because these are the most familiar types of arrays to economists who are less-familiar with scientfic computing. Everything we discuss extends to N-dimensions when the user feels comfortable moving into ‘the next dimension.’

In this notebook we focus on creating arrays, either by manually inputting the data ourselves, or using some built-in functionality to create an array with certain structures. In a later notebook we discuss how to import an array from an existing file on our machine.

Fixed Type Arrays

Since Python 3.3, Python has a built in array module that allows us to create a dense array where all elements are of a uniform type:

import array

number_list = list(range(10))

my_array = array.array('i', number_list)
my_array
array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
another_array = array.array('f', number_list)
another_array
array('f', [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0])

Note that i indicates that the contents of my_array are integers, whilst f indicates the contents of another_array are floating point numbers.

Python’s array module offers an efficient way os storing arrays, but we will often want to perform operations on, or with, the data stored in an array. NumPy offers this functionality - so we now turn to creating NumPy arrays - and we will stay in the NumPy paradigm for the remainder of the module.

Numpy Arrays

import numpy as np

We can use the np.array method to create an array from a Python list:

array_numpy = np.array(number_list)
array_numpy
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

or more simply:

np.array([1,2,3,4])
array([1, 2, 3, 4])

Recall from last week, that Python allowed lists to have multiple types of data stored in them. NumPy will not allow this - and will constrain all the contents of an array to have the same type.

If we try and enter different data types into an array NumPy will re-cast the data to all be of one type through ‘up-casting.’ Let’s see that in action:

np.array([1, 5, 4, 9.2])
array([1. , 5. , 4. , 9.2])
np.array([9.2, 'hello'])
array(['9.2', 'hello'], dtype='<U32')

We now see this term dtype, which is the data type of the arrray we created. We can allow NumPy to choose the data type for us by letting it make its own decisions, or we can manually set the datatype ourselves:

Numpy Data Types

Here’s a few examples where we manually set the data type - using common types we may come across:

np.array([1.0, 2.0, 3.0], dtype='int')
array([1, 2, 3])
np.array([1.0, 2.0, 3.0], dtype='float')
array([1., 2., 3.])
np.array([1.0, 2.0, 3.0], dtype='complex')
array([1.+0.j, 2.+0.j, 3.+0.j])
np.array([1.0, 2.0, 3.0], dtype=np.float)
/tmp/ipykernel_2447/4220931706.py:1: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  np.array([1.0, 2.0, 3.0], dtype=np.float)
array([1., 2., 3.])

These are not the only data types NumPy can work with; and it’s important to know these types.

Here’s a list of most data types that you will encounter:

Data type

Description

bool_

Boolean (True or False) stored as a byte

int_

Default integer type (same as C long; normally either int64 or int32)

intc

Identical to C int (normally int32 or int64)

intp

Integer used for indexing (same as C ssize_t; normally either int32 or int64)

int8

Byte (-128 to 127)

int16

Integer (-32768 to 32767)

int32

Integer (-2147483648 to 2147483647)

int64

Integer (-9223372036854775808 to 9223372036854775807)

uint8

Unsigned integer (0 to 255)

uint16

Unsigned integer (0 to 65535)

uint32

Unsigned integer (0 to 4294967295)

uint64

Unsigned integer (0 to 18446744073709551615)

float_

Shorthand for float64.

float16

Half precision float: sign bit, 5 bits exponent, 10 bits mantissa

float32

Single precision float: sign bit, 8 bits exponent, 23 bits mantissa

float64

Double precision float: sign bit, 11 bits exponent, 52 bits mantissa

complex_

Shorthand for complex128.

complex64

Complex number, represented by two 32-bit floats

complex128

Complex number, represented by two 64-bit floats

Source: Jake VanderPlas (2016), Python Data Science Handbook Essential Tools for Working with Data, O’Reilly Media.

More on Creating Arrays

We often will not want to manually enter the contents of an array ourselves. It is more efficient to create arrays from scratch using routines built into NumPy.

Here are several examples, including extensions to creating 2-D arrays:

# Create a length-10 integer array filled with integer zeros
np.zeros(10, dtype=int)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
# Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)
array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])
# Create a 3x5 array filled with 4
np.full((3, 5), 4)
array([[4, 4, 4, 4, 4],
       [4, 4, 4, 4, 4],
       [4, 4, 4, 4, 4]])

Some of the more useful functionality for producing arrays is the ability to produce a seqeunce of numbers according to a pattern.

The arange method allows us to create a linear sequence with a given step size:

np.arange(0, 20, 2)
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
np.arange(0, 20, 0.2)
array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ,  1.2,  1.4,  1.6,  1.8,  2. ,
        2.2,  2.4,  2.6,  2.8,  3. ,  3.2,  3.4,  3.6,  3.8,  4. ,  4.2,
        4.4,  4.6,  4.8,  5. ,  5.2,  5.4,  5.6,  5.8,  6. ,  6.2,  6.4,
        6.6,  6.8,  7. ,  7.2,  7.4,  7.6,  7.8,  8. ,  8.2,  8.4,  8.6,
        8.8,  9. ,  9.2,  9.4,  9.6,  9.8, 10. , 10.2, 10.4, 10.6, 10.8,
       11. , 11.2, 11.4, 11.6, 11.8, 12. , 12.2, 12.4, 12.6, 12.8, 13. ,
       13.2, 13.4, 13.6, 13.8, 14. , 14.2, 14.4, 14.6, 14.8, 15. , 15.2,
       15.4, 15.6, 15.8, 16. , 16.2, 16.4, 16.6, 16.8, 17. , 17.2, 17.4,
       17.6, 17.8, 18. , 18.2, 18.4, 18.6, 18.8, 19. , 19.2, 19.4, 19.6,
       19.8])

It’s important to notice that the end point is not included in the array, and the arange method does not have an option to ‘force’ the end point in to the array.

Another way of creating a linear sequence is to use the linspace method. An advantage of this method is that it does allow is to choose whether we want the end point included or not:

# evenly spaced between 0 and 20
np.linspace(0, 20, 11)
array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14., 16., 18., 20.])
np.linspace(0, 20, 10, endpoint=False)
array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14., 16., 18.])

We do not need to restrict ourselves to sequences uniformally distributed in linear space. the logspace method allows to create arrays uniformally distributed in log-space:

np.logspace(0, 4.0, num=10)
array([1.00000000e+00, 2.78255940e+00, 7.74263683e+00, 2.15443469e+01,
       5.99484250e+01, 1.66810054e+02, 4.64158883e+02, 1.29154967e+03,
       3.59381366e+03, 1.00000000e+04])

Creating an array this way is equivalent to:

y = np.linspace(0, 4, num=10)
y
array([0.        , 0.44444444, 0.88888889, 1.33333333, 1.77777778,
       2.22222222, 2.66666667, 3.11111111, 3.55555556, 4.        ])
np.set_printoptions(precision=4)
np.power(10, y).astype(np.float)
/tmp/ipykernel_2447/3696506027.py:2: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  np.power(10, y).astype(np.float)
array([1.0000e+00, 2.7826e+00, 7.7426e+00, 2.1544e+01, 5.9948e+01,
       1.6681e+02, 4.6416e+02, 1.2915e+03, 3.5938e+03, 1.0000e+04])

More handy ways to create arrays

NumPy has a built-in function to create an identity matrix:

np.eye(3)
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

and can create empty matrices but you mightn’t get what’s expected, NumPy doesn’t set the array values to zero:

np.empty(5)
array([4.6444e-310, 0.0000e+000, 6.9326e-310, 2.4209e-322, 2.3715e-322])
np.empty(3)
array([1., 1., 1.])

The documentation argues that not setting values to zero is computationally faster, but this is user beware territory.

If you want a array of zeroes:

np.zeros(5)
array([0., 0., 0., 0., 0.])

Or a matrix:

np.zeros([5,5])
array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

Creating a Matrix of Random values:

We can also use NumPy to create an array or matrix of numbers that are randomly generated from some distribution, using np.random.

For example, an array of uniformally distributed random numbers:

np.random.random(5)
array([0.7782, 0.5411, 0.4425, 0.0814, 0.8379])

Or a 2-D array:

np.random.random((3, 3))
array([[0.3723, 0.8678, 0.8891],
       [0.4403, 0.3162, 0.5736],
       [0.9002, 0.9659, 0.9528]])

If we want to draw numbers from a std normal distribution:

np.random.normal(0, 1, (3, 3))
array([[-0.4057, -0.77  ,  1.9894],
       [-0.1713, -1.9382, -1.2112],
       [-0.5041, -0.5712,  1.0889]])

Or generate random integers:

# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))
array([[8, 1, 2],
       [0, 4, 0],
       [3, 7, 0]])

we can do all of that using NumPy too.

Challenges:

  1. Create a 10x10 numpy array of random numbers drawn from a normal distribution with mean 5 and std dev 3

  2. Convert the matrix you created in (1) to be of ‘integer type’

  3. Convert the matrix in (1) to be of type string