Computation on Arrays - Universal Functions

We should now feel relatively comfortable creating and array and then indexing and slicing it as we need.

Next, we turn to doing computation on arrays. NumPy implements most of the standard computations we need to perform on arrays using universal functions, or ufuncs for short. NumPy’s ufuncs are optimized using vectorized operations so are in general extremely fast - and we should always try and use them rather than writing our own method.

Let’s get started exploring the world of ufuncs by loading NumPy into our notebook

import numpy as np
np.random.seed(1234567890)

Slowness of loops

Python’s default implementation of some operations is quite slow, mostly due to the fact that its a dynamic, interpreted language. This means that types are flexible so at each operation a series of type checks and other things need to be done, implying operations are generally not compiled to efficient machine code.

This manifests itself when many small operations have to be repeated: like in a loop that would do the same thing to each element of an array:

def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output
        
values = np.random.randint(1, 10, size=5)
compute_reciprocals(values)
array([0.33333333, 0.14285714, 0.25      , 0.5       , 0.2       ])
big_array = np.random.randint(1, 100, size=1000000)
%timeit compute_reciprocals(big_array)
2.12 s ± 36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The bottleneck here is the type checking and function dispatches that the Python engine is performing at each cycle of the loop: each time it examines the objects type, and looks for the correct function to use on that type.

Introducting UFuncs

NumPy provides a convenient interface into a statically typed, compiled routine. This can be accomplished by simply performing an operation on the array, which will then be applied to each element, and recall we know that each element of a numpy array must be of the same time; so we save time on all the type checking related slow downs.

Here’s an example using a NumPy ufunc to perform the same operation as above:

print(compute_reciprocals(values))
print(1.0 / values)
[0.33333333 0.14285714 0.25       0.5        0.2       ]
[0.33333333 0.14285714 0.25       0.5        0.2       ]
%timeit (1.0 / big_array)
2 ms ± 57.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Vectorized operations in NumPy are implemented via ufuncs, which offer the ability to quickly execute repeated operations on values in NumPy arrays.

Ufuncs are extremely flexible – we can also operate between two arrays:

np.arange(5) / np.arange(1, 6)
array([0.        , 0.5       , 0.66666667, 0.75      , 0.8       ])

And multidimensional arrays:

x = np.arange(9).reshape((3, 3))
2 ** x
array([[  1,   2,   4],
       [  8,  16,  32],
       [ 64, 128, 256]])

Numpy’s UFuncs

Numpy’s ufuncs make use of Python’s native arithmetic operators, so are easy to use:

x = np.arange(10)
print("x     =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 5)
print("x / 2 =", x / 5)
print("x // 2 =", x // 5)  # floor division
x     = [0 1 2 3 4 5 6 7 8 9]
x + 5 = [ 5  6  7  8  9 10 11 12 13 14]
x - 5 = [-5 -4 -3 -2 -1  0  1  2  3  4]
x * 2 = [ 0  5 10 15 20 25 30 35 40 45]
x / 2 = [0.  0.2 0.4 0.6 0.8 1.  1.2 1.4 1.6 1.8]
x // 2 = [0 0 0 0 0 1 1 1 1 1]
print("-x     = ", -x)
print("x ** 2 = ", x ** 5)
print("x % 2  = ", x % 5)
-x     =  [ 0 -1 -2 -3 -4 -5 -6 -7 -8 -9]
x ** 2 =  [    0     1    32   243  1024  3125  7776 16807 32768 59049]
x % 2  =  [0 1 2 3 4 0 1 2 3 4]

Ufuncs can be strung together as in any combination, and use the standard order of operations to execute:

-(0.5*x + 1) ** 2
array([ -1.  ,  -2.25,  -4.  ,  -6.25,  -9.  , -12.25, -16.  , -20.25,
       -25.  , -30.25])

Each of these arithmetic operators are wrappers around specific NumPy functions, for example:

all(np.add(x, 2) == x + 2)
True

This table relates the arithmetic operator to the NumPy function:

Operator

Equivalent ufunc

Description

+

np.add

Addition (e.g., 1 + 1 = 2)

-

np.subtract

Subtraction (e.g., 3 - 2 = 1)

-

np.negative

Unary negation (e.g., -2)

*

np.multiply

Multiplication (e.g., 2 * 3 = 6)

/

np.divide

Division (e.g., 3 / 2 = 1.5)

//

np.floor_divide

Floor division (e.g., 3 // 2 = 1)

**

np.power

Exponentiation (e.g., 2 ** 3 = 8)

%

np.mod

Modulus/remainder (e.g., 9 % 4 = 1)

Source: Jake VanderPlas (2016), Python Data Science Handbook Essential Tools for Working with Data, O’Reilly Media.

Absolute Value

NumPy also plays well with Python’s native absolute value function

x = np.array([-2, -1, 0, 1, 2])
abs(x)
array([2, 1, 0, 1, 2])
np.absolute(x)
array([2, 1, 0, 1, 2])
np.abs(x)
array([2, 1, 0, 1, 2])

Exponents and Logarithms

We often work with exponentials and logarithms:

x = [1, 2, 3]
print("x     =", x)
print("e^x   =", np.exp(x))
print("2^x   =", np.exp2(x))
print("3^x   =", np.power(3, x))
x     = [1, 2, 3]
e^x   = [ 2.71828183  7.3890561  20.08553692]
2^x   = [2. 4. 8.]
3^x   = [ 3  9 27]
x = [1, 2, 4, 10]
print("x        =", x)
print("ln(x)    =", np.log(x))
print("log2(x)  =", np.log2(x))
print("log10(x) =", np.log10(x))
x        = [1, 2, 4, 10]
ln(x)    = [0.         0.69314718 1.38629436 2.30258509]
log2(x)  = [0.         1.         2.         3.32192809]
log10(x) = [0.         0.30103    0.60205999 1.        ]

NumPy has specialized versions that are useful for maintaining precision with small valued inputs:

x = [0, 0.001, 0.01, 0.1]
print("exp(x) - 1 =", np.expm1(x))
print("log(1 + x) =", np.log1p(x))
exp(x) - 1 = [0.         0.0010005  0.01005017 0.10517092]
log(1 + x) = [0.         0.0009995  0.00995033 0.09531018]

There’s also a bunch of trigonometric functions, but these are less common in our work so we skip over them.

Some useful UFuncs are living in Scipy

The SciPy module (that we discuss further later) has some additional ufuncs that are more specialized in nature. (This may be suggested by how we import them!)

Here’ some useful ones for stats:

from scipy import special
x = [1, 5, 10]
# Error function (integral of Gaussian)
# its complement, and its inverse
x = np.array([0, 0.3, 0.7, 1.0])
print("erf(x)  =", special.erf(x))
print("erfc(x) =", special.erfc(x))
print("erfinv(x) =", special.erfinv(x))
erf(x)  = [0.         0.32862676 0.67780119 0.84270079]
erfc(x) = [1.         0.67137324 0.32219881 0.15729921]
erfinv(x) = [0.         0.27246271 0.73286908        inf]

‘Advanced’ UFunc Features

Specifying output

We are used to seeing

y = np.multiply(x, 10)

But we can achive the same result using the out argument, which is available in all ufuncs - provided we the location of the out argument exists beforehand.

x = np.arange(5)
y = np.empty(5)
np.multiply(x, 10, out=y)
print(y)
[ 0. 10. 20. 30. 40.]

So why might we be interested in this feature?

Notice that:

y = np.zeros(10)
np.power(2, x, out=y[::2])
print(y)
[ 1.  0.  2.  0.  4.  0.  8.  0. 16.  0.]

is equivalent to : y[::2] = 2 ** x.

If we perfom this operation on a large array:

big_array = np.random.randint(1, 100, size=10000000)
y = np.zeros(2 * big_array.size)
%timeit np.power(2, big_array, out=y[::2])
265 ms ± 9.01 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit y[::2] = 2 ** big_array
307 ms ± 8.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

There is a huge difference in the timing, a factor of more than 5.

What’s the difference? The latter executes 2 operations: First, it creates of a temporary array to hold the results of 2 ** x. Second it copies those values into the y array.

Aggregation

Some aggregates can be computed directly from an array when we want to reduce it using a particular operation:

x = np.arange(1, 6)
np.add.reduce(x)
15
np.multiply.reduce(x)
120
np.add.accumulate(x)
array([ 1,  3,  6, 10, 15])
np.multiply.accumulate(x)
array([  1,   2,   6,  24, 120])

Outer products

x = np.arange(1, 6)
np.multiply.outer(x, x)
array([[ 1,  2,  3,  4,  5],
       [ 2,  4,  6,  8, 10],
       [ 3,  6,  9, 12, 15],
       [ 4,  8, 12, 16, 20],
       [ 5, 10, 15, 20, 25]])

Challenge

Suppose we have two firms in market operating under Cournot compeition. Let \(q1\) and \(q2\) denote the quantities produced by each firm respectively, and \(Q=q1+q2\).

Market demand can be either linear:

\[ P(Q) = a \times Q + b \]

or be isoelastic:

\[ P(Q) = k Q^ {-\epsilon} \]

Assume that marginal cost of production is \(c\) for both firms, and they can only produce integer valued outputs in the range \(q_i \in [0,10]\).

  1. Write a function that returns the market price for the linear demand function for all possible market quantities. Return the price function as a matrix that can be indexed so that price[5,5] yields the market price when both firms produce 5 units of output

  2. Repeat 1. for the isoelastic case.

  3. Write a function that returns the profit for firm 1, for any combination of inputs for the either demand model.

    1. What is his profit when they both produce 5 units of output, when a = -3, b = 30 and c=1?

    2. What is his profit when they both produce 5 units of output, when k = -1, \(\epsilon\) = -3 and c=1?

Additional Info on ufuncs:

See https://docs.scipy.org/doc/numpy/reference/ufuncs.html for more ufuncs inside NumPy, and https://docs.scipy.org/doc/scipy/reference/special.html for ufuncs in SciPy’s special module.