Saving and Loading Data¶

Once we are constructing and computing on arrays, at some point we might want to save our results. We may also be interested in importing an existing array or data from our files to work on.

This notebook looks into methods to do this:

import numpy as np

Saving textfiles with savetxt¶

The simplest way to save an array is to write it out to a plain text file. NumPy’s savetext function allows us to do this easily:

x = np.array([[1, 2, 3], 
              [4, 5, 6],
              [7, 8, 9]], np.int32)
np.savetxt("test.txt", x)

We can verify that the array was saved by using a shell command within our python / jupyter session:

!ls *.txt

test.txt

The savetxt function gives us a lot of flexibility. For example we can choose how many significant digits we want to save, and how we want the text file representation to separate the individual elements:

np.savetxt("test2.txt", x, fmt="%2.3f", delimiter=",")
np.savetxt("test3.txt", x, fmt="%04d", delimiter=" :-) ")

!ls *.txt

test.txt  test2.txt  test3.txt

!head *.txt

==> test.txt <==
000000000000000000e+00 2.000000000000000000e+00 3.000000000000000000e+00
000000000000000000e+00 5.000000000000000000e+00 6.000000000000000000e+00
000000000000000000e+00 8.000000000000000000e+00 9.000000000000000000e+00

==> test2.txt <==
000,2.000,3.000
000,5.000,6.000
000,8.000,9.000

==> test3.txt <==
:-) 0002 :-) 0003
:-) 0005 :-) 0006
:-) 0008 :-) 0009

We can also tell NumPy how we want new lines to be stores, and can add comments at the beginning and end of the array that will not be read in back in when we load the data.

# or to go over the top
np.savetxt('test4.txt', x, fmt='%2.3f', delimiter=',', 
               newline='\n', header='this is a header', 
               footer='and a footer', comments='## ')

!head test4.txt

## this is a header
1.000,2.000,3.000
4.000,5.000,6.000
7.000,8.000,9.000
## and a footer

Loading Textfiles with loadtxt¶

Now we have seen how to write an array to a file, unsuprisingly there is a loadtxt file that allows us to read in an array from a plain text file too:

y = np.loadtxt("test.txt")
print(y)

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

It has similar functionality, where we can specify the characters used to delimit the individual elements:

y = np.loadtxt("test2.txt", delimiter=",")
print(y)

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

y = np.loadtxt("test3.txt", delimiter=" :-) ")
print(y)

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]

And we can also tell NumPy the data type that we want the individual elements to be once they are read in:

y = np.loadtxt("test4.txt", delimiter=",", dtype='complex')
y

array([[1.+0.j, 2.+0.j, 3.+0.j],
       [4.+0.j, 5.+0.j, 6.+0.j],
       [7.+0.j, 8.+0.j, 9.+0.j]])

We can also read in parts of an array by selecting the columns to read in, and whether we want the results to be read into one large array or unpacked into multiple:

y,z = np.loadtxt('test4.txt', delimiter=',', usecols=(0, 2), unpack=True)

print(y,z)

[1. 4. 7.] [3. 6. 9.]

with genfrmtxt¶

Since NumPy 0.12, the preferred way to read in an array from a file is with genfrmtxt rather than loadtxt. The functionality looks the same:

np.genfromtxt('test4.txt', delimiter=",")

array([[1., 2., 3.],
       [4., 5., 6.],
       [7., 8., 9.]])

np.genfromtxt('test4.txt', delimiter=',', 
                  skip_header=2, skip_footer=1, 
                  usecols=(0, -1))

array([4., 6.])

np.genfromtxt('test4.txt', delimiter=',', 
                  skip_header=2, skip_footer=1, 
                  usecols=(0, -1), 
                  names="A, C", dtype=['int', 'float'] )

array((-1, 6.), dtype=[('A', '<i8'), ('C', '<f8')])

Using NumPy’s native format¶

We have so far focussed on saving an array to a plain text file - and most of the time this is the recommended way to go. Sometimes however, we may want to save and load the output from NumPy’s own binary format .npy

The functions to do this are straight forwards:

np.save('x_mat.npy', x)

Note that in this case we cannot see into the array using standard tools because the array is written in NumPy’s binary format:

!head *.npy

�NUMPYv{'descr': '<i4', 'fortran_order': False, 'shape': (3, 3), }

If you get an array saved as .npy - we can readily load it back into our session with:

np.load('x_mat.npy')

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]], dtype=int32)

Challenge¶

Let’s use our combined tools of aggregate functions and loading and saving text data to analyse some weather data from ZRH airport. The data are contained in the file data/zrh_weather.txt in this repository.

load the data for the maximum, mean and minimum temperature into a numpy array called weather. Also load in the dates using the code excerpt at the bottom of these questions (adding the necessary information filled in with XX’s):
Find the hottest temperature at Zurich airport over the duration of the data
On what date did the hottest temperature occur? (Did it happen only once?)
On what date did the minimum temperature occur?
On what date was the largest difference between the maximum and the minimum temperature?
Save the maximum, mean and minimum temperature to the file weather_changes.txt for the week around the date of the largest temperature difference. In the file you save write a header that says “The seven days centered around the largest temperature change”, ensuring that line begins with a triple #.

from datetime import datetime

str2date = lambda x: datetime.strptime(x.decode("utf-8"), '%Y-%m-%d')


dates = np.genfromtxt('XXX', delimiter='XXX',
                      skip_header=XXX, usecols=(XXX),
                     dtype='object').astype(str)

!head -1 data/zrh_weather.txt

"Date"&"CEST"&"Max_TemperatureC"&"Mean_TemperatureC"&"Min_TemperatureC"&"Dew_PointC"&"MeanDew_PointC"&"Min_DewpointC"&"Max_Humidity"&"Mean_Humidity"&"Min_Humidity"&"Max_Sea_Level_PressurehPa"&"Mean_Sea_Level_PressurehPa"&"Min_Sea_Level_PressurehPa"&"Max_VisibilityKm"&"Mean_VisibilityKm"&"Min_VisibilitykM"&"Max_Wind_SpeedKm_h"&"Mean_Wind_SpeedKm_h"&"Max_Gust_SpeedKm_h"&"Precipitationmm"&"CloudCover"&"Events"&"WindDirDegrees"

weather = np.genfromtxt('data/zrh_weather.txt', delimiter='&', 
                        skip_header=1, usecols=(3,4,5)) 

from datetime import datetime

str2date = lambda x: datetime.strptime(x.decode("utf-8"), '%Y-%m-%d')


dates = np.genfromtxt('data/zrh_weather.txt', delimiter='&',
                      skip_header=1, usecols=(1),
                     dtype='object').astype(str)

# max temperature
np.max(weather[:,0], axis=0)

34.0

# (first) date of max temperature:
dates[np.argmax(weather[:,0], axis=0)]

'2017-07-06'

np.unique(weather[:,0], return_counts=True)

(array([13., 14., 18., 19., 20., 21., 22., 24., 25., 26., 27., 28., 29.,
        30., 31., 32., 34.]),
 array([1, 1, 2, 1, 1, 6, 5, 7, 1, 3, 5, 6, 3, 1, 3, 3, 2]))

# min temperature
dates[np.argmin(weather[:,0], axis=0)]

'2017-08-10'

# largest change
np.max(weather[:,0] - weather[:,2], axis=0)

20.0

dates[np.argmax(weather[:,0] - weather[:,2], axis=0)]

'2017-07-05'

# what was the max, min and mean temp on that day
weather[np.argmax(weather[:,0]- weather[:,2]),]

array([31., 21., 11.])

reference_index = np.argmax(weather[:,0]- weather[:,2])
reference_index

weather_changes = weather[reference_index-3:reference_index+4 , 0:2]

np.savetxt('weather.txt', x, fmt='%2.2f', delimiter=',', 
               newline='\n', 
               header='The seven days centered around the largest temperature change', 
               comments='## ')

Additional Reference:¶

http://www.python-course.eu/numpy_reading_writing.php https://docs.scipy.org/doc/numpy-1.12.0/user/basics.io.genfromtxt.html

Aggregate Functions

More Computation on Arrays: Broadcasting

Python for Economics and Business Research