Files and Directories

1 Writing files with write and print

So far, we have not talked about how to import and export data into Python. It will be the focus of this section. Before we start, we have to create some data that we can write to a file. So let’s write down a poem (all credits go to xkcd here).

poem = '''Never have I felt so close to another soul
          And yet so helplessly alone
          As when I google an error
          And there's one result
          A thread by someone with the same problem
          And no answer
          Last posted to in 2003
          Who were you, DenverCoder9?
          What did you see?!'''

Before we can write into a file, we have to open it and create a fileobject: fileobj = open(filename, mode). The only thing mysterious with this command might be the mode argument. You have to pass a two-character string here.

The first character denotes what ou want to do with the file and must be one of the following:

  • r: read the file,

  • w: write into the file, if it does not exist it is created,

  • x: write into the file, if it does not exist yet,

  • a: append the file.

The second character determines the file’s type:

  • t: text, or

  • b: binary.

1.1 Writing strings with write and print

In the following, I will concetrate on writing text since that will be the more common case for us lowly economists. So, let’s open up our text file!

my_file = open('xkcd_poem', 'wt')

The next step is to actually write something into the text file with the write method.

my_file.write(poem)
325

What kind of output is that? Is the length of our poem string.

len(poem)
325

Now that we have opened and written into this file, we finally have to close it.

my_file.close()

There should be now a new file in your working directory. Instead of using the write method you can also use the good old print function. The difference is that print adds a space after each argument and a new line in the end. To make the commands exactly do the same, you can change the default values of the sep and end arguments.

my_file = open('xkcd_poem2', 'wt')
print(poem, file = my_file, sep = '', end = '')
my_file.close()

1.2 Using with to close files automatically

It’s annoying that you have to close each file with an explicit statement. Why not close it automatically after having done whatwever it is you wanted to do with it. You can do this with the with context manager.

with open('xkcd_poem3', 'wt') as my_file:
    my_file.write(poem)

This codeblock does the same as the two above but it automatically closes my_file at the end of the tabulated code block, which only consists of the line my_file.write(poem) in our example.

1.3 Reading strings with read, readline and readlines

We not only want to save string but also read them back into Python again. The simplest way to do this is with the read method.

with open('xkcd_poem', 'rt') as my_file: 
    poem = my_file.read()
print(poem)
Never have I felt so close to another soul
          And yet so helplessly alone
          As when I google an error
          And there's one result
          A thread by someone with the same problem
          And no answer
          Last posted to in 2003
          Who were you, DenverCoder9?
          What did you see?!

Alternatively, we can also use the readline method, which reads one line at a time from our file.

with open('xkcd_poem', 'rt') as my_file: 
    print(my_file.readline())
    print(my_file.readline())
Never have I felt so close to another soul

          And yet so helplessly alone

How can we use this to build a loop which reads out the whole poem in the end?

poem = ''
with open('xkcd_poem', 'rt') as my_file: 
    while True:
        line = my_file.readline()
        if not line: 
            break
        poem += line
print(poem)
Never have I felt so close to another soul
          And yet so helplessly alone
          As when I google an error
          And there's one result
          A thread by someone with the same problem
          And no answer
          Last posted to in 2003
          Who were you, DenverCoder9?
          What did you see?!

So, why does this work? We created an infinite loop which only breaks if not line is evaluated to True. Otherwise, the string poem is appended by the line. If there is no more line in our textfile, readline returns '' which is evaluated to False which then leads to the break. It might look complicated at first but you will get the hang of it soon.

Finally, there is a third way of reading poems, using the readlines method. This method read the textfile line by line and stores each one as a list element.

with open('xkcd_poem', 'rt') as my_file: 
    lines = my_file.readlines()
lines
['Never have I felt so close to another soul\n',
 '          And yet so helplessly alone\n',
 '          As when I google an error\n',
 "          And there's one result\n",
 '          A thread by someone with the same problem\n',
 '          And no answer\n',
 '          Last posted to in 2003\n',
 '          Who were you, DenverCoder9?\n',
 '          What did you see?!']

We can of course also build up our poem again.

poem = ''
for line in lines:
    poem += line
print(poem)
Never have I felt so close to another soul
          And yet so helplessly alone
          As when I google an error
          And there's one result
          A thread by someone with the same problem
          And no answer
          Last posted to in 2003
          Who were you, DenverCoder9?
          What did you see?!

or

print(''.join(lines))
Never have I felt so close to another soul
          And yet so helplessly alone
          As when I google an error
          And there's one result
          A thread by someone with the same problem
          And no answer
          Last posted to in 2003
          Who were you, DenverCoder9?
          What did you see?!

We have learned quite a bit about textfiles now. Usually, we will want to store our data in csv files though as it allows us more structure. This will be the topic of the next two sections.

1.4 Reading and writing csv files for lists of lists

To read and write csv files, we will first need to import the necessary module.

import csv

Suppose we have a list of lists.

my_economists = [
    ['Alfred', 'Marshall'],
    ['John', 'Keynes'],
    ['Paul', 'Krugman']
]

To write this structure into a csv file, we will have to do three things:

  • open a file,

  • make clear that we want to use the csv writer,

  • use a function to actually write the data. Let’s do this.

with open('my_economists.csv', 'wt') as my_file: 
    csvout = csv.writer(my_file)
    csvout.writerows(my_economists)

To read it out again, we will have to use the reader function.

with open('my_economists.csv', 'rt') as my_file: 
    csvin = csv.reader(my_file)
csvin
<_csv.reader at 0x7fb460f1bf90>

As you can see, the function csvin returns an object which we have to unpack.

my_economists = []
with open('my_economists.csv', 'rt') as my_file: 
    csvin = csv.reader(my_file)
    for row in csvin:
        my_economists.append(row)
my_economists
[['Alfred', 'Marshall'], ['John', 'Keynes'], ['Paul', 'Krugman']]

or if we want to use a more elegant way, we can use a list comprehension.

my_economists = []
with open('my_economists.csv', 'rt') as my_file: 
    csvin = csv.reader(my_file)
    my_economists = [row for row in csvin]
my_economists
[['Alfred', 'Marshall'], ['John', 'Keynes'], ['Paul', 'Krugman']]

1.5 Reading and writing csv files for lists of dictionaries.

Now suppose we have a list of dictionaries.

my_economists = [
    {'first' : 'Alfred', 'last' : 'Marshall'},
    {'first' : 'John', 'last' : 'Keynes'},
    {'first' : 'Paul', 'last' : 'Krugman'},
]

Let’s first save this file. Here, we need the DictWriter function. We pass not only the file object my_file to this function but also a header line.

with open('my_economists.csv', 'wt') as my_file: 
    csvout = csv.DictWriter(my_file, ['first', 'last'])
    csvout.writeheader()
    csvout.writerows(my_economists)    

Reading them in again is very similar to lists of lists, but we need the DictReader function.

with open('my_economists.csv', 'rt') as my_file: 
    csvin = csv.DictReader(my_file)
    my_economists = [row for row in csvin]
my_economists
[{'first': 'Alfred', 'last': 'Marshall'},
 {'first': 'John', 'last': 'Keynes'},
 {'first': 'Paul', 'last': 'Krugman'}]

Note that if we had not written a header line, it would have been necessary to pass ['first', 'last'] as a second argument to the DictReader function.