Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Working with files is for most users of Python an essential skill to possess. For this reason, we will dedicate this page to guiding you through the interaction with different file types, but also some of the different modules/packages you may encounter. As always, do also keep in mind that this is not a full tutorial for all possible scenarios, but rather a starting point for you to further build upon.


Panel
borderColor#dfe1e5
bgColor#eff9ff
borderWidth2
titlePage content

Table of Contents


Introduction

Open and close files

To interact with any type of file, we first need to know a bit about some general functions. open(file, mode='r',...)is the first, and is used to open a file. Using the parameter mode, we can here choose how the file is to be opened, as well as what we are able to do with it. The following table is taken from docs.python.org.

CharacterMeaning
'r'open for reading (default)
'w'open for writing, truncating1 the file first
'x'open for exclusive creation, failing if the file already exists
'a'open for writing, appending to the end of the file if it exists
'b'binary mode
't'text mode (default)
'+'open for updating (reading and writing)

1Means removing the file contents without deleting the file.

It is also important to close the file after we are done with it, which we will show in the following examples. If you are unsure why it is important to close the file, please have a look at this post on Stack Overflow discussing the topic.

Code Block
languagepy
titleClose Closing file - Method 1
collapsetrue
# This is the most intuitive method, as we are explicitly closing the file when we are finished.
# However, if an exception is triggered during the program, the .close-function will not be executed.


# Open the file "input" for reading
fid = open('input.txt', 'r')

# Read the file as wanted...

# Close the file using the .close()-method
fid.close()

# We can verify that the file is closed using the bool .closed
print(fid.closed)		# This will print "True"


Code Block
languagepy
titleClose Closing file - Method 2
collapsetrue
# In this method we take into account that there might be triggered an exception during the program,
# and we make sure to close the file despite this


try:		# Try to open the file "input" for reading
	fid = open('input.txt', 'r')
	# Read the file as wanted...
except:		# If an exception is triggered during the try-block
	print('Something went wrong when reading to the file')
finally:	# Make sure the file is closed independent of an exception or not
	fid.close()

# We can verify that the file is closed using the bool .closed
print(fid.closed)		# This will print "True"


Code Block
languagepy
titleClosing file - Method 3
collapsetrue
# The third, and by docs.python.org recommended method, is to use the 'with' keyword.
# This makes sure the file is properly closed after the suite finishes,
# but also uses less space than the try-finally blocks.


with open('input.txt', 'r') as fid:
	# Read the file as wanted...
	# Be aware of this indent!
# The file is automatically and properly closed after the suite finishies

# We can verify that the file is closed using the bool .closed
print(fid.closed)		# This will print "True"


Code Block
languagepy
titleNot closing file
collapsetrue
# Open file for reading
fid = open('input.txt', 'r')

# Read the file as wanted...

# Forgetting to close file

# We can verify that the file is closed using the bool .closed
print(fid.closed)		# This will now print "

Textfile [.txt]

Textf

bibtex-display
False"

As a final note on how to open files, we will open a file not located in our current directory (this refers to the location of the executed python program/file). To do so, we need to use an absolute path, as shown in the example below.

Code Block
languagepy
# Open file in current directory
fid = open('input.txt', 'r')

# Open file in another directory
fid = open('C:/Users/input.txt', 'r')

Basic methods for reading and writing to files

This section will display some basic functions for reading and writing to files. To not spend to much space on simple functions, we recommend you to read the documentation by docs.python.org for more details.

Expand
titleRead from files


Code Block
languagetext
titleinput.txt
Line 1
Line 2
Last line


Code Block
languagepy
titleExample code
with open('input.txt','r') as fid:
    print('\nExample 1:')
    print(fid.read())       # Reads the whole file

with open('input.txt','r') as fid:
    print('\nExample 2:')
    print(fid.readline())   # Reads a single line

with open('input.txt','r') as fid:
    print('\nExample 3:')
    print(fid.readlines(3))   # Reads at most n letters. 
							  # However, does not reads more than one line, even if n exceeds the length of the line.

with open('input.txt','r') as fid:
    print('\nExample 4:')
    print(fid.readlines())   # Reads the whole file as a list

with open('input.txt','r') as fid:
    print('\nExample 5:')
    print(fid.read(3))      # Reads 3 letters

with open('input.txt','r') as fid:
    print('\nExample 6:')
    for line in fid:
        print(line, end='') # Reads lines using a loop.
                            # This is a memory efficient, fast and simple method.


Code Block
languagepy
titleOutput of example code
Example 1:
Line 1
Line 2
Last line

Example 2:
Line 1


Example 3:
Lin

Example 4:
['Line 1\n', 'Line 2\n', 'Last line']

Example 5:
Lin

Example 6:
Line 1
Line 2
Last line



Expand
titleWrite to files


Code Block
languagepy
titleExample code
with open('output.txt','a') as fid:
    fid.write('\nExample 1\n')
    fid.write('Write simple text or numbers converted to strings\n')
    fid.write(str(5))
    fid.write('%d' % 4)

with open('output.txt','a') as fid:
    fid.write('\nExample 2\n')
    arr = ['Array element 1', 'Array element 2']
    fid.writelines(arr)


Code Block
languagepy
titleoutput.txt
Example 1
Write simple text or numbers converted to strings
54
Example 2
Array element 1Array element 2



Text file [.txt]

Text files are a very common file for storing information, mostly due to their simplicity. Although text files can be interpreted as a general level of description (compared to binary files), or include multiple suffixes for language-specific text files (e.g. .mat and .py), we will in this section only consider the suffix .txt. As seen in the introduction, several methods for working with text files have already been established. However, most of these are not considered effective, especially when introducing large quantitives of data through e.g. matrices. 

Using NumPy

To read all about input and output in NumPy, visit the NumPy documentation. As for our example information (introduced at the top of this page), we will first aim to write it to a file. For this, we will be using the function numpy.savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='n', header='', footer='', comments='# ', encoding=None). At first, this may seem like a lot of parameters for one simple function, but hopefully, you will soon see it as a pure benefit.

Code Block
languagepy
titlenumpy.savetxt() - Example
collapsetrue
import numpy as np

mat = np.array([[0, 0, 0],
                [10, 0, 0],
                [10, 10, 0],
                [0, 10, 0]])

with open('output.txt', 'w') as fid:
    # We will through all parameters here, but keep in mind that you don't have to include them if you're just keeping the default
    np.savetxt(fid,     # 'fname' refers to the id of our file; fid
               mat,     # 'X' is our matrix/array to be saved to the file
               '%d',    # 'fmt' refers to the format of our data, in this case, we want to store it as integers,'d'
               '\t',    # 'delimiter' is the string or character separating the columns of our matrix, e.g. '\t', meaning the tab delimiter
               '\n',    # 'newline' is the string or character spearating the lines. '\n' is the newline specifier, which is default
               'Coordinates\nx\ty\tz\t',    # In the 'header' we can add the information we want to save above our data
                                            # Note that we could also have done this using some lines of 'fid.write()'
                                            # , but for our short text, this is easier
               '',      # 'footer' is similar to the header of the file, but located below the data
               '',      # 'comments' is prepended to the header and footer using '#' as default. In our case, we want ''
               None     # Choose 'encoding' for output file. 'None' is default
               )

# Or similar:
with open('output2.txt', 'w') as fid:
    np.savetxt(fid, mat, fmt='%d', delimiter='\t', header='Coordinates\nx\ty\tz\t', comments='')

# ---
# output.txt will now look like this:
Coordinates
x	y	z	
0	0	0
10	0	0
10	10	0
0	10	0

As opposed to numpy.savetxt()numpy.loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, ...) is used to load data from a text file.

Code Block
languagetext
titleinput.txt
collapsetrue
Coordinates
x	y	z	
0	0	0
10	0	0
10	10	0
0	10	0


Code Block
languagepy
titlenumpy.loadtxt() - Example
collapsetrue
import numpy as np

with open('input.txt', 'r') as fid:
    mat = np.loadtxt(fid,   # 'fname' is the file id
                  int,      # 'dtype' is the data-type of the resulting array
                  '#',       # 'comments', characters or list of cha. used to indicate the start of a comment
                  None,      # 'delimiter', string used to separate values. In our case, None
                  None,     # 'converters', directory mapping column number to a function
                  2         # 'skiprows', since we have information on the two first rows, we will skip them to load the array
				  False		# 'unpack', if 'True', the arguments may be unpacked using 'x, y, z = loadtxt(..)'
                  # There are some more parameters that may be used in loadtxt, like 'usecols, but we will not cover them here
                  )
    print(mat)

# Or similar:
with open('input.txt', 'r') as fid:
    mat = np.loadtxt(fid, dtype = int, skiprows = 2)
    print(mat)
    
# ---
# Both methods printing the same;
[[ 0  0  0]
 [10  0  0]
 [10 10  0]
 [ 0 10  0]]


Comma separated value file [.csv]

A CSV-file is delimited text file that uses commas (but may also use e.g. semicolons, colons, tab etc;) to separate values. Because CSV-files are written in plain text, they are both easy to manually read, as well as import to a software such as e.g. Excel.

Using csv

csv is a Python module that implements classes to read and write tabular data in csv format.

Code Block
languagetext
titleinput.csv
collapsetrue
x,y,z
0,0,0
10,0,0
10,10,0
0,10,0

 First, let's import the data as a regular Python list.

Code Block
languagepy
titleRead as a list - Example
collapsetrue
import csv

with open('input.csv', 'r') as fid:
    my_list = []
    reader = csv.reader(fid)
    for row in reader:
        my_list.append(row)
    print(my_list)

# Or, using a more compressed version
with open('input.csv', 'r') as fid:
    my_list = list(csv.reader(fid))
    print(my_list)

# ---
# The following will be printed for both cases:
[['x', 'y', 'z'], ['0', '0', '0'], ['10', '0', '0'], ['10', '10', '0'], ['0', '10', '0']]

The second example will display how you may use csv to import the data to a Python dictionary.

Code Block
languagepy
titleRead as a dictionary - Example
collapsetrue
import csv

with open('input.csv', 'r') as fid:
    my_dict = {}        # Make empty dictionary
    reader = csv.DictReader(fid)
    for row in reader:  # 'row' will be read as a dictionary
        for column, value in row.items():       # Extract values from the 'row'-dictionary
            my_dict.setdefault(column, []).append(value)    # Append the values to our main dictionary
                    # Here, we need to use the function 'setdefault' to make sure the values of x are
                    # appended as a list, and not just continuously updated as new value.
    print('---')
    print(my_dict)
    print('---')
    print(my_dict['x'])
    print('---')
    print(my_dict['x'][1], my_dict['y'][1], my_dict['z'][1])

# ---
# The following will be printed:
---
{'x': ['0', '10', '10', '0'], 'y': ['0', '0', '10', '10'], 'z': ['0', '0', '0', '0']}
---
['0', '10', '10', '0']
---
10 0 0

The previous example using the Python dictionary container is usually not applied for the exact type of information we used ((x,y,z)-values). A more commonly used input-file for the case of a dictionary could be:

Code Block
languagetext
titleinput_dictionary.csv
collapsetrue
Name,Age,Hair_Color
Robb,17,black
Jon,17,black
Sansa,13,red


Using NumPy

Seeing as CSV-files are a specific type of text-files, the functions introduced for regular text-files (np.loadtxt() and np.savetxt()) are also applicable here. The difference between the two is located in the delimiter-parameter and the suffix of the outputted file.

Code Block
languagepy
titlenumpy.savetxt() - Example
collapsetrue
import numpy as np

mat = np.array([[0, 0, 0],
                [10, 0, 0],
                [10, 10, 0],
                [0, 10, 0]])

with open('output.csv', 'w') as fid:
    # We will through all parameters here, but keep in mind that you don't have to include them if you're just keeping the default
    np.savetxt(fid,     # 'fname' refers to the id of our file; fid
               mat,     # 'X' is our matrix/array to be saved to the file
               '%d',    # 'fmt' refers to the format of our data, in this case, we want to store it as integers,'d'
               ',',    # 'delimiter' is the string or character separating the columns of our matrix
               '\n',    # 'newline' is the string or character spearating the lines. '\n' is the newline specifier, which is default
               'Coordinates\nx,y,z',    # In the 'header' we can add the information we want to save above our data
                                            # Note that we could also have done this using some lines of 'fid.write()'
                                            # , but for our short text, this is easier
               '',      # 'footer' is similar to the header of the file, but located below the data
               '',      # 'comments' is prepended to the header and footer using '#' as default. In our case, we want ''
               None     # Choose 'encoding' for output file. 'None' is default
               )

# Or similar:
with open('output.csv', 'w') as fid:
    np.savetxt(fid, mat, fmt='%d', delimiter=',', header='Coordinates\nx,y,z', comments='')

# ---
# output.csv will now look like this:
Coordinates
x,y,z
0,0,0
10,0,0
10,10,0
0,10,0


Code Block
languagetext
titleinput.csv
collapsetrue
Coordinates
x,y,z
0,0,0
10,0,0
10,10,0
0,10,0


Code Block
languagepy
titlenumpy.loadtxt() - Example
collapsetrue
import numpy as np

with open('input.csv', 'r') as fid:
    mat = np.loadtxt(fid,   # 'fname' is the file id
                  int,      # 'dtype' is the data-type of the resulting array
                  '#',       # 'comments', characters or list of cha. used to indicate the start of a comment
                  ',',      # 'delimiter', string used to separate values. In our case, ','
                  None,     # 'converters', directory mapping column number to a function
                  2         # 'skiprows', since we have information on the two first rows, we will skip them to load the array
				  False		# 'unpack', if 'True', the arguments may be unpacked using 'x, y, z = loadtxt(..)'
                  # There are some more parameters that may be used in loadtxt, like 'usecols', but we will not cover them here
                  )
    print(mat)

# Or similar:
with open('input.csv', 'r') as fid:
    mat = np.loadtxt(fid, dtype = int, skiprows = 2, delimiter = ',')
    print(mat)
    
# ---
# Both methods printing the same;
[[ 0  0  0]
 [10  0  0]
 [10 10  0]
 [ 0 10  0]]

Another function that may be used is numpy.genfromtxt(). This function is equivalent to numpy.loadtxt() when no data is missing, but very useful if we should have missing data. A simple example displaying one of the unique parameters of np.genfromtxt() is shown below. For further information, please visit the function documentation.

Code Block
languagepy
titleinput.csv - Missing data
collapsetrue
Coordinates
x,y,z
,,
10,,
10,10,
,10,


Code Block
languagepy
titlenumpy.genfromtxt() - Example
collapsetrue
import numpy as np

with open('input.csv', 'r') as fid:
    mat = np.genfromtxt('input.csv', delimiter = ',', skip_header = 2, filling_values = 0)
    print(mat)


# ---
# This will print:
[[ 0.  0.  0.]
 [10.  0.  0.]
 [10. 10.  0.]
 [ 0. 10.  0.]]




BibTeX Display Table

Code Block