Working with files is for most users of Python an essential skill to possess. For this reason, we will dedicate this page to guiding you through the interaction with different file types, but also some of the different modules/packages you may encounter. As always, do also keep in mind that this is not a full tutorial for all possible scenarios, but rather a starting point for you to further build upon.

Page content

Introduction

Open and close files

To interact with any type of file, we first need to know a bit about some general functions. open(file, mode='r',...)is the first, and is used to open a file. Using the parameter mode, we can here choose how the file is to be opened, as well as what we are able to do with it. The following table is taken from docs.python.org.

CharacterMeaning
'r'open for reading (default)
'w'open for writing, truncating1 the file first
'x'open for exclusive creation, failing if the file already exists
'a'open for writing, appending to the end of the file if it exists
'b'binary mode
't'text mode (default)
'+'open for updating (reading and writing)

1Means removing the file contents without deleting the file.

It is also important to close the file after we are done with it, which we will show in the following examples. If you are unsure why it is important to close the file, please have a look at this post on Stack Overflow discussing the topic.

Closing file - Method 1
# This is the most intuitive method, as we are explicitly closing the file when we are finished.
# However, if an exception is triggered during the program, the .close-function will not be executed.


# Open the file "input" for reading
fid = open('input.txt', 'r')

# Read the file as wanted...

# Close the file using the .close()-method
fid.close()

# We can verify that the file is closed using the bool .closed
print(fid.closed)		# This will print "True"
Closing file - Method 2
# In this method we take into account that there might be triggered an exception during the program,
# and we make sure to close the file despite this


try:		# Try to open the file "input" for reading
	fid = open('input.txt', 'r')
	# Read the file as wanted...
except:		# If an exception is triggered during the try-block
	print('Something went wrong when reading to the file')
finally:	# Make sure the file is closed independent of an exception or not
	fid.close()

# We can verify that the file is closed using the bool .closed
print(fid.closed)		# This will print "True"
Closing file - Method 3
# The third, and by docs.python.org recommended method, is to use the 'with' keyword.
# This makes sure the file is properly closed after the suite finishes,
# but also uses less space than the try-finally blocks.


with open('input.txt', 'r') as fid:
	# Read the file as wanted...
	# Be aware of this indent!
# The file is automatically and properly closed after the suite finishies

# We can verify that the file is closed using the bool .closed
print(fid.closed)		# This will print "True"
Not closing file
# Open file for reading
fid = open('input.txt', 'r')

# Read the file as wanted...

# Forgetting to close file

# We can verify that the file is closed using the bool .closed
print(fid.closed)		# This will now print "False"

As a final note on how to open files, we will open a file not located in our current directory (this refers to the location of the executed python program/file). To do so, we need to use an absolute path, as shown in the example below.

# Open file in current directory
fid = open('input.txt', 'r')

# Open file in another directory
fid = open('C:/Users/input.txt', 'r')

Basic methods for reading and writing to files

This section will display some basic functions for reading and writing to files. To not spend to much space on simple functions, we recommend you to read the documentation by docs.python.org for more details.

input.txt
Line 1
Line 2
Last line
Example code
with open('input.txt','r') as fid:
    print('\nExample 1:')
    print(fid.read())       # Reads the whole file

with open('input.txt','r') as fid:
    print('\nExample 2:')
    print(fid.readline())   # Reads a single line

with open('input.txt','r') as fid:
    print('\nExample 3:')
    print(fid.readlines(3))   # Reads at most n letters. 
							  # However, does not reads more than one line, even if n exceeds the length of the line.

with open('input.txt','r') as fid:
    print('\nExample 4:')
    print(fid.readlines())   # Reads the whole file as a list

with open('input.txt','r') as fid:
    print('\nExample 5:')
    print(fid.read(3))      # Reads 3 letters

with open('input.txt','r') as fid:
    print('\nExample 6:')
    for line in fid:
        print(line, end='') # Reads lines using a loop.
                            # This is a memory efficient, fast and simple method.
Output of example code
Example 1:
Line 1
Line 2
Last line

Example 2:
Line 1


Example 3:
Lin

Example 4:
['Line 1\n', 'Line 2\n', 'Last line']

Example 5:
Lin

Example 6:
Line 1
Line 2
Last line
Example code
with open('output.txt','a') as fid:
    fid.write('\nExample 1\n')
    fid.write('Write simple text or numbers converted to strings\n')
    fid.write(str(5))
    fid.write('%d' % 4)

with open('output.txt','a') as fid:
    fid.write('\nExample 2\n')
    arr = ['Array element 1', 'Array element 2']
    fid.writelines(arr)
output.txt
Example 1
Write simple text or numbers converted to strings
54
Example 2
Array element 1Array element 2

Text file [.txt]

Text files are a very common file for storing information, mostly due to their simplicity. Although text files can be interpreted as a general level of description (compared to binary files), or include multiple suffixes for language-specific text files (e.g. .mat and .py), we will in this section only consider the suffix .txt. As seen in the introduction, several methods for working with text files have already been established. However, most of these are not considered effective, especially when introducing large quantitives of data through e.g. matrices. 

Using NumPy

To read all about input and output in NumPy, visit the NumPy documentation. As for our example information (introduced at the top of this page), we will first aim to write it to a file. For this, we will be using the function numpy.savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='n', header='', footer='', comments='# ', encoding=None). At first, this may seem like a lot of parameters for one simple function, but hopefully, you will soon see it as a pure benefit.

numpy.savetxt() - Example
import numpy as np

mat = np.array([[0, 0, 0],
                [10, 0, 0],
                [10, 10, 0],
                [0, 10, 0]])

with open('output.txt', 'w') as fid:
    # We will through all parameters here, but keep in mind that you don't have to include them if you're just keeping the default
    np.savetxt(fid,     # 'fname' refers to the id of our file; fid
               mat,     # 'X' is our matrix/array to be saved to the file
               '%d',    # 'fmt' refers to the format of our data, in this case, we want to store it as integers,'d'
               '\t',    # 'delimiter' is the string or character separating the columns of our matrix, e.g. '\t', meaning the tab delimiter
               '\n',    # 'newline' is the string or character spearating the lines. '\n' is the newline specifier, which is default
               'Coordinates\nx\ty\tz\t',    # In the 'header' we can add the information we want to save above our data
                                            # Note that we could also have done this using some lines of 'fid.write()'
                                            # , but for our short text, this is easier
               '',      # 'footer' is similar to the header of the file, but located below the data
               '',      # 'comments' is prepended to the header and footer using '#' as default. In our case, we want ''
               None     # Choose 'encoding' for output file. 'None' is default
               )

# Or similar:
with open('output2.txt', 'w') as fid:
    np.savetxt(fid, mat, fmt='%d', delimiter='\t', header='Coordinates\nx\ty\tz\t', comments='')

# ---
# output.txt will now look like this:
Coordinates
x	y	z	
0	0	0
10	0	0
10	10	0
0	10	0

As opposed to numpy.savetxt()numpy.loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, ...) is used to load data from a text file.

input.txt
Coordinates
x	y	z	
0	0	0
10	0	0
10	10	0
0	10	0
numpy.loadtxt() - Example
import numpy as np

with open('input.txt', 'r') as fid:
    mat = np.loadtxt(fid,   # 'fname' is the file id
                  int,      # 'dtype' is the data-type of the resulting array
                  '#',       # 'comments', characters or list of cha. used to indicate the start of a comment
                  None,      # 'delimiter', string used to separate values. In our case, None
                  None,     # 'converters', directory mapping column number to a function
                  2         # 'skiprows', since we have information on the two first rows, we will skip them to load the array
				  False		# 'unpack', if 'True', the arguments may be unpacked using 'x, y, z = loadtxt(..)'
                  # There are some more parameters that may be used in loadtxt, like 'usecols, but we will not cover them here
                  )
    print(mat)

# Or similar:
with open('input.txt', 'r') as fid:
    mat = np.loadtxt(fid, dtype = int, skiprows = 2)
    print(mat)
    
# ---
# Both methods printing the same;
[[ 0  0  0]
 [10  0  0]
 [10 10  0]
 [ 0 10  0]]

Comma separated value file [.csv]

A CSV-file is delimited text file that uses commas (but may also use e.g. semicolons, colons, tab etc;) to separate values. Because CSV-files are written in plain text, they are both easy to manually read, as well as import to a software such as e.g. Excel.

Using csv

csv is a Python module that implements classes to read and write tabular data in csv format.

input.csv
x,y,z
0,0,0
10,0,0
10,10,0
0,10,0

 First, let's import the data as a regular Python list.

Read as a list - Example
import csv

with open('input.csv', 'r') as fid:
    my_list = []
    reader = csv.reader(fid)
    for row in reader:
        my_list.append(row)
    print(my_list)

# Or, using a more compressed version
with open('input.csv', 'r') as fid:
    my_list = list(csv.reader(fid))
    print(my_list)

# ---
# The following will be printed for both cases:
[['x', 'y', 'z'], ['0', '0', '0'], ['10', '0', '0'], ['10', '10', '0'], ['0', '10', '0']]

The second example will display how you may use csv to import the data to a Python dictionary.

Read as a dictionary - Example
import csv

with open('input.csv', 'r') as fid:
    my_dict = {}        # Make empty dictionary
    reader = csv.DictReader(fid)
    for row in reader:  # 'row' will be read as a dictionary
        for column, value in row.items():       # Extract values from the 'row'-dictionary
            my_dict.setdefault(column, []).append(value)    # Append the values to our main dictionary
                    # Here, we need to use the function 'setdefault' to make sure the values of x are
                    # appended as a list, and not just continuously updated as new value.
    print('---')
    print(my_dict)
    print('---')
    print(my_dict['x'])
    print('---')
    print(my_dict['x'][1], my_dict['y'][1], my_dict['z'][1])

# ---
# The following will be printed:
---
{'x': ['0', '10', '10', '0'], 'y': ['0', '0', '10', '10'], 'z': ['0', '0', '0', '0']}
---
['0', '10', '10', '0']
---
10 0 0

The previous example using the Python dictionary container is usually not applied for the exact type of information we used ((x,y,z)-values). A more commonly used input-file for the case of a dictionary could be:

input_dictionary.csv
Name,Age,Hair_Color
Robb,17,black
Jon,17,black
Sansa,13,red

Using NumPy

Seeing as CSV-files are a specific type of text-files, the functions introduced for regular text-files (np.loadtxt() and np.savetxt()) are also applicable here. The difference between the two is located in the delimiter-parameter and the suffix of the outputted file.

numpy.savetxt() - Example
import numpy as np

mat = np.array([[0, 0, 0],
                [10, 0, 0],
                [10, 10, 0],
                [0, 10, 0]])

with open('output.csv', 'w') as fid:
    # We will through all parameters here, but keep in mind that you don't have to include them if you're just keeping the default
    np.savetxt(fid,     # 'fname' refers to the id of our file; fid
               mat,     # 'X' is our matrix/array to be saved to the file
               '%d',    # 'fmt' refers to the format of our data, in this case, we want to store it as integers,'d'
               ',',    # 'delimiter' is the string or character separating the columns of our matrix
               '\n',    # 'newline' is the string or character spearating the lines. '\n' is the newline specifier, which is default
               'Coordinates\nx,y,z',    # In the 'header' we can add the information we want to save above our data
                                            # Note that we could also have done this using some lines of 'fid.write()'
                                            # , but for our short text, this is easier
               '',      # 'footer' is similar to the header of the file, but located below the data
               '',      # 'comments' is prepended to the header and footer using '#' as default. In our case, we want ''
               None     # Choose 'encoding' for output file. 'None' is default
               )

# Or similar:
with open('output.csv', 'w') as fid:
    np.savetxt(fid, mat, fmt='%d', delimiter=',', header='Coordinates\nx,y,z', comments='')

# ---
# output.csv will now look like this:
Coordinates
x,y,z
0,0,0
10,0,0
10,10,0
0,10,0
input.csv
Coordinates
x,y,z
0,0,0
10,0,0
10,10,0
0,10,0
numpy.loadtxt() - Example
import numpy as np

with open('input.csv', 'r') as fid:
    mat = np.loadtxt(fid,   # 'fname' is the file id
                  int,      # 'dtype' is the data-type of the resulting array
                  '#',       # 'comments', characters or list of cha. used to indicate the start of a comment
                  ',',      # 'delimiter', string used to separate values. In our case, ','
                  None,     # 'converters', directory mapping column number to a function
                  2         # 'skiprows', since we have information on the two first rows, we will skip them to load the array
				  False		# 'unpack', if 'True', the arguments may be unpacked using 'x, y, z = loadtxt(..)'
                  # There are some more parameters that may be used in loadtxt, like 'usecols', but we will not cover them here
                  )
    print(mat)

# Or similar:
with open('input.csv', 'r') as fid:
    mat = np.loadtxt(fid, dtype = int, skiprows = 2, delimiter = ',')
    print(mat)
    
# ---
# Both methods printing the same;
[[ 0  0  0]
 [10  0  0]
 [10 10  0]
 [ 0 10  0]]

Another function that may be used is numpy.genfromtxt(). This function is equivalent to numpy.loadtxt() when no data is missing, but very useful if we should have missing data. A simple example displaying one of the unique parameters of np.genfromtxt() is shown below. For further information, please visit the function documentation.

input.csv - Missing data
Coordinates
x,y,z
,,
10,,
10,10,
,10,
numpy.genfromtxt() - Example
import numpy as np

with open('input.csv', 'r') as fid:
    mat = np.genfromtxt('input.csv', delimiter = ',', skip_header = 2, filling_values = 0)
    print(mat)


# ---
# This will print:
[[ 0.  0.  0.]
 [10.  0.  0.]
 [10. 10.  0.]
 [ 0. 10.  0.]]



  • No labels