To interact with any type of file, we first need to know a bit about some general functions. open(file, mode='r',...)
is the first, and is used to open a file. Using the parameter mode, we can here choose how the file is to be opened, as well as what we are able to do with it. The following table is taken from docs.python.org.
Character | Meaning |
---|---|
'r' | open for reading (default) |
'w' | open for writing, truncating1 the file first |
'x' | open for exclusive creation, failing if the file already exists |
'a' | open for writing, appending to the end of the file if it exists |
'b' | binary mode |
't' | text mode (default) |
'+' | open for updating (reading and writing) |
1Means removing the file contents without deleting the file.
It is also important to close the file after we are done with it, which we will show in the following examples. If you are unsure why it is important to close the file, please have a look at this post on Stack Overflow discussing the topic.
# This is the most intuitive method, as we are explicitly closing the file when we are finished. # However, if an exception is triggered during the program, the .close-function will not be executed. # Open the file "input" for reading fid = open('input.txt', 'r') # Read the file as wanted... # Close the file using the .close()-method fid.close() # We can verify that the file is closed using the bool .closed print(fid.closed) # This will print "True" |
# In this method we take into account that there might be triggered an exception during the program, # and we make sure to close the file despite this try: # Try to open the file "input" for reading fid = open('input.txt', 'r') # Read the file as wanted... except: # If an exception is triggered during the try-block print('Something went wrong when reading to the file') finally: # Make sure the file is closed independent of an exception or not fid.close() # We can verify that the file is closed using the bool .closed print(fid.closed) # This will print "True" |
# The third, and by docs.python.org recommended method, is to use the 'with' keyword. # This makes sure the file is properly closed after the suite finishes, # but also uses less space than the try-finally blocks. with open('input.txt', 'r') as fid: # Read the file as wanted... # Be aware of this indent! # The file is automatically and properly closed after the suite finishies # We can verify that the file is closed using the bool .closed print(fid.closed) # This will print "True" |
# Open file for reading fid = open('input.txt', 'r') # Read the file as wanted... # Forgetting to close file # We can verify that the file is closed using the bool .closed print(fid.closed) # This will now print "False" |
As a final note on how to open files, we will open a file not located in our current directory (this refers to the location of the executed python program/file). To do so, we need to use an absolute path, as shown in the example below.
# Open file in current directory fid = open('input.txt', 'r') # Open file in another directory fid = open('C:/Users/input.txt', 'r') |
This section will display some basic functions for reading and writing to files. To not spend to much space on simple functions, we recommend you to read the documentation by docs.python.org for more details.
|
|
Text files are a very common file for storing information, mostly due to their simplicity. Although text files can be interpreted as a general level of description (compared to binary files), or include multiple suffixes for language-specific text files (e.g. .mat and .py), we will in this section only consider the suffix .txt. As seen in the introduction, several methods for working with text files have already been established. However, most of these are not considered effective, especially when introducing large quantitives of data through e.g. matrices.
To read all about input and output in NumPy, visit the NumPy documentation. As for our example information (introduced at the top of this page), we will first aim to write it to a file. For this, we will be using the function numpy.savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='n', header='', footer='', comments='# ', encoding=None)
. At first, this may seem like a lot of parameters for one simple function, but hopefully, you will soon see it as a pure benefit.
import numpy as np mat = np.array([[0, 0, 0], [10, 0, 0], [10, 10, 0], [0, 10, 0]]) with open('output.txt', 'w') as fid: # We will through all parameters here, but keep in mind that you don't have to include them if you're just keeping the default np.savetxt(fid, # 'fname' refers to the id of our file; fid mat, # 'X' is our matrix/array to be saved to the file '%d', # 'fmt' refers to the format of our data, in this case, we want to store it as integers,'d' '\t', # 'delimiter' is the string or character separating the columns of our matrix, e.g. '\t', meaning the tab delimiter '\n', # 'newline' is the string or character spearating the lines. '\n' is the newline specifier, which is default 'Coordinates\nx\ty\tz\t', # In the 'header' we can add the information we want to save above our data # Note that we could also have done this using some lines of 'fid.write()' # , but for our short text, this is easier '', # 'footer' is similar to the header of the file, but located below the data '', # 'comments' is prepended to the header and footer using '#' as default. In our case, we want '' None # Choose 'encoding' for output file. 'None' is default ) # Or similar: with open('output2.txt', 'w') as fid: np.savetxt(fid, mat, fmt='%d', delimiter='\t', header='Coordinates\nx\ty\tz\t', comments='') # --- # output.txt will now look like this: Coordinates x y z 0 0 0 10 0 0 10 10 0 0 10 0 |
As opposed to numpy.savetxt()
, numpy.loadtxt(fname, dtype=<class 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, ...)
is used to load data from a text file.
Coordinates x y z 0 0 0 10 0 0 10 10 0 0 10 0 |
import numpy as np with open('input.txt', 'r') as fid: mat = np.loadtxt(fid, # 'fname' is the file id int, # 'dtype' is the data-type of the resulting array '#', # 'comments', characters or list of cha. used to indicate the start of a comment None, # 'delimiter', string used to separate values. In our case, None None, # 'converters', directory mapping column number to a function 2 # 'skiprows', since we have information on the two first rows, we will skip them to load the array False # 'unpack', if 'True', the arguments may be unpacked using 'x, y, z = loadtxt(..)' # There are some more parameters that may be used in loadtxt, like 'usecols, but we will not cover them here ) print(mat) # Or similar: with open('input.txt', 'r') as fid: mat = np.loadtxt(fid, dtype = int, skiprows = 2) print(mat) # --- # Both methods printing the same; [[ 0 0 0] [10 0 0] [10 10 0] [ 0 10 0]] |
A CSV-file is delimited text file that uses commas (but may also use e.g. semicolons, colons, tab etc;) to separate values. Because CSV-files are written in plain text, they are both easy to manually read, as well as import to a software such as e.g. Excel.
csv is a Python module that implements classes to read and write tabular data in csv format.
x,y,z 0,0,0 10,0,0 10,10,0 0,10,0 |
First, let's import the data as a regular Python list.
import csv with open('input.csv', 'r') as fid: my_list = [] reader = csv.reader(fid) for row in reader: my_list.append(row) print(my_list) # Or, using a more compressed version with open('input.csv', 'r') as fid: my_list = list(csv.reader(fid)) print(my_list) # --- # The following will be printed for both cases: [['x', 'y', 'z'], ['0', '0', '0'], ['10', '0', '0'], ['10', '10', '0'], ['0', '10', '0']] |
The second example will display how you may use csv to import the data to a Python dictionary.
import csv with open('input.csv', 'r') as fid: my_dict = {} # Make empty dictionary reader = csv.DictReader(fid) for row in reader: # 'row' will be read as a dictionary for column, value in row.items(): # Extract values from the 'row'-dictionary my_dict.setdefault(column, []).append(value) # Append the values to our main dictionary # Here, we need to use the function 'setdefault' to make sure the values of x are # appended as a list, and not just continuously updated as new value. print('---') print(my_dict) print('---') print(my_dict['x']) print('---') print(my_dict['x'][1], my_dict['y'][1], my_dict['z'][1]) # --- # The following will be printed: --- {'x': ['0', '10', '10', '0'], 'y': ['0', '0', '10', '10'], 'z': ['0', '0', '0', '0']} --- ['0', '10', '10', '0'] --- 10 0 0 |
The previous example using the Python dictionary container is usually not applied for the exact type of information we used ((x,y,z)-values). A more commonly used input-file for the case of a dictionary could be:
Name,Age,Hair_Color Robb,17,black Jon,17,black Sansa,13,red |
Seeing as CSV-files are a specific type of text-files, the functions introduced for regular text-files (np.loadtxt() and np.savetxt()) are also applicable here. The difference between the two is located in the delimiter-parameter and the suffix of the outputted file.
import numpy as np mat = np.array([[0, 0, 0], [10, 0, 0], [10, 10, 0], [0, 10, 0]]) with open('output.csv', 'w') as fid: # We will through all parameters here, but keep in mind that you don't have to include them if you're just keeping the default np.savetxt(fid, # 'fname' refers to the id of our file; fid mat, # 'X' is our matrix/array to be saved to the file '%d', # 'fmt' refers to the format of our data, in this case, we want to store it as integers,'d' ',', # 'delimiter' is the string or character separating the columns of our matrix '\n', # 'newline' is the string or character spearating the lines. '\n' is the newline specifier, which is default 'Coordinates\nx,y,z', # In the 'header' we can add the information we want to save above our data # Note that we could also have done this using some lines of 'fid.write()' # , but for our short text, this is easier '', # 'footer' is similar to the header of the file, but located below the data '', # 'comments' is prepended to the header and footer using '#' as default. In our case, we want '' None # Choose 'encoding' for output file. 'None' is default ) # Or similar: with open('output.csv', 'w') as fid: np.savetxt(fid, mat, fmt='%d', delimiter=',', header='Coordinates\nx,y,z', comments='') # --- # output.csv will now look like this: Coordinates x,y,z 0,0,0 10,0,0 10,10,0 0,10,0 |
Coordinates x,y,z 0,0,0 10,0,0 10,10,0 0,10,0 |
import numpy as np with open('input.csv', 'r') as fid: mat = np.loadtxt(fid, # 'fname' is the file id int, # 'dtype' is the data-type of the resulting array '#', # 'comments', characters or list of cha. used to indicate the start of a comment ',', # 'delimiter', string used to separate values. In our case, ',' None, # 'converters', directory mapping column number to a function 2 # 'skiprows', since we have information on the two first rows, we will skip them to load the array False # 'unpack', if 'True', the arguments may be unpacked using 'x, y, z = loadtxt(..)' # There are some more parameters that may be used in loadtxt, like 'usecols', but we will not cover them here ) print(mat) # Or similar: with open('input.csv', 'r') as fid: mat = np.loadtxt(fid, dtype = int, skiprows = 2, delimiter = ',') print(mat) # --- # Both methods printing the same; [[ 0 0 0] [10 0 0] [10 10 0] [ 0 10 0]] |
Another function that may be used is numpy.genfromtxt()
. This function is equivalent to numpy.loadtxt() when no data is missing, but very useful if we should have missing data. A simple example displaying one of the unique parameters of np.genfromtxt() is shown below. For further information, please visit the function documentation.
Coordinates x,y,z ,, 10,, 10,10, ,10, |
import numpy as np with open('input.csv', 'r') as fid: mat = np.genfromtxt('input.csv', delimiter = ',', skip_header = 2, filling_values = 0) print(mat) # --- # This will print: [[ 0. 0. 0.] [10. 0. 0.] [10. 10. 0.] [ 0. 10. 0.]] |