A file is a contiguous set of bytes used to store data. This data is organized in a specific format and can be anything as simple as a text file or as complicated as a program executable. These byte files are then translated into binary 1
and 0
for easier processing by the computer.
A file has 3 main parts:
What this data represents depends on the format specification used, which is typically represented by an extension. For example, a file that has an extension of .txt
most likely conforms to the text file specification. There are hundreds, if not thousands, of file extensions out there.
In Python, a file operation takes place in the following order:
We use the open function to read and write to files.
Syntax
variable = open(filename, mode) ## mode ## "r" - Read - Default value. Opens a file for reading, error if the file does not exist "a" - Append - Opens a file for appending, creates the file if it does not exist "w" - Write - Opens a file for writing, creates the file if it does not exist "x" - Create - Creates the specified file, returns an error if the file exist In addition you can specify if the file should be handled as binary or text mode "t" - Text - Default value. Text mode "b" - Binary - Binary mode (e.g. images)
Writing to file
testFile = open("test.txt", "w") # write strings to file about_me = str("I am a developer.\nI am nice.\nI am tall.") testFile.write(about_me) testFile.close()
Using with
The with
statement automatically takes care of closing the file once it leaves the with
block, even in cases of error. I highly recommend that you use the with
statement as much as possible, as it allows for cleaner code and makes handling any unexpected errors easier for you.
with open("test.txt", "a") as test2File: # write strings to file about_me = str("I am a developer2.\nI am nice2.\nI am tall2.") test2File.write(about_me)
Reading a file
print("reading test.txt file...") with open("test.txt", "r") as reader: # read all content #print(reader.read()) # read line by line print("read line by line") lines = reader.readlines() for line in lines: print(line)
Note that you should always close your files, in some cases, due to buffering, changes made to a file may not show until you close the file.
Character encoding
Another common problem that you may face is the encoding of the byte data. An encoding is a translation from byte data to human readable characters. This is typically done by assigning a numerical value to represent a character. The two most common encodings are the ASCII and UNICODE Formats. ASCII can only store 128 characters, while Unicode can contain up to 1,114,112 characters.
ASCII is actually a subset of Unicode (UTF-8), meaning that ASCII and Unicode share the same numerical to character values. It’s important to note that parsing a file with the incorrect character encoding can lead to failures or misrepresentation of the character. For example, if a file was created using the UTF-8 encoding, and you try to parse it using the ASCII encoding, if there is a character that is outside of those 128 values, then an error will be thrown.