Skip to main content

2.2.3 File Operations and Serialization

File reading/writing and serialization flowchart

Where this section fits

This section shows how program data can be saved and loaded again later. File reading and writing, CSV, JSON, and serialization are the foundation for dataset processing, training logs, configuration files, and saving model results. They are also a key step from temporary code in memory to real projects.

Learning objectives

  • Master basic file reading and writing operations (open, read, write)
  • Understand the role and benefits of the with statement
  • Learn how to handle common data formats such as CSV and JSON
  • Understand the concepts of serialization and deserialization

Why do we need file operations?

So far, the data in your programs has lived in memory — once the program closes, the data is gone. But in real-world scenarios:

  • Trained AI models need to be saved to a file so they can be loaded later
  • Datasets are stored in CSV files and need to be read into the program
  • Training logs need to be written to files for later analysis
  • Configuration parameters are stored in JSON files and need to be loaded at startup

File operations let your program persist data.


File I/O basics

Open a file: open()

# Basic syntax
file = open("file_path", "mode", encoding="encoding")

Common modes:

ModeMeaningWhen the file does not exist
"r"Read (default)Error
"w"Write (overwrite)Create automatically
"a"Append (add to the end)Create automatically
"x"Create (error if file already exists)Create automatically
"rb"Read binary fileError
"wb"Write binary fileCreate automatically

Write to a file

# Method 1: Manually open and close (not recommended)
file = open("hello.txt", "w", encoding="utf-8")
file.write("Hello, world!\n")
file.write("I am learning Python file operations.\n")
file.close() # Don't forget to close the file!

# Method 2: Use the with statement (recommended!)
with open("hello.txt", "w", encoding="utf-8") as file:
file.write("Hello, world!\n")
file.write("I am learning Python file operations.\n")
# When you leave the with block, the file is closed automatically, so no manual close() is needed
Why is the with statement recommended?

The with statement has two benefits:

  1. Automatically closes the file — you do not need to worry about forgetting close()
  2. Exception-safe — even if an error occurs, the file will still be closed properly

From now on, when writing file operations, always use with.

Read a file

# Read the entire content
with open("hello.txt", "r", encoding="utf-8") as file:
content = file.read()
print(content)

# Read line by line
with open("hello.txt", "r", encoding="utf-8") as file:
for line in file:
print(line.strip()) # strip() removes the newline at the end of the line

# Read all lines into a list
with open("hello.txt", "r", encoding="utf-8") as file:
lines = file.readlines()
print(lines) # ['Hello, world!\n', 'I am learning Python file operations.\n']

Append content

# "a" mode: append to the end of the file without overwriting existing content
with open("log.txt", "a", encoding="utf-8") as file:
file.write("2026-02-09: Started learning\n")
file.write("2026-02-09: Finished Chapter 1\n")

Write multiple lines

lines = ["Line 1\n", "Line 2\n", "Line 3\n"]

with open("output.txt", "w", encoding="utf-8") as file:
file.writelines(lines) # Write a list of strings

# Or use print to write to a file
with open("output.txt", "w", encoding="utf-8") as file:
print("Line 1", file=file) # print can direct output to a file
print("Line 2", file=file)
print("Line 3", file=file)

Real-world examples: working with different file formats

CSV files

CSV (Comma-Separated Values) is one of the most common data file formats:

import csv

# Write CSV
students = [
["Name", "Age", "Score"],
["Zhang San", 20, 85],
["Li Si", 21, 92],
["Wang Wu", 19, 78],
]

with open("students.csv", "w", newline="", encoding="utf-8") as file:
writer = csv.writer(file)
writer.writerows(students)

# Read CSV
with open("students.csv", "r", encoding="utf-8") as file:
reader = csv.reader(file)
header = next(reader) # Read the header row
print(f"Column names: {header}")

for row in reader:
name, age, score = row
print(f"{name}, {age} years old, score: {score}")

# Read as dictionaries (more convenient)
with open("students.csv", "r", encoding="utf-8") as file:
reader = csv.DictReader(file)
for row in reader:
print(f"{row['Name']}'s score is {row['Score']}")

JSON files

JSON is the most common data format in web development and APIs:

import json

# Write JSON
config = {
"model": "ResNet-50",
"learning_rate": 0.001,
"epochs": 100,
"batch_size": 32,
"classes": ["cat", "dog", "bird"],
"use_gpu": True
}

with open("config.json", "w", encoding="utf-8") as file:
json.dump(config, file, ensure_ascii=False, indent=2)

# Read JSON
with open("config.json", "r", encoding="utf-8") as file:
loaded_config = json.load(file)

print(f"Model: {loaded_config['model']}")
print(f"Learning rate: {loaded_config['learning_rate']}")
print(f"Classes: {loaded_config['classes']}")

Generated config.json content:

{
"model": "ResNet-50",
"learning_rate": 0.001,
"epochs": 100,
"batch_size": 32,
"classes": ["cat", "dog", "bird"],
"use_gpu": true
}
ensure_ascii=False

By default, json.dump() converts Chinese characters into Unicode escapes (such as \u732b). Adding ensure_ascii=False keeps the Chinese characters as they are, making the file easier to read.

Text log files

from datetime import datetime

def log(message, filename="app.log"):
"""Write a log entry"""
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
with open(filename, "a", encoding="utf-8") as file:
file.write(f"[{timestamp}] {message}\n")

# Use it
log("Program started")
log("Loaded dataset: train.csv")
log("Started training model")
log("Training complete, accuracy: 92.5%")

Generated log file:

[2026-02-09 14:30:01] Program started
[2026-02-09 14:30:02] Loaded dataset: train.csv
[2026-02-09 14:30:03] Started training model
[2026-02-09 14:35:15] Training complete, accuracy: 92.5%

Path handling: pathlib

pathlib is the recommended way to handle paths in Python 3. It is more modern and easier to use than os.path:

from pathlib import Path

# Create Path objects
data_dir = Path("data")
train_file = data_dir / "train" / "data.csv" # Use / to join paths!
print(train_file) # data/train/data.csv

# Check paths
print(train_file.exists()) # Whether the file exists
print(train_file.is_file()) # Whether it is a file
print(data_dir.is_dir()) # Whether it is a directory

# Get file information
path = Path("model.pth")
print(path.name) # model.pth (file name)
print(path.stem) # model (without extension)
print(path.suffix) # .pth (extension)
print(path.parent) # . (parent directory)

# Create directories
Path("output/results").mkdir(parents=True, exist_ok=True)

# List files in a directory
for file in Path(".").glob("*.py"):
print(file)

# Recursively find all CSV files
for csv_file in Path("data").rglob("*.csv"):
print(csv_file)

# Convenient file read/write methods
Path("note.txt").write_text("Hello!", encoding="utf-8")
content = Path("note.txt").read_text(encoding="utf-8")
print(content) # Hello!

Serialization: saving Python objects

What is serialization?

Serialization means converting Python objects (lists, dictionaries, class instances, and so on) into a format that can be saved to a file. Deserialization means doing the reverse: restoring Python objects from a file.

FormatModuleReadabilitySpeedSafetyUse case
JSONjson✅ GoodMedium✅ SafeConfiguration files, API data
CSVcsv✅ GoodFast✅ SafeTabular data
picklepickle❌ BinaryFast❌ UnsafePython objects

pickle: save any Python object

import pickle

# Save Python object
data = {
"scores": [85, 92, 78, 95],
"names": ["Zhang San", "Li Si", "Wang Wu", "Zhao Liu"],
"metadata": {"class": "Class A", "year": 2026}
}

with open("data.pkl", "wb") as file: # Note: "wb" (binary write)
pickle.dump(data, file)

# Load Python object
with open("data.pkl", "rb") as file: # Note: "rb" (binary read)
loaded_data = pickle.load(file)

print(loaded_data["names"]) # ['Zhang San', 'Li Si', 'Wang Wu', 'Zhao Liu']
pickle safety warning

Never load a pickle file from an untrusted source! pickle can execute arbitrary code, and a maliciously crafted pickle file can run dangerous operations on your computer. Only load pickle files created by yourself or from trusted sources.


Comprehensive example: student grade management system

import json
from pathlib import Path
from datetime import datetime

class GradeBook:
"""Grade management system with file persistence"""

def __init__(self, filename="gradebook.json"):
self.filename = Path(filename)
self.students = {}
self.load() # Load data at startup

def load(self):
"""Load data from a file"""
if self.filename.exists():
with open(self.filename, "r", encoding="utf-8") as f:
self.students = json.load(f)
print(f"✅ Loaded data for {len(self.students)} students")
else:
print("📝 Creating a new gradebook")

def save(self):
"""Save data to a file"""
with open(self.filename, "w", encoding="utf-8") as f:
json.dump(self.students, f, ensure_ascii=False, indent=2)

def add_score(self, name, subject, score):
"""Add a score"""
if name not in self.students:
self.students[name] = {}
self.students[name][subject] = score
self.save()
print(f"✅ Saved {name}'s {subject} score ({score})")

def get_report(self, name):
"""Get a student report"""
if name not in self.students:
print(f"❌ Student not found: {name}")
return

scores = self.students[name]
print(f"\n{'='*30}")
print(f" {name}'s Grade Report")
print(f"{'='*30}")
for subject, score in scores.items():
print(f" {subject}: {score}")
avg = sum(scores.values()) / len(scores)
print(f"{'─'*30}")
print(f" Average score: {avg:.1f}")
print(f"{'='*30}")

def export_csv(self, filename="grades.csv"):
"""Export as CSV"""
import csv
subjects = set()
for scores in self.students.values():
subjects.update(scores.keys())
subjects = sorted(subjects)

with open(filename, "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerow(["Name"] + subjects)
for name, scores in self.students.items():
row = [name] + [scores.get(s, "") for s in subjects]
writer.writerow(row)
print(f"✅ Exported to {filename}")

# Use it
gb = GradeBook()
gb.add_score("Zhang San", "Math", 85)
gb.add_score("Zhang San", "English", 92)
gb.add_score("Zhang San", "Python", 95)
gb.add_score("Li Si", "Math", 78)
gb.add_score("Li Si", "English", 88)
gb.get_report("Zhang San")
gb.export_csv()

Hands-on exercises

Exercise 1: File statistics tool

from pathlib import Path

def file_stats(filename):
"""Return line, character, word, and longest-line statistics."""
path = Path(filename)
lines = path.read_text(encoding="utf-8").splitlines()
longest_index, longest_line = max(
enumerate(lines, start=1),
key=lambda item: len(item[1]),
default=(0, ""),
)
return {
"lines": len(lines),
"characters": sum(len(line) for line in lines),
"words": sum(len(line.split()) for line in lines),
"longest_line_number": longest_index,
"longest_line": longest_line,
}

Path("sample.txt").write_text("hello world\nthis is Python\n", encoding="utf-8")
print(file_stats("sample.txt"))

Exercise 2: Diary app

Write a simple diary app:

  • Support writing new diary entries (automatically add a timestamp)
  • Support viewing all diary entries
  • Store diary entries in a text file so data is not lost after the program closes

Exercise 3: Configuration file manager

import json
from pathlib import Path

DEFAULT_CONFIG = {"theme": "light", "language": "en", "page_size": 20}

def load_config(filename="config.json"):
"""Load a configuration file, or create a default config if it does not exist."""
path = Path(filename)
if not path.exists():
save_config(DEFAULT_CONFIG.copy(), filename)
return json.loads(path.read_text(encoding="utf-8"))

def save_config(config, filename="config.json"):
"""Save the config to a file."""
Path(filename).write_text(json.dumps(config, indent=2), encoding="utf-8")

def update_config(key, value, filename="config.json"):
"""Update a configuration item."""
config = load_config(filename)
config[key] = value
save_config(config, filename)
return config

print(update_config("theme", "dark"))

Summary

OperationCodeNotes
Write filewith open("f.txt", "w") as f:"w" overwrites, "a" appends
Read filewith open("f.txt", "r") as f:.read(), .readlines()
Write JSONjson.dump(data, file)Dictionary → JSON file
Read JSONjson.load(file)JSON file → Dictionary
Write CSVcsv.writer(file).writerow()List → CSV row
Read CSVcsv.reader(file)CSV row → List
Path handlingPath("data") / "file.txt"pathlib is recommended
Core idea

File operations give your program a "memory" — data can persist across program runs. In AI development, you will often read and write many kinds of files: datasets (CSV), configurations (JSON/YAML), model weights (.pth), and training logs (.log). Mastering file operations is a fundamental skill for becoming a developer.