Skip to main content

3.5.2 Basics of Relational Databases

Basics of Relational Databases

Elective Chapter

This chapter is optional. If you only want to do data analysis and modeling, you can skip it for now. But if you plan to build AI applications in the future (such as RAG systems or AI Agents), database knowledge is essential.

Learning Objectives

  • Understand what a database is and why we need it
  • Master the core concepts of relational databases
  • Understand the meaning of tables, rows, columns, primary keys, and foreign keys
  • Learn about common database management systems

Why do AI engineers need to learn databases?

You may be thinking: "I already know how to use Pandas to read CSV files, so why do I still need to learn databases?"

ScenarioCSV filesDatabase
Data sizeOkay for tens of thousands of rows, slows to a crawl at millionsEasily handles hundreds of millions of rows
CollaborationHard to know who changed what, easy to conflictSupports concurrent read/write with access control
Data securityIf the file is deleted, it is goneHas backups, transactions, and crash recovery
Query speedMust scan everything each timeHas indexes, millisecond-level queries
Data relationshipsManually merge multiple filesUse one SQL JOIN to get it done

Real-world examples:

  • Building a RAG system → user Q&A records need to be stored in a database
  • Building an AI Agent → the memory system needs persistent storage
  • Building a recommendation system → user behavior data lives in the database
  • Doing data analysis → 99% of enterprise data is stored in databases

What is a relational database?

A comparison with Excel

If you have used Excel before, you already understand 80% of the concepts of relational databases:

Excel conceptDatabase conceptExplanation
One Excel fileOne databaseA container that stores all data
One SheetOne tableStores one type of data
One rowOne record (Row / Record)One specific data entity
One columnOne field (Column / Field)One attribute of the data
Column headerColumn nameThe attribute name
Cell data typeData typeInteger, text, date, and so on

Understand it with an example

Suppose you run an online store and need to manage users and orders:

Users table (users):

idnameemailagecity
1Zhang San[email protected]28Beijing
2Li Si[email protected]35Shanghai
3Wang Wu[email protected]22Guangzhou

Orders table (orders):

order_iduser_idproductamountorder_date
1011iPhone 1679992024-11-01
1021AirPods9992024-11-05
1032MacBook149992024-11-10

These two tables are connected through user_id—this is where the name relational database comes from: tables have relationships with each other.


Core concepts

Primary Key

A primary key is the unique identifier for each record, like an ID card number: it cannot be repeated and cannot be empty.

In the users table: id is the primary key → each user has a unique id
In the orders table: order_id is the primary key → each order has a unique order_id
Why do we need a primary key?

Imagine there is no primary key: if two users are both named "Zhang San", how would you tell them apart? The primary key solves this problem—even if the names are the same, the ids must be different.

Foreign Key

A foreign key is a field that references the primary key of another table and is used to build relationships between tables.

The user_id in the orders table is the foreign key—it points to the id in the users table and indicates "which user this order belongs to."

Common data types

TypeExplanationExample
INTEGERInteger1, 42, -100
REAL / FLOATFloating-point number3.14, 99.9
TEXT / VARCHARText string"Zhang San", "hello"
DATEDate2024-11-01
DATETIMEDate and time2024-11-01 14:30:00
BOOLEANBoolean valueTRUE / FALSE
BLOBBinary dataImages, files (less commonly used)

Constraints

Constraints are rules for data that help ensure data quality:

ConstraintPurposeExample
PRIMARY KEYPrimary key, unique and not nullid
NOT NULLCannot be emptyname NOT NULL
UNIQUEValue cannot be duplicatedemail UNIQUE
DEFAULTDefault valuecity DEFAULT 'Unknown'
FOREIGN KEYForeign key, references another tableuser_id REFERENCES users(id)

Common database management systems

DatabaseFeaturesUse cases
SQLiteZero configuration, stored in a single fileLearning, small apps, mobile apps
MySQLThe most popular open-source databaseWeb applications, small to medium projects
PostgreSQLThe most powerful open-source databaseLarge projects, AI applications (supports vector search)
SQL ServerMade by MicrosoftEnterprise Windows environments
This chapter uses SQLite

SQLite does not require installing any server, and Python comes with the sqlite3 module, making it ideal for learning. All SQL syntax is also applicable in other databases.

The safest default order when learning databases for the first time

A more stable order is usually:

  1. First get familiar with “tables, primary keys, and foreign keys”
  2. Then learn SQL queries
  3. Then connect Python to the database
  4. Finally, learn database design

This is less confusing than trying to memorize a lot of SQL details right from the start.


Hands-on: Create your first database

import sqlite3

# 1. Connect to the database (it will be created automatically if it does not exist)
conn = sqlite3.connect("my_shop.db")
cursor = conn.cursor()

# 2. Create the users table
cursor.execute("""
CREATE TABLE IF NOT EXISTS users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
email TEXT UNIQUE,
age INTEGER,
city TEXT DEFAULT 'Unknown'
)
""")

# 3. Insert data
cursor.execute("INSERT INTO users (name, email, age, city) VALUES ('Zhang San', '[email protected]', 28, 'Beijing')")
cursor.execute("INSERT INTO users (name, email, age, city) VALUES ('Li Si', '[email protected]', 35, 'Shanghai')")
cursor.execute("INSERT INTO users (name, email, age, city) VALUES ('Wang Wu', '[email protected]', 22, 'Guangzhou')")

# 4. Commit changes
conn.commit()

# 5. Query data
cursor.execute("SELECT * FROM users")
rows = cursor.fetchall()
for row in rows:
print(row)
# (1, 'Zhang San', '[email protected]', 28, 'Beijing')
# (2, 'Li Si', '[email protected]', 35, 'Shanghai')
# (3, 'Wang Wu', '[email protected]', 22, 'Guangzhou')

# 6. Close the connection
conn.close()

Congratulations! You just created a database, created a table, and inserted 3 rows of data.

What is the most important thing to learn from this small example first?

The most important thing is not every SQL keyword, but that the smallest database workflow is actually very straightforward:

  1. Connect to the database
  2. Create a table
  3. Insert data
  4. Query the results

Once this flow makes sense, learning SQL and Python database connections later will feel much less abstract.


Summary

ConceptOne-sentence understanding
DatabaseA "folder" that stores all tables
TableAn "Excel worksheet" for one kind of data
Primary keyThe "ID card number" of each record
Foreign keyThe "link" connecting two tables
ConstraintA "rule" that ensures data quality

What should you take away from this lesson?

  • The most important thing about relational databases is not "storing many tables," but that tables can establish relationships through keys
  • Primary keys uniquely identify records, and foreign keys connect tables
  • Once this foundation is clear, SQL and multi-table analysis will make much more sense

Hands-on practice

Exercise 1: Design table structures

Design two tables for a library management system:
- books table: title, author, publication year, price, category
- borrows table: borrowing records (who borrowed which book, borrow date, return date)

Think about:
1. What is the primary key of each table?
2. Which foreign keys does the borrows table need?
3. Which fields should have NOT NULL constraints?

Exercise 2: Practice with SQLite

# Use sqlite3 to create the books table and borrows table designed above
# Insert 5 books and 3 borrowing records
# Query all data and print it