Skip to content

3.5.2 Basics of Relational Databases

Basics of Relational Databases

  • Understand what a database is and why we need it
  • Master the core concepts of relational databases
  • Understand the meaning of tables, rows, columns, primary keys, and foreign keys
  • Learn about common database management systems

Why do AI engineers need to learn databases?

Section titled “Why do AI engineers need to learn databases?”

You may be thinking: “I already know how to use Pandas to read CSV files, so why do I still need to learn databases?”

flowchart LR
A["CSV / Excel files"] --> B{"Data size"}
B -->|"tens of thousands of rows"| C["Pandas is enough"]
B -->|"millions of rows"| D["Need a database"]
B -->|"multiple people use it at the same time"| D
B -->|"need secure storage"| D
style C fill:#e8f5e9,stroke:#2e7d32,color:#333
style D fill:#fff3e0,stroke:#e65100,color:#333
ScenarioCSV filesDatabase
Data sizeOkay for tens of thousands of rows, slows to a crawl at millionsEasily handles hundreds of millions of rows
CollaborationHard to know who changed what, easy to conflictSupports concurrent read/write with access control
Data securityIf the file is deleted, it is goneHas backups, transactions, and crash recovery
Query speedMust scan everything each timeHas indexes, millisecond-level queries
Data relationshipsManually merge multiple filesUse one SQL JOIN to get it done

Real-world examples:

  • Building a RAG system → user Q&A records need to be stored in a database
  • Building an AI Agent → the memory system needs persistent storage
  • Building a recommendation system → user behavior data lives in the database
  • Doing data analysis → 99% of enterprise data is stored in databases

If you have used Excel before, you already understand 80% of the concepts of relational databases:

Excel conceptDatabase conceptExplanation
One Excel fileOne databaseA container that stores all data
One SheetOne tableStores one type of data
One rowOne record (Row / Record)One specific data entity
One columnOne field (Column / Field)One attribute of the data
Column headerColumn nameThe attribute name
Cell data typeData typeInteger, text, date, and so on

Suppose you run an online store and need to manage users and orders:

Users table (users):

Orders table (orders):

  • order_id=101: user_id=1, iPhone 16, 7999, 2024-11-01
  • order_id=102: user_id=1, AirPods, 999, 2024-11-05
  • order_id=103: user_id=2, MacBook, 14999, 2024-11-10

These two tables are connected through user_id—this is where the name relational database comes from: tables have relationships with each other.


A primary key is the unique identifier for each record, like an ID card number: it cannot be repeated and cannot be empty.

In the users table: id is the primary key → each user has a unique id
In the orders table: order_id is the primary key → each order has a unique order_id

A foreign key is a field that references the primary key of another table and is used to build relationships between tables.

flowchart LR
subgraph users["users table"]
U["id (primary key) | name | email"]
end
subgraph orders["orders table"]
O["order_id (primary key) | user_id (foreign key) | product"]
end
orders -->|"user_id references"| users
style users fill:#e3f2fd,stroke:#1565c0,color:#333
style orders fill:#fff3e0,stroke:#e65100,color:#333

The user_id in the orders table is the foreign key—it points to the id in the users table and indicates “which user this order belongs to.”

TypeExplanationExample
INTEGERInteger1, 42, -100
REAL / FLOATFloating-point number3.14, 99.9
TEXT / VARCHARText string”Zhang San”, “hello”
DATEDate2024-11-01
DATETIMEDate and time2024-11-01 14:30:00
BOOLEANBoolean valueTRUE / FALSE
BLOBBinary dataImages, files (less commonly used)

Constraints are rules for data that help ensure data quality:

ConstraintPurposeExample
PRIMARY KEYPrimary key, unique and not nullid
NOT NULLCannot be emptyname NOT NULL
UNIQUEValue cannot be duplicatedemail UNIQUE
DEFAULTDefault valuecity DEFAULT 'Unknown'
FOREIGN KEYForeign key, references another tableuser_id REFERENCES users(id)

Keep this page’s proof of learning as a small evidence card:

Schema
table names, keys, relationships, and sample rows
Query
SQL or Python database code used
Output
result rows, row count, or saved extract
Failure Check
wrong join key, unsafe query, missing transaction, or schema mismatch
Expected Output
query plus result table and one data-quality note
DatabaseFeaturesUse cases
SQLiteZero configuration, stored in a single fileLearning, small apps, mobile apps
MySQLThe most popular open-source databaseWeb applications, small to medium projects
PostgreSQLThe most powerful open-source databaseLarge projects, AI applications (supports vector search)
SQL ServerMade by MicrosoftEnterprise Windows environments

The safest default order when learning databases for the first time

Section titled “The safest default order when learning databases for the first time”

A more stable order is usually:

  1. First get familiar with “tables, primary keys, and foreign keys”
  2. Then learn SQL queries
  3. Then connect Python to the database
  4. Finally, learn database design

This is less confusing than trying to memorize a lot of SQL details right from the start.


import sqlite3
# 1. Connect to the database (it will be created automatically if it does not exist)
conn = sqlite3.connect("my_shop.db")
cursor = conn.cursor()
# 2. Create the users table
cursor.execute("""
CREATE TABLE IF NOT EXISTS users (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
email TEXT UNIQUE,
age INTEGER,
city TEXT DEFAULT 'Unknown'
)
""")
# 3. Insert data
cursor.execute("INSERT INTO users (name, email, age, city) VALUES ('Zhang San', '[email protected]', 28, 'Beijing')")
cursor.execute("INSERT INTO users (name, email, age, city) VALUES ('Li Si', '[email protected]', 35, 'Shanghai')")
cursor.execute("INSERT INTO users (name, email, age, city) VALUES ('Wang Wu', '[email protected]', 22, 'Guangzhou')")
# 4. Commit changes
conn.commit()
# 5. Query data
cursor.execute("SELECT * FROM users")
rows = cursor.fetchall()
for row in rows:
print(row)
# (1, 'Zhang San', '[email protected]', 28, 'Beijing')
# (2, 'Li Si', '[email protected]', 35, 'Shanghai')
# (3, 'Wang Wu', '[email protected]', 22, 'Guangzhou')
# 6. Close the connection
conn.close()

Congratulations! You just created a database, created a table, and inserted 3 rows of data.

What is the most important thing to learn from this small example first?

Section titled “What is the most important thing to learn from this small example first?”

The most important thing is not every SQL keyword, but that the smallest database workflow is actually very straightforward:

  1. Connect to the database
  2. Create a table
  3. Insert data
  4. Query the results

Once this flow makes sense, learning SQL and Python database connections later will feel much less abstract.


root(("Relational Database"))
Core concepts
Database
Table
Row
Column
Key mechanisms
Primary Key PK
Foreign Key FK
Constraints
Data types
Common databases
SQLite (for learning)
MySQL
PostgreSQL
SQL Server
ConceptOne-sentence understanding
DatabaseA “folder” that stores all tables
TableAn “Excel worksheet” for one kind of data
Primary keyThe “ID card number” of each record
Foreign keyThe “link” connecting two tables
ConstraintA “rule” that ensures data quality

What should you take away from this lesson?

Section titled “What should you take away from this lesson?”
  • The most important thing about relational databases is not “storing many tables,” but that tables can establish relationships through keys
  • Primary keys uniquely identify records, and foreign keys connect tables
  • Once this foundation is clear, SQL and multi-table analysis will make much more sense

Design two tables for a library management system:
- books table: title, author, publication year, price, category
- borrows table: borrowing records (who borrowed which book, borrow date, return date)
Think about:
1. What is the primary key of each table?
2. Which foreign keys does the borrows table need?
3. Which fields should have NOT NULL constraints?
# Use sqlite3 to create the books table and borrows table designed above
# Insert 5 books and 3 borrowing records
# Query all data and print it
Reference implementation and walkthrough
  • A simple library database usually needs a books table with book_id as primary key and a borrowing table with its own borrow_id plus foreign keys to the book and borrower.
  • Use NOT NULL for required fields such as title, author, borrower, and borrow date. Optional fields, such as return date, can be nullable because they are not known at checkout time.
  • After creating tables, insert two or three rows and run a join query. Schema design is only convincing when a realistic question can be answered from it.