Skip to main content

1.2.1 Git Basics

Git four-area workflow diagram

Where This Lesson Fits

This lesson first helps you understand why code needs version control. You will start from real pain points like “file-naming hell,” “I broke it and can’t go back,” and “team collaboration is difficult,” and build a basic understanding of repositories, the staging area, commits, and branches.

Learning Objectives

  • Understand why version control is needed (through real pain-point scenarios)
  • Master Git’s four core concepts: repository, staging area, commit, and branch
  • Complete Git installation and initial configuration

A World Without Git

Before learning Git, let’s look at how developers managed code without it:

Pain Point 1: File-Naming Hell

You write an AI model training script and keep modifying and saving it:

train.py
train_v2.py
train_v2_final.py
train_v2_final_really_final.py
train_v2_final_really_final_fixed_bug.py
train_v2_final_really_final_fixed_bug_boss_said_change_it_again.py

A week later, you want to go back to the version “before the first bug fix” — which file was it?

Pain Point 2: You Broke It and Can’t Go Back

You enthusiastically refactor model.py and change 200 lines of code. You run it — error. You change it again — even more errors. You want to restore the previous version, but you’ve already hit Ctrl+S countless times, and now you can’t go back.

Pain Point 3: Team Collaboration Is a Disaster

You and a coworker are editing the same file at the same time. You change the first half; they change the second half. You each save your version, then send files back and forth through WeChat. Who merges them? How do you merge them? What if you overwrite each other’s changes?

Git solves these three problems:

Pain pointHow Git solves it
File-naming hellAutomatically records versions for each change, so you don’t need to rename files manually
You broke it and can’t go backYou can roll back to any previous version at any time
Team collaboration is difficultEveryone works on their own branch, then merges automatically at the end

What Is Git?

In one sentence: Git is a code version control tool. It records every change to your code, so you can view history, roll back versions, and collaborate with others at any time.

A few key points:

  • Git is free and open source
  • Git is a local tool — it works without an internet connection (GitHub is Git’s online hosting service, not Git itself)
  • Git is an industry standard — almost all software companies and open-source projects use Git
  • Git was created in 2005 by Linus Torvalds, the father of Linux

Git’s Four Core Concepts

Think of Git as an intelligent archive system. When you’re playing a big game (writing code), Git helps you save and load at any time.

Concept 1: Repository

Repository = a project folder managed by Git.

The difference between a normal folder and a Git repository is like the difference between an ordinary notebook and a magic notebook with a record of “all changes.”

# Turn a normal folder into a Git repository
cd my-project
git init

After running git init, a hidden .git directory appears in the folder. This is where Git stores all version records. You don’t need to open it; just know it’s there.

Concept 2: Staging Area

This is one of Git’s most unique designs. The staging area is an intermediate place for “preparing to commit.”

Use moving house as an analogy:

  1. You have lots of things in your room (working directory — the files you are editing)
  2. You choose some things to place by the door (staging area — the files you selected and are preparing to record)
  3. The moving company arrives and loads the items at the door onto the truck (commit — formally recording this change)
# You modified 3 files: model.py, train.py, notes.txt

# Put only model.py and train.py at the "door" (staging area)
git add model.py train.py

# notes.txt stays in the "room" and will not be committed this time

Why do we need a staging area? Because you may have changed 5 files, but only want to record changes to 2 of them this time. The staging area lets you precisely control which changes are included in each commit.

Concept 3: Commit

Commit = one formal version record. It’s like a save point in a game.

Each commit includes:

  • Which files were changed
  • What changed specifically (what was added or deleted on each line)
  • When it was committed
  • Who committed it
  • A description message (explaining what this change did)
git commit -m "Fix the bug where the learning rate was too high during model training"

The text in quotes after -m is the commit message, which explains what this change did. A good commit message should let other people (including your future self) know immediately what changed.

A project’s commit history might look like this:

Commit #5: "Add data augmentation"               ← latest
Commit #4: "Fix the bug where the learning rate was too high during model training"
Commit #3: "Add CNN model definition"
Commit #2: "Complete data loading module"
Commit #1: "Project initialization, add README" ← earliest

You can return to any commit point at any time, just like loading a game save.

Concept 4: Branch

Branch = an independent development line. It’s like a parallel universe.

Imagine your project is a main road (main branch). You want to try a new feature (for example, changing the model architecture), but you’re not sure it will work. You don’t want to make changes directly on the main road — what if you break it?

At this point, you can “branch off” into a new road (a new branch) and make any changes you want there. If it works, merge the new road back into the main road; if it fails, just delete the new road, and the main road stays completely unaffected.

main branch:    ● ─── ● ─── ● ─── ● ─── ●  (stable code)
\ ↗
feature branch: ● ─── ● (trying a new feature)

Branches will be explained in detail in later chapters. For now, you only need to know that they exist.


Complete Workflow (See the Big Picture First)

The full Git workflow for managing code is:

You modify files  →  Select files to record (add)  →  Record formally (commit)  →  Push to the cloud (push)
Working directory Staging area Local repository Remote repository (GitHub)

A concrete example:

# 1. You write a new model file
# (At this point, the file is in the "working directory"; Git knows you changed something, but it hasn’t been recorded yet.)

# 2. Put it into the staging area
git add model.py

# 3. Commit formally (record this change)
git commit -m "Add ResNet model definition"

# 4. Push to GitHub (so the cloud also has this record)
git push

Install Git

macOS

# Method 1: Use Homebrew (recommended)
brew install git

# Method 2: Type git directly in the terminal; macOS will prompt you to install Xcode Command Line Tools
git --version

Ubuntu / Debian

sudo apt update
sudo apt install git

Windows

# Using winget
winget install Git.Git

# After installation, restart the terminal, then verify
git --version

You can also download the installer from git-scm.com. During installation, you can keep the default options.

Verify the Installation

git --version
# Output looks like: git version 2.43.0

If you see a version number, the installation was successful.


Initial Configuration

After installing Git, you need to tell it who you are. This information will appear in every commit record.

# Set your name (use English; this will be shown on GitHub)
git config --global user.name "Zhang San"

# Set your email (recommended to use the same email you registered with GitHub)
git config --global user.email "[email protected]"

# Set the default branch name to main (the standard in newer Git versions)
git config --global init.defaultBranch main

# View the configuration to confirm it worked
git config --list
About --global

--global means this is a global configuration that applies to all Git repositories on your computer. If a specific project needs a different configuration (for example, a company project using a company email), you can set it separately in that project without --global.


Try It Out

Now let’s create your first Git repository and experience the full workflow:

# Create a new project
mkdir my-first-repo
cd my-first-repo

# Initialize the Git repository
git init
# Output: Initialized empty Git repository in .../my-first-repo/.git/

# Create a file
echo "# My First Git Repository" > README.md
echo "print('Hello Git!')" > hello.py

# Check status — Git will tell you which files have changed
git status
# You will see README.md and hello.py shown in red (untracked files)

# Add the files to the staging area
git add .
# "." means all files in the current directory

# Check status again — the files turn green (staged, ready to commit)
git status

# Commit!
git commit -m "Project initialization: add README and hello.py"
# Output: [main (root-commit) abc1234] Project initialization: add README and hello.py

# View commit history
git log --oneline
# Output: abc1234 Project initialization: add README and hello.py

Congratulations, you’ve completed your first Git commit!

Now try modifying a file and committing again:

# Modify hello.py
echo "print('Hello Git! I am learning AI.')" > hello.py

# See what changed
git diff
# Your changes will be highlighted in red/green

# Add and commit
git add hello.py
git commit -m "Update greeting"

# View history — now there are two records
git log --oneline
# Output:
# def5678 Update greeting
# abc1234 Project initialization: add README and hello.py

Two commits, two save points. You can return to either one at any time.


Summary

ConceptOne-sentence explanationAnalogy
RepositoryA project folder managed by GitA magic notebook with “undo history”
Staging areaThe intermediate place for preparing a commitThings placed at the door during a move, waiting to be loaded
CommitOne formal version recordA game save
BranchAn independent development lineA parallel universe
Working directoryThe files you are currently editingThe draft you are writing
Core Understanding

Git’s workflow is only three steps: modify files → add (stage) → commit. All later Git operations are built on top of this foundation.