Software Engineering for Self-Directed Learners
  • Revision Control

    What

    Can explain revision control

    Revision control is the process of managing multiple versions of a piece of information. In its simplest form, this is something that many people do by hand: every time you modify a file, save it under a new name that contains a number, each one higher than the number of the preceding version.

    Manually managing multiple versions of even a single file is an error-prone task, though, so software tools to help automate this process have long been available. The earliest automated revision control tools were intended to help a single user to manage revisions of a single file. Over the past few decades, the scope of revision control tools has expanded greatly; they now manage multiple files, and help multiple people to work together. The best modern revision control tools have no problem coping with thousands of people working together on projects that consist of hundreds of thousands of files.

    Revision control software will track the history and evolution of your project, so you don't have to. For every change, you'll have a log of who made it; why they made it; when they made it; and what the change was.

    Revision control software makes it easier for you to collaborate when you're working with other people. For example, when people more or less simultaneously make potentially incompatible changes, the software will help you to identify and resolve those conflicts.

    It can help you to recover from mistakes. If you make a change that later turns out to be an error, you can revert to an earlier version of one or more files. In fact, a really good revision control tool will even help you to efficiently figure out exactly when a problem was introduced.

    It will help you to work simultaneously on, and manage the drift between, multiple versions of your project. Most of these reasons are equally valid, at least in theory, whether you're working on a project by yourself, or with a hundred other people.

    -- [adapted from bryan-mercurial-guide



     

    Mercurial: The Definitive Guide by Bryan O'Sullivan retrieved on 2012/07/11

    RCS : Revision Control Software are the software tools that automate the process of Revision Control i.e. managing revisions of software artifacts.

    Revision: A revision (some seem to use it interchangeably with version while others seem to distinguish the two -- here, let us treat them as the same, for simplicity) is a state of a piece of information at a specific time that is a result of some changes to it e.g., if you modify the code and save the file, you have a new revision (or a version) of that file.

    Revision control is also known as Version Control Software (VCS), and a few other names.

    Revision Control Software

    In the context of RCS, what is a Revision? Give an example.

    A revision (some seem to use it interchangeably with version while others seem to distinguish the two -- here, let us treat them as the same, for simplicity) is a state of a piece of information at a specific time that is a result of some changes to it. For example, take a file containing program code. If you modify the code and save the file, you have a new revision (or a version) of that file.

    • a. Help a single user manage revisions of a single file
    • b. Help a developer recover from a incorrect modification to a code file
    • c. Makes it easier for a group of developers to collaborate on a project
    • d. Manage the drift between multiple versions of your project
    • e. Detect when multiple developers make incompatible changes to the same file
    • f. All of them are benefits of RCS

    f

    Suppose You are doing a team project with Tom, Dick, and Harry but those three have not even heard the term RCS. How do you explain RCS to them as briefly as possible, using the project as an example?

    Repositories

    Can explain repositories

    Repository (repo for short): The database of the history of a directory being tracked by an RCS software (e.g. Git).

    The repository is the database where the meta-data about the revision history are stored. Suppose you want to apply revision control on files in a directory called ProjectFoo. In that case you need to set up a repo (short for repository) in ProjectFoo directory, which is referred to as the working directory of the repo. For example, Git uses a hidden folder named .git inside the working directory.

    You can have multiple repos in your computer, each repo revision-controlling files of a different working directly, for examples, files of different projects.



    In the context of RCS, what is a repo?

    Saving History

    Can explain saving history

    Tracking and Ignoring

    In a repo, we can specify which files to track and which files to ignore. Some files such as temporary log files created during the build/test process should not be revision-controlled.

    Staging and Committing
     

    Committing saves a snapshot of the current state of the tracked files in the revision control history. Such a snapshot is also called a commit (i.e. the noun).

    When ready to commit, we first stage the specific changes we want to commit. This intermediate step allows us to commit only some changes while saving other changes for a later commit.


     


    Identifying Points in History

    Each commit in a repo is a recorded point in the history of the project that is uniquely identified by an auto-generated hash e.g. a16043703f28e5b3dab95915f5c5e5bf4fdc5fc1.

    We can tag a specific commit with a more easily identifiable name e.g. v1.0.2

    Using History

    Can explain using history

    RCS tools store the history of the working directory as a series of commits. This means we should commit after each change that we want the RCS to 'remember' for us.

    To see what changed between two points of the history, you can ask the RCS tool to diff the two commits in concern.

    To restore the state of the working directory at a point in the past, you can checkout the commit in concern. i.e., we can traverse the history of the working directory simply by checking out the commits we are interested in.

    RCS : Revision Control Software are the software tools that automate the process of Revision Control i.e. managing revisions of software artifacts.

    Remote Repositories

    Can explain remote repositories

    Remote repositories are copies of a repo that are hosted on remote computers. They are especially useful for sharing the revision history of a codebase among team members of a multi-person project. They can also serve as a remote backup of your code base.

    You can clone a remote repo onto your computer which will create a copy of a remote repo on your computer, including the version history as the remote repo.

    You can push new commits in your clone to the remote repo which will copy the new commits onto the remote repo. Note that pushing to a remote repo requires you to have write-access to it.

    You can pull from the remote repos to receive new commits in the remote repo. Pulling is used to sync your local repo with latest changes to the remote repo.

    While it is possible to set up your own remote repo on a server, an easier option is to use a remote repo hosting service such as GitHub or BitBucket.

    A fork is a remote copy of a remote repo. If there is a remote repo that you want to push to but you do not have write access to it, you can fork the remote repo, which gives you your own remote repo that you can push to.

    A pull request is mechanism for contributing code to a remote repo. It is a formal request sent to the maintainers of the repo asking them to pull your new code to their repo.

    Here is a scenario that includes all the concepts introduced above (click on the slide to advance the animation):

    Branching

    Can explain branching

    Branching is the process of evolving multiple versions of the software in parallel. For example, one team member can create a new branch and add an experimental feature to it while the rest of the team keeps working on another branch. Branches can be given names e.g. master, release, dev.

    A branch can be merged into another branch. Merging usually result in a new commit that represents the changes done in the branch being merged.

    Branching and merging

    Merge conflicts happen when you try to merge two branches that had changed the same part of the code and the RCS software cannot decide which changes to keep. In those cases we have to ‘resolve’ those conflicts manually.

    In the context of RCS, what is the branching? What is the need for branching?.

    In the context of RCS, what is the merging branches? How can it lead to merge conflicts?.

    DRCS vs CRCS

    Can explain DRCS vs CRCS

    RCS can be done in two ways: the centralized way and the distributed way.

    Centralized RCS (CRCS for short)uses a central remote repo that is shared by the team. Team members download (‘pull’) and upload (‘push’) changes between their own local repositories and the central repository. Older RCS tools such as CVS and SVN support only this model. Note that these older RCS do not support the notion of a local repo either. Instead, they force users to do all the versioning with the remote repo.

    The centralized RCS approach without any local repos (e.g., CVS, SVN)

    Distributed RCS (DRCS for short, also known as Decentralized RCS) allows multiple remote repos and pulling and pushing can be done among them in arbitrary ways. The workflow can vary differently from team to team. For example, every team member can have his/her own remote repository in addition to their own local repository, as shown in the diagram below. Git and Mercurial are some prominent RCS tools that support the distributed approach.

    The decentralized RCS approach

    Forking Flow

    Can explain forking workflow

    In the forking workflow, the 'official' version of the software is kept in a remote repo designated as the 'main repo'. All team members fork the main repo create pull requests from their fork to the main repo.

    To illustrate how the workflow goes, let’s assume Jean wants to fix a bug in the code. Here are the steps:

    1. Jean creates a separate branch in her local repo and fixes the bug in that branch.
    2. Jean pushes the branch to her fork.
    3. Jean creates a pull request from that branch in her fork to the main repo.
    4. Other members review Jean’s pull request.
    5. If reviewers suggested any changes, Jean updates the PR accordingly.
    6. When reviewers are satisfied with the PR, one of the members (usually the team lead or a designated 'maintainer' of the main repo) merges the PR, which brings Jean’s code to the main repo.
    7. Other members, realizing there is new code in the upstream repo, sync their forks with the new upstream repo (i.e. the main repo). This is done by pulling the new code to their own local repo and pushing the updated code to their own fork.

    Feature Branch Flow

    Can explain feature branch flow

    Feature branch workflow is similar to forking workflow except there are no forks. Everyone is pushing/pulling from the same remote repo. The phrase feature branch is used because each new feature (or bug fix, or any other modification) is done in a separate branch and merged to master branch when ready.

    Centralized Flow

    Can explain centralized flow

    The centralized workflow is similar to the feature branch workflow except all changes are done in the master branch.