Open-Source Articles


Version Control Systems

by Ron Murawski © 2004-2008 -- Last update: April 4, 2008

Introduction

Version control systems enable developers to maintain a historical record of every file in a project. The project record is usually referred to as a repository. As changes are made this repository would seem to grow enormous quickly, but in practice a diff program is used to maintain the changes. Using diff, the revision control system will then only need to maintain the latest version (or, possibly, just the original?) plus a database of the small diff-generated delta files containing all the changes.

For programmers new to version control systems the major puposes are:

  • Maintaining a coherent codebase history for several programmers (or just one)
  • Automatically interleaving commits by several programmers to the same files
  • Reverting back to previous versions in case of introduced bugs

Version control systems support the concept of branches, where one programmer can branch off into his/her own experimental project while others can concurrently work on the main branch (trunk) or on other branches. These branches can later be merged back into the main trunk, but it usually involves some human interpretation and editing of automatically versioning-generated comments where editing conflicts arise.


Older Version Control Packages

The grandfather of version control is RCS (Revision Control System). I have not used RCS for many years and I believe that no one is doing any further development on it. Its main audience was small teams of disciplined programmers. Usage was cumbersome in that files had to be "checked out" in order to work on them. If one programmer on a team checked out a file, then others could not work on it until the file was returned (committed) back into the repository. RCS's fatal flaw was its annoying habit of versioning each file separately; there was no concept of a group of changes crossing file or directory boundaries.

If RCS is the grandfather of version control software, then CVS (Concurrent Versions System) is the father. In theory CVS is great -- it allows multiple programmers to edit the same files and then automates the interleaving of the changes. When there is a conflict in editing the same line of code, the software flags both edited lines and inserts a warning comment. CVS's biggest problem is that it is based on RCS. Because of this it is not capable of reverting back to a previous version that crossed file boundaries. CVS's problems spring from its RCS heritage.


Important Features For Version Control Systems

For large projects, the most important feature for a modern versioning system is that committed changes be atomic. In simple words, a group of editing changes across disparate files and directories should get committed as a single unit. Once commits are handled atomically, it becomes possible to revert back to previous project versions. Except for RCS and CVS, all of the below-listed version control systems have atomic commits and are client-server unless marked as distributed.

Other versioning features to look for are proper handling of files and directories that have been renamed, deleted, or moved.

Free Open Source Version Control Packages (in alphabetical order)
  • Aegis - Tedious to set up and use. Can only be pseudo-networked due to its file-system based orientation.
  • Arch - Distributed. Good choice for an individual or a small team. Easy to set up and use. Repository not as concise as other VC projects. This is primarily a Linux project, the Windows versions are not up to date.
  • Bazaar - Distributed. Requires Python interpreter to run; CVS compatible commands; extensive 3rd party plug-ins and utilities. Windows version might require Cygwin package.
  • Codeville - Distributed. Has a unique merging algorithm. Good choice for an individual. Although it scales to very large projects, Codeville is yet not mature, so lacks some expected niceties, such as GUI interfaces. Easy to set up and use. Requires Python interpreter.
  • CVS - An old VC system with limited features.
  • Darcs - Distributed. Written in Haskell, but binary packages are available.
  • Mercurial - Distributed. Probably the fastest VC; Command set similar to CVS. Fairly popular. Requires Python interpreter.
  • Monotone - Easy to use; somewhat slow; doesn't scale well. Older commits are needlessly difficult to identify.
  • RCS - An old VC system with very limited features.
  • Subversion - Probably the most popular; fully featured. Command set almost identical to CVS. Fast; repository is concise. Lots of 3rd party support for GUI interfaces and specialty VC. Not easy to set up.
  • Vesta - Linux/Unix only. Not widely used; no user guide manual.

Comparison of Version Control Systems

I have not tested all of the above-listed systems. What I did find was a version control systems comparison. For me, of all the reviewed systems, it was Subversion that stood out the most due to its full features, its ability to restrict access to just one directory per user, and its great documentation. This is a major project that supports file/directory deleting, moving and renaming. It maintains backward compatability by using almost the same command set as the venerable and still-popular CVS. There are Subversion pre-built packages available for download for Windows, RedHat, SuSE, [Net | Open | Free] BSD, Linux, Solaris, and Mac OS X. Better yet, there are cross-platform GUI front-ends for the Subversion revision system available from RapidSVN. For Windows-based programmers there is also TortoiseSVN which is a simple-to-use Subversion client, implemented as a windows shell extension. Best of all from my vantage point is the fact that there is an O'Reilly book:

  • Pilato, Michael & others; Version Control with Subversion; O'Reilly; 350p; ISBN 0-596-00448-6; paperback. $34.95.

Although the paperback book is available there is no need to buy it, for the Subversion on-line book (same information) is always available at the website.

For the dedicated, there is svk, a Perl add-on. Svk enables Subversion to support replication with automatic progagation of changes to the parent repository. The GUI clients for Subversion also work for svk. The downside to svk is that it takes quite a bit of configuration to get it to go and documentation is very sparse.


Getting Started with Subversion

Setting up the Subversion server was not too difficult, but configuration was finicky. Make one small mistake and nothing will happen when you try to create or query the repository. I'm still having troubles setting up a password-protected repository and I don't know if the problem is my own mistake setting up the server or if it is a Windows98-related issue. Be advised that the server itself cannot run on Windows98 because the backend database, Berekely DB, refuses to run on Win98 systems.

I am currently using Subversion to maintain the Horizon chess engine codebase, some other small projects, and all the files for this web site. I am reporting on my results as they happen.


Windows 98 and Subversion Passwords

All the Subversion clients have troubles running on Windows 98. The two that work are Rapid 0.4.0 and Tortoise 1.0.3. Rapid 0.4.0 runs fine; you can browse the repository, checkout directories and make commits. Tortoise 1.0.3 seems almost trouble-free and has some unique diff capabilities.

At one time, my greatest problem was the issue of username-password access to the repository. Many of the GUIs claim Subversion's design is at fault because, when password-protection is enabled and anonymous access is denied, the server exits with an error instead of asking for a username/password when anonymous access is attempted. According to the Tortoise mailing list: "Subversion always tries first with the default username (UID or User ID). But there's something wrong with the API function which fetches the UID on Win98. That wouldn't be a problem if Subversion would simply ask for a username in that case (as it does if the UID is not accepted by the server), but it exits with an error." Win98 also gets blame because users do not log on and the Windows system call that some of the GUI clients use to fetch the name of the current logged-in user fails. The best workaround for enabling password support seems to be allowing anonymous read access to the repository. With anonymous read access enabled, Rapid 0.4.0 can commit changes for logged-in users. The downside to this approach is that the repository contents are publicly available to anyone with a Subversion GUI client and the proper URL path.

The Tortoise mailing list has work-arounds listed for password access but, for a long time, I was not been able to make them work. It was a bit frustrating that trying to bring up Tortoise's settings crashed it out-- that's where the username/password is set! But, one day, Tortoise started working as promised. The username/password somehow got stored permanently and now everything works wonderfully well. I have no idea what I did to get it to go, but Tortoise has been trouble-free ever since.


TortoiseSVN

A great choice for Win98 is TortoiseSVN. Other than the password problem I experienced in the beginning, I find it to be free of bugs. As I go along I'm discovering some really powerful features. There is a program called TortoiseMerge, which is a high-end diff program. You can input up to three versions of a file and it will highlight all differences and can merge all of these changes into one file. If there is a conflict it allows you to decide which change to use. If three files sounds excessive, just imagine that you have been editing some files. Then, you receive an email with someone's suggested changes. There you are with three versions: your own edited copy, the other emailed version, and the base file in the repository (if the base copy has changed since the last time you looked, then you will have four files!). TortoiseMerge is the solution.

While it's nice to have a high-end diff program available it would seem that it wouldn't be of much practical use. Well, imagine my surprise when I found out that TortoiseMerge is exactly what I needed! Here's the scenario: You are about to commit a directory containing your coding changes. You are about to fill out the little text box that documents your changes... and then you scratch your head because you don't remember exactly just what those changes were. Just double-click on any of the changed files and TortoiseMerge pops up showing you all changed lines side-by-side. So you start to fill in your description and then you double-click on the next changed file, see what changes you made and expand your description a little more. Nice and painless and always accurate. I like that! What could use a bit of work is the default highlighting color scheme. It strikes me as garish and in need of toning down.


RapidSVN

Compared to Tortoise, Rapid is a more standard sort of program. The simple menu system and shortcut bar make it easy to navigate and manipulate both your external repository and your local working copy. It also handles password-protected repositories. But I miss having TortoiseMerge built into the commit logic to help me remember all of my changes.


GUI Client Techniques

When you checkout your Subversion repository (a one-time event) into your Windows filespace, you create a "working copy". Both Tortoise and Rapid use these files. A quick look at the .svn subdirectories under every repository subdirectory tells me that these are journals of committed changes containing items such as revision dates and checksums, plus copies of the original committed files. These files should only be changed by Tortoise or Rapid. It seems to me that editing them would be disastrous. I would suggest to both Rapid and to Tortoise that perhaps making these directories hidden and the files read-only would help insure that users do not modify them inadvertently.

I am using Tortoise to do all of my commits. I let TortoiseMerge tell me the exact changes and build up an accurate description of the changes. When I do the actual commit I have a nice Tortoise-assisted description of all my changes.

The speed of commits to the repository is truly astounding. Like it says on the Subversion webpage: "In general, the time required for an Subversion operation is proportional to the size of the changes resulting from that operation, not to the absolute size of the project in which the changes are taking place. This is a property of the Subversion repository model." I find Subversion to be substantially faster than CVS on commits; at least 10 times faster -- and maybe as much as 100 times faster!


Subversion GUI Client Comparison

Using Windows 98SE I found the following annoyances:

  • TortoiseSVN 1.0.3: It is not straightforward to get Tortoise to store your username-password. Until this is achieved no commits are possible.
  • RapidSVN 0.4.0: I wish there were a handy diff program available as in Tortoise.
  • RapidSVN 0.5.0: Doesn't work at all. Won't launch.
  • SvnUp 0.7.0: Cannot browse the repository.

And here is the good news:

  • TortoiseSVN 1.0.3: is fully-featured, and works extremely well. For Windows users this is by far the best interface to use. The companion TortoiseMerge program, integrated into the commit logic, is a huge plus. It is the only GUI client with extensive documentation on setup and use.
  • RapidSVN 0.4.0: is fully-featured and works nicely. It has no password problem.

Final Report Card

  • A -- TortoiseSVN 1.0.3
  • B -- RapidSVN 0.4.0
  • F -- RapidSVN 0.5.0
  • F -- SvnUp 0.7.0

If you find an error, a bad link, or just want to comment, please email Ron Murawski