Tuesday, August 20, 2013

Crash course for CVS users switching to git

I've used CVS for a long time and have been generally happy with it. Some of its quirks have caused me to consider subversion but none were really pressing enough for me to do so. I've tried to use git in the past - honestly, I have - and have always ended up extremely frustrated. But I have crossed the mental rubicon and wanted to share what was so difficult for me to grasp, in case there are any dinosaurs like me still out there.

How I use(d) CVS

It is probably best to describe my usage of CVS first. CVS has a model of a single central repository, and multiple simultaneous editors. Each checkout by an editor is referred to as a 'sandbox'. Editors work in their sandbox and commit to the repository.

One additional thing to note is that CVS has a concept of 'tags', which is a way to label a snapshot of all the files in the repository (it could also apply to a subset, but we're keeping this simple). A quirk here is that a "tag -b", or "branch tag" is a label that can be used to denote a separate line of development.
This model has allowed me to use the following workflow:

  • Create some product and release version 1.0
  • cvs tag -b VERSION_1_0
    • The name format is another CVS quirk, but this branch tag lets me not only label the files that make up version 1.0, but create a branch point where I can develop new features.
  • Begin working on version 2.0. As features get completed, commit to the (main branch of the) repository. NEVER commit a broken configuration as this will upset other developers when they check out the code.
  • A bug is reported in version 1.0
  • Go to a different directory, and cvs checkout -r VERSION_1_0
    • This new sandbox will be devoted to identifying/fixing this bug.
  • Fix the bug, and cvs commit it. Then cvs tag BUG_1001.
    • This effectively creates VERSION_1_0 / BUG_1001.
  • Continue until you release and branch tag VERSION_1_1.
    • Always do the tag in a clean sandbox (with a fresh checkout) to ensure it functions.
  • Meanwhile, the other sandbox can continue developing VERSION_2_0
  • Prior to release of version 2.0, review bug fixes to see if they are applicable.
    • cvs merge BUG_1001, etc. as necessary.
  • Delete any sandboxes used for failed experiments or resolved bugs
This is an iterative process that gets more involved with more people and features, etc, but the key is that simultaneous development happens, each in its own sandbox.

This is not how git wants to work.

Development under git

There are two key things to recognize when developing with git:

  • Discard the idea of a sandbox
  • Discard the idea of a commit

I'll also add:

  • Discard the idea of a repository

In git, most frustratingly to me, the 'sandbox' and the 'repository' are gone. Instead, there is only your work area. When you commit files, they go into (by default) the .git directory in your work area. If you remove your work area, you remove your repository (the .git directory). These CVS ideas must be completely disabused in order to use git well.

Also in git, branches are not 'clean'. Under my CVS workflow, a sandbox was a branch. I would commit what I wanted and delete the entire sandbox after I'd verified my commit. Any stray files created would be removed. In git, any files created in your work area will be carried around from branch to branch as you checkout.

The key idea of git that you have to get used to is what is called the 'index'. You can think of the index as your commit target. When you are happy that your code doesn't break things, you stage it in the index. Think of this as a lightweight 'commit':

  • Create files onefish.txt and twofish.txt
  • Modify files redfish.txt and bluefish.txt
  • See these files and see that they are good.
  • git add '*fish.txt'
The result of 'git add' isn't just "Hey, add these files", like it is in CVS -- instead, it's "Hey, I want these changes to be staged in the index". When this command completes, the 'index' has been updated, just as though you had done a 'commit' to it. Then, a 'git commit' pushes the changes in the index into the repository. The files in your work area are not a part of this equation AT ALL.

If you edit README, 'git add' README to the index, edit README again, and then 'git commit', you are committing the first set of changes because the second set was not staged before the commit. I suspect many people will use the 'git commit -a' form of the command by default, which does an implicit 'add' to the index of all modified and removed files, but not new files not yet 'git add'ed. This actually makes good sense because, as I noted above, git branches are not clean.

I don't like the 'commit -a' form as it seems too risky at this time -- I prefer to manually 'git add' each change before the commit. But sometimes I forget to add a new file that I created. Fortunately, git makes this really easy to fix: 'git commit --amend'. No matter how many things you got wrong with the last commit, you can just fix it like this. Add new files, new changes, fix the comment, etc. That's really handy.

So this is a workflow that I'm happy with for now. Creating branches is a simple 'git checkout -b branchname', switching branches is the same without the '-b' parameter. Crap in your work area remains in your work area, and the index remains as well. Some may like this because you can commit to a different branch than you started out on, but CVS has a simpler "cvs commit -r banchname" (and git probably does too).

The problem I have is the crud that persists in your work area as you switch branches. Technically I don't know where else it would go, but it seems bad form to me to carry this around. What do people do here? Do you check this in-progress and probably-won't-compile code into the branch while you switch to work on a bug? Is "never commit something that doesn't work" no longer a valid rule? Or do you leave it and 'git clone' the repository into another 'sandbox'?

Using 'git clone' seems like an acceptable solution, except it makes me ill at ease to 'rm -rf' the directory because it carries around the entire repository in its .git folder. And without a central server, there is no backup if you make a mistake and accidentally remove the last copy of the project's .git folder. Under CVS you may lose your changes, but in git you lose everything.

So I've set up git as a server which I'm using as my master repository and developing in multiple "sandboxes" cloned from the central repository, using minor feature branches in each work area as I go. When I believe I am feature complete, I push to the server as my final 'commit' and, if appropriate, 'tag'.

Once CVS users can get into the flow of adding to the index as a lightweight commit, the rest of git is fairly intuitive. Though I clearly still have much more to learn, I think that this fundamental understanding will make me a very happy git user.

Next project: do something about Blogger's horrid editor and/or my blog styling.

No comments:

Post a Comment

Some HTML tags are accepted, SomeLink, bold and italic seem to be it.