Digital Vomit: 2013.08

Tuesday, August 20, 2013

Crash course for CVS users switching to git

I've used CVS for a long time and have been generally happy with it. Some of its quirks have caused me to consider subversion but none were really pressing enough for me to do so. I've tried to use git in the past - honestly, I have - and have always ended up extremely frustrated. But I have crossed the mental rubicon and wanted to share what was so difficult for me to grasp, in case there are any dinosaurs like me still out there.

How I use(d) CVS

It is probably best to describe my usage of CVS first. CVS has a model of a single central repository, and multiple simultaneous editors. Each checkout by an editor is referred to as a 'sandbox'. Editors work in their sandbox and commit to the repository.

One additional thing to note is that CVS has a concept of 'tags', which is a way to label a snapshot of all the files in the repository (it could also apply to a subset, but we're keeping this simple). A quirk here is that a "tag -b", or "branch tag" is a label that can be used to denote a separate line of development.
This model has allowed me to use the following workflow:

Create some product and release version 1.0
cvs tag -b VERSION_1_0

The name format is another CVS quirk, but this branch tag lets me not only label the files that make up version 1.0, but create a branch point where I can develop new features.

Begin working on version 2.0. As features get completed, commit to the (main branch of the) repository. NEVER commit a broken configuration as this will upset other developers when they check out the code.
A bug is reported in version 1.0
Go to a different directory, and cvs checkout -r VERSION_1_0

This new sandbox will be devoted to identifying/fixing this bug.

Fix the bug, and cvs commit it. Then cvs tag BUG_1001.

This effectively creates VERSION_1_0 / BUG_1001.

Continue until you release and branch tag VERSION_1_1.

Always do the tag in a clean sandbox (with a fresh checkout) to ensure it functions.

Meanwhile, the other sandbox can continue developing VERSION_2_0
Prior to release of version 2.0, review bug fixes to see if they are applicable.

cvs merge BUG_1001, etc. as necessary.

Delete any sandboxes used for failed experiments or resolved bugs

This is an iterative process that gets more involved with more people and features, etc, but the key is that simultaneous development happens, each in its own sandbox.

This is not how git wants to work.

Development under git

There are two key things to recognize when developing with git:

Discard the idea of a sandbox
Discard the idea of a commit

I'll also add:

Discard the idea of a repository

In git, most frustratingly to me, the 'sandbox' and the 'repository' are gone. Instead, there is only your work area. When you commit files, they go into (by default) the .git directory in your work area. If you remove your work area, you remove your repository (the .git directory). These CVS ideas must be completely disabused in order to use git well.

Also in git, branches are not 'clean'. Under my CVS workflow, a sandbox was a branch. I would commit what I wanted and delete the entire sandbox after I'd verified my commit. Any stray files created would be removed. In git, any files created in your work area will be carried around from branch to branch as you checkout.

The key idea of git that you have to get used to is what is called the 'index'. You can think of the index as your commit target. When you are happy that your code doesn't break things, you stage it in the index. Think of this as a lightweight 'commit':

Create files onefish.txt and twofish.txt
Modify files redfish.txt and bluefish.txt
See these files and see that they are good.
git add '*fish.txt'

The result of 'git add' isn't just "Hey, add these files", like it is in CVS -- instead, it's "Hey, I want these changes to be staged in the index". When this command completes, the 'index' has been updated, just as though you had done a 'commit' to it. Then, a 'git commit' pushes the changes in the index into the repository. The files in your work area are not a part of this equation AT ALL.

If you edit README, 'git add' README to the index, edit README again, and then 'git commit', you are committing the first set of changes because the second set was not staged before the commit. I suspect many people will use the 'git commit -a' form of the command by default, which does an implicit 'add' to the index of all modified and removed files, but not new files not yet 'git add'ed. This actually makes good sense because, as I noted above, git branches are not clean.

I don't like the 'commit -a' form as it seems too risky at this time -- I prefer to manually 'git add' each change before the commit. But sometimes I forget to add a new file that I created. Fortunately, git makes this really easy to fix: 'git commit --amend'. No matter how many things you got wrong with the last commit, you can just fix it like this. Add new files, new changes, fix the comment, etc. That's really handy.

So this is a workflow that I'm happy with for now. Creating branches is a simple 'git checkout -b branchname', switching branches is the same without the '-b' parameter. Crap in your work area remains in your work area, and the index remains as well. Some may like this because you can commit to a different branch than you started out on, but CVS has a simpler "cvs commit -r banchname" (and git probably does too).

The problem I have is the crud that persists in your work area as you switch branches. Technically I don't know where else it would go, but it seems bad form to me to carry this around. What do people do here? Do you check this in-progress and probably-won't-compile code into the branch while you switch to work on a bug? Is "never commit something that doesn't work" no longer a valid rule? Or do you leave it and 'git clone' the repository into another 'sandbox'?

Using 'git clone' seems like an acceptable solution, except it makes me ill at ease to 'rm -rf' the directory because it carries around the entire repository in its .git folder. And without a central server, there is no backup if you make a mistake and accidentally remove the last copy of the project's .git folder. Under CVS you may lose your changes, but in git you lose everything.

So I've set up git as a server which I'm using as my master repository and developing in multiple "sandboxes" cloned from the central repository, using minor feature branches in each work area as I go. When I believe I am feature complete, I push to the server as my final 'commit' and, if appropriate, 'tag'.

Once CVS users can get into the flow of adding to the index as a lightweight commit, the rest of git is fairly intuitive. Though I clearly still have much more to learn, I think that this fundamental understanding will make me a very happy git user.

Next project: do something about Blogger's horrid editor and/or my blog styling.

Thursday, August 15, 2013

Part 2: Create a SOAP::Lite Server that uses Basic Authentication for password verification

Today is a better day than yesterday (see yesterday's part one of this post for details). Despite Google appearing to be ever more useless in the chaff that is today's WWW, and Bing no better, I've made some good progress with my SOAP experiment. It turns out the key to my next problem was solved in 2004. But let's recap:

I want to write a SOAP server. I want to authenticate clients somehow - username and password are a good start. I want every request to be authenticated, and a Basic Authentication Realm will work just fine.

Why make every request authenticate? Because the web is a stateless protocol. When you "log in" to a website, you really are just requesting a cookie. This cookie is a magic number that is stored on the server and means "your session". When you load another page on that same site, the browsers sends that cookie along as well. But the server still has to authenticate the validity of that cookie because it will be sent on the next page load, whether it's made in 2 seconds or 2 years. Basic Authentication is similar, except it's built-in to the browser. This has downsides though, the biggest is that your browser asks you to log in so the website developer can't make a fancy branded prompt. But I digress...

The second issue I will run into is that I want to be able to retrieve those credentials from the class methods being called. There are a number of reasons to do this: perhaps different clients will get different responses, or perhaps I want to log some AAA data. This is where I need to search back to 2004.

So let's revisit with some source code.

The SOAP server

soapDaemon.pl

    use strict;
    use SOAP::Transport::HTTP +'trace';

    # don't want to die on 'Broken pipe' or Ctrl-C
    #$SIG{PIPE} = $SIG{INT} = 'IGNORE';
    $SIG{PIPE} = 'IGNORE';

    #my $daemon = SOAP::Transport::HTTP::Daemon
    my $daemon = BasicAuthDaemon
          -> new (LocalPort => 8000, Reuse => 1)
          -> dispatch_to('myClass')
          ;

    print "Connect to SOAP server at ", $daemon->url, "\n";
    $daemon->handle;

This is the same code that can be found in the documentation all over the internet. I've added tracing and disabled the ^C handler, but this should be straightforward to understand. Again, note that I am replacing the default HTTP::Daemon with my custom BasicAuthDaemon. We'll run on port 8000 and send requests to the 'myClass' object.

The modified HTTP::Daemon class

soapDaemon.pl

    package BasicAuthDaemon;
    use strict;
    use warnings;
    use MIME::Base64;
    use SOAP::Lite +'trace';
    use SOAP::Transport::HTTP ();
    our @ISA= 'SOAP::Transport::HTTP::Daemon';

    sub handle {
        my $self = shift->new;
        while ( my $c = $self->accept ) {
            while ( my $r = $c->get_request ) {
                $self->request($r);

                my ($type, $creds) = split /\s+/, $r->headers->authorization;
                my ($user, $pass) = split /:/, decode_base64( $creds )
                    if( $type eq 'Basic' );
    print "type: [$type], user:[$user], pass:[$pass]\n";

                if( $user eq 'user' and $pass eq 'password' ) {
                    $self->{'auth'} = "123123123";
                    SOAP::Transport::HTTP::Server::handle $self;
                    #$self->SUPER::handle;
                } else {
                    $self->response( $self->make_fault(
                          $SOAP::Constants::FAULT_CLIENT, 'Authentication required',
                          'Give authentication credentials for Basic Realm security'
                      ));
                  }
                  $c->send_response( $self->response );
              }

    # replaced ->close, thanks to Sean Meisner <Sean.Meisner@VerizonWireless.com>
    # shutdown() doesn't work on AIX. close() is used in this case. Thanks to Jos Clijmans <jos.clijmans@recyfin.be>
              $c->can('shutdown')
               ? $c->shutdown(2)
               : $c->close();
              $c->close;
        }
    }
    1;

This code is in the same file as the server code above. I actually have this as the top of the file but it takes a bit more explaining. Firstly, this is still just test code because you're unlikely to use HTTP::Daemon in production. I think this actually gets easier in production because the handle() method doesn't have a service loop where requests have to be processed. See here for some clarification.

However, what I've done is gone into the file perl5/SOAP/Transport/HTTP.pm and found the handle() method within the SOAP::Transport::HTTP::Daemon package, and copied it into my code. I've left the original comments and expanded formatting to show that this is a copy-paste and not my own code. My additions are bold and removals are italic.

My added logic takes the HTTP request and references into the authorization headers that are sent for the Basic Authentication mechanism. You can use a similar routine to extract and verify cookie data, etc.

If the username and password match, I jam a value into $self, which is a SOAP context. I'm simply creating a hash called 'auth', which could be enhanced. Is there a 'stash' in this architecture? I don't think so... Should I call it something more unique like 'myAppAuth'? Probably. But this is where we begin.

Aside from adding my custom logic, one more modification was necessary: remove the call to $self->SUPER::handle. This is necessary because our superclass has the same event loop and will just hang waiting for another client. Instead we need to call the superclass of our superclass, which is SOAP::Transport::HTTP::Server. I pass $self as the first argument because this is how perl seems to do OO things, and it works nicely.

Finally, if the credentials don't match, I return a SOAP error.

The client (in perl)

soapClient.pl

    use strict;
    use SOAP::Lite +'trace';

    my $soapClient = new SOAP::Lite
        uri => 'http://example.com/myClass',
        proxy => 'http://user:password@localhost:8000/',
        ;

    my $result = $soapClient->hi( 'this is how the world ends', 'kabaam' );
    unless ($result->fault) {
        print "\nresult: [" . $result->result . "]\n\n";
    } else {
        print join ' --=-- ',
        $result->faultcode,
        $result->faultstring,
        $result->faultdetail, "\n";
    }

This is the same client as before. Again, the uri is the method being called and the host name appears to be ignored. The proxy is the actual web server, and is where we place the username and password.

The client (in PHP)

soapClient.php

    $client = new SoapClient(null,
        array('location' => "http://user:somepass@localhost:8000/",
              'uri'      => "http://test-uri/myClass",
              'login'    => "some_name",
              'password' => "some_password",
        ));

    $res = $client->hi( 'from php!' );
    print "got: $res\n";

Here's the same SOAP client written in PHP for comparison. Placing the username and password in the URL doesn't work in PHP, so these have to be explicitly added. Note again that the server doesn't seem to care about the server portion of the URI. On error, PHP throws an exception.

The SOAP class

myClass.pm

    package myClass;

    use strict;
    use vars qw(@ISA);
    @ISA = qw(Exporter SOAP::Server::Parameters);
    use SOAP::Lite;# +'trace';

    sub hi {
        my $evp = pop;
        my $context = $evp->context;

        my ($name, @args) = @_;
        print "[$name] got arguments: [@args]\n";

        return "hello, world, auth user=$context->{'auth'}";
    }

    sub bye {
        return "goodbye, cruel world";
    }
1;

Finally, here's the class handling the SOAP calls. Not being a perl expert, there is magic happening in here that I don't fully understand.

First, this code must exist in an external file. I surmised that this could be placed in the same file as the daemon code, and it does work for how I was using it yesterday. However, this version is a subclass of SOAP::Server::Parameters (doesn't that seem like how it should work?) For reasons I don't understand, the subclassing doesn't work if this code is in the same file, even if I place it in the BEGIN{} block. If anyone knows why, I'd love to hear an explanation.

So now that we're a subclass of SOAP::Server::Parameters, we automatically get a new parameter added as the last argument to every function call we make. We pop this value off the end of the parameter stack, saving it as our environment pointer. This is really a SOAP::SOM object but what I want is the original SOAP context. Fortunately, this is easy to access through the ->context method.

Once I get the context, the 'auth' hash I created is ready available, and I return it in the call to hi() for verification. Fortunately, it works like - and appears to be slightly - magic.

Wednesday, August 14, 2013

Create a SOAP::Lite Server that uses Basic Authentication for password verification

Today was a rough day. I'm writing a SOAP service in perl. The basics required to make this work are fairly straightforward, but, as usual, the documentation and I can't seem to understand one another very well. As far as I can tell, this recipe exists nowhere on the internet.

Let me start with the basic example. This is what a standalone server looks like:

    use strict;
    use SOAP::Transport::HTTP;
    # don't want to die on 'Broken pipe'
    $SIG{PIPE} = 'IGNORE';
    my $daemon = SOAP::Transport::HTTP::Daemon
        -> new (LocalPort => 8000, Reuse => 1)
        -> dispatch_to('class::method')
        ;
    print "Contact to SOAP server at ", $daemon->url, "\n";
    $daemon->handle;

How much simpler can this get? We use the required SOAP module, setup a signal handler, and create an HTTP daemon. We set it to run on an unprivileged port (8000) and setup socket reuse. It has a single method that can be called from the 'class' package. We print out the URL and start handling requests.

If you're really just starting out, this should work to define the 'class::method' handler, for reference:

    package class;
    use strict;
    sub method { return 'Hello, world!' };

Go ahead and put it at the bottom of the file with the daemon code.

Now, to test this service you also need to write a SOAP client. Here's that code:

    use strict;
    use SOAP::Lite;
    my $soapClient = new SOAP::Lite
        uri => 'http://example.com/class',
        proxy => 'http://user:password@localhost:8000/',
        ;
    my $result = $soapClient->method( 'some_args' );
    unless ($result->fault) {
        print "\nresult: [" . $result->result . "]\n\n";
    } else {
        print join ' ** ',
            $result->faultcode,
            $result->faultstring,
            $result->faultdetail, "\n";
    }

It's a bit longer because of the error checking, but still fairly straightforward. We use the SOAP module, create a client, and call the method. Then we either print the result or the fault information. There are a couple things to note in here:

The 'uri' parameter can contain any hostname. The only thing the server seems to care about is the name 'class'. This identifies what code is going to be dispatched.
The name 'class' in the client 'url' must match the dispatch_to parameter in the server.
The 'proxy' is the actual URL the request will be sent to. It can be http, https, mailto:, etc.
We have added username and password credentials to the URL. This is important later.

Now, running this should Just Work. $soapClient becomes a remote representation of 'class', and we can call method() as though it were a local function. Of course it's not a local function and the web is stateless, so each request also passes that username and password we want to use to authenticate.

Here's what needs to happen on the server to receive and process the username and password. At the top of the file with the server code, add this:

    # subclass the default soap server daemon to handle authenticated requests
    package BasicAuthDaemon;
    use strict;
    use MIME::Base64;
    use SOAP::Transport::HTTP ();
    our @ISA= 'SOAP::Transport::HTTP::Daemon';
    sub handle {
        my $self = shift->new;
        while ( my $c = $self->accept ) {
            while ( my $r = $c->get_request ) {
                $self->request($r);
                my ($type, $crypt) = split /\s+/, $r->headers->authorization;
                my ($user, $pass) = split /:/, decode_base64( $crypt )
                    if( $type eq 'Basic' );
                print "user:[$user], pass:[$pass]\n";
                if( $user eq 'user' and $pass eq 'pass' ) {
                    #$self->SUPER::handle;
                    SOAP::Transport::HTTP::Server::handle $self;
                } else {
                    $self->response( $self->make_fault(
                        $SOAP::Constants::FAULT_CLIENT, 'Authentication required',
                        'Give authentication credentials for Basic Realm security'
                    ));
                }
                $c->send_response( $self->response );
            }
            $c->can('shutdown')
                ? $c->shutdown(2)
                : $c->close();
            $c->close;
        }
    }

What we're doing here is taking the code that can be found around line 688 of SOAP/Transport/HTTP.pm (in the SOAP::Transport::HTTP::Daemon package) and copying it into the 'BasicAuthPackage' that we are defining here. We use SOAP::Transport::HTTP and declare ourselves to be of the same type of the package we are subclassing.

Assuming the code is still legible despite the formatting, the code in blue that we added serves to extract the authentication data from the request headers, which looks like "Basic BASE64DATA==". We call the BASE64DATA "crypt" in the code but there's nothing secure about this - be sure to send credentials over SSL in production. We then decode the Base64 data to get the "user:password" string that was passed before the '@' in the client's URL. It is up to the reader to expand the code to do proper database lookups for production release.

However, do note the commented code in red: the call to $self->SUPER::handle will hang if left inline. Instead, I've replaced this with an explicit call to the handle method for HTTP::Server, which appears the be the super class for Transport::HTTP::Daemon. I'm also not an expert on SOAP so the make_fault may not be technically correct but it's working for now.

To complete the transition, we then simply redefine our daemon to use our derived class:

    #my $daemon = SOAP::Transport::HTTP::Daemon
    my $daemon = BasicAuthDaemon