Passing by reference, and dog leashes

Pass-by-reference and pass-by-value are pretty confusing when you start learning to code. When I first saw them, I know I ignored the distinction (until I got tired of my code not doing what I expected). Throwing collections into the mix just makes it worse.

Today, though, we stumbled on a pretty decent analogy for passing-by-reference: a reference to an object is like a leash to a dog. Let’s take our dog Dagwood for a walk.

Dog dagwood = new Dog(“Dagwood”);

new Dog() creates, of course, a new Dog object. Dog dagwood creates a reference that can point to any Dog object -- it's really the leash, but we name our references for what they point to, rather than what they are: a reference, a handle, a leash. The equals sign takes the leash, and hooks it to Dagwood's collar. Now we can take Dagwood for a walk.

dagwood.walk();

To tell Dagwood it's time to walk, we tug on the leash. He feels the tug, and gets the message, so he starts following us. We come to a busy road, and wait for the crossing signal, but Dagwood's oblivious, and tries to cross anyway.

dagwood.halt();

Since we're stopped, he feels the tug of the leash again, gets the message, and stops. We're sending messages to Dagwood through his leash. In OO terms, sending a message to an object means calling one of its methods. We're calling methods on our Dagwood through our reference to him, through the leash.

Storing a reference in an array

In the park, we find a snack shop. We're getting hungry, but the snack shop doesn't let dogs inside. Luckily, there's a chain link fence, and in our eyes, a chain link fence is nothing but a big row of places for us to attach a dog leash. We tie a spare leash to the end of the fence, and attach it to Dagwood's collar.

Dog[] fence = new Dog[10]; // only room for 10 dogs
fence[0] = dagwood;

What's happening here in OO terms is that our reference to Dagwood, our leash, is copied into the zeroth slot on the fence. It's not our leash, but it's one just like it. So now there are two leashes on Dagwood: one in our hand, and one on the fence. We'll take our leash off Dagwood, since we can't very well hold it while we're in the store.

dagwood = null;


Don't worry, he's fine...he's still tied to the fence, by that other leash. Let's go buy cashews.

When we come out of the store, we want to re-attach our leash to Dagwood.

dagwood = fence[0];

Now let's untie him from the fence, and head over to the lake.

fence[0] = null;

Passing by reference

Passing references to methods works in much the same way. Dagwood got kind of stinky, swimming in the lake, so let's bring him to the groomer for a bath.

DogGroomer.shampoo(dagwood);

When you pass a reference to a method, your reference is copied into that method. Again, it's like a new leash, one just like ours, springs into the groomer's hand -- now Dagwood's attached to us, and the groomer. He gets fidgety when he's getting bathed, so it's just as well.

From the groomer's perspective, it might look like this:

void shampoo(Dog doggie) {
wet(doggie);
apply(shampoo, doggie);
rinse(doggie);
towelDry(doggie);
}

The groomer doesn't care what Dagwood's name is, she just keeps calling him "doggie." That's ok, she must see a lot of dogs during the day...names aren't that important to her. The interesting thing is, even though it's the groomer who's shampooing our dog, since we still have a leash on him, we can observe him getting cleaner.

When she's done, the procedure ends, the method returns, and her leash to Dagwood disappears. Which is fine, because he's stopped fidgeting, now that he's dry.

Garbage collection

We head back home through the park. Dagwood's itching to run around, but we're tired, so we just unleash him. Hopefully we can find him before it gets dark...

dagwood = null;


Unfortunately, the dog catcher spots him running around without a leash, which is illegal in these parts -- a stray dog will hang around forever, eating up resources. The dog catcher carries off our poor Dagwood, and destroys him. We take it in stride, and try to keep the whole circle of life allocation-deallocation in mind.

So...

So that's how references work. It's why code like this (C#) will ensure the balloon bouquet has at least one balloon that says "Happy Birthday!":

List balloons = GetBalloons();
Balloon printed = balloons.Find(Balloon.IsPrinted);
if (printed == null) {
printed = new Balloon();
printed.PrintMessage("Happy Birthday!");
balloons.Add(printed);
}
return balloons;

My GoRuCo 2008 highlights

Aaron and I had a great time at GoRuCo 2008 last Saturday. Here are my highlights.

Bryan Helmkamp, Story-Driven Development

Bryan Helmkamp’s talk on SDD (slides, 1.6 MB PDF) reminded me of what Scott Hanselman calls “spec documents with teeth.” If I get it right, as you develop user stories, you use a standard format, and code parses your story file, and ties the text directly to functional tests you write. The stories aren’t executable themselves, but it brings them closer together.

Each story has a name, a description, and scenarios; the descriptions follow the format “As a …, I want to …, so that …”:

As a headhunter looking for Rails developers
I want to search for CVs with 8 years experience
So that I can make an exorbitant commission

The scenarios are different ways that story might play out. Each scenario has a description, which follows the format "Given ... When ... Then ...":

Scenario: Enough experience.
Given a CV with 9 years of Rails experience
When I search for qualified Rails candidates
Then I should find the CV
And I should realize the candidate is full of shit

Then code reads your story files, and uses the text following the keywords to connect to executable functional tests you write:

steps_for :cvs do
  Given "a CV with 3 years of Rails experience" do
    @cv = Developer.create!(:first_name => "Joe",
      :last_name => "Developer", :rails_exp => 3,
      :gender => "Male")
  end
end

steps_for :cvs do
  When "I search for qualified Rails candidates" do
    @results = Developer.find_qualified(8)
  end
end

The code that actually performs the test is just ActiveRecord. If you want to test the UI, there's a DSL called Webrat that simulates the browser. It seems to live halfway between Watir and mechanize, and it doesn't do javascript. It looks like this:

steps_for :cvs_with_ui do
  Given "a CV with 3 years of Rails experience" do
    visits new_developer_path
    fills_in "First name", :with => "Joe"
    fills_in "Last name", :with => "Developer"
    selects "4", :from => "Rails Experience"
    chooses "Male"
    clicks_button "Create"
  end
end

I'm curious about chooses "Male" -- I don't see how it connects that text with the drop-down it's supposed to change, unless it looks at values for all drop-downs on the page. He gave a nice breakdown of the differences between Webrat and Selenium, and how SDD fits into an Agile team.

Giles Bowkett, Archaeopteryx

That's ARK-ee-OP-ter-ix, or Arcx for short. Made by Giles (soft 'g') BOH-ket, or boh-KET, I wasn't exactly sure which. It is, in his words, "a Ruby MIDI generator," and "a system for auto-generating, self-modifying music."

Giles had hands-down the most entertaining talk of the day. Instead of poring through each token of the code, he compared taking VC money (or "weasel-brained muppet fucker" money) to being an artist with a patron -- as the programmer/artist, your job is to make stuff that makes your VC/patron look good. He showed some of Leonardo da Vinci's designs that weren't constructed until recently; that it took this long, he said, was a failure of da Vinci's time. Similarly, staying within a VC's idea of what's possible is a failure of wasted passion and energy. Start-ups are so cheap now, you can ignore VCs -- so follow your passion, create an open-source-enriched ecosystem around it, and make money by servicing the niche market you made. If your startup is great, you can say "my career is this thing"; if it's mediocre, you can say "my career includes this thing". Just remember that good artists ship.

Which brings us to Arcx, Giles' idea for a crowd-interfacing, automatic music machine. Someone asked whether it was wise to name a new project after an extinct species, and Giles got all clever: archaeopteryx was either the last dinosaur or the first bird, and the project could be either the last DJ tool, or the first automatic music machine, played by the crowd. He played us some samples, and talked through the code just a bit, dropping hits about the CLOS-like structure of his code. As fun as his talk was, I would've liked to hear more music, and more about the lambda-passing and CLOS stuff.

He also pointed out the most interesting ruby book I'd never heard of, Practical Ruby Projects: Ideas for the Eclectic Programmer.

Chris Wanstrath, ParseTree

Out of all the talks, Chris' was the one I'll be using first. Lispers say "code is data," and I can see why that's so powerful -- but I haven't really tried it yet. ParseTree brings some of that coolness to ruby:

require 'ruby2ruby'

def to_ruby(&blk)
   blk.to_ruby
end

puts to_ruby { 1 + 1 }
puts to_ruby { |i| i == 42 }

def to_sexp(&blk)
   blk.to_sexp
end

puts to_sexp { 1 + 1 }
puts to_sexp { |i| i == 42 }

...which produces:

proc {
  (1 + 1)
}
proc { |i|
  (i == 42)
}
[:proc, nil, [:call,
   [:lit, 1], :+, [:array, [:lit, 1]]]]
[:proc, [:dasgn_curr, :i], [:call,
   [:dvar, :i], :==, [:array, [:lit, 42]]]]

Most of the examples he gave generated query syntax in a ruby-idiomatic way: say you have an ORM, and you want users to search for users like this:

old_cat_people = Users.select do |u|
   u.favorite_pet == "cat" && u.age > 100
end

How could you turn that into SQL? Call to_sexp on the query block (that's "to S-expression"), and evaluate the abstract syntax tree it returns. Like this:

class Users
   def self.select(&query)
      query = query.to_sexp

      # Now, query looks like this:
      # [:proc,
      #    [:dasgn_curr, :u],
      #       [:and,
      #          [:call,
      #             [:call, [:dvar, :u], :favorite_pet],
      #             :==,
      #             [:array, [:str, "cat"]]],
      #          [:call,
      #             [:call, [:dvar, :u], :age],
      #             :>,
      #             [:array, [:lit, 100]]]]]

      # Time to evaluate the AST.
   end
end

Admittedly, it's not that trivial, but that's the gist of it -- and I think the gem helps you with this. (Cue the smug Lispers: this stuff is natural in Lisp, the way passing anonymous functions blocks is in ruby.)

But here's the interesting thing: we're on our way to building .Net's LINQ right into ruby. Remember, it stands for Language Integrated Query. LINQ is a big deal for .Net folks, because it's handy. Microsoft put a lot of work into it, and it still needs its own new syntax. I think that's a pretty clear example of the power of being able to see your own AST, and code off it.

Ryan Davis, Hurting Code for Fun and Profit

Ryan's talk was nominally about using tools like Heckle and Flog to beat the evil out of code, but my favorite part was his introspection-driven development. I know I'll want to refer others to his slides and audio throughout my career.

Some of his tips for improving as a programmer:

  • Read. 1 technical book a month. Sites like c2.com -- get close to the experts.
  • Focus. Only a few smart blogs: not zillions, not flame-prone. (Flame-retardant blogs?)
  • Grow. Learn 1 new language a year. Learn things outside of programming.
  • Learn from the pottery challenge story in Art & Fear: practice, a lot.

Ryan is also a fellow Dvorak typist, and pretty emphatic about it.

Johnson

Johnson is a ruby gem that executes JavaScript code. (It's a successor to RKelly, which did the same thing.) I don't know why I think this is so cool. Most people agreed the main use case for something like this is testing, but it seems to me there might be neater tricks to play. We'll see how I feel after playing with it for a while.

GoRuCo 2009?

I'm definitely going next year -- see you there.

Trading Space for Speed: Memoizing with Ruby Facets

Recently, I talked about a faster, cheaper way to calculate Fibonacci numbers. One of the optimizations I made was to remember the value of each Fibonacci number: since F(7) is always 13, instead of recalculating it each time N=7, we can stuff 7 -> 13 into a look-up table for future reference. The function builds up a cheat-sheet, to avoid doing the re-work. It remembers.

This is called memoization, and it’s a nice way to trade memory for performance. But it only works when the function always returns the same answer for a given set of arguments — otherwise it’s first-in wins, forever. This property of a function, returning the same answer for the same args, is called referential transparency.

A Sample Implementation

There are lots of ways you could memoize a function. Hash tables are a natural choice, since they map a key to a value, just like functions map arguments to a value. Even if you implement it differently, a hash table is a good working model for memoization.

Let’s briefly consider factorials. The regular version:

class Unmemoized
    def factorial(n)
        puts n
        if n < 1
            1
        else
            n * factorial(n-1)
        end
    end
end

unmemoized = Unmemoized.new

5.downto(1) { |i| puts "\t#{unmemoized.factorial(i)}" }

...and the memoized version:

class Memoized
    attr_reader :factorial_memo
    def initialize
        @factorial_memo = {}
    end

    def factorial(n)
        puts n
        unless @factorial_memo.has_key? n
            if n < 1
                @factorial_memo[n] = 1
            else
                @factorial_memo[n] = n * factorial(n-1)
            end
        end

        @factorial_memo[n]
    end
end

memoized = Memoized.new

5.downto(1) { |i| puts "\t#{memoized.factorial(i)}" }

puts memoized.factorial_memo.inspect

Printing the hashtable is especially telling: {5=>120, 0=>1, 1=>1, 2=>2, 3=>6, 4=>24} It reads like a look-up table for factorials.

Memoization in Facets

As relatively easy as that example is, it has its drawbacks: we need to track our previous results in a separate variable, the memoization code is mixed up with the actual calculation (the part we care about), we can't easily use it with other functions, and the pattern only works for functions of one argument. Facets makes memoization trivial, and removes all these issues.

require 'facets/memoize'

class FacetsMemoized
    def factorial(n)
        puts n
        if n < 1
            1
        else
            n * factorial(n-1)
        end
    end

    memoize :factorial # <= HINT
end

facets_memoized = FacetsMemoized.new

5.downto(1) { |i| puts "\t#{facets_memoized.factorial(i)}" }

In case you missed it, this is just like Unmemoized above, except we added line 13, memoize :factorial...that's it. Just like attr_reader and friends, you can pass a list of symbols to memoize, and it'll work on functions with any number of arguments:

require 'facets/memoize'

class MemoizedMath
    def add(n, m)
        n + m
    end
    def mult(n, m)
        n * m
    end
    memoize :add, :mult
end

When You Might Use Memoization, and What to Avoid

There are a number of places where this is useful: calculating a value by successive approximation, finding the path to the root node in an immutable tree structure, finding the Nth number in a recursively-defined series, even simple derived values (like 'abc'.upcase). In general, a function is a good candidate if it only looks at its arguments (no global, class, or member variables, no files or databases) -- especially if those arguments are immutable.

Relying on side-effects (printing to standard out, writing to a database or file, or updating a variable) in memoized methods is a bad idea: they'll only happen the first time your method is called with those arguments, which is probably not what you intend. (Unless you're printing the arguments to illustrate how memoizing works.) On the other hand, relying on side-effects is generally a bad idea anyway. Even if you don't use a functional programming language, you can still benefit from minimizing state changes.

Further Reading

If memoization sounds interesting to you, you might like Oliver Steele's article about memoizing JavaScript functions. If you're curious about immutability, you might like this Joshua Bloch interview. If you're interested in functional programming, there are worse places to start than the excellent Structure and Interpretation of Computer Programs. And of course, there's more where that came from, in Ruby Facets.

Why We Abstract, and What To Do When We Can’t

Whenever you see yourself writing the same thing down more than once, there’s something wrong and you shouldn’t be doing it, and the reason is not because it’s a waste of time to write something down more than once. It’s because there’s some idea here, a very simple idea, which has to do with the Sigma notation…not depending upon what it is I’m adding up. And I would like to be able to always…divide the things up into as many pieces as I can, each of which I understand separately. I would like to understand the way of adding things up, independently of what it is I’m adding up.

- Gerald Sussman, SICP Lecture 2a, “Higher-order Procedures” (emphasis added)

The purpose of abstracting is not to be vague, but to create a new semantic level in which one can be absolutely precise.

- Edsger W. Dijkstra, The Humble Programmer

What Larry Wall said about Perl holds true: “When you say something in a small language, it comes out big. When you say something in a big language, it comes out small.” The same is true for English. The reason that biologist Ernst Haeckel could say “Ontogeny recapitulates phylogeny” in only three words was that he had these powerful words with highly specific meanings at his disposal. We allow inner complexity of the language because it enables us to shift the complexity away from the individual utterance.

- Hal Fulton, The Ruby Way, Introduction (emphasis added)

Programming is our thoughts, and with better ways to express them, we can spend more time thinking them, and less time expressing them.

3 + 3 + 3 + 3 + 3 + 3 is hard…hard to read (how many threes?), hard to get right (I lost count!), hard to reason about (piles of operations!). 3 x 6 is easy, once you learn multiplication. This is a good trade-off. We should look for ways to add abstractions, new semantic levels, to our programs.

If you’re doing the same thing twice, stop, and look for the common idea. Peel the idea away from the context, from the details. Grasp the idea, and then use it over and over. As a bonus, you’ll type less, re-use code, and debug less.

“But I can’t find ways to do that!”

When you look at similar bits of code, and can’t find a good way to remove the duplication, you’re hitting the limits of either your language, or your knowledge.

Programming languages put up very real walls, they force you down their paths, often by leaving out features. A language without recursion puts up a wall in front of recursive solutions; a language without first-class functions makes it tough to write higher-order functions. Language limitations are the cause of Greenspun’s Tenth Rule.

Sometimes, the language is not the problem. Sometimes you just can’t find your way through. This is why you read Refactoring, and Design Patterns, but really, this is why you learn other programming languages. Think about the right way to factor the problem.

If you can’t remove the duplication, you need to work around your language, or learn some new tricks.