Romans, rubies and the D

There’s an increasing interest with the D programming language amongst my readers so I figured I’ll post a bunch of short posts about D and see what happens.

Anyway, here’s a classic example showing Ruby’s capabilities taken from Seven Languages in Seven Weeks:

class Roman
  def self.method_missing name, *args
    roman = name.to_s
    roman.gsub!("IV", "IIII")
    roman.gsub!("IX", "VIIII")
    roman.gsub!("XL", "XXXX")
    roman.gsub!("XC", "LXXXX")

    (roman.count("I") +
     roman.count("V") * 5 +
     roman.count("X") * 10 +
     roman.count("L") * 50 +
     roman.count("C") * 100)
  end
end

puts Roman.X
puts Roman.XC
puts Roman.XII

This snippet creates a simple Roman numbers API using Ruby’s method_missing – it basically translates method name to its decimal value. For simplicities sake it misses error checking and the like.

Here’s my version in the D programming language:

import std.stdio;
import std.array;
import std.algorithm;

struct Roman {
    @property static auto opDispatch(string number)() {
        enum num = number.replace("IV", "IIII")
                         .replace("IX", "VIIII")
                         .replace("XL", "XXXX")
                         .replace("XC", "LXXXX");

        enum value = num.count('I')
                   + num.count('V') * 5
                   + num.count('X') * 10
                   + num.count('L') * 50
                   + num.count('C') * 100;

        return value;
    }
}

void main() {
    writeln(Roman.XV);
    writeln(Roman.IIX);
    writeln(Roman.VXCII);
}

Again, for the sake of simplicity it doesn’t check for any errors and whatnot.
This code works the same way Ruby’s version does. Using opDispatch (D’s version of method_missing) it first translates the method name into a simpler form and then counts different characters obtaining the decimal value.

The biggest and most significant difference is the fact, that D’s version has no runtime overhead at all!

D’s opDispatch is a template, meaning that it’s instantiated at compile time making string number a compile time value.
Now, making a use of such a compile time value and assigning the result to an enum triggers CTFE in D – the code is run at compile time leaving no runtime overhead at all!

(Obviously there might be some overhead in opDispatch!RomanLiteral() call, but it will most likely be inlined by the compiler.)

We end up having a simple, readable API using high level features in a systems programming language that’s as fast as C. This is one of the numerous reasons I like D so much.

About these ads

18 responses to “Romans, rubies and the D

  1. Really cool.

    I have a doubt, in the number.replace bit, how all those .replace lines work? Since it is aligned to number it is assumed to be number.replace in each one of them?

    • D’s a free-form-whitespace language like C, so the “replace” stuff is exactly the same as this:

      number.replace(“IV”, “IIII”).replace(“IX”, “VIIII”).replace(“XL”, “XXXX”).replace(“XC”, “LXXXX”);

      In other words, the result of one replace is passed to the next replace.

      • Is it actually doing that?
        If so, how come IIX comes out as 10?
        IIX should first encounter IX -> VIIII which would transform it to IVIIII correct?
        Then IVIIII would be processed by IV -> IIII which would result in IIIIIIII which is 8 if you count the ‘I’s (which is what the counting code does).
        But the code does not generate a result of 8; it generates a result of 10.
        To me, this indicated that IIX is being converted to IVIIII and then counted as five ‘I’s plus one V which is 5 + 5 = 10.

        It looks to me like the results of replace are not being fed into each other. Am I missing something?

      • Ok, brain fart. Ignore what I said in my other comment (I do not know how to delete it).
        For some insane reason, I was evaluating left to right instead of right to left. Something about thinking about Roman numerals at the same time I guess.

    • Since strings in the D programming language are immutable `replace’ returns a new string with replacements each time, so we can chain method calls this way.

      I personally don’t like method chaining, but oh well.

    • Whitespace is unimportant in D. It is function chaining. Note that D’s replace returns a new string with the replaced text, while ruby’s gsub! changes the receiver string.

  2. This is a version that does not depend on the inliner:

    struct Roman {
        template opDispatch(string number) {
            enum num = number.replace("IV", "IIII")
                .replace("IX", "VIIII")
                .replace("XL", "XXXX")
                .replace("XC", "LXXXX");
    
            enum value = num.count('I')
                + num.count('V') * 5
                + num.count('X') * 10
                + num.count('L') * 50;
    
            enum opDispatch = value;
        }
    }
    
    void main() {
        writeln(Roman.XV);
        writeln(Roman.IIX);
        writeln(Roman.VXCII);
    }
    

    This is what the code generator will see:

    struct Roman{}
    void main() {
        writeln(15);
        writeln(10);
        writeln(97);
    }
    

    That does not seem right, but the original ruby code behaves the same.

    // Added sourcecode tags, hope you don’t mind.

  3. @alexmatto, the implementation of replace will return an array with requested change, each call to replace is just feeding that change to the next replace. It is actually syntactic sugar here:

    replace(replace(number,…), …)…;

  4. Sorry to be that guy, but you forgot `+ num.count(‘C’) * 100;` in the D version :)

    Thanks for posting this. I’ve heard many good things about D, so now I’m going to have to learn a bit more!

  5. Here is a non-destructive Ruby version more similar to what you have for D, and less verbose:

     class Roman
      def self.method_missing name, *args
        roman = name.to_s
        .gsub("IV", "IIII")
        .gsub("IX", "VIIII")
        .gsub("XL", "XXXX")
        .gsub("XC", "LXXXX")
    
        (roman.count("I") +  
         roman.count("V") * 5 +
         roman.count("X") * 10 + 
         roman.count("L") * 50 +
         roman.count("C") * 100)
      end
    end
    
    puts Roman.X
    puts Roman.XC
    puts Roman.XII
    

    Before 1.9 you would need \’s to continue the lines. And something a little more advanced for the counting part:

    %w[I V X L C].inject {|sum, letter| sum + roman.count(letter)}
    

    // Added sourcecode tags.

  6. You have chosen some interesting Roman numbers to test out the D version.
    I am no expert but I believe that it is only valid to have a single ‘lower’ digit before a higher one.
    So, IIX (for eight) would be properly written as VIII.
    I am not sure where you are going with VXCII but my guess is that it is not valid to put VX before C. Actually, VX itself is probably not valid as VX and V would appear to be the same number. Assuming then that VXCII is more properly written as VCII then I guess it would be 97. Your code gives the right answer but I think it is entirely an accident if you follow the code logic. It simplifies VXCII to VLXXXXII and then just gives a numerical value to each digit. I would imagine that VLXXXXII should be 87 as VL would be 45 according to the ‘subtract when leading digit is smaller’ rule and XXXXII would add 42 to that.

    The problem is more in the choice of test values than it is in the code itself. Of course, the code is not exhaustive as shown by the fact that it does not handle VL properly even though I expect that VL is valid (and does not mean 55). Of course, you left C = 100 in the D code as mentioned above.

    Of course, I may just not be able to read Roman numerals properly that is certainly a possibility.

    • All this is true, however I didn’t aim for a working, usable API but rather a nice example.
      Note that with this code you can do all kinds of weird stuff, like…

      auto fooriendship = Roman.MyLittlePonyIX;
      

      …and it’ll happily accept that.

      • Note that D has template constraints, so if you can produce a roman numeral validation function (takes a string and returns true if the string is a valid roman numeral), you can guard your implementation with this, and the compiler only accepts code which passes valid roman numerals:

        (forgive me for not knowing the markup for this site, I have no idea how to make this “code”):

        @property static auto opDispatch(string number)()
          if (isValidRomanNumeral!number)
        {...}
        

        You might say “why use template constraints, just validate the number in the function!”

        Well, template constraints serve two very useful purposes. First, the location of the error is the call, not somwhere inside a (possibly devilishly complex) template function. Second, failing to pass the template constraint acts as if the function doesn’t exist, so the compiler is free to try other templates, allowing overloading just based on template constraints.

        BTW, this is a really cool example of D and opDispatch. I would also point out that this only works because roman numerals can be D symbols. It doesn’t work for other numbering systems that may start with a digit. For another example that works with all types of strings, see http://dlang.org/phobos/std_conv.html#octal

        -Steve

        // Added sourcecode tags.

  7. I see that you are trying to prove a point about D language, but your example is wrong! I am not talking about programming (I am not worthy to judge that) but about Roman Numerals. Strictly speaking, see the wikipedia page: http://en.wikipedia.org/wiki/Roman_numerals , more precisely the points:

    * The symbols “I”, “X”, “C”, and “M” can be repeated three times in succession, but no more. (They may appear more than three times if they appear non-sequentially, such as XXXIX.) “D”, “L”, and “V” can never be repeated.
    * “I” can be subtracted from “V” and “X” only. “X” can be subtracted from “L” and “C” only. “C” can be subtracted from “D” and “M” only. “V”, “L”, and “D” can never be subtracted
    * Only one small-value symbol may be subtracted from any large-value symbol.

    IIX does not exist, neither VXCII.

    Sorry for being pedantic.

    an italian guy used to read roman numerals everywhere ;)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s