Converting Decimals to Roman Numerals with Bash

Decimals to Roman numerals—here we hit all the limitations of Bash shell scripting.

My last few articles have given me a chance to relive my undergraduate computer science degree and code a Roman numeral to decimal converter. It's quite handy when you're watching old movies (when was MCMLVII anyway?), and the basic coding algorithm was reasonably straightforward. (See Dave's "Roman Numerals and Bash" and "More Roman Numerals and Bash".)

The trick with Roman numerals, however, is that it's what's known as a subtractive notation. In other words, it's not a position → value or even symbol → value notation, but a sort of hybrid. MM = 2000, and C = 100, but MMC and MCM are quite different: the former is 2100, and the latter is 1000 + (–100 + 1000) = 1900.

This means that the conversion isn't quite as simple as a mapping table, which makes it a good homework assignment for young comp-sci students!

Let's Write Some Code

In the Roman numeral to decimal conversion, a lot of the key work was done by this simple function:


mapit() {
   case $1 in
     I|i) value=1 ;;
     V|v) value=5 ;;
     X|x) value=10 ;;
     L|l) value=50 ;;
     C|c) value=100 ;;
     D|d) value=500 ;;
     M|m) value=1000 ;;
      * ) echo "Error: Value $1 unknown" >&2 ; exit 2 ;;
   esac
}

You'll need this function to proceed, but as a cascading set of conditional statements. Indeed, in its simple form, you could code a decimal to Roman numeral converter like this:


while [ $decvalue -gt 0 ] ; do

  if [ $decvalue -gt 1000 ] ; then
    romanvalue="$romanvalue M"
    decvalue=$(( $decvalue - 1000 ))
  elif [ $decvalue -gt 500 ] ; then
    romanvalue="$romanvalue D"
    decvalue=$(( $decvalue - 500 ))
  elif [ $decvalue -gt 100 ] ; then
    romanvalue="$romanvalue C"
    decvalue=$(( $decvalue - 100 ))
  elif [ $decvalue -gt 50 ] ; then
    romanvalue="$romanvalue L"
    decvalue=$(( $decvalue - 50 ))
  elif [ $decvalue -gt 10 ] ; then
    romanvalue="$romanvalue X"
    decvalue=$(( $decvalue - 10 ))
  elif [ $decvalue -gt 5 ] ; then
    romanvalue="$romanvalue V"
    decvalue=$(( $decvalue - 5 ))
  elif [ $decvalue -ge 1 ] ; then
    romanvalue="$romanvalue I"
    decvalue=$(( $decvalue - 1 ))
  fi

done

This actually works, though the results are, um, a bit clunky:


$ sh 2roman.sh 25
converts to roman numeral  X X I I I I I

Or, more overwhelming:


$ sh 2roman.sh 1900
converts to roman numeral  M D C C C L X X X X V I I I I I

I suppose there is some sort of charm to the latter, but there also are much, much better ways to simplify this. You can do all the math, but since my approach to coding is often "be lazy, get it done, move on", let's recognize that there are a very small number of special case numeric values:


900 = CM
400 = CD
90  = XC
40  = XL
9   = IX
4   = IV

That's really it. The notation allows only a single character subtracting from another, so you can't have CCM or IIX (the latter being correctly written as VIII), and some of the other possible two-character notations don't make sense. For example, why use VX when V is the same value?

So given that, all you really need to do is expand the if-elseif block to add the five possible values above, which makes for a pretty darn long code block. But before I share it, did you catch the error in the above code?

It's the other reason that the resultant Roman numerals are so darn long, actually. Let's look at just the very first conditional statement:


if [ $decvalue -gt 1000 ] ; then
  romanvalue="$romanvalue M"
  decvalue=$(( $decvalue - 1000 ))

Here's the question to ask yourself: what happens if the $decvalue is exactly 1000? Isn't that "M"? Yes, it is. Which means that all these conditionals are wrong; instead of being -gt they should be -ge.

With that fix, here's the big block of code:


while [ $decvalue -gt 0 ] ; do

  if [ $decvalue -ge 1000 ] ; then
    romanvalue="$romanvalue M"
    decvalue=$(( $decvalue - 1000 ))
  elif [ $decvalue -ge 900 ] ; then
    romanvalue="$romanvalue CM"
    decvalue=$(( $decvalue - 900 ))
  elif [ $decvalue -ge 500 ] ; then
    romanvalue="$romanvalue D"
    decvalue=$(( $decvalue - 500 ))
  elif [ $decvalue -ge 400 ] ; then
    romanvalue="$romanvalue CD"
    decvalue=$(( $decvalue - 400 ))
  elif [ $decvalue -ge 100 ] ; then
    romanvalue="$romanvalue C"
    decvalue=$(( $decvalue - 100 ))
  elif [ $decvalue -ge 90 ] ; then
    romanvalue="$romanvalue XC"
    decvalue=$(( $decvalue - 90 ))
  elif [ $decvalue -ge 50 ] ; then
    romanvalue="$romanvalue L"
    decvalue=$(( $decvalue - 50 ))
  elif [ $decvalue -ge 40 ] ; then
    romanvalue="$romanvalue XL"
    decvalue=$(( $decvalue - 40 ))
  elif [ $decvalue -ge 10 ] ; then
    romanvalue="$romanvalue X"
    decvalue=$(( $decvalue - 10 ))
  elif [ $decvalue -ge 9 ] ; then
    romanvalue="$romanvalue IX"
    decvalue=$(( $decvalue - 9 ))
  elif [ $decvalue -ge 5 ] ; then
    romanvalue="$romanvalue V"
    decvalue=$(( $decvalue - 5 ))
  elif [ $decvalue -ge 4 ] ; then
    romanvalue="$romanvalue IV"
    decvalue=$(( $decvalue - 4 ))
  elif [ $decvalue -ge 1 ] ; then
    romanvalue="$romanvalue I"
    decvalue=$(( $decvalue - 1 ))
  fi

done

It works (albeit with some easily removed spaces) with some basic numeric tests:


$ sh 2roman.sh 71
converts to roman numeral  L X X I
$ sh 2roman.sh 1997
converts to roman numeral  M CM XC V I I
$ sh 2roman.sh 666
converts to roman numeral  D C L X V I

The problem is, that's a long and graceless block of code, even if it does solve the problem.

Making the Code More Concise

Obviously, every block of code has the very same format:


elif [ $decvalue -ge VALUE ] ; then
  romanvalue="$romanvalue NOTATION-FOR-VALUE"
  decvalue=$(( $decvalue - VALUE ))

As highlighted, there are only two values to consider: the numeric value VALUE and the one or two character NOTATION-FOR-VALUE. As an example, VALUE=90, NOTATION=XC. The logical function then is:


SubIfValue()
{
  # if $decvalue >= $2 then add $3 to romanvalue
  # and subtract $2 from decvalue

  if [ $decvalue -ge $1 ] ; then
    romanvalue="${romanvalue}$2"
    decvalue=$(( $decvalue - $1 ))
  fi
}

This would produce a series of invocations like this:


SubIfValue  500 "D"
SubIfValue  400 "CD"
SubIfValue  100 "C"
SubIfValue   90 "XC"

But there's a problem with this. The loop has to iterate and subtract the largest possible value each time through; otherwise, you get very odd results.

So algorithmically, you still need to have the if-then-elif loop anyway:


if ( SubIfValue 500 "D" fails ) then

The problem is, that's really hard to do cleanly because you can't actually return values from a Bash shell function. Rather than be too clunky, therefore, I'm going to find a compromise on my quest to clean up the code. I can leave the function mostly the same:


SubValue()
{
  # add $3 to romanvalue and subtract $2 from decvalue

  romanvalue="${romanvalue}$2"
  decvalue=$(( $decvalue - $1 ))

}

The sequence of calls is going to look more succinct, however:


if [ $decvalue -ge 1000 ] ; then
  SubValue 1000 "M"
elif [ $decvalue -ge 900 ] ; then
  SubValue 900 "CM"
elif [ $decvalue -ge 500 ] ; then
  SubValue 500 "D"
elif [ $decvalue -ge 400 ] ; then
  SubValue 400 "CD"
elif [ $decvalue -ge 100 ] ; then
  SubValue 100 "C"
elif [ $decvalue -ge 90 ] ; then
  SubValue 90 "XC"
elif [ $decvalue -ge 50 ] ; then
  SubValue 50 "L"
elif [ $decvalue -ge 40 ] ; then
  SubValue 40 "XL"
elif [ $decvalue -ge 10 ] ; then
  SubValue 10 "X"
elif [ $decvalue -ge 9 ] ; then
  SubValue 9 "IX"
elif [ $decvalue -ge 5 ] ; then
  SubValue 5 "V"
elif [ $decvalue -ge 4 ] ; then
  SubValue 4 "IV"
elif [ $decvalue -ge 1 ] ; then
  SubValue 1 "I"
fi

And, that's the fully functional script once you wrap it in the while;do / done loop.

A few tests:


$ sh 2roman.sh 1991
converts to roman numeral MCMXCI
$ sh 2roman.sh 2222
converts to roman numeral MMCCXXII
$ sh 2roman.sh 1234
converts to roman numeral MCCXXXIV

So there you go. Solved. Now I can't recall why it seemed so daunting when I was in college. I will note that in a more sophisticated programming language, you could come up with a considerably shorter solution, particularly if you could utilize a two-dimensional array for number/characters pairs. But, that's not the Bash shell, so we work with what we have, right?

See you next time. Meanwhile, do you have an interesting programming puzzle? Drop me a note, and I'll have a look at it!

Dave Taylor has been hacking shell scripts on UNIX and Linux systems for a really long time. He's the author of Learning Unix for Mac OS X and Wicked Cool Shell Scripts. You can find him on Twitter as @DaveTaylor, and you can reach him through his tech Q&A site: Ask Dave Taylor.

Load Disqus comments