Tuesday, March 27, 2012

A Dollar is not a Krona, a Deciliter is not a Cup - adventures with Google Translate and Units

The country Sweden has a very interesting use of the official @Sweden account - each week, a new individual gets to use the account, allowing lots of interesting Swedish stuff flow through your twitter thing.

While following this account during the week that @vassaste_kniven was @Sweden, the following interesting misfeature of Google Translate came up; the guy loves to make food, and was trying to translate his recepies with the help of Google Translate. He complains:
  •  I know, but if you google translate the page 3 dl turns to 3 cups.  :) 
Now a "1 dl" (deciliter) is not the same as "a cup". If you ask google "1 cup in dl" it goves the right answer:
  • 1 US cup = 2.36588237 dl
Odd. So Google knows this, but Google translate does not!

I started playing around with units in Google translate to see how wrong it can be, and I quickly went down the rabbit hole of total bizzaro world.

What about the money, money, money?

The first unit that came to mind was money. For example, if you want to translate "20 kronor" (that's 20 swedish crowns, i.e. currency unit SEK), google translates it to "$ 20"! Now, 20 bucks is, according to the search "20 USD in SEK":

  • 20 U.S. dollars = 133.313781 Swedish kronor

So thinking it's rather crazy that Google translates straight to US dollars, I then sort of accidentally tried to translate "3 kronor" instead, and to my amazement got "3 crowns".

I was like... what?!

20 kronor was one thing, 3 kronor another?

I was baffled. My bafflatron started to blink. So I started playing.

  • "1 krona" is "1 crown
  • "2 kronor" is "two crowns
  • "3 kronor" is "three crowns"
  • "4 kronor" is "SEK 4" for some reason!?
  • "5 kronor" is "SEK 5
  • "6 kronor" is "6 crowns" (up to 8)
  • "9 kronor" is suddenly "9 dollars"
  • "10 kronor" is "SEK 10" and we are back to currency units!?
  • "11 kronor" through "19 kronor" all get the "SEK" treatment
  • "20 kronor" is suddenly "$ 20" - with the dollar sign, not "dollars" like "9 kronor" got for some reason?
  • "100 kronor" is "$100"
  • "1000 kronor" is "$1000"
  • "1001 kronor is "SEK 1001"
I could go on, but you can try it yourself at Google Translate.

But it doesn't end here. It gets even weirder.

I know from me playing around with Japanese translations that Google considers complete sentences (ending with a period) different from incomplete sentences. I.e. you can get a completely different translation if Google knows the sentence is ending rather than not (especially in Japanese), e.g. "O-genki desu ka" translates to "How are you" vs. "O-genki desu ka." with the period" to "How are you doing".

So I tried the numbers with a period, and if I wasn't baffled before, my bafflatron now started smoking and glow red:
  • "1 krona." is "1 crown." (no change but the period)
  • "2 kronor," is "2 crowns." (went from written to numeric form!)
  • "3 kronor," is "3 crowns." (same change as 2)
  • "4 kronor." became "4 million.". 

My bafflatron immediately exploded! Holy haleakala what is going on here!? I kept going:
  • "5 kronor." is also "5 million."
  • "6 kronor." up to "8 kronor." is also as many "million"!?
  • "9 kronor." is suddenly "9 dolllars." again
  • "10 kronor." is "$ 10."
  • "100 kronor." is "$ 100."
  • "1000 kronor." is "1000 dollars." (no longer a dollar sign?)
  • "1001 kronor." is "1001 dollars
I could go on but you can play yourself too.

Back to the cookery...

I tried some other Swedish and European units, and got more fun results:
  • "1 dl" is "1 cup"
  • "2 dl" is "2 cups"
  • "3 dl" is "3 dl" ....okay, thats at least accurate ! :)
  • "4 dl" is "4 cups"
  • "5 dl" is "5 dl" ....again right, also for 6 and 7
  • "8 dl" is "8 ounces". WHAT? New unit again? A fluid ounce is 0.29 dl actually!!
  • "9 dl" is "9 dl"
  • "10 dl" is "10 ml"
Stop right there.... a dl (deciliter) and ml (milliliter) are not the same thing (there are 100 ml in a dl).

Again, I could go on, but you can play for yourself.

Adding periods this time didn't seem to change. But what if I spelled out "deciliter"?

Well, it got a lot better; It just decided to - quite randomly - write out "decilitre" vs writing "dl" in the result, e.g.
  • "1 deciliter" was "1 decilitre"
  • "2 deciliter" was "2 decilitres"
  • "3 deciliter" was "3 dl"
...and so on

My conclusion So Far

Having played a lot with Google Translate recently, I can't say I understand it, but I have developed an understanding for it. It seems to want to try to map idiom to idiom. In Japanese, there are a bunch of "set phrases", which if you translate them literally, they mean one think, but they've been idiomized to mean another thing. 

For example, "hajimemashite" literally means something like "as for the first time", but it is used as a general greeting when meeting someone for that first time, and hence gets translated (quite reasonably) by Google to "Nice to meet"... coz that's what you mean when you say it.

For some reason Google seems to want to idiomatically translate "20 kronor" to "$ 20" almost like if you were to translate a phrase like "I bet you twenty bucks that he won't do X", you wouldn't want the translation tool to translate it to betting 133 kronor and 31 ├Âre in Swedish :)

My theory is that this idiomatic logic makes it fail heavily on units?

Anyway... there is probably more fun to be had with other units, but I just wanted to emphasize with this...

  • ...do NOT trust Google Translate with units. Not even a tiny bit.

Zap out