Tuesday, March 27, 2012

A Dollar is not a Krona, a Deciliter is not a Cup - adventures with Google Translate and Units

The country Sweden has a very interesting use of the official @Sweden account - each week, a new individual gets to use the account, allowing lots of interesting Swedish stuff flow through your twitter thing.

While following this account during the week that @vassaste_kniven was @Sweden, the following interesting misfeature of Google Translate came up; the guy loves to make food, and was trying to translate his recepies with the help of Google Translate. He complains:
  •  I know, but if you google translate the page 3 dl turns to 3 cups.  :) 
Now a "1 dl" (deciliter) is not the same as "a cup". If you ask google "1 cup in dl" it goves the right answer:
  • 1 US cup = 2.36588237 dl
Odd. So Google knows this, but Google translate does not!

I started playing around with units in Google translate to see how wrong it can be, and I quickly went down the rabbit hole of total bizzaro world.

What about the money, money, money?

The first unit that came to mind was money. For example, if you want to translate "20 kronor" (that's 20 swedish crowns, i.e. currency unit SEK), google translates it to "$ 20"! Now, 20 bucks is, according to the search "20 USD in SEK":

  • 20 U.S. dollars = 133.313781 Swedish kronor

So thinking it's rather crazy that Google translates straight to US dollars, I then sort of accidentally tried to translate "3 kronor" instead, and to my amazement got "3 crowns".

I was like... what?!

20 kronor was one thing, 3 kronor another?

I was baffled. My bafflatron started to blink. So I started playing.

  • "1 krona" is "1 crown
  • "2 kronor" is "two crowns
  • "3 kronor" is "three crowns"
  • "4 kronor" is "SEK 4" for some reason!?
  • "5 kronor" is "SEK 5
  • "6 kronor" is "6 crowns" (up to 8)
  • "9 kronor" is suddenly "9 dollars"
  • "10 kronor" is "SEK 10" and we are back to currency units!?
  • "11 kronor" through "19 kronor" all get the "SEK" treatment
  • "20 kronor" is suddenly "$ 20" - with the dollar sign, not "dollars" like "9 kronor" got for some reason?
  • "100 kronor" is "$100"
  • "1000 kronor" is "$1000"
  • "1001 kronor is "SEK 1001"
I could go on, but you can try it yourself at Google Translate.

But it doesn't end here. It gets even weirder.

I know from me playing around with Japanese translations that Google considers complete sentences (ending with a period) different from incomplete sentences. I.e. you can get a completely different translation if Google knows the sentence is ending rather than not (especially in Japanese), e.g. "O-genki desu ka" translates to "How are you" vs. "O-genki desu ka." with the period" to "How are you doing".

So I tried the numbers with a period, and if I wasn't baffled before, my bafflatron now started smoking and glow red:
  • "1 krona." is "1 crown." (no change but the period)
  • "2 kronor," is "2 crowns." (went from written to numeric form!)
  • "3 kronor," is "3 crowns." (same change as 2)
  • "4 kronor." became "4 million.". 

My bafflatron immediately exploded! Holy haleakala what is going on here!? I kept going:
  • "5 kronor." is also "5 million."
  • "6 kronor." up to "8 kronor." is also as many "million"!?
  • "9 kronor." is suddenly "9 dolllars." again
  • "10 kronor." is "$ 10."
  • "100 kronor." is "$ 100."
  • "1000 kronor." is "1000 dollars." (no longer a dollar sign?)
  • "1001 kronor." is "1001 dollars
I could go on but you can play yourself too.

Back to the cookery...

I tried some other Swedish and European units, and got more fun results:
  • "1 dl" is "1 cup"
  • "2 dl" is "2 cups"
  • "3 dl" is "3 dl" ....okay, thats at least accurate ! :)
  • "4 dl" is "4 cups"
  • "5 dl" is "5 dl" ....again right, also for 6 and 7
  • "8 dl" is "8 ounces". WHAT? New unit again? A fluid ounce is 0.29 dl actually!!
  • "9 dl" is "9 dl"
  • "10 dl" is "10 ml"
Stop right there.... a dl (deciliter) and ml (milliliter) are not the same thing (there are 100 ml in a dl).

Again, I could go on, but you can play for yourself.

Adding periods this time didn't seem to change. But what if I spelled out "deciliter"?

Well, it got a lot better; It just decided to - quite randomly - write out "decilitre" vs writing "dl" in the result, e.g.
  • "1 deciliter" was "1 decilitre"
  • "2 deciliter" was "2 decilitres"
  • "3 deciliter" was "3 dl"
...and so on

My conclusion So Far

Having played a lot with Google Translate recently, I can't say I understand it, but I have developed an understanding for it. It seems to want to try to map idiom to idiom. In Japanese, there are a bunch of "set phrases", which if you translate them literally, they mean one think, but they've been idiomized to mean another thing. 

For example, "hajimemashite" literally means something like "as for the first time", but it is used as a general greeting when meeting someone for that first time, and hence gets translated (quite reasonably) by Google to "Nice to meet"... coz that's what you mean when you say it.

For some reason Google seems to want to idiomatically translate "20 kronor" to "$ 20" almost like if you were to translate a phrase like "I bet you twenty bucks that he won't do X", you wouldn't want the translation tool to translate it to betting 133 kronor and 31 ├Âre in Swedish :)

My theory is that this idiomatic logic makes it fail heavily on units?

Anyway... there is probably more fun to be had with other units, but I just wanted to emphasize with this...

  • ...do NOT trust Google Translate with units. Not even a tiny bit.

Zap out



Steve Neal said...

Yep - I love Google Translate. I also hate Google Translate. When I use it on Swedish (I speak bad Swedish but often use it as a sanity check) SVT translates to BBC... Very odd.

Anonymous said...

A long time ago when I was trying to do a translation project (from Swedish to English, incidentally), I discovered how awful and weird Google Translate could be. But from what I understand it's a system based on some kind of statistical analysis of matching corpuses (corpusii?). I don't know if it's still there, but there used to be an option to improve the translation by submitting what the translation *should have* been -- ie, adding your own version to the statistical pile.

So presumably there were all sorts of inconsistent/incorrect initial examples of translation from one language's units to another. What is certain is they are not (presently) doing any form of semantic recognition of WHAT a kronor or dollar is.

Anonymous said...

Er, actually, the initial examples wouldn't even have to be incorrect. But when you have thousands of examples like:

XXX Kronor => YYY Dollars
XXX SEK => XXX Swedish Kronor
X.XXX.XXX SEK = Y million
etc etc

Then a statistical method is going to yield some peculiar correlations.

DiLina said...

Convert volume units. Easily convert deciliter to liter, convert dl to l.