SEO for Foreign-Language and Non-ASCII Character Sets

April 12, 2008 4 By Tad Reeves

image This is something I’ve wanted to nail down for some time:  How do you do Search Engine Optimization for foreign character sets? 

SEO is getting to be more and more a normal thing to do, and less and less of a hidden black art.  Google has made it plain enough times that what they want is good, fresh, updated, relevant content,  and not a bunch of garbage. 

Pursuant to that, you’ve got a ton of fairly-well-documented best-practices for SEO’ing your site.  And, if you don’t know the first thing about SEO at all — well — read a good book on the subect.  My favourites are:

Or you can just hit SEOMoz or SEOBook for some hot tips.

But one unfortunate thing is that most of the best SEO data is coming people who are ignorant Americans like me.  Despite my love of geography and far-off places, I can speak no foreign languages fluently, except for some Korean bad words I learned from fellow soccer players

What does that have to do with anything?

Take the preceding picture I just linked to where I’m doing a soccer throw-in.

Assuming you could edit that page, if you ask any search engine novice to optimize that page to show up well for its subject matter, they’d probably tell you to hit the easy things first.  They’d tell you to optimize:

– HTML <title> tag
– <meta description=> tag
– <meta keywords=> tag
– <H1> text
– Body text
– text of inbound links
– filename of the page

imageIdeally your page would have “Soccer Throw-In” or a more unique title and <h1> text, and would have a description and set of meta keywords that followed along.  Ideally, as well, you’d have a filename like “/soccer-throw-in.html” or similar. 

Easy, right?  Of course it is — in English.

But, let’s say you have similar items in German, or worse, Japanese, Greek and Russian!

As an example, the Japanese word for “soccer” is “サッカー“.  What do you make as the page title for that?  The filename? 

If you do a google.jp search for “サッカー“, one of the first results you get is a Wikipedia article for “サッカー” which has a displayed URL of:

 image

Now, of course, anyone with any technical sense will tell you that you can’t put non 7-bit ASCII URLs into an HTTP request, as that violates the spec. 

But of course, pasting such a URL into your browser automatically decodes it to:

http://ja.wikipedia.org/wiki/%E3%82%B5%E3%83%83%E3%82%AB%E3%83%BC

So, it has the benefit of (a) showing up with the proper Japanese term in the search engine result page, improving the apparent relevence of the result, and (b) well showing up at all in the top 10 listings at all — so you’d think it has SOME positive impact in ranking.

European terms are much easier, as there are common transliterations for many of the non-7-bit-ASCII characters that one would use in normal usage. 

image For example, Google for the beautiful German city of Düsseldorf.  Clearly, one wouldn’t want to have to title all one’s pages as “Dusseldorf” as that would mean “village of idiots” as opposed to Düsseldorf which refers to the small tributary of the River Rhine.   The u umlaut is easily transliterated to “ue” generally, so by Googling for “Duesseldorf” you get an acceptable result – as Google knows what you’re talking about.

Not so easy with these other languages like Greek, Hebrew, Hindi, etc.

I’m very interested for any input or feedback on this, as it’s a massive gray area right now — and I don’t know if ANYONE has this one covered well.