Minority languages cut short in the SMS

  • user warning: Table './translatewebsite/blog_friedel_captcha_sessions' is marked as crashed and should be repaired query: INSERT into blog_friedel_captcha_sessions (uid, sid, ip_address, timestamp, form_id, solution, status, attempts) VALUES (0, '4r59cfbgblllhkan5pokhc8ur1', '38.107.179.222', 1328703220, 'comment_form', 'bf114ca91c3d797761ebe4b1ebb91544', 0, 0) in /var/www/vhosts/translate.org.za/httpdocs/blogs/drupal-common/sites/all/modules/captcha/captcha.inc on line 99.
  • user warning: Table './translatewebsite/blog_friedel_captcha_sessions' is marked as crashed and should be repaired query: UPDATE blog_friedel_captcha_sessions SET token='21f47530dd7b3a87fbd255ba81d6d3f2' WHERE csid=59075 in /var/www/vhosts/translate.org.za/httpdocs/blogs/drupal-common/sites/all/modules/captcha/captcha.module on line 216.
  • user warning: Table './translatewebsite/blog_friedel_captcha_sessions' is marked as crashed and should be repaired query: SELECT status FROM blog_friedel_captcha_sessions WHERE csid = 59075 in /var/www/vhosts/translate.org.za/httpdocs/blogs/drupal-common/sites/all/modules/captcha/captcha.inc on line 129.
  • user warning: Table './translatewebsite/blog_friedel_captcha_sessions' is marked as crashed and should be repaired query: SELECT status FROM blog_friedel_captcha_sessions WHERE csid = 59075 in /var/www/vhosts/translate.org.za/httpdocs/blogs/drupal-common/sites/all/modules/captcha/captcha.inc on line 129.
  • user warning: Table './translatewebsite/blog_friedel_captcha_sessions' is marked as crashed and should be repaired query: UPDATE blog_friedel_captcha_sessions SET timestamp=1328703220, solution='pUxrTJ' WHERE csid=59075 in /var/www/vhosts/translate.org.za/httpdocs/blogs/drupal-common/sites/all/modules/captcha/captcha.inc on line 111.

I read today on Slashdot about the reason for the length limitation of 160 characters in SMSs. It is an interesting story, and relevant to us today, since it is one of the reasons for the current outlook of the cell phone landscape. Our languages of course didn't escape unharmed.

We recently started working on a training manual for localising in Northern Sotho together with Mosekola. In places we tried to give attention to issues that are specifically applicable to cell phones, such as limitations on length. Nokia has had cell phones available in a few South African languages for some time. It is usually Zulu, Xhosa, Afrikaans and (Southern) Sotho. The first three are the three biggest languages of South Africa. Sotho is number 7 however. The main reason for this choice is probably that Northern Sotho uses the characters "š" en "Š" which aren't available in the simple set of charakters for SMSs. Interestingly enough even a few characters for Afrikaans are missing from this set, e.g. "ë". I know some cell phones can handle these characters, but I'm pretty sure that "ýïû" aren't handled anywhere, but would like to hear if I'm wrong.

Other plans have been made for many languages, especially in the Far East. For some reason my current phone can type Greek characters. Brilliant "strategy" from Samsung for the South African market.

It is sad that many languages simply doesn't have the market strength and public voice to request better technology. Without choice, normal market forces won't sort this out.

Comments

š and ý

Well, I'd wonder about š and ý, their both pretty common in Cetral Europe and probably every cell phone sold here (at Czech Republic) supports these two (at least all I've ever used have). On the other hand ë, ï and û cannot be written with my nokia cellphone.

Re: š and ý

Thanks Martin. Yes, at least for Northern Sotho we haven't had big font problems on computers because of the existing coverage of the š character in most fonts. The same goes for Afrikaans, that uses characters also found in other languages. However, I haven't seen a cell phone in South African that can do those two characters (š and ý), and probably a few others as well, although they are admittedly quite rare in Afrikaans. My guess is that "Western European" versions are shipped to South Africa, and that even localised handsets are built on models with these capabilities.

Of course, even the Central European characters does nothing for Venda that needs ??????????...

Situation in Japan

I guess that because Japan evolved its own mobile phone system for a while, users in Japan (including me) got a better deal.

Although there is some kind of horrible short messaging service (THAT SEEM TO DIAL UP THE OTHER PHONE!), all phones in Japan come with an email address and there is no real and practical limitation in length i have seen.

It is terrible to see too that the article correctly points out that sending SMS is "free" for the operator (it uses a signaling channel between the base stations and the phone which remains mostly idle), operator charge a crazy amount for the privilege to send a SMS... Still wonder how they managed not to get hit by some kind of world-wide anti-trust investigation.

funky characters

I think you can be mixing up more than one issue here:
1) ability to transmit accented characters through GSM,
2) ability to display them using a cellphone,
3) ability to enter them using your cellphone.

Let's start wit the first one – it has been solved for quite some time – most phones have been supporting Unicode in text messages for quite some time now (especially Nokias, who I think did that as far as back in 2000). For example, I can send my friend a message with any Lithuanian character, and it's (almost) guaranteed to be processed correctly. There is a downside to this: since messages are sent in UTF-16, instead of squeezing 160 characters into single message, now you can only fit 70, thus making each letter cost you more.
I think it also requires support from the GSM operator, but I'm not an expert on that, and at least here in Lithuania, all operators support UCS messages.

Now the second issue: display. I just used Nokia PC Suite to paste text "šŠ ë ýïû ??????????" into a message and saved it as a draft. Only the Venda characters ("??????????") failed to be displayed (I got squares instead), meanwhile all others were perfectly correct. And don't forget that this phone targets Lithuanian market. If manufacturers would care to put support for different characters in their African variants, this would not be a problem.

And the last thing is entering those symbols – this is basically just a part of the phone's localization, easy to implement etc.

So, the actual problem is not GSM standards – it's in the market size, as you correctly noticed in the last paragraph of this post. However, problems like that actually aren't carved in stone. If you really want them solved, you may start by contacting local representatives of Nokia and other manufacturers. They surely have enough voice to promote the problem upstream. We tried it here, and it works. ;)

Character Sets

At least from my experiences with Sony Ericsson handsets and SMS, a couple of Unicode encodings should be available for SMS usage. That's not to say that support is universal, but I can imagine most modern handsets providing some level of Unicode support.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.

More information about formatting options

          _   _                 _____       _ 
_ __ | | | | __ __ _ __ |_ _| | |
| '_ \ | | | | \ \/ / | '__| | | _ | |
| |_) | | |_| | > < | | | | | |_| |
| .__/ \___/ /_/\_\ |_| |_| \___/
|_|
Enter the code depicted in ASCII art style.