Linux Myanmar Sorting

Linux Myanmar Sorting

 

This is a quick guide to sorting Burmese (မြန်မာ အက္ခရာစဉ်). If you are using Unicode for Burmese, you can auto-sort Burmese texts and files in Linux OS distros and OpenOffice suite in any OS. You can develop applications with Burmese sorting supported using ICU library. Even so, there are times you cannot rely on machines when you are working with non-Unicode fonts or on paper. Here is the shortcut for you to quickly memorize how the Burmese sorting works.

Burmese Consonants

  • က ခ ဂ ဃ င
  • စ ဆ ဇ ဈ ဉ ည
  • ဋ ဌ ဍ ဎ ဏ
  • တ ထ ဒ ဓ န
  • ပ ဖ ဗ ဘ မ
  • ယ ရ လ ဝ သ
  • ဟ ဠ အ

Dependent Vowels

  • အ အာ အိ အီ အု အူ အေ အဲ အော အော် အံ အို

Independent Vowels

  • ဣ ဤ ဥ ဦ ဥ ဩ ဪ

Medials

  • ျ ြ ွ ှ (ပင့်ရစ်ဆွဲထိုး)

See the complete list of Burmese characters in Unicode chart.

 
 
 
 
 

The Burmese Sorting Formula

(1) Consonant*+ Vowel**

(2) Consonant*+ Vowel***+ (Consonant+Asat)**

(3) Consonant*+ Medial**+

– (a) Vowel***

– (b) Vowel****+ (Consonant+Asat)***

– (c) (another) Medial*** + Vowel****

– (d) (another) Medial*** + Vowel***** + (Consonant+Asat)****

– (e) (another) Medial*** + (another) Medial**** + Vowel*****  (There is only one for this; မြွှာ)

– (f) (another) Medial*** + (another) Medial**** + Vowel******+ (Consonant+Asat)***** (There is only one for this; မြွှင်း)

#The less the stars(*), the higher the priority.

 

Notes:

(1) Asat and Virama are assumed as equal. E.g. စက္က and စက်က are the same. Kinzi (ကင်းစီး) is equal to Nga+Asat (ငသတ်). E.g. အင်္ဂလိပ် is equal to အင်ဂလိပ်.

(2) ည (Double Nya or Nya) comes after ဉ(Nya lay). ယျ (Double Ya) comes after ယ (Ya).

(3) Dot below (အောက်မြစ်) and Visaga (ဝစ္စပေါက်) come after related vowels. E.g. က ကာ ကာ့ ကား ကာ့း.

(4) Independent vowels are equal to A(အ)+Dependent vowels. E.g. ဧ = အေ, ဦ = အူ, ဥုမ် = အုမ် = အုံ etc. မ် (Ma+asat) is equal to သေးသေးတင် (Anusvara).

(5) There is no special mentioning about number sorting. So numbers will comes before consonants as other languages do.

#See the original reference from Burmese dictionary of Myanmar Language Commission.

#See the pre-sorted Burmese Orthography in Wiktionary.

More Links:

Tags: , , , , , , , , ,

· · · ◊ ◊ ◊ · · ·

I’ve been learning IPA symbols for a while for the purpose of using in Wikipedia and Wiktionary. IPA stands for International Phonetic Alphabet. Most people think IPA is quite difficult and only suited for linguists and geeks. The fact might not be wrong for learning IPA for professional purpose. But translating IPA to Burmese is not that difficult since Burmese had it’s own phonetic writing system. Burmese syllables (ဝဏ္ဏ) are limited which means Burmese phonetic is easy enough to learn. Here is the link to Burmese phonetic guide, by Myanmar Language Commission, which was scanned from Burmese Dictionary.

There are only a few sources I can collect IPA samples for Burmese sounds. So English Wikipedia articles about Burma written by Wiki Project Burma members and entries from Sealang Burmese dictionary were my main sources to learn. Whenever there were conflicts between sources, I listened to sample voice clips of IPA so that I can be confident in my decision to define a IPA-Burmese pair. I tried to compiled all possible syllables in Burmese last weekends and tried to develop a converter between Burmese and IPA. Though the logic was not difficult, developing a javascripts converter for a non-coder myself was a challenge. Burmese to IPA is working well now while IPA to Burmese has a lots of things to fix. Moreover, Burmese doesn’t have enough syllable to simulate all IPA combinations.

Burmese Phonetics – IPA Converter

  • The Burmese phonetic input needs to follow the standard set by Myanmar Language Commission.
  • Add “dash”(-) symbol in the place of half tune(အသံဝက်). This is not in the standard but to save my lazy coding.
  • Full set of Burmese Phonetic – IPA list compiled by me is available here.
  • Burmese Phonetic guide is available here. You can easily find it in every Burmese dictionary front pages.

Examples: သှ-ဒင်းဇ-ဂါး (သတင်းစကား) သက်ကျာ့မုနိ (သကျမုနိ) မင်ဂ-လာဇောင် (မင်္ဂလာဆောင်) အေယာဝ-ဒီမျစ် (ဧရာဝတီမြစ်) နိုင်ငန်ဒေါ်ဝင်ဂျီးဂျုတ် (နိုင်ငံတော်ဝန်ကြီးချုပ်) ပ-ယွက်ဆိတ် (ပုရွက်ဆိတ်) အ-ယှုတ်အ-ယှင်း (အရှုပ်အရှင်း) ဗော်လုန်းဗွဲ (ဘောလုံးပွဲ) ဗွတ်ဇင်ဂါ့သုတ် (ဗောဇ္ဈင်ဂသုတ်) ဗ-မာ့တတ်ပေါင်းဇုတတ်မ-ဒေါ် (ဗမာ့တပ်ပေါင်းစုတပ်မတော်)

##And, of course, input texts have to be in Unicode encoding.

Tags: , , , , , , , ,

· · · ◊ ◊ ◊ · · ·