Friday, February 17, 2006

Unicode

Unicode as I learnt

I couldn’t resist writing about Unicode as I learnt it. I know already that it’s a way of representing all the characters of various languages using a unique computer number. Following are some of the interesting jargons that I came across when I started learning Unicode.

  • Grapheme – a atomic unit of the script in any language
  • Glyph – a shape representing characters, punctuations and other stuff in a script
  • Phoneme – it’s the usual phonetics we talk about, a pronunciation. Its composed of one or more graphemes.
  • Digraph – two graphemes make a single phoneme called digraph. For example, the word ship contains four graphemes (s, h, i, and p) but only three phonemes, because sh is a digraph. An example of a trigraph is the tch in itch.
  • Trigraph – three graphemes

Confused? See here - http://en.wikipedia.org/wiki/Grapheme

Above all the encodings available right now, UTF-8 is the most widely used encoding followed by ISO 8859 1. UTF-8 is a variable four byte encoding set which can represent any available character in any character set. Its becoming the wide standard in web, email and storage applications. Following are the resources I referred to read about Unicode and understand encoding,

1 comment:

Anonymous said...

ntzzntzky
アグ
アグ
ugg ブーツ
ugg
ugg ムートン

ozsyrpsyp
[url=http://www.agubaileybuttontriplet.com/]ugg[/url]
[url=http://www.aguultrashort.com/]アグ[/url]
[url=http://www.aguclassiccardy.com/]ugg ムートン[/url]
[url=http://www.aguultratall.com/]UGG ブーツ[/url]
[url=http://www.agupaisley.com/]ugg ブーツ[/url]

qgcokzqad
http://www.aguclassiccardy.com/ ugg アグ
http://www.metallicboots.com/ ugg ブーツ
http://www.agubaileybuttontriplet.com/ ugg アグ
http://www.agusandansu.com/ ugg ブーツ
http://www.agupaisley.com/ アグ