The Unicode Character Set

The Unicode is a 16-bit character set. This character set is used in almost all the programming languages. Not only this, it offers a wide variety of characters, some several millions of them, along with the normal ASCII character set. These characters are used to represent the international character set and also contain the characters used in the Asian languages.

The Unicode character sets are not only used in latest programming languages like Java, but also in languages using scientific symbols and even the primitive languages that are no longer used.

In the year 1993, the consortium of companies such as Apple, Microsoft, HP, Digital and IBM created the Unicode character set, using the ISO-10646 standard. Their aim was to produce a single standard. Not only this, this character set is also used in the Windows NT operating system.

All the characters used in the 16-bit Unicode character set occupy the same space. This character set shares its first 256 values with the ISO-Latin character set, which forms the basis of the earlier operating systems such as Windows 3.1 and Windows 95.

In addition to the characters of the ASCII character set, the Unicode character set defines an additional 34,168 distinct coded characters. This character set uses a single instance for each character set. Not only this, Unicode also assigns it a unique name and a code value. Unicode characters also combine with the accent characters defining the base characters that needs to be modified.

Continue reading The Unicode Character Set