Acknowledgments.
Unicode Consortium Members and Directors.
Figures.
Tables.
Preface.
1. Introduction.
Coverage.Standards Coverage.New Characters.Design Goals.Text
Handling.Interpreting Characters.Text Elements.The Unicode Standard
and ISO/IEC 10646.The Unicode Consortium.The Unicode Technical
Committee.Submitting New Characters.
2. General Structure.
Architectural Context.Basic Text Processes.Text Elements,
Characters, and Text Processes.Text Processes and Encoding.Unicode
Design Principles.Universality.Efficiency.Characters, Not
Glyphs.Semantics.Plain Text.Logical Order.Unification.Dynamic
Composition.Equivalent Sequences.Convertibility.Compatibility
Characters.Compatibility Characters.Compatibility Decomposable
Characters.Mapping Compatibility Characters.Code Points and
Characters.Types of Code Points.Encoding
Forms.UTF-32.UTF-16.UTF-8.Comparison of the Advantages of UTF-32,
UTF-16, and UTF-8.Encoding Schemes.Unicode Strings.Unicode
Allocation.Planes.Allocation Areas and Character Blocks.Details of
Allocation.Assignment of Code Points.Writing Direction.Combining
Characters.Sequence of Base Characters and Diacritics.Multiple
Combining Characters.Ligated Multiple Base Characters.Spacing
Clones of European Diacritical Marks."Characters" and Grapheme
Clusters.Special Characters and Noncharacters.Byte Order Mark
(BOM).Special Noncharacter Code Points.Layout and Format Control
Characters.The Replacement Character.Control Codes.Conforming to
the Unicode Standard.Supported Subsets.Related Publications.
3. Conformance.
Versions of the Unicode Standard.Stability.Version
Numbering.Errata, Corrigenda, and Future Updates.References to the
Unicode Standard.References to Unicode Character
Properties.References to Unicode Algorithms.Conformance
Requirements.Byte Ordering.Unassigned Code
Points.Interpretation.Modification.Character Encoding
Forms.Character Encoding Schemes.Bidirectional Text.Normalization
Forms.Normative References.Unicode Algorithms.Default Casing
Operations.Unicode Standard Annexes.Semantics.Definitions.Character
Identity and Semantics.Characters and Encoding.Properties.Normative
and Informative Properties.Simple and Derived Properties.Property
Aliases.Default Property Values.Private
Use.Combination.Decomposition.Compatibility Decomposition.Canonical
Decomposition.Surrogates.Unicode Encoding
Forms.UTF-32.UTF-16.UTF-8.Encoding Form Conversion.Unicode Encoding
Schemes.Canonical Ordering Behavior.Application of Combining
Marks.Combining Classes.Canonical Ordering.Canonical Ordering and
Collation.Conjoining Jamo Behavior.Hangul Syllable
Boundaries.Standard Korean Syllables.Hangul Syllable
Composition.Hangul Syllable Decomposition.Hangul Syllable
Names.Default Case Operations.Definitions.Case Conversion of
Strings.Case Detection for Strings.Caseless Matching.
4. Character Properties.
Unicode Character Database.Case-Normative.Case Mapping.Combining
Classes-Normative.Directionality-Normative.General
Category-Normative.Numeric Value-Normative.Ideographic Numeric
Values.Bidi Mirrored-Normative.Unicode 1.0 Names.Letters,
Alphabetic, and Ideographic.Boundary Control.Characters with
Unusual Properties.
5. Implementation Guidelines.
Transcoding to Other Standards.Issues.Multistage Tables.ANSI/ISO C
wchar_t.Unknown and Missing Characters.Reserved and Private-Use
Character Codes.Interpretable but Unrenderable Characters.Default
Property Values.Default Ignorable Code Points.Interacting with
Downlevel Systems.Handling Surrogate Pairs in UTF-16.Handling
Numbers.Normalization.Compression.Newline
Guidelines.Definitions.Background.Recommendations.Regular
Expressions.Language Information in Plain Text.Requirements for
Language Tagging.Language Tags and Han Unification.Editing and
Selection.Consistent Text Elements.Strategies for Handling
Nonspacing Marks.Keyboard Input.Truncation.Rendering Nonspacing
Marks.Canonical Equivalence.Positioning Methods.Locating Text
Element Boundaries.Identifiers.Property-Based Identifier
Syntax.Syntactic Rule.Alternative Recommendation.Sorting and
Searching.Culturally Expected Sorting and
Searching.Language-Insensitive Sorting.Searching.Sublinear
Searching.Binary Order.UTF-8 in UTF-16 Order.UTF-16 in UTF-8
Order.Case Mappings.Complications for Case
Mapping.Reversibility.Caseless Matching.Normalization.Unicode
Security.Default Ignorable Code Points.
6. Writing Systems and Punctuation.
Writing Systems.General Punctuation.Punctuation:
U+0020-U+00BF.General Punctuation: U+2000-U+206F.CJK Symbols and
Punctuation: U+3000-U+303F.CJK Compatibility Forms:
U+FE30-U+FE4F.Small Form Variants: U+FE50-U+FE6F.
7. European Alphabetic Scripts.
Latin.Letters of Basic Latin: U+0041-U+007A.Letters of the Latin-1
Supplement: U+00C0-U+00FF.Latin Extended-A: U+0100-U+017F.Latin
Extended-B: U+0180-U+024F.IPA Extensions: U+0250-U+02AF.Phonetic
Extensions: U+1D00-U+1D6A.Latin Extended Additional:
U+1E00-U+1EFF.Latin Ligatures: FB00-FB06.Greek.Greek:
U+0370-U+03FF.Greek Extended: U+1F00-U+1FFF.Cyrillic.Cyrillic:
U+0400-U+04FF.Cyrillic Supplement: U+0500-U+052F.Armenian.Armenian:
U+0530-U+058F.Georgian.Georgian: U+10A0-U+10FF.Modifier
Letters.Spacing Modifier Letters: U+02B0-U+02FF.Combining
Marks.Combining Diacritical Marks: U+0300-U+036F.Combining Marks
for Symbols: U+20D0-U+20FF.Combining Half Marks: U+FE20-U+FE2F.
8. Middle Eastern Scripts.
Hebrew.Hebrew: U+0590-U+05FF.Alphabetic Presentation Forms:
U+FB1D-U+FB4F.Arabic.Arabic: U+0600-U+06FF.Cursive
Joining.Ligatures.Arabic Presentation Forms-A: U+FB50-U+FDFF.Arabic
Presentation Forms-B: U+FE70-U+FEFF.Syriac.Syriac:
U+0700-U+074F.Syriac Shaping.Syriac Cursive
Joining.Ligatures.Thaana.Thaana: U+0780-U+07BF.
9. South Asian Scripts.
Devanagari.Devanagari: U+0900-U+097F.Bengali.Bengali:
U+0980-U+09FF.Gurmukhi.Gurmukhi: U+0A00-U+0A7F.Gujarati.Gujarati:
U+0A80-U+0AFF.Oriya.Oriya: U+0B00-U+0B7F.Tamil.Tamil:
U+0B80-U+0BFF.Telugu.Telugu: U+0C00-U+0C7F.Kannada.Kannada:
U+0C80-U+0CFF.Malayalam.Malayalam: U+0D00-U+0D7F.Sinhala.Sinhala:
U+0D80-U+0DFF.Tibetan.Tibetan: U+0F00-U+0FFF.Limbu.Limbu:
U+1900-U+194F.
10. Southeast Asian Scripts.
Thai.Thai: U+0E00-U+0E7F.Lao.Lao: U+0E80-U+0EFF.Myanmar.Myanmar:
U+1000-U+109F.Khmer.Khmer: U+1780-U+17FF.Khmer Symbols:
U+19E0-U+19FF.Tai Le.Tai Le: U+1950-U+197F.Philippine
Scripts.Tagalog: U+1700-U+171F.Hanunoo: U+1720-U+173F.Buhid:
U+1740-U+175F.Tagbanwa: U+1760-U+177F.
11. East Asian Scripts.
Han.CJK Unified Ideographs.CJK Unified Ideographs Ext. B:
U+20000-U+2A6D6.CJK Compatibility Ideographs: U+F900-U+FAFF.CJK
Compatibility Supplement: U+2F800-U+2FA1D.Kanbun: U+3190-U+319F.CJK
and KangXi Radicals: U+2E80-U+2FD5.Ideographic Description:
U+2FF0-U+2FFB.Bopomofo.Bopomofo: U+3100-U+312F.Hiragana and
Katakana.Hiragana: U+3040-U+309F.Katakana: U+30A0-U+30FF.Katakana
Phonetic Extensions: U+31F0-U+31FF.Halfwidth and Fullwidth Forms:
U+FF00-U+FFEF.Hangul.Hangul Jamo: U+1100-U+11FF.Hangul
Compatibility Jamo: U+3130-U+318F.Hangul Syllables:
U+AC00-U+D7A3.Yi.Yi: U+A000-U+A4CF.
12. Additional Modern Scripts.
Ethiopic.Ethiopic: U+1200-U+137F.Mongolian.Mongolian:
U+1800-U+18AF.Osmanya.Osmanya: U+10480-U+104AF.Cherokee.Cherokee:
U+13A0-U+13FF.Canadian Aboriginal Syllabics.Canadian Aboriginal
Syllabics: U+1400-U+167F.Deseret.Deseret:
U+10400-U+1044F.Shavian.Shavian: U+10450-U+1047F.
13. Archaic Scripts.
Ogham.Ogham: U+1680-U+169F.Old Italic.Old Italic:
U+10300-U+1032F.Runic.Runic: U+16A0-U+16F0.Gothic.Gothic:
U+10330-U+1034F.Ugaritic.Ugaritic: U+10380-U+1039F.Linear B.Linear
B Syllabary: U+10000-U+1007F.Linear B Ideograms:
U+10080-U+108FF.Aegean Numbers: U+10100-U+1013F.Cypriot
Syllabary.Cypriot Syllabary: U+10800-U+1083F.
14. Symbols.
Currency Symbols.Currency Symbols: U+20A0-U+20CF.Letterlike
Symbols.Letterlike Symbols: U+2100-U+214F.Math Alphanumeric
Symbols: U+1D400-U+1D7FF.Mathematical Alphabets.Fonts Used for
Mathematical Alphabets.Number Forms.Number Forms:
U+2150-U+218F.Superscripts and Subscripts:
U+2070-U+209F.Mathematical Symbols.Mathematical Operators:
U+2200-U+22FF.Supplements to Mathematical Symbols and
Arrows.Supplemental Math Operators: U+2A00-U+2AFF.Miscellaneous
Math Symbols-A: U+27C0-U+27EF.Miscellaneous Math Symbols-B:
U+2980-U+29FF.Arrows: U+2190-U+21FF.Supplemental
Arrows.Standardized Variants of Mathematical Symbols.Technical
Symbols.Control Pictures: U+2400-U+243F.Miscellaneous Technical:
U+2300-U+23FF.Optical Character Recognition:
U+2440-U+245F.Geometrical Symbols.Box Drawing: U+2500-U+257F.Block
Elements: U+2580-U+259F.Geometric Shapes:
U+25A0-U+25FF.Miscellaneous Symbols and Dingbats.Miscellaneous
Symbols: U+2600-U+26FF.Dingbats: U+2700-U+27BF.Yijing Hexagram
Symbols: U+4DC0-U+4DFF.Tai Xuan Jing Symbols:
U+1D300-U+1D356.Enclosed and Square.Enclosed Alphanumerics:
U+2460-U+24FF.Enclosed CJK Letters and Months: U+3200-U+32FF.CJK
Compatibility: U+3300-U+33FF.Braille.Braille Patterns:
U+2800-U+28FF.Byzantine Musical Symbols.Byzantine Musical Symbols:
U+1D000-U+1D0FF.Western Musical Symbols.Musical Symbols:
U+1D100-U+1D1FF.
15. Special Areas and Format Characters.
Control Codes.Layout Controls.Invisible Operators.Deprecated Format
Characters.Deprecated Format Characters: U+206A-U+206F.Surrogates
Area.Surrogates Area: U+D800-U+DFFF.Variation Selectors.Private-Use
Characters.Private Use Area: U+E000-U+F8FF.Supplementary Private
Use Areas.Noncharacters.Noncharacters: U+FFFE, U+FFFF, and
Others.Specials.Specials: U+FEFF, U+FFF0-U+FFFD.Tag Characters.Tag
Characters: U+E0000-U+E007F.
16. Code Charts.
Character Names List.Images in the Code Charts and Character
Lists.Character Names.Aliases.Cross References.Information About
Languages.Case Mappings.Decompositions.Reserved
Characters.Noncharacters.Subheads.CJK Unified Ideographs.Hangul
Syllables.
17. Han Radical-Stroke Index.
A. Han Unification History.
B. Abstracts of Unicode Technical Reports.
Unicode Standard Annexes.UAX #9: The Bidirectional Algorithm.UAX
#11: East Asian Width.UAX #14: Line Breaking Properties.UAX #15:
Unicode Normalization Forms.UAX #24: Script Names.UAX #29: Text
Boundaries.Unicode Technical Standards.UTS #6: A Standard
Compression Scheme for Unicode.UTS #10: Unicode Collation
Algorithm.Unicode Technical Reports.UTR #16: UTF-EBCDIC.UTR #17:
Character Encoding Model.UTR #18: Unicode Regular Expression
Guidelines.UTR #20: Unicode in XML and Other Markup Languages.UTR
#22: Character Mapping Markup Language (CharMapML).UTR #26:
Compatibility Encoding Scheme for UTF-16: 8-Bit (CESU-8).Other
Unicode References.Unicode Technical Notes.FAQ (Frequently Asked
Questions).Charts.Conferences.Policies.Updates and
Errata.Versions.Where Is My Character?
C. Relationship to ISO/IEC 10646.
History.Unicode 1.0.Unicode 2.0.Unicode 3.0.Unicode 4.0.Encoding
Forms in ISO/IEC 10646.Zero Extending.UCS Transformation
Formats.UTF-8.UTF-16.Synchronization of the
Standards.Identification of Features for the Unicode
Standard.Character Names.Character Functional Specifications.
D. Changes from Unicode Version 3.0.
Versions of the Unicode Standard.Changes from Unicode Version 3.0
to Version 3.1.New Characters Added.Unicode Character Database
Changes.Changes Affecting Conformance.Unicode Standard
Annexes.Changes from Unicode Version 3.1 to Version 3.2.New
Characters Added.Unicode Character Database Changes.Changes
Affecting Conformance.Unicode Standard Annexes.Changes from Unicode
Version 3.2 to Version 4.0.New Characters Added.Unicode Character
Database Changes.Changes Affecting Conformance.Unicode Standard
Annexes.Errata.
G. Glossary.
R. References.
Source Standards and Specifications.Source Dictionaries for Han
Unification.Other Sources for the Unicode Standard.Selected
Resources: Technical.Selected Resources: Scripts and Languages.
I. Indices.
Unicode Names Index.General Index.
Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use. Unicode is changing all that! The Unicode Standard has been adopted by such industry leaders as Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many others. Unicode is required by modern standards such as XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc. It is supported in many operating systems, all modern browsers, and many other products.
The Unicode Consortium is a non-profit organization founded to develop, extend, and promote the use of the Unicode Standard. The membership of the Consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. The Unicode Consortium actively cooperates with many of the leading standards development organizations, including ISO/IEC JTC1, W3C, IETF, and ECMA.
Ask a Question About this Product More... |