summaryrefslogtreecommitdiffstats
path: root/src/corelib/codecs/codecs.qdoc
blob: 9364b7a9893d27e09c25395750dd31f397c32d9e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
/****************************************************************************
**
** Copyright (C) 2016 The Qt Company Ltd.
** Contact: https://www.qt.io/licensing/
**
** This file is part of the documentation of the Qt Toolkit.
**
** $QT_BEGIN_LICENSE:FDL$
** Commercial License Usage
** Licensees holding valid commercial Qt licenses may use this file in
** accordance with the commercial license agreement provided with the
** Software or, alternatively, in accordance with the terms contained in
** a written agreement between you and The Qt Company. For licensing terms
** and conditions see https://www.qt.io/terms-conditions. For further
** information use the contact form at https://www.qt.io/contact-us.
**
** GNU Free Documentation License Usage
** Alternatively, this file may be used under the terms of the GNU Free
** Documentation License version 1.3 as published by the Free Software
** Foundation and appearing in the file included in the packaging of
** this file. Please review the following information to ensure
** the GNU Free Documentation License version 1.3 requirements
** will be met: https://www.gnu.org/licenses/fdl-1.3.html.
** $QT_END_LICENSE$
**
****************************************************************************/

/*!
    \page codec-big5.html
    \title Big5 Text Codec
    \ingroup codecs

    The Big5 codec provides conversion to and from the Big5 encoding.
    The code was originally contributed by Ming-Che Chuang
    \<mingche@cobra.ee.ntu.edu.tw\> for the Big-5+ encoding, and was
    included in Qt with the author's permission, and the grateful
    thanks of the Qt team. (Note: Ming-Che's code is QPL'd, as
    per an mail to qt-info@nokia.com.)

    However, since Big-5+ was never formally approved, and was never
    used by anyone, the Taiwan Free Software community and the Li18nux
    Big5 Standard Subgroup agree that the de-facto standard Big5-ETen
    (zh_TW.Big5 or zh_TW.TW-Big5) be used instead.

    The Big5 is currently implemented as a pure subset of the
    Big5-HKSCS codec, so more fine-tuning is needed to make it
    identical to the standard Big5 mapping as determined by
    Li18nux-Big5.  See \l{http://www.autrijus.org/xml/} for the draft
    Big5 (2002) standard.

    James Su \<suzhe@turbolinux.com.cn\> \<suzhe@gnuchina.org\>
    generated the Big5-HKSCS-to-Unicode tables with a very
    space-efficient algorithm. He generously donated his code to glibc
    in May 2002.  Subsequently, James has kindly allowed Anthony Fok
    \<anthony@thizlinux.com\> \<foka@debian.org\> to adapt the code
    for Qt.

    \sa{Text Codecs: Big5, Big5-HKSCS}
*/

/*!
    \page codec-big5hkscs.html
    \title Big5-HKSCS Text Codec
    \ingroup codecs

    The Big5-HKSCS codec provides conversion to and from the
    Big5-HKSCS encoding.

    The codec grew out of the QBig5Codec originally contributed by
    Ming-Che Chuang \<mingche@cobra.ee.ntu.edu.tw\>.  James Su
    \<suzhe@turbolinux.com.cn\> \<suzhe@gnuchina.org\> and Anthony Fok
    \<anthony@thizlinux.com\> \<foka@debian.org\> implemented HKSCS-1999
    QBig5hkscsCodec for Qt-2.3.x, but it was too late in Qt development
    schedule to be officially included in the Qt-2.3.x series.

    Wu Yi \<wuyi@hancom.com\> ported the HKSCS-1999 QBig5hkscsCodec to
    Qt-3.0.1 in March 2002.

    With the advent of the new HKSCS-2001 standard, James Su
    \<suzhe@turbolinux.com.cn\> \<suzhe@gnuchina.org\> generated the
    Big5-HKSCS<->Unicode tables with a very space-efficient algorithm.
    He generously donated his code to glibc in May 2002.  Subsequently,
    James has generously allowed Anthony Fok to adapt the code for
    Qt-3.0.5.

    Currently, the Big5-HKSCS tables are generated from the following
    sources, and with the Euro character added:
    \list 1
    \li \l{http://www.microsoft.com/typography/unicode/950.txt}
    \li \l{http://www.info.gov.hk/digital21/chi/hkscs/download/big5-iso.txt}
    \li \l{http://www.info.gov.hk/digital21/chi/hkscs/download/big5cmp.txt}
    \endlist

    There may be more fine-tuning to the QBig5hkscsCodec to maximize its
    compatibility with the standard Big5 (2002) mapping as determined by
    Li18nux Big5 Standard Subgroup.  See \l{http://www.autrijus.org/xml/}
    for the various Big5 CharMapML tables.

    \sa{Text Codecs: Big5, Big5-HKSCS}
*/

/*!
    \page codec-eucjp.html
    \title EUC-JP Text Codec
    \ingroup codecs

    The EUC-JP codec provides conversion to and from EUC-JP, the main
    legacy encoding for Unix machines in Japan.

    The environment variable \c UNICODEMAP_JP can be used to
    fine-tune the JIS, Shift-JIS, and EUC-JP codecs. The \l{ISO
    2022-JP (JIS) Text Codec} documentation describes how to use this
    variable.

    Most of the code here was written by Serika Kurusugawa,
    a.k.a. Junji Takagi, and is included in Qt with the author's
    permission and the grateful thanks of the Qt team.

    \sa{Text Codec: EUC-JP}
*/

/*!
    \page codec-euckr.html
    \title EUC-KR Text Codec
    \ingroup codecs

    The EUC-KR codec provides conversion to and from EUC-KR, KR, the
    main legacy encoding for Unix machines in Korea.

    It was largely written by Mizi Research Inc. Here is the
    copyright statement for the code as it was at the point of
    contribution. The subsequent modifications are covered by
    the usual copyright for Qt.

    \sa{Text Codec: EUC-KR}
*/

/*!
    \page codec-gbk.html
    \title GBK Text Codec
    \ingroup codecs

    The GBK codec provides conversion to and from the Chinese
    GB18030/GBK/GB2312 encoding.

    GBK, formally the Chinese Internal Code Specification, is a commonly
    used extension of GB 2312-80.  Microsoft Windows uses it under the
    name codepage 936.

    GBK has been superseded by the new Chinese national standard
    GB 18030-2000, which added a 4-byte encoding while remaining
    compatible with GB2312 and GBK.  The new GB 18030-2000 may be described
    as a special encoding of Unicode 3.x and ISO-10646-1.

    Special thanks to charset gurus Markus Scherer (IBM),
    Dirk Meyer (Adobe Systems) and Ken Lunde (Adobe Systems) for publishing
    an excellent GB 18030-2000 summary and specification on the Internet.
    Some must-read documents are:

    \list
    \li \l{ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf}
    \li \l{http://oss.software.ibm.com/cvs/icu/~checkout~/charset/source/gb18030/gb18030.html}
    \li \l{http://oss.software.ibm.com/cvs/icu/~checkout~/charset/data/xml/gb-18030-2000.xml}
    \endlist

    The GBK codec was contributed to Qt by
    Justin Yu \<justiny@turbolinux.com.cn\> and
    Sean Chen \<seanc@turbolinux.com.cn\>.  They may also be reached at
    Yu Mingjian \<yumj@sun.ihep.ac.cn\>, \<yumingjian@china.com\>
    Chen Xiangyang \<chenxy@sun.ihep.ac.cn\>

    The GB18030 codec Qt functions were contributed to Qt by
    James Su \<suzhe@gnuchina.org\>, \<suzhe@turbolinux.com.cn\>
    who pioneered much of GB18030 development on GNU/Linux systems.

    The GB18030 codec was contributed to Qt by
    Anthony Fok \<anthony@thizlinux.com\>, \<foka@debian.org\>
    using a Perl script to generate C++ tables from gb-18030-2000.xml
    while merging contributions from James Su, Justin Yu and Sean Chen.
    A copy of the source Perl script is available at
    \l{http://people.debian.org/~foka/gb18030/gen-qgb18030codec.pl}

    \sa{Text Codec: GBK}
*/

/*!
    \page codecs-jis.html
    \title ISO 2022-JP (JIS) Text Codec
    \ingroup codecs

    The JIS codec provides conversion to and from ISO 2022-JP.

    The environment variable \c UNICODEMAP_JP can be used to
    fine-tune the JIS, Shift-JIS, and EUC-JP codecs. The mapping
    names are as for the Japanese XML working group's
    \l{XML Japanese Profile},
    because it names and explains all the
    widely used mappings. Here are brief descriptions, written by
    Serika Kurusugawa:

    \list

    \li "unicode-0.9" or "unicode-0201" for Unicode style. This assumes
    JISX0201 for 0x00-0x7f. (0.9 is a table version of jisx02xx mapping
    used for Unicode 1.1.)

    \li "unicode-ascii" This assumes US-ASCII for 0x00-0x7f; some
    chars (JISX0208 0x2140 and JISX0212 0x2237) are different from
    Unicode 1.1 to avoid conflict.

    \li "open-19970715-0201" ("open-0201" for convenience) or
    "jisx0221-1995" for JISX0221-JISX0201 style. JIS X 0221 is JIS
    version of Unicode, but a few chars (0x5c, 0x7e, 0x2140, 0x216f,
    0x2131) are different from Unicode 1.1. This is used when 0x5c is
    treated as YEN SIGN.

    \li "open-19970715-ascii" ("open-ascii" for convenience) for
    JISX0221-ASCII style. This is used when 0x5c is treated as REVERSE
    SOLIDUS.

    \li "open-19970715-ms" ("open-ms" for convenience) or "cp932" for
    Microsoft Windows style. Windows Code Page 932. Some chars (0x2140,
    0x2141, 0x2142, 0x215d, 0x2171, 0x2172) are different from Unicode
    1.1.

    \li "jdk1.1.7" for Sun's JDK style. Same as Unicode 1.1, except that
    JIS 0x2140 is mapped to UFF3C. Either ASCII or JISX0201 can be used
    for 0x00-0x7f.

    \endlist

    In addition, the extensions "nec-vdc", "ibm-vdc" and "udc" are
    supported.

    For example, if you want to use Unicode style conversion but with
    NEC's extension, set \c UNICODEMAP_JP to \c {unicode-0.9,
    nec-vdc}. (You will probably need to quote that in a shell
    command.)

    Most of the code here was written by Serika Kurusugawa,
    a.k.a. Junji Takagi, and is included in Qt with the author's
    permission and the grateful thanks of the Qt team.

    \sa{Text Codec: ISO 2022-JP (JIS)}
*/

/*!
    \page codec-sjis.html
    \title Shift-JIS Text Codec
    \ingroup codecs

    The Shift-JIS codec provides conversion to and from Shift-JIS, an
    encoding of JIS X 0201 Latin, JIS X 0201 Kana and JIS X 0208.

    The environment variable \c UNICODEMAP_JP can be used to
    fine-tune the codec. The \l{ISO 2022-JP (JIS) Text Codec}
    documentation describes how to use this variable.

    Most of the code here was written by Serika Kurusugawa, a.k.a.
    Junji Takagi, and is included in Qt with the author's permission
    and the grateful thanks of the Qt team. Here is the
    copyright statement for the code as it was at the point of
    contribution. The subsequent modifications are covered by
    the usual copyright for Qt.

    \sa{Text Codec: Shift-JIS}
*/

/*!
    \page codec-tscii.html
    \title TSCII Text Codec
    \ingroup codecs

    The TSCII codec provides conversion to and from the Tamil TSCII
    encoding.

    TSCII, formally the Tamil Standard Code Information Interchange
    specification, is a commonly used charset for Tamils. The
    official page for the standard is at
    \l{http://www.tamil.net/tscii/}

    This codec uses the mapping table found at
    \l{http://www.geocities.com/Athens/5180/tsciiset.html}.
    Tamil uses composed Unicode which might cause some
    problems if you are using Unicode fonts instead of TSCII fonts.

    Most of the code was written by Hans Petter Bieker and is
    included in Qt with the author's permission and the grateful
    thanks of the Qt team.

    \sa{Text Codec: TSCII}
*/