Document QString's UTF-8 conversion behaviors

We haven't handled the Unicode non-characters specially since Qt 5.2 (since commit 9327bc87c3abf58bb471693b5448cd78e3db1b46), so this part of the documentation was stale. Since Qt 5.3 (since 8dd47e34b9b96ac27a99cdcf10b8aec506882fc2), QString will insert one replacement character for each byte that can't be decoded properly. [ChangeLog][Important Behavior Changes][UTF-8 decoding] The QString UTF-8 decoder changed behavior slightly: when it encounters invalid sequences, it will insert one replacement character per byte that is invalid, instead of one replacement character for the whole invalid length. Change-Id: Ia4ec78afded9445bbe937311d6be80f71bd1a55f Reviewed-by: Richard J. Moore <rich@kde.org> Reviewed-by: Olivier Goffart <ogoffart@woboq.com> Reviewed-by: Lars Knoll <lars.knoll@digia.com>
author: Thiago Macieira <thiago.macieira@intel.com> 2014-04-04 10:34:15 -0700
committer: The Qt Project <gerrit-noreply@qt-project.org> 2014-04-24 10:48:03 +0200
commit: bbf37b61d00a6470349b728a4e6982a359e0931c (patch)
tree: 25f17afa01732c1e836f553c8737b65e860d31e2 /src
parent: f56ef579ba5b1d3adda060fa9c0707e37f9f1baa (diff)
1 files changed, 4 insertions, 20 deletions
diff --git a/src/corelib/tools/qstring.cpp b/src/corelib/tools/qstring.cpp
index 79365b11b1..aac9c493c3 100644
--- a/src/corelib/tools/qstring.cpp
+++ b/src/corelib/tools/qstring.cpp
@@ -4331,14 +4331,6 @@ QByteArray QString::toLocal8Bit_helper(const QChar *data, int size)
     UTF-8 is a Unicode codec and can represent all characters in a Unicode
     string like QString.
 
-    However, in the Unicode range, there are certain codepoints that are not
-    considered characters. The Unicode standard reserves the last two
-    codepoints in each Unicode Plane (U+FFFE, U+FFFF, U+1FFFE, U+1FFFF,
-    U+2FFFE, etc.), as well as 32 codepoints in the range U+FDD0..U+FDEF,
-    inclusive, as non-characters. If any of those appear in the string, they
-    may be discarded and will not appear in the UTF-8 representation, or they
-    may be replaced by one or more replacement characters.
-
     \sa fromUtf8(), toLatin1(), toLocal8Bit(), QTextCodec
 */
 
@@ -4493,10 +4485,10 @@ QString QString::fromLocal8Bit_helper(const char *str, int size)
     sequences, non-characters, overlong sequences or surrogate codepoints
     encoded into UTF-8.
 
-    Non-characters are codepoints that the Unicode standard reserves and must
-    not be used in text interchange. They are the last two codepoints in each
-    Unicode Plane (U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, etc.), as well
-    as 32 codepoints in the range U+FDD0..U+FDEF, inclusive.
+    This function can be used to process incoming data incrementally as long as
+    all UTF-8 characters are terminated within the incoming data. Any
+    unterminated characters at the end of the string will be replaced or
+    suppressed. In order to do stateful decoding, please use \l QTextDecoder.
 
     \sa toUtf8(), fromLatin1(), fromLocal8Bit()
 */
@@ -9517,14 +9509,6 @@ QByteArray QStringRef::toLocal8Bit() const
     UTF-8 is a Unicode codec and can represent all characters in a Unicode
     string like QString.
 
-    However, in the Unicode range, there are certain codepoints that are not
-    considered characters. The Unicode standard reserves the last two
-    codepoints in each Unicode Plane (U+FFFE, U+FFFF, U+1FFFE, U+1FFFF,
-    U+2FFFE, etc.), as well as 16 codepoints in the range U+FDD0..U+FDDF,
-    inclusive, as non-characters. If any of those appear in the string, they
-    may be discarded and will not appear in the UTF-8 representation, or they
-    may be replaced by one or more replacement characters.
-
     \sa toLatin1(), toLocal8Bit(), QTextCodec
 */
 QByteArray QStringRef::toUtf8() const
author	Thiago Macieira <thiago.macieira@intel.com>	2014-04-04 10:34:15 -0700
committer	The Qt Project <gerrit-noreply@qt-project.org>	2014-04-24 10:48:03 +0200
commit	bbf37b61d00a6470349b728a4e6982a359e0931c (patch)
tree	25f17afa01732c1e836f553c8737b65e860d31e2 /src
parent	f56ef579ba5b1d3adda060fa9c0707e37f9f1baa (diff)