summaryrefslogtreecommitdiffstats
path: root/src/corelib/io/qurlrecode.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Update license headers and add new license filesMatti Paaso2014-09-241-18/+10
| | | | | | | | | - Renamed LICENSE.LGPL to LICENSE.LGPLv21 - Added LICENSE.LGPLv3 - Removed LICENSE.GPL Change-Id: Iec3406e3eb3f133be549092015cefe33d259a3f2 Reviewed-by: Iikka Eklund <iikka.eklund@digia.com>
* Use the new UTF-8 codec in QUrl and QUrlQueryThiago Macieira2014-01-091-120/+66
| | | | | | | | The new code is based on what QUrl already had, so this should have no net effect in performance. Change-Id: Ibb2fabd5a108e99a44e0e6e3f713ce2f8b26e4d7 Reviewed-by: Lars Knoll <lars.knoll@digia.com>
* Initialize variable to fix build [-Werror=uninitialized].Sergio Martins2013-10-191-2/+1
| | | | | | | | The complaining compiler is: gcc version 4.6.3 (crosstool-NG hg+default-ddc327ebaef2) Change-Id: Iae488a89d75492e76a39a326b2db36548f8894d0 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Allow non-character codes in utf8 stringsKurt Pattyn2013-10-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Changed the processing of non-character code handling in the UTF8 codec. Non-character codes are now accepted in QStrings, QUrls and QJson strings. Unit tests were adapted accordingly. For more info about non-character codes, see: http://www.unicode.org/versions/corrigendum9.html [ChangeLog][QtCore][QUtf8] UTF-8 now accepts non-character unicode points; these are not replaced by the replacement character anymore [ChangeLog][QtCore][QUrl] QUrl now fully accepts non-character unicode points; they are encoded as percent characters; they can also be pretty decoded [ChangeLog][QtCore][QJson] The Writer and the Parser now fully accept non-character unicode points. Change-Id: I77cf4f0e6210741eac8082912a0b6118eced4f77 Task-number: QTBUG-33229 Reviewed-by: Lars Knoll <lars.knoll@digia.com> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Make sure that QUrl::FullyDecoded mode uses U+FFFD for bad UTF-8Thiago Macieira2013-08-041-0/+23
| | | | | | | | It's a good practice to always replace bad UTF-8 sequences with the replacement character. It could be considered a security issue too. Change-Id: I9e7d72e4c4102cdb8334449b5e7f882228a9048f Reviewed-by: David Faure (KDE) <faure@kde.org>
* QUrlQuery: update our understanding of delimitersThiago Macieira2013-08-041-16/+5
| | | | | | | | | | | | This commit is similar to the previous commit that changed the behavior of QUrl, but now to QUrlQuery. We can now remove a section of qt_urlDecode, which is no longer used: there's no "decode everything" mode anymore. Task-number: QTBUG-31660 Change-Id: I66cfbfd290eeba5b04688cd5ffd615dd57cc6309 Reviewed-by: David Faure (KDE) <faure@kde.org>
* QUrl: update our understanding of the encoding of delimitersThiago Macieira2013-08-041-57/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The longer explanation can be found in the comment in qurl.cpp. The short version is as follows: Up to now, we considered that every character could be replaced with its percent-encoding equivalent and vice-versa, so long as the parsing of the URL did not change. For example, x:/path+path and x:/path%2Bpath were the same. However, to do this and yet be compliant with most URL uses in the real world, we had to add exceptions: - "/" and "%2F" were not the same in the path, despite the delimiter being behind (rationale was the complex definition of path) - "+" and "%2B" were not the same in the query, so we ended up not transforming any sub-delim in the query at all Now, we change our understanding based on the following line from RFC 3986 section 2.2: URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent. From now on, QUrl will not replace any sub-delim or gen-delim ("reserved character"), except where such a character could not exist in the first place. This simplifies the code and removes all exceptions. As a side-effect, this has also changed the behaviour of the "{" and "}" characters, which we previously allowed to remain decoded. [ChangeLog][Important Behavior Changes][QUrl and QUrlQuery] QUrl no longer considers all delimiter characters equivalent to their percent-encoded forms. Now, both classes always keep all delimiters exactly as they were in the original URL text. [ChangeLog][Important Behavior Changes][QUrl and QUrlQuery] QUrl no longer decodes %7B and %7D to "{" and "}" in the output of toString() Task-number: QTBUG-31660 Change-Id: Iba0b5b31b269635ac2d0adb2bb0dfb74c139e08c Reviewed-by: David Faure (KDE) <faure@kde.org>
* Make the URL Recode function to fix bad input in FullyDecoded mode tooThiago Macieira2013-07-201-3/+10
| | | | | | | | | | | | So far, this function hasn't been used for input coming in from the user, so it wasn't necessary. But we may want to do it, or we may already be doing it accidentally somewhere that isn't triggering the failed assertions during unit testing. So let's be on the safe side and allow it. And test it too. Change-Id: Ib63addd8da468ad6908278d07a4829f1bdc26a07 Reviewed-by: David Faure (KDE) <faure@kde.org>
* Remove use of 'register' from Qt.Stephen Kelly2013-06-171-2/+2
| | | | | | | | | | It is deprecated and clang is starting to warn about it. Patch mostly generated by clang itself, with some careful grep and sed for the platform-specific parts. Change-Id: I8058e6db0f1b41b33a9e8f17a712739159982450 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Change copyrights from Nokia to DigiaIikka Eklund2012-09-221-23/+23
| | | | | | | | Change copyrights and license headers from Nokia to Digia Change-Id: If1cc974286d29fd01ec6c19dd4719a67f4c3f00e Reviewed-by: Lars Knoll <lars.knoll@digia.com> Reviewed-by: Sergio Ahumada <sergio.ahumada@digia.com>
* Fix decoding of QByteArray in the deprecated "encoded" setters in QUrlThiago Macieira2012-08-201-0/+50
| | | | | | | | | | | | | | The asymmetry is intentional: the getters can use toLatin1() because the called functions, with a QUrl::FullyEncoded parameter, return ASCII only. This gives a small performance improvement over the need to run the UTF-8 encoder. However, the data passed to setters could contain non-ASCII binary data, in addition to the percent-encoded data. We can't use fromUtf8 because it's binary and we can't use toPercentEncoded because it already encoded. Change-Id: I5ecdb49be5af51ac86fd9764eb3a6aa96385f512 Reviewed-by: David Faure <faure@kde.org>
* Add the QUrl::FullyDecoded flag to the component formattingThiago Macieira2012-05-221-0/+47
| | | | | | | | | | | | | | | | | | This allows the QUrl component getters to return fully decoded data, like they did in Qt 4. This is necessary for some use-cases where the component like the user name, password or path are used outside the context of a URL. In those contexts, the percent-encoded data makes no sense, and the loss of data of what could be represented in a URL is acceptable. Also take the opportunity to expand the documentation of those getter methods, explaining what the options argument does. Discussed-on: http://lists.qt-project.org/pipermail/development/2012-May/003811.html Change-Id: I89f743cde78c02f169c88314bff0768714341419 Reviewed-by: Lars Knoll <lars.knoll@nokia.com> Reviewed-by: David Faure <faure@kde.org> Reviewed-by: Shane Kearns <shane.kearns@accenture.com>
* QChar: add isSurrogate() and isNonCharacter() to the public APIKonstantin Ritt2012-05-161-14/+1
| | | | | | | | + QChar::LastValidCodePoint enum value that supercede the UNICODE_LAST_CODEPOINT macro replace uses of hardcoded values with the new API; remove leftovers Change-Id: I1395c9840b85fcb6b08e241b131794a98773c952 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Fix compiler warningOlivier Goffart2012-05-041-3/+2
| | | | | | | | qurlrecode.cpp:481:24: warning: ‘action’ may be used uninitialized in this function [-Wmaybe-uninitialized] Change-Id: I638b65218d1875667e2c60a5720ecda87202b82f Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Change the component formatting enum values so the default is zeroThiago Macieira2012-04-111-6/+33
| | | | | | | | | | | | | | | | | | | | | | By having the default value equal to zero, we follow the principle of least surprise. For example, if we had url.path() and we refactored to url.path(QUrl::DecodeSpaces) Then instead of ensuring spaces are decoded, we make spaces the only thing encoded (unicode, delimiters and reserved characters are encoded). Besides, modifying the default can only be used to encode something that wasn't encoded previously, so having the enums as Encode makes more sense. As a side-effect, toEncoded() does not support any extra encoding options. Change-Id: I2624ec446e65c2d979e9ca2f81bd3db22b00bb13 Reviewed-by: Shane Kearns <shane.kearns@accenture.com>
* Introduce QUrl::DecodeReserved and reorder the enumsThiago Macieira2012-04-111-1/+122
| | | | | | | | DecodeReserved applies to all characters between 0x21 and 0x7E that aren't unreserved, a delimiter, or the percent sign itself. Change-Id: Ie64bddb6b814dfa3bb8380e3aa24de1bb3645a65 Reviewed-by: Shane Kearns <shane.kearns@accenture.com>
* Merge QUrl::DecodeAllDelimiters and QUrl::DecodeUnambiguousDelimitersThiago Macieira2012-04-111-1/+1
| | | | | | | | | | | | | | | | | | | There's little value in having the DecodeUnambiguousDelimiters option since neither QUrl nor QUrlQuery can return values that are ambiguous in that particular context, ever. This option could be used to encode a character if, when placed in a URL, it would need to be encoded. Such cases are hash (#) or question marks (?) in the path component, or slashes (/) and at signs (@) in the userinfo. However, we don't need two enums for that, since there are no other characters that can appear in either form. Still, leave two bits for this enum. In the future, if we want to split the gen-delims from the sub-delims, we are able to. Change-Id: If5416b524680eb67dd4abbe7d072ca0ef7218506 Reviewed-by: Shane Kearns <shane.kearns@accenture.com>
* Refactor the URL recoder a littleThiago Macieira2012-03-301-77/+80
| | | | | | | | | | | | | Change it to operate on QChar pointers, which gains a little in performance. This also avoids unnecessary detaching in the QString source. In addition, make the output be appended to an existing QString. This will be useful later when we're reconstructing a URL from its components. Change-Id: I7e2f64028277637bd329af5f98001ace253a50c7 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Remove the tolerant parsing function and make the recoder tolerantThiago Macieira2012-03-301-106/+95
| | | | | | | | | | | | | | | The reason for this change is that the strict parser made little sense to exist. What would the recoder do if it was passed an invalid string? I believe that the tolerant recoder is more efficient than the correcting code followed by the strict recoder. This makes the recoder more complex and probably a little less efficient, but it's better in the common case (tolerant that doesn't need fixes) and in the worst case (needs fixes). Change-Id: I68a0c9fda6765de05914cbd6ba7d3cea560a7cd6 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>
* Add the code that recodes URLs.Thiago Macieira2012-03-301-0/+504
This one function is an all-in-one: - UTF-8 encoder - UTF-8 decoder - percent encoder - percent decoder The next step is add the ability to modify the behaviour, by telling the function what else it must encode or decode and what it should leave untouched. Change-Id: I997eccfd2f9ad8487305670b18d6c806f4cf6717 Reviewed-by: Lars Knoll <lars.knoll@nokia.com>