UTF-8: always store the SIMD result, even if invalid

For ASCII content, this improves the throughput because the conditional is no longer on the codepath to storing, so the processor can perform the store at the same time as it's doing the movemask operation. However, the gain is mostly theoretical: benchmarking with mostly ASCII content shows the algorithm running within 0.5% of the previous result (which is noise). For non-ASCII content, we're comparing the cost of doing a 16-byte store (which may be completely overwritten) with the loop copying and shifting left. Benchmarking shows a slight gain of a few percent. Change-Id: I28ef0021dffc725a922c539cc5976db367f36e78 Reviewed-by: Allan Sandfeld Jensen <allan.jensen@digia.com>
author: Thiago Macieira <thiago.macieira@intel.com> 2014-02-21 16:26:32 -0800
committer: The Qt Project <gerrit-noreply@qt-project.org> 2014-05-27 09:48:32 +0200
commit: d0c38291eb8c58b85c5df21afe6c3b729b3cb492 (patch)
tree: 673e3549c4754b7911183428fba67b5db3e71f40 /src
parent: 8917179b47cf2ead645aaacfb680d261308cd866 (diff)
1 files changed, 8 insertions, 11 deletions
diff --git a/src/corelib/codecs/qutfcodec.cpp b/src/corelib/codecs/qutfcodec.cpp
index c5f580e13d..4fb32dcc59 100644
--- a/src/corelib/codecs/qutfcodec.cpp
+++ b/src/corelib/codecs/qutfcodec.cpp
@@ -74,25 +74,22 @@ static inline bool simdEncodeAscii(uchar *&dst, const ushort *&nextAscii, const
         __m128i packed = _mm_packus_epi16(data1, data2);
         __m128i nonAscii = _mm_cmpgt_epi8(packed, _mm_setzero_si128());
 
+        // store, even if there are non-ASCII characters here
+        _mm_storeu_si128((__m128i*)dst, packed);
+
         // n will contain 1 bit set per character in [data1, data2] that is non-ASCII (or NUL)
         ushort n = ~_mm_movemask_epi8(nonAscii);
         if (n) {
-            // copy the front part that is still ASCII
-            while (!(n & 1)) {
-                *dst++ = *src++;
-                n >>= 1;
-            }
-
             // find the next probable ASCII character
             // we don't want to load 32 bytes again in this loop if we know there are non-ASCII
             // characters still coming
-            n = _bit_scan_reverse(n);
-            nextAscii = src + n + 1;
+            nextAscii = src + _bit_scan_reverse(n) + 1;
+
+            n = _bit_scan_forward(n);
+            dst += n;
+            src += n;
             return false;
         }
-
-        // pack
-        _mm_storeu_si128((__m128i*)dst, packed);
     }
     return src == end;
 }
author	Thiago Macieira <thiago.macieira@intel.com>	2014-02-21 16:26:32 -0800
committer	The Qt Project <gerrit-noreply@qt-project.org>	2014-05-27 09:48:32 +0200
commit	d0c38291eb8c58b85c5df21afe6c3b729b3cb492 (patch)
tree	673e3549c4754b7911183428fba67b5db3e71f40 /src
parent	8917179b47cf2ead645aaacfb680d261308cd866 (diff)