summaryrefslogtreecommitdiffstats
path: root/src/gui/painting/qdrawhelper.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Long live QColorSpace and friendsAllan Sandfeld Jensen2019-02-081-11/+11
| | | | | | | | | | | | | | | | | | Adds QColorSpace and QColorTransform classes, and parsing of a common subset of ICC profiles found in images, and also parses the ICC profiles in PNG and JPEGs. For backwards compatibility no automatic color handling is done by this patch. [ChangeLog][QtGui] A QColorSpace class has been added, and color spaces are now parsed from PNG and JPEG images. No automatic color space conversion is done however, and applications must request it. Change-Id: Ic09935f84640a716467fa3a9ed1e73c02daf3675 Reviewed-by: Eirik Aavitsland <eirik.aavitsland@qt.io>
* Add AVX2 version of ARGB->ARGB32PMThiago Macieira2019-01-091-2/+15
| | | | | | | | Similar to the previous commit. This also removes the SSE4 implementations from Qt builds that use AVX2 throughout. Change-Id: I251f00d706d646ed87b4fffd1577f96ed52a4cf4 Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
* Add AVX2 version of the ARGB32->RGBA64PM codeThiago Macieira2019-01-091-0/+9
| | | | | Change-Id: I251f00d706d646ed87b4fffd1577f84854e358a4 Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
* Optimize ARGB32->RGBA64PM betterAllan Sandfeld Jensen2019-01-081-20/+28
| | | | | | | | This conversion is critical for ARGB32 painting, and no compiler optimized the premultiplication efficiently. Change-Id: Iee137c2f7020246478d09e880a7a1bf2ed3c6fd4 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Add Grayscale16 Image FormatAllan Sandfeld Jensen2018-12-121-3/+86
| | | | | | | | | | [ChangeLog][QtGui][QImage] Added support for 16-bit grayscale format. Together-with: Aaron Linville<aaron@linville.org> Task-number: QTBUG-41176 Change-Id: I5fe4f54a55ebe1413aa71b882c19627fe22362ac Reviewed-by: Nick D'Ademo <nickdademo@gmail.com> Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Add an SSSE3 implementation of qt_memfill24Thiago Macieira2018-12-121-0/+9
| | | | | | | | | | | | Brought to you by the PSHUFB instruction, introduced in SSSE3 (implementation also uses PALIGNR just because we can). The tail functionality makes use of the fact that the low half of "mval2" ends in the correct content, so we overwrite up to 8 bytes close to the end. Change-Id: Iba4b5c183776497d8ee1fffd15646a620829c335 Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
* Add a qt_memfill24 implementationAllan Sandfeld Jensen2018-12-121-0/+37
| | | | | | | | | | This function gets called from qt_rectfill_quint24, which is used by the RGB666, ARGB6666_Premultiplied, ARGB8555_Premultiplied, and RGB888 formats. Together-with: Thiago Macieira <thiago.macieira@intel.com> Change-Id: Iba4b5c183776497d8ee1fffd1564585fdee835c2 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* Add AVX2 versions of qt_memfill32 and qt_memfill64Thiago Macieira2018-12-111-0/+10
| | | | | | | | | | The implementation is almost the same 4-way-unrolled loop, but because of the wider registers, we fill 128 bytes per loop. Unlike the SSE2 implementation, the AVX2 version uses unaligned stores and won't try to align in the prologue, matching glibc's __memset_avx2 (also unaligned). Change-Id: Iba4b5c183776497d8ee1fffd15637ccb2a7b83bc Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
* Add SSE2 qt_memfill64Thiago Macieira2018-12-111-0/+2
| | | | | | | | Implemented by merging with the qt_memfill32 implementation in a non-inlining function. Change-Id: I343f2beed55440a7ac0bfffd15636f8ba995a2bd Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
* Merge remote-tracking branch 'origin/5.12' into devLiang Qi2018-12-041-32/+32
|\ | | | | | | | | | | | | Conflicts: src/gui/painting/qdrawhelper.cpp Change-Id: I4916e07b635e1d3830e9b46ef7914f99bec3098e
| * Fix alignment of temporary QRgba64 buffers on win32Allan Sandfeld Jensen2018-11-301-32/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The alignment of long long on 32-bit windows is only 32-bits, but we should be safe to assume it is aligned at 64-bit. Since QRgba64 is a public type, we can't fix the alignment there without breaking ABI, so instead fix it where temporary buffers are allocated. At the same time be consistent about using QRgba64 so we can switch to it enforcing alignment in Qt6. Change-Id: Ie15c305bc867c62a13df8eb2b1678e92174e1f97 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* | Generalize fill span optimizationAllan Sandfeld Jensen2018-11-161-16/+52
| | | | | | | | | | | | | | | | | | | | This was added to RGB64 rendering because pixel conversions could be expensive, but when painting to non-standard imageformats, the same can also be expensive in the generic 32-bit backend, and thus benefit from the optimization as well. Change-Id: I747a398670b1d4dbd844a772e7aafce3c8dbef20 Reviewed-by: Eirik Aavitsland <eirik.aavitsland@qt.io>
* | Cleanup gradient blendingAllan Sandfeld Jensen2018-11-161-132/+112
| | | | | | | | | | | | | | | | | | Moving it to a separate routine like blendTexture, so DrawHelper only has solid color routines, and generalizing the vertical gradient optimization. Change-Id: I54bd59eba7e95247b9a365a3738d02c4f8cc2631 Reviewed-by: Eirik Aavitsland <eirik.aavitsland@qt.io>
* | Move qt_memfill32-based implementation of qt_memfill16 to genericThiago Macieira2018-11-111-16/+9
| | | | | | | | | | | | | | | | | | | | The SSE2 implementation and the one in qdrawhelper.cpp are almost identical. And if we make it so qt_memfill16 can tail-call to qt_memfill32, there's no need for inlining, so there's no need to keep it in qdrawhelper_sse2.cpp Change-Id: I343f2beed55440a7ac0bfffd15637027771c2254 Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
* | Use qsizetype for qt_memfill functionsThiago Macieira2018-11-111-3/+3
| | | | | | | | | | | | | | Just in case the image is larger than 2 GB (512 megapixels). Change-Id: I343f2beed55440a7ac0bfffd15636cbc68dfa13d Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
* | Merge the qt_memfill{,_template} functionsThiago Macieira2018-11-111-39/+4
| | | | | | | | | | | | | | | | | | | | | | | | We had two copies of the Duff's Device implementation, one in the .cpp and one in the header. One of the two implementations had a protection against zero counts, the other didn't. So move the .cpp implementation to the header and use it in the functions that were declared there. Fixes: QTBUG-16104 Patch-By: Allan Sandfeld Jensen <allan.jensen@qt.io> Change-Id: Iba4b5c183776497d8ee1fffd156456cc3502946e Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
* | Remove QT_MEMFILL_xxx macrosThiago Macieira2018-11-081-2/+2
| | | | | | | | | | | | | | | | They were just calling a function, may as well just call said function directly. Change-Id: I343f2beed55440a7ac0bfffd15636b183c1a420f Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
* | Remove unnecessary Q_STATIC_TEMPLATE_FUNCTION macroThiago Macieira2018-11-061-1/+1
|/ | | | | | | It expands to the same thing in all three branches. Change-Id: I343f2beed55440a7ac0bfffd15636a8bfd8fd21c Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
* Add NEON optimized ARGB32 unpremultiply routinesAllan Sandfeld Jensen2018-10-091-0/+9
| | | | | | | | Mirroring similar routines recently added for SSE4.1 Change-Id: Ibb9d10cc34655ce1dc0e97fdff4e4f6a81d47d05 Reviewed-by: Erik Verbruggen <erik.verbruggen@qt.io> Reviewed-by: Eirik Aavitsland <eirik.aavitsland@qt.io>
* Merge remote-tracking branch 'origin/5.11' into 5.12Liang Qi2018-09-271-0/+2
|\ | | | | | | | | | | | | | | Conflicts: src/corelib/global/qconfig-bootstrapped.h src/widgets/util/qcompleter.cpp Change-Id: I4f44f0f074982530f2f2e750ce696230b2754cf3
| * Disable RGB64 backend for ARGB32 when it will be very slowAllan Sandfeld Jensen2018-09-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | Fixes a speed regression on ARGB32 painting on low end hardware introduced when it was switched to using the RGB64 raster routines. It turns out several of our embedded QPA targets use ARGB32 as native format. Task-number: QTBUG-69724 Change-Id: I6d7993c12da46a85b8354eb905930dae9602b5e6 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* | Add optimized fetch ARGB32 routines for NEONAllan Sandfeld Jensen2018-09-121-0/+6
| | | | | | | | | | | | | | | | After convert and fetch were split, only convert was still NEON vectorized, while fetch is the more commonly used version. Change-Id: Iea2af7ccee6589b3d6e9908afeaae2d1ad2753be Reviewed-by: Eirik Aavitsland <eirik.aavitsland@qt.io>
* | Merge remote-tracking branch 'origin/5.11' into 5.12Qt Forward Merge Bot2018-09-071-9/+49
|\| | | | | | | Change-Id: I66c7f18a2abd13601da0947919436f7da3549ae9
| * Avoid conversion over RGBA64 for RGB32 LCD text blendingAllan Sandfeld Jensen2018-08-301-9/+49
| | | | | | | | | | | | | | | | Short-cuts the case where there is no gamma correction to avoid a conversion over RGBA64 and back. Change-Id: I100697a9f7a4b94283557b2c0eaa45e0eff81785 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* | clangcl: Fix QtGui link error (missing fetchPixelsBPP24_ssse3)Friedemann Kleint2018-08-231-3/+3
| | | | | | | | | | | | | | | | | | Do not try using SSSE3 if the compiler do not support it. (see 648ee7aa020d04b160ec56187f49f761ffab93cc). Task-number: QTBUG-50804 Change-Id: I489b0bbacfde0c647c9d5b71ca3de992d7dc8878 Reviewed-by: Friedemann Kleint <Friedemann.Kleint@qt.io>
* | Smooth image scaling for 64bit imagesAllan Sandfeld Jensen2018-08-221-37/+0
| | | | | | | | | | | | | | | | Adds support for smooth scaling 64bit images. Task-number: QTBUG-45858 Change-Id: If46030fb8e7d684159f852a3b8266a74e5e6700c Reviewed-by: Eirik Aavitsland <eirik.aavitsland@qt.io>
* | Fix big-endian buildAllan Sandfeld Jensen2018-08-211-0/+13
| | | | | | | | | | | | | | | | | | Declare rbSwap<QImage::Format_RGBA8888>, so we don't end up in a fallback definition not used by little-endian. Task-number: QTBUG-69951 Change-Id: I8512bba76da7d59a27593d37c70283d881c3e8fc Reviewed-by: Shawn Rutledge <shawn.rutledge@qt.io>
* | Implement support for 16bpc image formatsAllan Sandfeld Jensen2018-08-111-328/+907
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds support for 16bit per color image formats in QImage. This makes it possible to read and write 16bpc PNGs, and take full advantage of the 16bpc paint engine. [ChangeLog][QtGui][QImage] QImage now supports 64bit image formats with 16 bits per color channel, compatible with 16bpc PNG or RGBA16 OpenGL formats. Task-number: QTBUG-45858 Change-Id: Icd28bd5868a6efcf65cb5bd56031d42941e04099 Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
* | Merge remote-tracking branch 'origin/5.11' into devQt Forward Merge Bot2018-08-071-0/+61
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: .qmake.conf src/corelib/doc/src/objectmodel/signalsandslots.qdoc src/plugins/platforms/cocoa/qcocoamenuloader.mm src/plugins/platforms/xcb/qxcbconnection.cpp src/plugins/platforms/xcb/qxcbconnection.h src/plugins/platforms/xcb/qxcbconnection_xi2.cpp src/plugins/platforms/xcb/qxcbwindow.cpp tests/auto/gui/image/qimage/tst_qimage.cpp Done-with: Gatis Paeglis <gatis.paeglis@qt.io> Change-Id: I9bd24ee9b00d4f26c8f344ce3970aa6e93935ff5
| * Add missing optimization for loading RGB32 to RGBA64 using NEONAllan Sandfeld Jensen2018-08-031-0/+61
| | | | | | | | | | | | | | | | | | | | The rest of the RGB64 routines were optimized, but the loading of RGB32 was not as it was originally not used much, but with ARGB32 using the RGB64 backend, it is essential for decent performance. Task-number: QTBUG-69724 Change-Id: I1c02411ed29d3d993427afde44dfa83689d117e0 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* | SIMD: Add a haswell sub-architecture selection to our supportThiago Macieira2018-07-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As the comment says, Haswell is a nice divider and is a good optimization target. I'm using -march=core-avx2 instead of -march=haswell because the latter form was only added to GCC 4.9 but we still support 4.7 and that has support for AVX2. This commit changes the AVX2-optimized code in QtGui to Haswell- optimized instead. That means, for example, that qdrawhelper_avx2.cpp can now use the FMA instructions. Change-Id: If025d476890745368955fffd153129c1716ba006 Reviewed-by: Lars Knoll <lars.knoll@qt.io> Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
* | Merge "Merge remote-tracking branch 'origin/5.11' into dev" into ↵Liang Qi2018-06-081-2/+2
|\ \ | | | | | | | | | refs/staging/dev
| * | Merge remote-tracking branch 'origin/5.11' into devLiang Qi2018-06-071-2/+2
| |\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: .qmake.conf src/corelib/kernel/qeventdispatcher_cf.mm src/gui/kernel/qguiapplication_p.h src/gui/kernel/qwindowsysteminterface.cpp src/gui/kernel/qwindowsysteminterface.h src/plugins/platforms/cocoa/qcocoawindow.mm src/plugins/platforms/cocoa/qnswindowdelegate.mm src/plugins/platforms/ios/qioseventdispatcher.mm src/plugins/platforms/windows/qwindowsdrag.h src/plugins/platforms/windows/qwindowsinternalmimedata.h src/plugins/platforms/windows/qwindowsmime.cpp src/plugins/platforms/winrt/qwinrtscreen.cpp Change-Id: Ic817f265c2386e83839d2bb9ef7419cb29705246
| | * Reduce recent performance regressionAllan Sandfeld Jensen2018-05-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The change to fix 16-bit integer overflow used two floor operations when only one is necessary. With floor being rather expensive on x86 without SSE4.1 this caused a performance regression in ARGB32 smooth perspective transforms. This eliminates one of the floor operations which is unnecessary as the number is always positive in this case and thus truncation will yield the same result faster. Change-Id: Iaae76820d4bc2f368e49ed143130b5075fc760a2 Reviewed-by: Eirik Aavitsland <eirik.aavitsland@qt.io>
* | | Optimize fetchTransformed and removed specialized versionsAllan Sandfeld Jensen2018-06-071-662/+124
|/ / | | | | | | | | | | | | | | Optimize the general transformed fetcher and use it to replace several specialized copies of it that are now slower than the general one. Change-Id: Ife79ec99227f0df4f127303c7a340b2f7d174dff Reviewed-by: Eirik Aavitsland <eirik.aavitsland@qt.io>
* | Merge remote-tracking branch 'origin/5.11' into devLiang Qi2018-05-241-4/+4
|\| | | | | | | | | | | | | | | | | | | Conflicts: mkspecs/features/qt_common.prf src/corelib/tools/qstring.cpp src/plugins/platforms/windows/qwindowsmousehandler.cpp src/widgets/widgets/qmainwindowlayout_p.h Change-Id: I5df613008f6336f69b257d08e49a133d033a9d65
| * Fix potential 16-bit integer overflowAllan Sandfeld Jensen2018-05-231-4/+4
| | | | | | | | | | | | | | | | | | | | | | When multiplying a float in [0;1[ with (1<<16), with rounding, it might end up being rounded to 65536 even if the input was under 1. This patch uses a floor operation to make sure the value can be in a ushort, and cleans up the surrounding code so it is clearer what it does. Task-number: QTBUG-68360 Change-Id: I2d566586765db3d68e8e7e5fb2fd1df20dabd922 Reviewed-by: Eirik Aavitsland <eirik.aavitsland@qt.io>
* | Optimize unpremultiply using SSE rcpAllan Sandfeld Jensen2018-05-231-0/+4
| | | | | | | | | | Change-Id: I255031d354b0fde7abe8366ea2c86a35f9f24afd Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* | Reapply SSE4 acceleration to ARGB32->ARGB32PM conversionAllan Sandfeld Jensen2018-05-201-0/+6
| | | | | | | | | | | | | | | | | | After the merger of fetch and convert, we were missing the hook to the accelerated merged version of ARGB32->ARGB32PM conversion, causing a minor performance regression. Change-Id: I3965d1a95f2305306005db09640f2775aa645d2e Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
* | Merge drawhelper convert-from and storeAllan Sandfeld Jensen2018-05-021-481/+473
| | | | | | | | | | | | | | Avoids using an intermediate buffer on store and simplifies the code. Change-Id: I2dc4e735eb770f90dc99fe0f513b4df3b35ee793 Reviewed-by: Eirik Aavitsland <eirik.aavitsland@qt.io>
* | Merge remote-tracking branch 'origin/5.11' into devLars Knoll2018-04-121-53/+43
|\| | | | | | | Change-Id: I9f802cb9b4d9ccba77ca39428a5cb1afd2d01642
| * Remove last uses of interpolate_4_pixels_16 on SSE2 and NEONAllan Sandfeld Jensen2018-04-111-53/+43
| | | | | | | | | | | | | | | | | | | | | | | | With SSE2 or NEON interpolate_4_pixels is faster than interpolate_4_pixels_16, and using it saves a branch of duplicated code. Similar changes had already been done other places it was used, those have been updated to follow a similar logic. Change-Id: I040d96480f7f925f659602f66f931d28b59312a5 Reviewed-by: Lars Knoll <lars.knoll@qt.io> Reviewed-by: Eirik Aavitsland <eirik.aavitsland@qt.io>
* | Remove bit details from QPixelLayoutAllan Sandfeld Jensen2018-04-031-93/+134
| | | | | | | | | | | | | | | | | | They were only used for rgb swap and checking for the presence of an alpha channel. Change-Id: I013aa9035ccf4362fa3d9ecda41723e4ec5a44cb Reviewed-by: Qt CI Bot <qt_ci_bot@qt-project.org> Reviewed-by: Eirik Aavitsland <eirik.aavitsland@qt.io>
* | Avoid destStore64 when pixels are just repeatingAllan Sandfeld Jensen2018-03-221-0/+57
| | | | | | | | | | | | | | | | | | | | The destStore64 operation can be expensive, so when the pixels are just repeating, avoid using it and the composition, and just repeat the first generated part. Change-Id: I6e21594a9abecdc245010b956acbaa60e3fb21a3 Reviewed-by: Eirik Aavitsland <eirik.aavitsland@qt.io> Reviewed-by: Erik Verbruggen <erik.verbruggen@qt.io>
* | Merge remote-tracking branch 'origin/5.11' into devQt Forward Merge Bot2018-03-151-0/+2
|\| | | | | | | Change-Id: I8b5a10d897a926078895ae41f48cdbd2474902b8
| * Fix performance regression in simple a8 non-gamma correctedAllan Sandfeld Jensen2018-03-141-0/+2
| | | | | | | | | | | | | | Avoid doing the conversion over QRgba64 when we don't need it. Change-Id: Ic2f82bef0a80b17ef7803eedcdb0600eeac96489 Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* | Add generic optimized rectfill methodsAllan Sandfeld Jensen2018-03-131-9/+27
| | | | | | | | | | | | | | Also makes the qt_rectfill_quint16 actually work with any uint16 format Change-Id: Ibb3deed54ee1a0a86b14d5349c95f106ced057f7 Reviewed-by: Laszlo Agocs <laszlo.agocs@qt.io>
* | Remove RGB16 specific bilinear transformAllan Sandfeld Jensen2018-03-131-194/+12
| | | | | | | | | | | | | | The generic one is better optimized and faster at this point. Change-Id: Ie7eef2402265183ef4d27a7f0eab5dc801beba7a Reviewed-by: Laszlo Agocs <laszlo.agocs@qt.io>
* | Use simple scaling for downscaling less than 2xAllan Sandfeld Jensen2018-03-071-160/+161
|/ | | | | | | | | | | | | | The simple scaling that only samples every input pixel once, can be used with downscaling < 2x as well if we just handle the case where the input can't be in the intermediate buffer. At the same time the handling of the intermediate buffer has been moved out of simple scale helper functions so the code can be shared and the AVX2 optimizations also used for non-argb32pm formats. Change-Id: I98d225ef8d4f2978480d09110c959b556c563b57 Reviewed-by: Eirik Aavitsland <eirik.aavitsland@qt.io> Reviewed-by: Lars Knoll <lars.knoll@qt.io>
* Unalias some core drawhelper loopsAllan Sandfeld Jensen2018-02-061-58/+54
| | | | | | | | | | Some compilers will assume src and buffer are different and only vectorize the unaliased case and take a slow path when they are equal. In our case they are as often equal, so we need to manually unalias the variables to make sure both cases are fully optimized. Change-Id: I6ec86171dd179844facdf45376253c55980d9e36 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>