qt/qtbase.git - Qt Base (Core, Gui, Widgets, Network, ...)

	Commit message (Collapse)	Author	Age	Files	Lines
*	Use VPMASKMOV in the epilogue ARGB->ARGB{32,64} AVX2 epilogues	Thiago Macieira	2019-01-23	1	-97/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of stepping down to 4 pixels, then 2 px, then 1, with essentially the same code, let's use maskload and maskstore to only load and store the effective portions (instructions new in AVX2). The secondary loop gets run at most twice, since there can be at most 7 pixels left. This fixes an off-by-4 bug in the previous implementation (lines 1041 and 1186 should have had 7 instead of 3). Change-Id: I4d4dadb709f1482fa8ccfffd157862e77ac508f6 Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
*	Fix the AVX2 ARGB->ARGB64 conversion code	Thiago Macieira	2019-01-15	1	-5/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit c8c5ff19de1c34a99b8315e59015d115957b3584 introduced the solution as a simple scaling up of the code in qdrawhelper_sse4.cpp, but it's bad due to the way that the 256-bit unpack instructions work: the unpack-low instruction unpacks the lower half of each half of the 256-bit register. So we fix it up by inserting a permute4x64 that swaps the middle two quarters of the 256-bit register (permute8x32 requires a __m256i parameter, instead of an immediate). This introduces an instruction that costs 3 cycles in each loop, but since the AVX2 code has double the throughput compared to SSE4 code, it should still be faster. This problem does not affect the ARGB->ARGB32 code because that repacks at the end. Change-Id: I4d4dadb709f1482fa8ccfffd1578620b45166a4f Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
*	Add AVX2 version of ARGB->ARGB32PM	Thiago Macieira	2019-01-09	1	-0/+138
\| \| \| \| \| \| \| \|	Similar to the previous commit. This also removes the SSE4 implementations from Qt builds that use AVX2 throughout. Change-Id: I251f00d706d646ed87b4fffd1577f96ed52a4cf4 Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
*	Add AVX2 version of the ARGB32->RGBA64PM code	Thiago Macieira	2019-01-09	1	-0/+134
\| \| \| \| \|	Change-Id: I251f00d706d646ed87b4fffd1577f84854e358a4 Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
*	Work around GCC bug in generating 64-bit population of SSE register	Thiago Macieira	2018-12-12	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We know what code we want it to generate, so I just replaced the _mm_set1_epi64x() with the code we want it to generate. Except that GCC sees through and tries to "optimize" my code... so that asm() statement makes it separate the two operations. This generates optimal code for both 32- and 64-bit. 64-bit: vmovq %rdi, %xmm0 vpbroadcastq %xmm0, %ymm0 32-bit: vmovq 8(%esp), %xmm0 vpbroadcastq %xmm0, %ymm0 See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80820 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87976 Change-Id: I42a48bd64ccc41aebf84fffd15664109b97fe42b Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
*	Use Q_DECL_VECTORCALL in a few more places	Thiago Macieira	2018-12-11	1	-6/+11
\| \| \| \| \| \| \| \| \| \| \|	There were a few functions that passed vectors in parameters but did not mark as vectorcall. I've taken the opportunity to de-macroify one macro, but I'm not going to do it for the rest. Change-Id: I42a48bd64ccc41aebf84fffd1564bfc21faa2a14 Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
*	Add AVX2 versions of qt_memfill32 and qt_memfill64	Thiago Macieira	2018-12-11	1	-1/+52
\| \| \| \| \| \| \| \| \| \|	The implementation is almost the same 4-way-unrolled loop, but because of the wider registers, we fill 128 bytes per loop. Unlike the SSE2 implementation, the AVX2 version uses unaligned stores and won't try to align in the prologue, matching glibc's __memset_avx2 (also unaligned). Change-Id: Iba4b5c183776497d8ee1fffd15637ccb2a7b83bc Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
*	Optimize intermediate_adder_avx2	Allan Sandfeld Jensen	2018-05-07	1	-6/+7
\| \| \| \| \| \| \| \|	Use 16-bit multiplication as it is twice as fast as 32-bit multiplication. Change-Id: I64b529eaaed4ce2c59c64a0120e93cd132724156 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
*	Use simple scaling for downscaling less than 2x	Allan Sandfeld Jensen	2018-03-07	1	-27/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The simple scaling that only samples every input pixel once, can be used with downscaling < 2x as well if we just handle the case where the input can't be in the intermediate buffer. At the same time the handling of the intermediate buffer has been moved out of simple scale helper functions so the code can be shared and the AVX2 optimizations also used for non-argb32pm formats. Change-Id: I98d225ef8d4f2978480d09110c959b556c563b57 Reviewed-by: Eirik Aavitsland <eirik.aavitsland@qt.io> Reviewed-by: Lars Knoll <lars.knoll@qt.io>
*	Fix broken rendering of RGB30 and ARGB32 on machines with AVX2	Allan Sandfeld Jensen	2018-01-27	1	-2/+2
\| \| \| \| \| \| \|	Two small changes late in the review process were flawed. Change-Id: I4b1f6e3fdb8e17000a2e11bc30aae1b29d9f43a9 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
*	Add AVX2 optimized versions of the most basic RGB64 compositions	Allan Sandfeld Jensen	2018-01-04	1	-0/+165
\| \| \| \| \| \| \|	Speeds up RGB30 and ARGB32-unpremul painting. Change-Id: I419afdf5c26ceffc0f7557b8f196035056178c9a Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
*	Improve readability of code that uses the Qt signed size typev5.10.0-rc2	Simon Hausmann	2017-11-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During the container BoF session at the Qt Contributor Summit 2017 the name of the signed size type became a subject of discussion in the context of readability of code using this type and the intention of using it for all length, size and count properties throughout the entire framework in future versions of Qt. This change proposes qsizetype as new name for qssize_t to emphasize the readability of code over POSIX compatibility, the former being potentially more relevant than the latter to the majority of users of Qt. Change-Id: Idb99cb4a8782703c054fa463a9e5af23a918e7f3 Reviewed-by: Samuel Gaist <samuel.gaist@edeltech.ch> Reviewed-by: David Faure <david.faure@kdab.com>
*	Fix handling of mirroring upscaling in simple bilinear upscaler	Allan Sandfeld Jensen	2017-08-10	1	-7/+10
\| \| \| \| \| \| \| \| \| \|	Calculates the correct offsets and coordinate transforms for the intermediate buffer. This means we can conceptually simplify our path switches instead of having downscale routines handling mirrored upscaling. Change-Id: I60efa7feaba80165672ca0ce064515fdf620869d Reviewed-by: Eirik Aavitsland <eirik.aavitsland@qt.io>
*	Allow QImage with more than 2GByte of image data	Allan Sandfeld Jensen	2017-07-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Changes internal data-size and pointer calculations to qssize_t. Adds new sizeInBytes() accessor to read byte size, and marks the old one deprecated. Task-number: QTBUG-50912 Change-Id: Idf0c2010542b0ec1c9abef8afd02d6db07f43e6d Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
*	Add AVX2 optimized bilinear texture transform	Allan Sandfeld Jensen	2017-02-28	1	-0/+414
\| \| \| \| \| \| \| \|	Implement AVX2 versions of the three optimized paths of bilinear texture transform. Change-Id: Ie7199ef7dcce1e3457535fee35822d76afc0e8ba Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
*	Manually vectorize ARGB32toARGB32PM for SSE4.1 and NEON	Allan Sandfeld Jensen	2017-01-31	1	-13/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Manually vectorizing is significantly faster because we can optimize for common cases like long stretches of opaque or transparent pixels. This is both smaller and faster than the auto-vectorized version, it is also much faster than the autovectorized version for AVX2 which then can be removed. Change-Id: I0fa80ce273a8387cc6cd084879822ad9bade385c Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
*	Fix blending of RGB32 on RGB32 with partial opacity	Allan Sandfeld Jensen	2016-12-03	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	The alpha channel of an RGB32 image was not properly ignored when doing blending with partial opacity. Now the alpha value is properly ignored, which is both more correct and faster. This also makes SSE2 and AVX2 implementations match NEON which was already doing the right thing (though had dead code for doing it wrong). Change-Id: I4613b8d70ed8c2e36ced10baaa7a4a55bd36a940 Reviewed-by: Eirik Aavitsland <eirik.aavitsland@qt.io>
*	Avoid auto-vectorization of epilogues of manual vectorization	Allan Sandfeld Jensen	2016-10-11	1	-4/+4
\| \| \| \| \| \| \| \| \| \|	Defines a structure that tells the compiler in no uncertain terms the maximum number of times a loop can be run. The reduces the size of qdrawhelper_avx2.o from 22kbytes to 11kbytes. Change-Id: Ie3d6281b04b4be3332497c15f3dfe9f185e20507 Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
*	Fix qt_blend_rgb32_on_rgb32_avx2	Allan Sandfeld Jensen	2016-09-30	1	-1/+1
\| \| \| \| \| \| \| \|	The order of the arguments to testc was wrong, it should have been the other way. Replaced with testz to also get rid of setzero. Change-Id: Iff968c140f9ca34c6bd7c7f04a3623fd8ec42e1c Reviewed-by: Eskil Abrahamsen Blomfeldt <eskil.abrahamsen-blomfeldt@qt.io>
*	Add AVX2 versions of the fast blending functions	Allan Sandfeld Jensen	2016-09-18	1	-1/+304
\| \| \| \| \| \| \| \|	This patch adds AVX2 versions of the fast blending functions that we already have SSE2 versions of. Change-Id: Ifd1a22f7891b6208cb74929ad26095d12c5a1efb Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
*	Cleanup conversion parameters	Allan Sandfeld Jensen	2016-04-11	1	-2/+2
\| \| \| \| \| \| \| \|	Removes the now unused QPixelLayout parameter, simplifies the colorTable passing and prepares for adding dithering. Change-Id: Iaf7698b248b857804d8921bf118e7cfabbabff87 Reviewed-by: Gunnar Sletta <gunnar@sletta.org>
*	Updated license headers	Jani Heikkinen	2016-01-15	1	-14/+20
\| \| \| \| \| \| \| \| \| \| \|	From Qt 5.7 -> LGPL v2.1 isn't an option anymore, see http://blog.qt.io/blog/2016/01/13/new-agreement-with-the-kde-free-qt-foundation/ Updated license headers to use new LGPL header instead of LGPL21 one (in those files which will be under LGPL v3) Change-Id: I046ec3e47b1876cd7b4b0353a576b352e3a946d9 Reviewed-by: Lars Knoll <lars.knoll@theqtcompany.com>
*	Add AVX2 autovectorized versions of premultiply	Allan Sandfeld Jensen	2015-03-10	1	-0/+54
	Following up on using GCC's autovectorizing for faster SSE4.1 premultiply, this patch adds specialized autovectorized versions of premultiply for AVX2, giving another almost doubling in speed. To make the speed up for AVX2 and also SSE4_1 available to non-GCC compilers, the target-specific methods have been moved to separate files. Change-Id: I97ce05be67f4adeeb9a096eef80fd5fb662099f3 Reviewed-by: Gunnar Sletta <gunnar@sletta.org>