diff options
author | Thiago Macieira <thiago.macieira@intel.com> | 2019-01-09 20:24:32 -0800 |
---|---|---|
committer | Allan Sandfeld Jensen <allan.jensen@qt.io> | 2019-01-15 21:52:46 +0000 |
commit | f7a7a49f9235c9375fc515a3062341f285f3c2c3 (patch) | |
tree | df90884c90d1d129f05d3ec9657d487c9bab1218 /tests/testserver/vsftpd/vsftpd.sh | |
parent | cacf2ad9229a6842dbc0e002ed8ba4d04db026ae (diff) |
Fix the AVX2 ARGB->ARGB64 conversion code
Commit c8c5ff19de1c34a99b8315e59015d115957b3584 introduced the solution
as a simple scaling up of the code in qdrawhelper_sse4.cpp, but it's bad
due to the way that the 256-bit unpack instructions work: the unpack-low
instruction unpacks the lower half of each half of the 256-bit register.
So we fix it up by inserting a permute4x64 that swaps the middle two
quarters of the 256-bit register (permute8x32 requires a __m256i
parameter, instead of an immediate).
This introduces an instruction that costs 3 cycles in each loop, but
since the AVX2 code has double the throughput compared to SSE4 code, it
should still be faster.
This problem does not affect the ARGB->ARGB32 code because that repacks
at the end.
Change-Id: I4d4dadb709f1482fa8ccfffd1578620b45166a4f
Reviewed-by: Allan Sandfeld Jensen <allan.jensen@qt.io>
Diffstat (limited to 'tests/testserver/vsftpd/vsftpd.sh')
0 files changed, 0 insertions, 0 deletions