summaryrefslogtreecommitdiffstats
path: root/lib/Headers/avx512vlbwintrin.h
Commit message (Collapse)AuthorAgeFilesLines
* Move the builtin headers to use the new license file header.Chandler Carruth2019-04-081-17/+3
| | | | | | | | | | | | | | | | | | Summary: These all had somewhat custom file headers with different text from the ones I searched for previously, and so I missed them. Thanks to Hal and Kristina and others who prompted me to fix this, and sorry it took so long. Reviewers: hfinkel Subscribers: mcrosier, javed.absar, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D60406 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@357941 91177308-0d34-0410-b5e6-96231b3b80d8
* [X86] Add explicit alignment to __m128/__m128i/__m128d/etc. to allow ↵Craig Topper2019-02-081-8/+8
| | | | | | | | | | | | | | | | | | | | | | | matching of MSVC behavior with #pragma pack. Summary: With MSVC, #pragma pack is ignored when there is explicit alignment. This differs from gcc. Clang emulates this difference when compiling for Windows. It appears that MSVC and its headers consider the __m128/__m128i/__m128d/etc. types to be explicitly aligned and ignores #pragma pack for them. Since we don't have explicit alignment on them in our headers, we don't match the MSVC behavior here. This patch adds explicit alignment to match this behavior. I'm hoping this won't cause any problems when we're not emulating MSVC. But if someone knows of something that would be different we can swith to conditionally adding the alignment based on _MSC_VER. I had to add explicitly unaligned types as well so we could use them in the loadu/storeu intrinsics which use __attribute__(__packed__). Using the now explicitly aligned types wouldn't produce align 1 accesses when targeting Windows. Reviewers: rnk, erichkeane, spatel, RKSimon Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D57961 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@353555 91177308-0d34-0410-b5e6-96231b3b80d8
* [X86] Add more intrinsics to match icc.Craig Topper2018-10-201-1/+74
| | | | | | | | | | This adds _mm_loadu_epi8, _mm256_loadu_epi8, _mm512_loadu_epi8 _mm_loadu_epi16, _mm256_loadu_epi16, _mm512_loadu_epi16 _mm_storeu_epi8, _mm256_storeu_epi8, _mm512_storeu_epi8 _mm_storeu_epi16, _mm256_storeu_epi16, _mm512_storeu_epi16 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@344862 91177308-0d34-0410-b5e6-96231b3b80d8
* [X86] Lowering integer truncation intrinsics to native IRMikhail Dvoretckii2018-07-101-4/+4
| | | | | | | | | | | | | | This patch lowers the _mm[256|512]_cvtepi{64|32|16}_epi{32|16|8} intrinsics to native IR in cases where the result's length is less than 128 bits. The resulting IR for 256-bit inputs is folded into VPMOV instructions, while for 128-bit inputs the vpshufb (or, in the 64-to-32-bit case, vinsertps) instructions are generated instead Differential Revision: https://reviews.llvm.org/D48712 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@336643 91177308-0d34-0410-b5e6-96231b3b80d8
* [Builtins][Attributes][X86] Tag all X86 builtins with their required vector ↵Craig Topper2018-07-091-310/+312
| | | | | | | | | | | | | | | | | | | | | | | | width. Add a min_vector_width function attribute and tag all x86 instrinsics with it This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter. Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible. To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future. To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin. To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute. To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins. There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch. Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue. Differential Revision: https://reviews.llvm.org/D48617 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@336583 91177308-0d34-0410-b5e6-96231b3b80d8
* [X86] Correct the width of mask arguments in intrinsic headers and tests.Craig Topper2018-06-301-3/+3
| | | | | | | | | | | | | All of these found by grepping through IR from the builtin tests for extra trunc and zext/sext instructions that shouldn't have been there. Some of these were real bugs where we lost bits from the user input: _mm512_mask_broadcast_f32x8 _mm512_maskz_broadcast_f32x8 _mm512_mask_broadcast_i32x8 _mm512_maskz_broadcast_i32x8 _mm256_mask_cvtusepi16_storeu_epi8 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@336042 91177308-0d34-0410-b5e6-96231b3b80d8
* [X86] Remove masking from dbpsadbw builtins, use select builtin instead.Craig Topper2018-06-111-24/+16
| | | | git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@334385 91177308-0d34-0410-b5e6-96231b3b80d8
* [X86] Remove __extension__ from macro intrinsics when its not needed.Craig Topper2018-05-311-68/+68
| | | | | | | | | | I think this is a holdover from when we used to declare variables inside the macros. And then its been copy and pasted forward for years every time a new macro intrinsic gets added. Interestingly this caused some tests for IRGen to be slightly more optimized. We now return a zeroinitializer directly instead of going through a store+load. It also removed a bogus error message on another test. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@333613 91177308-0d34-0410-b5e6-96231b3b80d8
* [X86] Reduce the number of setzero intrinsics to just the set defined by the ↵Craig Topper2018-05-301-19/+14
| | | | | | | | | | Intel Intrinsics Guide. We had quite a few for different element sizes of integers sometimes with strange target features attached to them. We only need a single version for each of _m128i, _m256i, and _m512i with the target feature that first introduced those types. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@333568 91177308-0d34-0410-b5e6-96231b3b80d8
* [X86] Merge the 3 different flavors of masked vpermi2var/vpermt2var builtins ↵Craig Topper2018-05-291-48/+38
| | | | | | to a single version without masking. Use select builtins with appropriate operand instead. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@333387 91177308-0d34-0410-b5e6-96231b3b80d8
* [X86] Remove mask arguments from permvar builtins/intrinsics. Use a select ↵Craig Topper2018-05-201-24/+14
| | | | | | | | in IR instead. Someday maybe we'll use selects for all the builtins. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@332825 91177308-0d34-0410-b5e6-96231b3b80d8
* [X86] Fix a bad cast from mask16 to mask8 in _mm256_mask_cvtepi16_epi8 ↵Craig Topper2018-05-181-2/+2
| | | | | | introduced in r332266. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@332738 91177308-0d34-0410-b5e6-96231b3b80d8
* [X86] Use __builtin_convertvector to replace some of the avx512 truncate ↵Craig Topper2018-05-141-9/+7
| | | | | | | | | | builtins. As long as the destination type is a 256 or 128 bit vector with the same number of elements we can use __builtin_convertvector to directly generate trunc IR instruction which will be handled natively by the backend. Differential Revision: https://reviews.llvm.org/D46742 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@332266 91177308-0d34-0410-b5e6-96231b3b80d8
* [X86] test/testn intrinsics lowering to IR. clang sideUriel Korach2017-11-131-40/+27
| | | | | | | | | Change Header files of the intrinsics for lowering test and testn intrinsics to IR code. Removed test and testn builtins from clang Differential Revision: https://reviews.llvm.org/D38737 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@318035 91177308-0d34-0410-b5e6-96231b3b80d8
* [X86] Replace the mask cmpeq/cmple/cmplt/cmpgt/cmpge/cmpneq intrinsics with ↵Craig Topper2017-11-061-639/+263
| | | | | | | | macros that just pass the right comparison predicate value to the regular cmp intrinsic. Remove mask cmpeq/cmpgt builtins that are now unused. This shortens the intrinsic headers a little and allows us to get rid of the cmpeq and cmpgt handling from CGBuiltin.cpp. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@317506 91177308-0d34-0410-b5e6-96231b3b80d8
* Lowering Mask Set1 intrinsics to LLVM IRJina Nahias2017-09-191-26/+24
| | | | | | | | This patch, together with a matching llvm patch (https://reviews.llvm.org/D37669), implements the lowering of X86 mask set1 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D37668 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@313624 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX-512] Replace masked 16-bit element variable shift builtins with new ↵Craig Topper2016-11-181-108/+60
| | | | | | unmasked versions and selects. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@287313 91177308-0d34-0410-b5e6-96231b3b80d8
* [X86] Remove extra escaped new lines in intrinsic headers left over from an ↵Craig Topper2016-11-131-48/+48
| | | | | | earlier conversion away from a macro. NFC git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@286756 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX-512] Remove many of the masked 128/256-bit shift builtins and replace ↵Craig Topper2016-10-311-120/+130
| | | | | | them with unmasked builtins and selects. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@285539 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX-512] Remove masked 128/256-bit builtins for vpmaddwd and vpmaddubsw. ↵Craig Topper2016-10-301-42/+33
| | | | | | Replace with unmasked builtins and select. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@285516 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX-512] Remove 128/256-bit masked pmulhrsw/pmulhuw/pmulhw builtins and use ↵Craig Topper2016-10-291-64/+49
| | | | | | unmasked builtins and select instead. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@285505 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX-512] Replace masked 128/256-bit byte, word, and dword min/max builtins ↵Craig Topper2016-10-231-174/+126
| | | | | | with selects and the older unmasked builtins. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@284954 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX-512] Remove masked 128/256-bit packss/packus builtins and replace with ↵Craig Topper2016-10-231-85/+64
| | | | | | selects and the older unmasked builtins. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@284935 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX-512] Replace masked 128/256-bit pavg builtins and replace with select ↵Craig Topper2016-10-221-44/+32
| | | | | | and older unmasked builtins. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@284929 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX-512] Replace masked 128/256-bit saturating add/sub builtins with select ↵Craig Topper2016-10-221-177/+129
| | | | | | and older unmasked builtins. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@284928 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX-512] Replace masked 128/256-bit vpmovzx/vpmovsx builtins with native IR.Craig Topper2016-10-221-35/+31
| | | | git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@284927 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX-512] Remove masked 128/256-bit pshufb builtins. Replace with a select ↵Craig Topper2016-10-221-22/+16
| | | | | | and the older unmaksed builtins. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@284925 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX-512] Remove builtins for 128/256-bit pabsb/pabsw. We can use a select ↵Craig Topper2016-10-221-31/+31
| | | | | | and the older non-masked versions instead. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@284924 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX-512] Add typecasts to alignr intrinsics that were modified in r284920.Craig Topper2016-10-221-8/+8
| | | | git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@284923 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX-512] Remove masked 128/256-bit palignr builtins. We can just use a ↵Craig Topper2016-10-221-16/+12
| | | | | | select in the header file with the older unmasked versions instead. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@284920 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX-512] Remove masked integer mullo builtins and replace with native IR.Craig Topper2016-09-031-22/+16
| | | | git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@280597 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX-512] Remove masked integer add/sub builtins and replace with native IR.Craig Topper2016-09-031-88/+65
| | | | git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@280596 91177308-0d34-0410-b5e6-96231b3b80d8
* [X86] Block pbroadcastq instructions on 32-bit targets instead of pbroadcastb.Craig Topper2016-07-241-2/+0
| | | | | | Thanks to Simon Pilgrim for catching the mistake. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@276564 91177308-0d34-0410-b5e6-96231b3b80d8
* [X86] Add missing __x86_64__ qualifiers on a bunch of intrinsics that assume ↵Craig Topper2016-07-211-0/+2
| | | | | | | | 64-bit GPRs are available. Usages of these intrinsics in a 32-bit build results in assertions in the backend. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@276249 91177308-0d34-0410-b5e6-96231b3b80d8
* [X86][AVX512] Converted the VBROADCAST intrinsics to generic IRSimon Pilgrim2016-07-051-24/+24
| | | | git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@274544 91177308-0d34-0410-b5e6-96231b3b80d8
* [Clang][BuiltIn][AVX512] adding ↵Michael Zuckerman2016-07-051-0/+36
| | | | | | | | | _mm{|256|512}_mask_cvt{s|us|}epi16_storeu_epi8 intrinsics Differential Revision: http://reviews.llvm.org/D21729 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@274532 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX512] Replace masked unpack builtins with shufflevector and selects.Craig Topper2016-06-231-88/+64
| | | | git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@273533 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX512] Use a __v8hi vector inside of _mm_setzero_hi to match its name. ↵Craig Topper2016-06-221-1/+1
| | | | | | Probably no real functional change. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@273389 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX512] Add missing typecasts to intrinsics.Craig Topper2016-06-221-4/+4
| | | | git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@273386 91177308-0d34-0410-b5e6-96231b3b80d8
* [X86] Add explicit typecasts to some intrinsics.Craig Topper2016-06-111-4/+6
| | | | git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@272466 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX512] Implement 512-bit and masked shufflelo and shufflehi intrinsics ↵Craig Topper2016-06-111-29/+24
| | | | | | directly with __builtin_shufflevector and __builtin_ia32_select. Also improve the formatting of the AVX2 version. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@272452 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX512] Emit select instruction instead of using x86 specific instrinsics.Igor Breger2016-06-081-37/+33
| | | | | | | | This will allow us to remove the x86 instrinics from the backend. Differential Revision: http://reviews.llvm.org/D21060 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@272141 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX512][Builtin] Fix palignr intrinsic for avx512vlbw. The immediate should ↵Craig Topper2016-05-271-4/+4
| | | | | | | | not be multiplied by 8. The 512-bit version was fixed recently but this was missed. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@270970 91177308-0d34-0410-b5e6-96231b3b80d8
* [AVX512] Add parentheses around macro arguments in AVX512VLBW intrinsics. ↵Craig Topper2016-05-171-240/+184
| | | | | | | | Remove leading underscores from macro argument names. Add explicit typecasts to all macro arguments and return values. And finally reformat after all the adjustments. This is a mostly mechanical change accomplished with a script. I tried to split out any changes to the typecasts that already existed into separate commits. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@269743 91177308-0d34-0410-b5e6-96231b3b80d8
* [Clang][avx512][Builtin] Adding intrinsics for cvtw2mask{128|256|512} ↵Michael Zuckerman2016-05-031-0/+12
| | | | | | | | | instruction set Differential Revision: http://reviews.llvm.org/D19766 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@268385 91177308-0d34-0410-b5e6-96231b3b80d8
* [clang][AVX512][Builtin] Adding intrinsics for the SAD instruction set.Michael Zuckerman2016-04-281-0/+48
| | | | | | | Differential Revision: http://reviews.llvm.org/D19591 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@267942 91177308-0d34-0410-b5e6-96231b3b80d8
* [Clang][BuiltIn][AVX512] Adding intrinsics fot align{d|q} and palignr ↵Michael Zuckerman2016-04-281-0/+34
| | | | | | | | | | instruction set Differential Revision: http://reviews.llvm.org/D19588 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@267876 91177308-0d34-0410-b5e6-96231b3b80d8
* [Clang][AVX512][BuiltIn] Adding support to intrinsics of VPERMD and VPERMW ↵Michael Zuckerman2016-04-251-0/+55
| | | | | | | | | instruction set Differential Revision: http://reviews.llvm.org/D19195 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@267380 91177308-0d34-0410-b5e6-96231b3b80d8
* [Clang][AVX512][Builtin] Adding support for VBROADCAST and ↵Michael Zuckerman2016-04-131-3/+96
| | | | | | | | | | | VPBROADCASTB/W/D/Q instruction set Differential Revision: http://reviews.llvm.org/D19012 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@266195 91177308-0d34-0410-b5e6-96231b3b80d8
* [Clang][AVX512][Builtin] Adding supporting to intrinsics of ↵Michael Zuckerman2016-04-131-0/+36
| | | | | | | | | | cvt{b|d|q}2mask{128|256|512} and cvtmask2{b|d|q}{128|256|512} instruction set. Differential Revision: http://reviews.llvm.org/D19009 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@266188 91177308-0d34-0410-b5e6-96231b3b80d8