summaryrefslogtreecommitdiffstats
path: root/docs/OpenMPSupport.rst
diff options
context:
space:
mode:
Diffstat (limited to 'docs/OpenMPSupport.rst')
-rw-r--r--docs/OpenMPSupport.rst83
1 files changed, 38 insertions, 45 deletions
diff --git a/docs/OpenMPSupport.rst b/docs/OpenMPSupport.rst
index 04a9648ca2..a8bfddce63 100644
--- a/docs/OpenMPSupport.rst
+++ b/docs/OpenMPSupport.rst
@@ -17,60 +17,50 @@
OpenMP Support
==================
-Clang fully supports OpenMP 4.5. Clang supports offloading to X86_64, AArch64,
-PPC64[LE] and has `basic support for Cuda devices`_.
-
-Standalone directives
-=====================
-
-* #pragma omp [for] simd: :good:`Complete`.
-
-* #pragma omp declare simd: :partial:`Partial`. We support parsing/semantic
- analysis + generation of special attributes for X86 target, but still
- missing the LLVM pass for vectorization.
-
-* #pragma omp taskloop [simd]: :good:`Complete`.
-
-* #pragma omp target [enter|exit] data: :good:`Complete`.
-
-* #pragma omp target update: :good:`Complete`.
-
-* #pragma omp target: :good:`Complete`.
+Clang supports the following OpenMP 5.0 features
-* #pragma omp declare target: :good:`Complete`.
+* The `reduction`-based clauses in the `task` and `target`-based directives.
-* #pragma omp teams: :good:`Complete`.
+* Support relational-op != (not-equal) as one of the canonical forms of random
+ access iterator.
-* #pragma omp distribute [simd]: :good:`Complete`.
+* Support for mapping of the lambdas in target regions.
-* #pragma omp distribute parallel for [simd]: :good:`Complete`.
+* Parsing/sema analysis for the requires directive.
-Combined directives
-===================
+* Nested declare target directives.
-* #pragma omp parallel for simd: :good:`Complete`.
+* Make the `this` pointer implicitly mapped as `map(this[:1])`.
-* #pragma omp target parallel: :good:`Complete`.
+* The `close` *map-type-modifier*.
-* #pragma omp target parallel for [simd]: :good:`Complete`.
-
-* #pragma omp target simd: :good:`Complete`.
-
-* #pragma omp target teams: :good:`Complete`.
-
-* #pragma omp teams distribute [simd]: :good:`Complete`.
-
-* #pragma omp target teams distribute [simd]: :good:`Complete`.
-
-* #pragma omp teams distribute parallel for [simd]: :good:`Complete`.
-
-* #pragma omp target teams distribute parallel for [simd]: :good:`Complete`.
+Clang fully supports OpenMP 4.5. Clang supports offloading to X86_64, AArch64,
+PPC64[LE] and has `basic support for Cuda devices`_.
-Clang does not support any constructs/updates from OpenMP 5.0 except
-for `reduction`-based clauses in the `task` and `target`-based directives.
+* #pragma omp declare simd: :partial:`Partial`. We support parsing/semantic
+ analysis + generation of special attributes for X86 target, but still
+ missing the LLVM pass for vectorization.
In addition, the LLVM OpenMP runtime `libomp` supports the OpenMP Tools
-Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and mac OS.
+Interface (OMPT) on x86, x86_64, AArch64, and PPC64 on Linux, Windows, and macOS.
+
+General improvements
+--------------------
+- New collapse clause scheme to avoid expensive remainder operations.
+ Compute loop index variables after collapsing a loop nest via the
+ collapse clause by replacing the expensive remainder operation with
+ multiplications and additions.
+
+- The default schedules for the `distribute` and `for` constructs in a
+ parallel region and in SPMD mode have changed to ensure coalesced
+ accesses. For the `distribute` construct, a static schedule is used
+ with a chunk size equal to the number of threads per team (default
+ value of threads or as specified by the `thread_limit` clause if
+ present). For the `for` construct, the schedule is static with chunk
+ size of one.
+
+- Simplified SPMD code generation for `distribute parallel for` when
+ the new default schedules are applicable.
.. _basic support for Cuda devices:
@@ -111,7 +101,7 @@ between the threads in the parallel regions.
Collapsed loop nest counter
---------------------------
-When using the collapse clause on a loop nest the default behaviour is to
+When using the collapse clause on a loop nest the default behavior is to
automatically extend the representation of the loop counter to 64 bits for
the cases where the sizes of the collapsed loops are not known at compile
time. To prevent this conservative choice and use at most 32 bits,
@@ -134,5 +124,8 @@ Features not supported or with limited support for Cuda devices
- Automatic translation of math functions in target regions to device-specific
math functions is not implemented yet.
-- Debug information for OpenMP target regions is not supported yet.
+- Debug information for OpenMP target regions is supported, but sometimes it may
+ be required to manually specify the address class of the inspected variables.
+ In some cases the local variables are actually allocated in the global memory,
+ but the debug info may be not aware of it.