[CUDA] Expand upon --cuda-gpu-arch flag in CompileCudaWithLLVM doc.

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@280848 91177308-0d34-0410-b5e6-96231b3b80d8
author: Justin Lebar <jlebar@google.com> 2016-09-07 20:09:46 +0000
committer: Justin Lebar <jlebar@google.com> 2016-09-07 20:09:46 +0000
commit: 78e95faf4b44ef6178211b2405271412ae5bea8c (patch)
tree: 6d7678b846dee214788be7b0af4732bbabb2dbf7 /docs/CompileCudaWithLLVM.rst
parent: a804c5a9aac31c25a5345a240919ee0d70671ce6 (diff)
1 files changed, 7 insertions, 0 deletions
diff --git a/docs/CompileCudaWithLLVM.rst b/docs/CompileCudaWithLLVM.rst
index f57839cec961..85aab5dda0f2 100644
--- a/docs/CompileCudaWithLLVM.rst
+++ b/docs/CompileCudaWithLLVM.rst
@@ -119,6 +119,13 @@ your GPU <https://developer.nvidia.com/cuda-gpus>`_. For example, if you want
 to run your program on a GPU with compute capability of 3.5, you should specify
 ``--cuda-gpu-arch=sm_35``.
 
+Note: You cannot pass ``compute_XX`` as an argument to ``--cuda-gpu-arch``;
+only ``sm_XX`` is currently supported.  However, clang always includes PTX in
+its binaries, so e.g. a binary compiled with ``--cuda-gpu-arch=sm_30`` would be
+forwards-compatible with e.g. ``sm_35`` GPUs.
+
+You can pass ``--cuda-gpu-arch`` multiple times to compile for multiple archs.
+
 Detecting clang vs NVCC
 =======================
author	Justin Lebar <jlebar@google.com>	2016-09-07 20:09:46 +0000
committer	Justin Lebar <jlebar@google.com>	2016-09-07 20:09:46 +0000
commit	78e95faf4b44ef6178211b2405271412ae5bea8c (patch)
tree	6d7678b846dee214788be7b0af4732bbabb2dbf7 /docs/CompileCudaWithLLVM.rst
parent	a804c5a9aac31c25a5345a240919ee0d70671ce6 (diff)