summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* [mlir] Start moving some builtin type formats to the dialectupstream/users/zero9178/simplify-builtin-parsingMarkus Böck2024-02-0211-289/+249
| | | | | | | | | | | | | | | | | Most types and attributes in the builtin dialect are parsed and printed using special-purpose printers and parsers for that type. They also use the low-level `Printer` rather than the `AsmPrinter`, making the implementations inconsistent compared to all other dialects in MLIR. This PR starts moving some builtin types to be parsed using the usual `print` and `parse` methods like all other MLIR dialects. This has the following advantages: * The implementation now looks like any other dialect's types * It is now possible to use `assemblyFormat` for builtin types and attributes * The code can be easily moved to other dialects if desired * Arguably better layering and less code * As a side-effect, it is now also possible to write `!builtin.<type>` for any types moved A future benefit would include being able to print types and attributes in stripped format as well (e.g. `<f32>` vs `complex<f32>`), just like all other dialect types and attributes. This is currently explicitly disabled as it causes a LOT of changes in IR syntax and I believe some ambiguities in the parser. For the purpose of reviewing and incremental development, this PR only moves `tuple`, `tensor`, `none`, `memref` and `complex`. The plan is to eventually move all attributes and types where the current syntax can be implemented within the dialect. For backwards compatibility with the existing syntax, the builtin dialect is special-cased in the printer where the `builtin.` prefix is omitted.
* clang-formatupstream/users/zero9178/qualified-traitMarkus Böck2024-02-021-1/+2
|
* [mlir] Add `Print(Attr|Type)Qualified` traitMarkus Böck2024-02-028-13/+80
| | | | | | This PR adds a new trait to attributes and types that force the use of the qualified syntax for attributes and types. More concretely, any attribute or type with the trait must be parsed and printed with the `dialect.mnemonic` prefix. The motivation for this PR is the dependent PR where it is used to retain backwards-compatibility of syntax, but downstream projects may also use the trait if the subjectively prefer the verbose syntax.
* [Clang][AArch64] Add ACLE macros for FEAT_PAuth_LR (#80163)Lucas Duarte Prates2024-02-013-0/+43
| | | | | | | | | | | | | This updates clang's target defines to include the ACLE changes covering the FEAT_PAuth_LR architecture extension. The changes include: * The new `__ARM_FEATURE_PAUTH_LR` feature macro, which is set to 1 when FEAT_PAuth_LR is available in the target. * A new bit field for the existing `__ARM_FEATURE_PAC_DEFAULT` macro, indicating the use of PC as a diversifier for Pointer Authentication (from -mbranch-protection=pac-ret+pc). The approved changes to the ACLE spec can be found here: https://github.com/ARM-software/acle/pull/292
* [flang][HLFIR] Relax verifiers of intrinsic operations (#80132)Tom Eccles2024-02-013-29/+90
| | | | | | | | | | | | | | | | | | | | | | | | | The verifiers are currently very strict: requiring intrinsic operations to be used only in cases where the Fortran standard permits the intrinsic to be used. There have now been a lot of cases where these verifiers have caused bugs in corner cases. In a recent ticket, @jeanPerier pointed out that it could be useful for future optimizations if somewhat invalid uses of these operations could be allowed in dead code. See this comment: https://github.com/llvm/llvm-project/issues/79995#issuecomment-1918118234 In response to all of this, I have decided to relax the intrinsic operation verifiers. The intention is now to only disallow operation uses that are likely to crash the compiler. Other checks are still available under `-strict-intrinsic-verifier`. The disadvantage of this approach is that IR can now represent intrinsic invocations which are incorrect. The lowering and implementation of these intrinsic functions is unlikely to do the right thing in all of these cases, and as they should mostly be impossible to generate using normal Fortran code, these edge cases will see very little testing, before some new optimization causes them to become more common. Fixes #79995
* [bazel] Fix a typo from e7d40a87ff230528131541f6ac17a2e1a7dc78e1Benjamin Kramer2024-02-011-1/+1
|
* [llvm-exegesis] Replace --num-repetitions with --min-instructions (#77153)Aiden Grossman2024-02-0110-35/+46
| | | | | | | | | This patch replaces --num-repetitions with --min-instructions to make it more clear that the value refers to the minimum number of instructions in the final assembled snippet rather than the number of repetitions of the snippet. This patch also refactors some llvm-exegesis internal variable names to reflect the name change. Fixes #76890.
* [bazel] Put back the pieces of TableGenGlobalISel that unittests depend onBenjamin Kramer2024-02-011-13/+42
| | | | This is a mess and needs to be cleaned up some day.
* [bazel] Merge TableGenGlobalISel into the tablegen targetBenjamin Kramer2024-02-011-28/+2
| | | | | These two are intertwined enough so it doesn't really make sense to have it standalone and hack around it by putting headers into both.
* [bazel] Add missing header for 7ec996d4c5c30083b070be4898140440094e6b97Benjamin Kramer2024-02-011-0/+1
|
* [mlir][EmitC] Add func, call and return operations and conversions (#79612)Marius Brehler2024-02-0118-31/+815
| | | | | | | | | | | | This adds a `func`, `call` and `return` operation to the EmitC dialect, closely related to the corresponding operations of the Func dialect. In contrast to the operations of the Func dialect, the EmitC operations do not support multiple results. The `emitc.func` op features a `specifiers` argument that for example allows, with corresponding support in the emitter, to emit `inline static` functions. Furthermore, this adds patterns and a pass to convert the Func dialect to EmitC. A `func.func` op that is `private` is converted to `emitc.func` with a `"static"` specifier.
* [Clang][test] Limit library search when linking shared lib (#80253)Wei Wang2024-02-011-1/+1
| | | | Don't search for unnecessary libs when linking the shared lib. This allows the test to run in chroot environment.
* [flang][NFC] Cache derived type translation in lowering (#80179)jeanPerier2024-02-013-18/+25
| | | | | | | | | | | | | | | | | Derived type translation is proving expensive in modern fortran apps with many big derived types with dozens of components and parents. Extending the cache that prevent recursion is proving to have little cost on apps with small derived types and significant gain (can divide compile time by 2) on modern fortran apps. It is legal since the cache lifetime is less than the MLIRContext lifetime that owns the cached mlir::Type. Doing so also exposed that the current caching was incorrect, the type symbol is the same for kind parametrized derived types regardless of the kind parameters. Instances with different kinds should lower to different MLIR types. See added test. Using the type scopes fixes the problem.
* [mlir][Transforms] `GreedyPatternRewriteDriver`: Hash ops separately (#78312)Matthias Springer2024-02-013-21/+29
| | | | | | | | | | | | | | The greedy pattern rewrite driver has multiple "expensive checks" to detect invalid rewrite pattern API usage. As part of these checks, it computes fingerprints for every op that is in scope, and compares the fingerprints before and after an attempted pattern application. Until now, each computed fingerprint took into account all nested operations. That is quite expensive because it walks the entire IR subtree. It is also redundant in the expensive checks because we already compute a fingerprint for every op. This commit significantly improves the running time of the "expensive checks" in the greedy pattern rewrite driver.
* Skip two WatchpointAlgorithm tests for 32-bit lldb'sJason Molenda2024-01-311-0/+6
| | | | | | | | | | | | | | After iterating with the arm-ubuntu CI bot, I found the crash (a std::bad_alloc exception being thrown) was caused by these two entries when built on a 32-bit machine. I probably have an assumption about size_t being 64-bits in WatchpointAlgorithms and we have a problem when it's actually 32-bits and we're dealing with a real 64-bit address. All of the cases where the address can be represented in the low 32-bits of the addr_t work correctly, so for now I'm skipping these two unit tests when building lldb on a 32-bit host until I can review that method and possibly switch to explicit uin64_t's. .
* Done iterating with arm-ubuntu bot, I see the problem test.Jason Molenda2024-01-311-2/+0
| | | | | Go back to the original form of this file before I add temp workaround.
* [clang] Use StringRef::starts_with (NFC)Kazu Hirata2024-01-315-9/+6
|
* [llvm] Use StringRef::starts_with (NFC)Kazu Hirata2024-01-317-7/+7
|
* [IR] Use range-based for loops (NFC)Kazu Hirata2024-01-316-16/+14
|
* [GlobalISel][TableGen] Support Intrinsics in MIR Patterns (#79278)Pierre van Houtryve2024-02-0114-22/+313
|
* [mlir] Use `create` instead of `createOrFold` for ConstantOp as folding has ↵Hugo Trachino2024-01-315-13/+13
| | | | | | no effect (NFC) (#80129) This aims to clean-up confusing uses of builder.createOrFold<ConstantOp> since folding of constants fails.
* Trying to refine which test is crashing on arm-ubuntu.Jason Molenda2024-01-311-11/+2
|
* [clang][Interp] Support GenericSelectionExprsTimm Bäder2024-02-014-0/+9
| | | | Just delegate to the resulting expression.
* [clang][Interp][NFC] Remove unused RecordScopeTimm Bäder2024-02-011-2/+0
|
* [clang][Interp] Protect Inc/Dec ops against dummy pointersTimm Bäder2024-02-013-0/+17
| | | | We create them more often in C, so it's more likely to happen there.
* Uncomment the 2GB max tests and see if that works on arm-ubuntuJason Molenda2024-01-311-2/+0
|
* [X86][CodeGen] Set mayLoad = 1 for LZCNT/POPCNT/TZCNTrm_(EVEX|NF)Shengchen Kan2024-02-011-5/+6
| | | | | | | | | Promoted and NF LZCNT/POPCNT/TZCNT were supported in #79954. B/c null_frag is used in the patterns for these variants, tablgen can not infer mayLoad = 1 for them. This can be tested by MCA tests, which will be added after -mcpu=<cpu_with_apx> is supported.
* [clang][Interp] Handle imaginary literals (#79130)Timm Baeder2024-02-014-3/+40
| | | | Initialize the first element to 0 and the second element to the value of the subexpression.
* [Github] Build stage2-clang-bolt target for CI containerAiden Grossman2024-01-311-1/+1
| | | | | | | | Only the stage2-distribution target is built by default for the stage2 distribution installation target. This means that we don't get a BOLT optimized binary. This patch explicitly builds the stage2-clang-bolt target before the distribution installation target so that the clang binary is optimized before it gets installed.
* [clang][Interp] complex binary operators aren't always initializingTimm Bäder2024-02-012-1/+18
| | | | The added test case would trigger the removed assertion.
* Skip 2 of the three test sets to narrow down the arm-ubuntuJason Molenda2024-01-311-0/+5
| | | | | CI bot crash when running this unittest. The printfs aren't printing into the CI log output.
* [clang-format] Allow decltype in requires clause (#78847)Emilia Kond2024-02-012-5/+23
| | | | | | | | | | | | | If clang-format is not sure whether a `requires` keyword starts a requires clause or a requires expression, it looks ahead to see if any token disqualifies it from being a requires clause. Among these tokens was `decltype`, since it fell through the switch. This patch allows decltype to exist in a require clause. I'm not 100% sure this change won't have repercussions, but that just means we need more test coverage! Fixes https://github.com/llvm/llvm-project/issues/78645
* [clang-tidy] Add AllowStringArrays option to modernize-avoid-c-arrays (#71701)Piotr Zegar2024-02-016-6/+61
| | | | | | | Add AllowStringArrays option, enabling the exclusion of array types with deduced sizes constructed from string literals. This includes only var declarations of array of characters constructed directly from c-strings. Closes #59475
* [C++20] [Modules] Introduce -fskip-odr-check-in-gmf (#79959)Chuanqi Xu2024-02-0115-30/+167
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Close https://github.com/llvm/llvm-project/issues/79240 Cite the comment from @mizvekov in //github.com/llvm/llvm-project/issues/79240: > There are two kinds of bugs / issues relevant here: > > Clang bugs that this change hides > Here we can add a Frontend flag that disables the GMF ODR check, just > so > we can keep tracking, testing and fixing these issues. > The Driver would just always pass that flag. > We could add that flag in this current issue. > Bugs in user code: > I don't think it's worth adding a corresponding Driver flag for > controlling the above Frontend flag, since we intend it's behavior to > become default as we fix the problems, and users interested in testing > the more strict behavior can just use the Frontend flag directly. This patch follows the suggestion: - Introduce the CC1 flag `-fskip-odr-check-in-gmf` which is by default off, so that the every existing test will still be tested with checking ODR violations. - Passing `-fskip-odr-check-in-gmf` in the driver to keep the behavior we intended. - Edit the document to tell the users who are still interested in more strict checks can use `-Xclang -fno-skip-odr-check-in-gmf` to get the existing behavior.
* [X86][NFC] Simplify the code for memory foldShengchen Kan2024-02-012-25/+11
|
* Add debug prints to diagnose a crash on arm-ubuntu botJason Molenda2024-01-311-0/+9
|
* [llvm-gsymutil] Print one-time DWO file missing warning under --quiet flag ↵Wanyi2024-02-012-7/+582
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#79882) FileCheck test added ``` ./bin/llvm-lit -sv llvm/test/tools/llvm-gsymutil/X86/elf-dwo.yaml ``` Manual test steps: - Create binary with split-dwarf: ``` clang++ -g -gdwarf-4 -gsplit-dwarf main.cpp -o main_split ``` - Remove or remane the dwo file to a different name so llvm-gsymutil can't find it ``` mv main_split-main.dwo main_split-main__.dwo ``` - Now run llvm-gsymutil conversion, it should print out warning with and without the `--quiet` flag ``` $ ./bin/llvm-gsymutil --convert=./main_split Input file: ./main_split Output file (x86_64): ./main_split.gsym warning: Unable to retrieve DWO .debug_info section for main_split-main.dwo Loaded 0 functions from DWARF. Loaded 12 functions from symbol table. Pruned 0 functions, ended with 12 total ``` ``` $ ./bin/llvm-gsymutil --convert=./main_split --quiet Input file: ./main_split Output file (x86_64): ./main_split.gsym warning: Unable to retrieve DWO .debug_info section for some object files. (Remove the --quiet flag for full output) Pruned 0 functions, ended with 12 total ```
* [SelectOpt] Print instruction instead of pointerwangpc2024-02-011-1/+1
| | | | Pull Request: https://github.com/llvm/llvm-project/pull/80125
* [gn build] Port 147d7a64f849LLVM GN Syncbot2024-02-011-0/+1
|
* [lldb] Add support for large watchpoints in lldb (#79962)Jason Molenda2024-01-3118-40/+695
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is the next piece of work in my Large Watchpoint proposal, https://discourse.llvm.org/t/rfc-large-watchpoint-support-in-lldb/72116 This patch breaks a user's watchpoint into one or more WatchpointResources which reflect what the hardware registers can cover. This means we can watch objects larger than 8 bytes, and we can watched unaligned address ranges. On a typical 64-bit target with 4 watchpoint registers you can watch 32 bytes of memory if the start address is doubleword aligned. Additionally, if the remote stub implements AArch64 MASK style watchpoints (e.g. debugserver on Darwin), we can watch any power-of-2 size region of memory up to 2GB, aligned to that same size. I updated the Watchpoint constructor and CommandObjectWatchpoint to create a CompilerType of Array<UInt8> when the size of the watched region is greater than pointer-size and we don't have a variable type to use. For pointer-size and smaller, we can display the watched granule as an integer value; for larger-than-pointer-size we will display as an array of bytes. I have `watchpoint list` now print the WatchpointResources used to implement the watchpoint. I added a WatchpointAlgorithm class which has a top-level static method that takes an enum flag mask WatchpointHardwareFeature and a user address and size, and returns a vector of WatchpointResources covering the request. It does not take into account the number of watchpoint registers the target has, or the number still available for use. Right now there is only one algorithm, which monitors power-of-2 regions of memory. For up to pointer-size, this is what Intel hardware supports. AArch64 Byte Address Select watchpoints can watch any number of contiguous bytes in a pointer-size memory granule, that is not currently supported so if you ask to watch bytes 3-5, the algorithm will watch the entire doubleword (8 bytes). The newly default "modify" style means we will silently ignore modifications to bytes outside the watched range. I've temporarily skipped TestLargeWatchpoint.py for all targets. It was only run on Darwin when using the in-tree debugserver, which was a proxy for "debugserver supports MASK watchpoints". I'll be adding the aforementioned feature flag from the stub and enabling full mask watchpoints when a debugserver with that feature is enabled, and re-enable this test. I added a new TestUnalignedLargeWatchpoint.py which only has one test but it's a great one, watching a 22-byte range that is unaligned and requires four 8-byte watchpoints to cover. I also added a unit test, WatchpointAlgorithmsTests, which has a number of simple tests against WatchpointAlgorithms::PowerOf2Watchpoints. I think there's interesting possible different approaches to how we cover these; I note in the unit test that a user requesting a watch on address 0x12e0 of 120 bytes will be covered by two watchpoints today, a 128-bytes at 0x1280 and at 0x1300. But it could be done with a 16-byte watchpoint at 0x12e0 and a 128-byte at 0x1300, which would have fewer false positives/private stops. As we try refining this one, it's helpful to have a collection of tests to make sure things don't regress. I tested this on arm64 macOS, (genuine) x86_64 macOS, and AArch64 Ubuntu. I have not modifed the Windows process plugins yet, I might try that as a standalone patch, I'd be making the change blind, but the necessary changes (see ProcessGDBRemote::EnableWatchpoint) are pretty small so it might be obvious enough that I can change it and see what the Windows CI thinks. There isn't yet a packet (or a qSupported feature query) for the gdb remote serial protocol stub to communicate its watchpoint capabilities to lldb. I'll be doing that in a patch right after this is landed, having debugserver advertise its capability of AArch64 MASK watchpoints, and have ProcessGDBRemote add eWatchpointHardwareArmMASK to WatchpointAlgorithms so we can watch larger than 32-byte requests on Darwin. I haven't yet tackled WatchpointResource *sharing* by multiple Watchpoints. This is all part of the goal, especially when we may be watching a larger memory range than the user requested, if they then add another watchpoint next to their first request, it may be covered by the same WatchpointResource (hardware watchpoint register). Also one "read" watchpoint and one "write" watchpoint on the same memory granule need to be handled, making the WatchpointResource cover all requests. As WatchpointResources aren't shared among multiple Watchpoints yet, there's no handling of running the conditions/commands/etc on multiple Watchpoints when their shared WatchpointResource is hit. The goal beyond "large watchpoint" is to unify (much more) the Watchpoint and Breakpoint behavior and commands. I have a feeling I may be slowly chipping away at this for a while. Re-landing this patch after fixing two undefined behaviors in WatchpointAlgorithms found by UBSan and by failures on different CI bots. rdar://108234227
* [clang][dataflow] Display line numbers in the HTML logger timeline. (#80130)martinboehme2024-02-012-0/+11
| | | | | | | | | | | | This makes it easier to count how many iterations an analysis takes to complete. It also makes it easier to compare how a change to the analysis code affects the timeline. Here's a sample screenshot: ![image](https://github.com/llvm/llvm-project/assets/29098113/b3f44b4d-7037-4f28-9532-5418663250e1)
* [clang][dataflow][NFC] Rename a confusingly named variable. (#80182)martinboehme2024-02-011-9/+9
|
* [clang-format] Simplify the AfterPlacementOperator option (#79796)Owen Pan2024-01-316-130/+38
| | | | | | Change AfterPlacementOperator to a boolean and deprecate SBPO_Never, which meant never inserting a space except when after new/delete. Fixes #78892.
* [clang][NFC] Move isSimpleTypeSpecifier() from Sema to Token (#80101)Owen Pan2024-01-316-49/+50
| | | So that it can be used by clang-format.
* [llvm-objcopy][test] Use llvm-readelf instead for clearer visualization(NFC) ↵Yi Kong2024-02-011-50/+10
| | | | (#79874)
* [mlir][arith] Improve `truncf` folding (#80206)Jakub Kuderski2024-01-312-16/+25
| | | | | * Use APFloat conversion function instead of going through double to check if fold results in information loss. * Support folding vector constants.
* [mlir][Vector] Add support for sub-byte transpose emulation (#80110)Diego Caballero2024-01-315-2/+80
| | | | | | | This PR adds patterns to convert a sub-byte vector transpose into a sequence of instructions that perform the transpose on i8 vector elements. Whereas this rewrite may not lead to the absolute peak performance, it should ensure correctness when dealing with sub-byte transposes.
* [libc] Fix read under msan (#80203)michaelrj-google2024-01-312-1/+5
| | | | | The read function wasn't properly unpoisoning its result under msan, causing test failures downstream when I tried to roll it out. This patch adds the msan unpoison call that fixes the issue.
* [libc][docs] fix stdbit.h docs (#80070)Nick Desaulniers2024-01-311-7/+8
| | | Fix rst comment, add checks for recently implemented functions+macro.
* [RISCV] Use Zacas for AtomicRMWInst::Nand i32 and XLen. (#80119)Craig Topper2024-01-313-352/+832
| | | | | | | We don't have an AMO instruction for Nand, so with the A extension we use an LR/SC loop. If we have Zacas we can use a CAS loop instead. According to the Zacas spec, a CAS loop scales to highly parallel systems better than LR/SC.