summaryrefslogtreecommitdiffstats
path: root/lib/Analysis/CloneDetection.cpp
Commit message (Collapse)AuthorAgeFilesLines
* [AST] Add TableGen for StmtDataCollectorsJohannes Altmanninger2017-09-061-1/+1
| | | | | | | | | | | | | | Summary: This adds an option "-gen-clang-data-collectors" to the Clang TableGen that is used to generate StmtDataCollectors.inc. Reviewers: arphaman, teemperor! Subscribers: mgorny, cfe-commits Differential Revision: https://reviews.llvm.org/D37383 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@312634 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] MinComplexityConstraint now early exits and only does one macro ↵Raphael Isemann2017-09-031-8/+10
| | | | | | | | | | | | | | | | | | | | | stack lookup Summary: This patch contains performance improvements for the `MinComplexityConstraint`. It reduces the constraint time when running on the SQLite codebase by around 43% (from 0.085s down to 0.049s). The patch is essentially doing two things: * It introduces a possibility for the complexity value to early exit when reaching the limit we were checking for. This means that once we noticed that the current clone is larger than the limit the user has set, we instantly exit and no longer traverse the tree or do further expensive lookups in the macro stack. * It also removes half of the macro stack lookups we do so far. Previously we always checked the start and the end location of a Stmt for macros, which was only a middle way between checking all locations of the Stmt and just checking one location. In practice I rarely found cases where it really matters if we check start/end or just the start of a statement as code with lots of macros that somehow just produce half a statement are very rare. Reviewers: NoQ Subscribers: cfe-commits, xazax.hun, v.g.vassilev Differential Revision: https://reviews.llvm.org/D34361 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@312440 91177308-0d34-0410-b5e6-96231b3b80d8
* std::function -> llvm::function_ref. NFC.Benjamin Kramer2017-09-011-1/+2
| | | | git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@312336 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Performance optimizations for the CloneCheckerRaphael Isemann2017-08-311-6/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch aims at optimizing the CloneChecker for larger programs. Before this patch we took around 102 seconds to analyze sqlite3 with a complexity value of 50. After this patch we now take 2.1 seconds to analyze sqlite3. The biggest performance optimization is that we now put the constraint for group size before the constraint for the complexity. The group size constraint is much faster in comparison to the complexity constraint as it only does a simple integer comparison. The complexity constraint on the other hand actually traverses each Stmt and even checks the macro stack, so it is obviously not able to handle larger amounts of incoming clones. The new order filters out all the single-clone groups that the type II constraint generates in a faster way before passing the fewer remaining clones to the complexity constraint. This reduced runtime by around 95%. The other change is that we also delay the verification part of the type II clones back in the chain of constraints. This required to split up the constraint into two parts - a verification and a hash constraint (which is also making it more similar to the original design of the clone detection algorithm). The reasoning for this is the same as before: The verification constraint has to traverse many statements and shouldn't be at the start of the constraint chain. However, as the type II hashing has to be the first step in our algorithm, we have no other choice but split this constrain into two different ones. Now our group size and complexity constrains filter out a chunk of the clones before they reach the slow verification step, which reduces the runtime by around 8%. I also kept the full type II constraint around - that now just calls it's two sub-constraints - in case someone doesn't care about the performance benefits of doing this. Reviewers: NoQ Reviewed By: NoQ Subscribers: klimek, v.g.vassilev, xazax.hun, cfe-commits Differential Revision: https://reviews.llvm.org/D34182 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@312222 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Make StmtDataCollector customizableJohannes Altmanninger2017-08-231-44/+68
| | | | | | | | | | | | | | | | | | | | | | | | Summary: This moves the data collection macro calls for Stmt nodes to lib/AST/StmtDataCollectors.inc Users can subclass ConstStmtVisitor and include StmtDataCollectors.inc to define visitor methods for each Stmt subclass. This makes it also possible to customize the visit methods as exemplified in lib/Analysis/CloneDetection.cpp. Move helper methods for data collection to a new module, AST/DataCollection. Add data collection for DeclRefExpr, MemberExpr and some literals. Reviewers: arphaman, teemperor! Subscribers: mgorny, xazax.hun, cfe-commits Differential Revision: https://reviews.llvm.org/D36664 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@311569 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Faster hashing of subsequences in CompoundStmts.Raphael Isemann2017-07-091-9/+20
| | | | | | | | | | | | | | Summary: This patches improves the hashing subsequences in CompoundStmts by incrementally hashing all subsequences with the same starting position. This results in a reduction of the time for this constraint while running over SQLite from 1.10 seconds to 0.55 seconds (-50%). Reviewers: NoQ Reviewed By: NoQ Subscribers: cfe-commits, xazax.hun, v.g.vassilev Differential Revision: https://reviews.llvm.org/D34364 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@307509 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Make StmtDataCollector part of the CloneDetection APIRaphael Isemann2017-07-091-185/+3
| | | | | | | | | | | | | | Summary: We probably want to use this useful templates in other pieces of code (e.g. the one from D34329), so we should make this public. Reviewers: NoQ Reviewed By: NoQ Subscribers: cfe-commits, xazax.hun, v.g.vassilev, johannes Differential Revision: https://reviews.llvm.org/D34880 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@307501 91177308-0d34-0410-b5e6-96231b3b80d8
* Changed wording in commentRaphael Isemann2017-06-211-2/+1
| | | | git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@305878 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Teach CloneDetection about Qt Meta-Object Compiler to filter auto ↵Leslie Zhai2017-06-201-1/+1
| | | | | | | | | | | | | generated files Reviewers: v.g.vassilev, teemperor Reviewed By: teemperor Differential Revision: https://reviews.llvm.org/D34353 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@305774 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Teach CloneDetection about Qt Meta-Object CompilerLeslie Zhai2017-06-191-1/+18
| | | | | | | | | | | Reviewers: v.g.vassilev, zaks.anna, NoQ, teemperor Reviewed By: v.g.vassilev, zaks.anna, NoQ, teemperor Differential Revision: https://reviews.llvm.org/D31320 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@305659 91177308-0d34-0410-b5e6-96231b3b80d8
* Fix unused lambda capture. Follow up to r299653.Ivan Krasin2017-04-061-1/+1
| | | | git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@299671 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Reland r299544 "Add a modular constraint system to the CloneDetector"Artem Dergachev2017-04-061-519/+368
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Hopefully fix crashes by unshadowing the variable. Original commit message: A big part of the clone detection code is functionality for filtering clones and clone groups based on different criteria. So far this filtering process was hardcoded into the CloneDetector class, which made it hard to understand and, ultimately, to extend. This patch splits the CloneDetector's logic into a sequence of reusable constraints that are used for filtering clone groups. These constraints can be turned on and off and reodreder at will, and new constraints are easy to implement if necessary. Unit tests are added for the new constraint interface. This is a refactoring patch - no functional change intended. Patch by Raphael Isemann! Differential Revision: https://reviews.llvm.org/D23418 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@299653 91177308-0d34-0410-b5e6-96231b3b80d8
* Revert "[analyzer] Add a modular constraint system to the CloneDetector"Artem Dergachev2017-04-051-368/+519
| | | | | | | | | This reverts commit r299544. Crashes on tests on some buildbots. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@299550 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Add a modular constraint system to the CloneDetectorArtem Dergachev2017-04-051-519/+368
| | | | | | | | | | | | | | | | | | | | | | | A big part of the clone detection code is functionality for filtering clones and clone groups based on different criteria. So far this filtering process was hardcoded into the CloneDetector class, which made it hard to understand and, ultimately, to extend. This patch splits the CloneDetector's logic into a sequence of reusable constraints that are used for filtering clone groups. These constraints can be turned on and off and reodreder at will, and new constraints are easy to implement if necessary. Unit tests are added for the new constraint interface. This is a refactoring patch - no functional change intended. Patch by Raphael Isemann! Differential Revision: https://reviews.llvm.org/D23418 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@299544 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Re-apply r283094 "Improve CloneChecker diagnostics"Artem Dergachev2016-10-081-11/+15
| | | | | | | The parent commit (r283092) was reverted before and now finally landed. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@283661 91177308-0d34-0410-b5e6-96231b3b80d8
* Revert r283106, "Wdocumentation fix"NAKAMURA Takumi2016-10-041-1/+1
| | | | | | It should depend on r283094 and r283182. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@283195 91177308-0d34-0410-b5e6-96231b3b80d8
* Revert "[analyzer] Improve CloneChecker diagnostics" as its depends on ↵Vitaly Buka2016-10-041-14/+10
| | | | | | | | reverted r283092 This reverts commit r283094. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@283182 91177308-0d34-0410-b5e6-96231b3b80d8
* Wdocumentation fixSimon Pilgrim2016-10-031-1/+1
| | | | git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@283106 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Improve CloneChecker diagnosticsArtem Dergachev2016-10-031-10/+14
| | | | | | | | | | | | | | | | | | | Highlight code clones referenced by the warning message with the help of the extra notes feature recently introduced in r283092. Change warning text to more clang-ish. Remove suggestions from the copy-paste error checker diagnostics, because currently our suggestions are strictly 50% wrong (we do not know which of the two code clones contains the error), and for that reason we should not sound as if we're actually suggesting this. Hopefully a better solution would bring them back. Make sure the suspicious clone pair structure always mentions the correct variable for the second clone. Differential Revision: https://reviews.llvm.org/D24916 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@283094 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Fix CloneDetector crash on calling methods of class templates.Artem Dergachev2016-08-231-4/+3
| | | | | | | | | | | | | | | If a call expression represents a method call of a class template, and the method itself isn't templated, then the method may be considered to be a template instantiation without template specialization arguments. No longer crash when we could not find template specialization arguments. Patch by Raphael Isemann! Differential Revision: https://reviews.llvm.org/D23780 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@279529 91177308-0d34-0410-b5e6-96231b3b80d8
* Wdocumentation fixSimon Pilgrim2016-08-201-2/+2
| | | | git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@279382 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Use faster hashing (MD5) in CloneDetector.Artem Dergachev2016-08-201-64/+189
| | | | | | | | | | | | | | | | This replaces the old approach of fingerprinting every AST node into a string, which avoided collisions and was simple to implement, but turned out to be extremely ineffective with respect to both performance and memory. The collisions are now dealt with in a separate pass, which no longer causes performance problems because collisions are rare. Patch by Raphael Isemann! Differential Revision: https://reviews.llvm.org/D22515 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@279378 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Make CloneDetector consider macro expansions.Artem Dergachev2016-08-201-4/+67
| | | | | | | | | | | | | | | | | | So far macro-generated code was treated by the CloneDetector as normal code. This caused that some macros where reported as false-positive clones because large chunks of code coming from otherwise concise macro expansions were treated as copy-pasted code. This patch ensures that macros are treated in the same way as literals/function calls. This prevents macros that expand into multiple statements from being reported as clones. Patch by Raphael Isemann! Differential Revision: https://reviews.llvm.org/D23316 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@279367 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Make CloneDetector consider template arguments.Artem Dergachev2016-08-201-2/+20
| | | | | | | | | | | | For example, code samples `isa<Stmt>(S)' and `isa<Expr>(S)' are no longer considered to be clones. Patch by Raphael Isemann! Differential Revision: https://reviews.llvm.org/D23555 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@279366 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Teach CloneDetector to find clones that look like copy-paste errors.Artem Dergachev2016-08-181-20/+123
| | | | | | | | | | | | | | | | | | | | The original clone checker tries to find copy-pasted code that is exactly identical to the original code, up to minor details. As an example, if the copy-pasted code has all references to variable 'a' replaced with references to variable 'b', it is still considered to be an exact clone. The new check finds copy-pasted code in which exactly one variable seems out of place compared to the original code, which likely indicates a copy-paste error (a variable was forgotten to be renamed in one place). Patch by Raphael Isemann! Differential Revision: https://reviews.llvm.org/D23314 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@279056 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Fix a crash in CloneDetector when calling functions by pointers.Artem Dergachev2016-08-101-2/+5
| | | | | | | | | | | | CallExpr may have a null direct callee when the callee function is not known in compile-time. Do not try to take callee name in this case. Patch by Raphael Isemann! Differential Revision: https://reviews.llvm.org/D23320 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@278238 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Try to fix coverity CID 1360469.Vassil Vassilev2016-08-091-1/+1
| | | | | | | Patch by Raphael Isemann! git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@278110 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Make CloneDetector recognize different variable patterns.Artem Dergachev2016-08-041-3/+143
| | | | | | | | | | | | | | | | | | CloneDetector should be able to detect clones with renamed variables. However, if variables are referenced multiple times around the code sample, the usage patterns need to be recognized. For example, (x < y ? y : x) and (y < x ? y : x) are no longer clones, however (a < b ? b : a) is still a clone of the former. Variable patterns are computed and compared during a separate filtering pass. Patch by Raphael Isemann! Differential Revision: https://reviews.llvm.org/D22982 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@277757 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Respect statement-specific data in CloneDetection.Artem Dergachev2016-08-021-3/+165
| | | | | | | | | | | | | | | | | | So far the CloneDetector only respected the kind of each statement when searching for clones. This patch refines the way the CloneDetector collects data from each statement by providing methods for each statement kind, that will read the kind-specific attributes. For example, statements 'a < b' and 'a > b' are no longer considered to be clones, because they are different in operation code, which is an attribute specific to the BinaryOperator statement kind. Patch by Raphael Isemann! Differential Revision: https://reviews.llvm.org/D22514 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@277449 91177308-0d34-0410-b5e6-96231b3b80d8
* [analyzer] Add basic capabilities to detect source code clones.Artem Dergachev2016-07-261-0/+277
This patch adds the CloneDetector class which allows searching source code for clones. For every statement or group of statements within a compound statement, CloneDetector computes a hash value, and finds clones by detecting identical hash values. This initial patch only provides a simple hashing mechanism that hashes the kind of each sub-statement. This patch also adds CloneChecker - a simple static analyzer checker that uses CloneDetector to report copy-pasted code. Patch by Raphael Isemann! Differential Revision: https://reviews.llvm.org/D20795 git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@276782 91177308-0d34-0410-b5e6-96231b3b80d8