diff options
author | Sean Silva <silvas@purdue.edu> | 2012-12-18 02:15:05 +0000 |
---|---|---|
committer | Sean Silva <silvas@purdue.edu> | 2012-12-18 02:15:05 +0000 |
commit | b52cc89e1bd43457f1f370670f85e518ae5329b0 (patch) | |
tree | a3564c2f96752fc1fa9dd6c95324e85775317298 /docs/analyzer | |
parent | 8a0086cfd21e5a89f96715d7a504178c5fc0ca26 (diff) |
docs: Nuke AnalyzerRegions.rst.
As per Ted's advice. It can be brought back from version control if
needed.
This also fixes a Sphinx warning.
git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@170401 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs/analyzer')
-rw-r--r-- | docs/analyzer/AnalyzerRegions.rst | 259 |
1 files changed, 0 insertions, 259 deletions
diff --git a/docs/analyzer/AnalyzerRegions.rst b/docs/analyzer/AnalyzerRegions.rst deleted file mode 100644 index 80b3882bc9..0000000000 --- a/docs/analyzer/AnalyzerRegions.rst +++ /dev/null @@ -1,259 +0,0 @@ -=============================================== -Static Analyzer Design Document: Memory Regions -=============================================== - -Authors: Ted Kremenek, ``kremenek at apple``, -Zhongxing Xu, ``xuzhongzhing at gmail`` - -Introduction -============ - -The path-sensitive analysis engine in libAnalysis employs an extensible -API for abstractly modeling the memory of an analyzed program. This API -employs the concept of "memory regions" to abstractly model chunks of -program memory such as program variables and dynamically allocated -memory such as those returned from 'malloc' and 'alloca'. Regions are -hierarchical, with subregions modeling subtyping relationships, field -and array offsets into larger chunks of memory, and so on. - -The region API consists of two components: - -- A taxonomy and representation of regions themselves within the - analyzer engine. The primary definitions and interfaces are described - in ``MemRegion.h``. At the root of the region hierarchy is the class - ``MemRegion`` with specific subclasses refining the region concept - for variables, heap allocated memory, and so forth. -- The modeling of binding of values to regions. For example, modeling - the value stored to a local variable ``x`` consists of recording the - binding between the region for ``x`` (which represents the raw memory - associated with ``x``) and the value stored to ``x``. This binding - relationship is captured with the notion of "symbolic stores." - -Symbolic stores, which can be thought of as representing the relation -``regions -> values``, are implemented by subclasses of the -``StoreManager`` class (``Store.h``). A particular StoreManager -implementation has complete flexibility concerning the following: - -- *How* to model the binding between regions and values -- *What* bindings are recorded - -Together, both points allow different StoreManagers to tradeoff between -different levels of analysis precision and scalability concerning the -reasoning of program memory. Meanwhile, the core path-sensitive engine -makes no assumptions about either points, and queries a StoreManager -about the bindings to a memory region through a generic interface that -all StoreManagers share. If a particular StoreManager cannot reason -about the potential bindings of a given memory region (e.g., -'``BasicStoreManager``' does not reason about fields of structures) then -the StoreManager can simply return 'unknown' (represented by -'``UnknownVal``') for a particular region-binding. This separation of -concerns not only isolates the core analysis engine from the details of -reasoning about program memory but also facilities the option of a -client of the path-sensitive engine to easily swap in different -StoreManager implementations that internally reason about program memory -in very different ways. - -The rest of this document is divided into two parts. We first discuss -region taxonomy and the semantics of regions. We then discuss the -StoreManager interface, and details of how the currently available -StoreManager classes implement region bindings. - -Memory Regions and Region Taxonomy -================================== - -Pointers --------- - -Before talking about the memory regions, we would talk about the -pointers since memory regions are essentially used to represent pointer -values. - -The pointer is a type of values. Pointer values have two semantic -aspects. One is its physical value, which is an address or location. The -other is the type of the memory object residing in the address. - -Memory regions are designed to abstract these two properties of the -pointer. The physical value of a pointer is represented by MemRegion -pointers. The rvalue type of the region corresponds to the type of the -pointee object. - -One complication is that we could have different view regions on the -same memory chunk. They represent the same memory location, but have -different abstract location, i.e., MemRegion pointers. Thus we need to -canonicalize the abstract locations to get a unique abstract location -for one physical location. - -Furthermore, these different view regions may or may not represent -memory objects of different types. Some different types are semantically -the same, for example, 'struct s' and 'my\_type' are the same type. - -:: - - struct s; - typedef struct s my_type; - -But ``char`` and ``int`` are not the same type in the code below: - -:: - - void *p; - int *q = (int*) p; - char *r = (char*) p; - -Thus we need to canonicalize the MemRegion which is used in binding and -retrieving. - -Regions -------- - -Region is the entity used to model pointer values. A Region has the -following properties: - -- Kind -- ObjectType: the type of the object residing on the region. -- LocationType: the type of the pointer value that the region - corresponds to. Usually this is the pointer to the ObjectType. But - sometimes we want to cache this type explicitly, for example, for a - CodeTextRegion. -- StartLocation -- EndLocation - -Symbolic Regions ----------------- - -A symbolic region is a map of the concept of symbolic values into the -domain of regions. It is the way that we represent symbolic pointers. -Whenever a symbolic pointer value is needed, a symbolic region is -created to represent it. - -A symbolic region has no type. It wraps a SymbolData. But sometimes we -have type information associated with a symbolic region. For this case, -a TypedViewRegion is created to layer the type information on top of the -symbolic region. The reason we do not carry type information with the -symbolic region is that the symbolic regions can have no type. To be -consistent, we don't let them to carry type information. - -Like a symbolic pointer, a symbolic region may be NULL, has unknown -extent, and represents a generic chunk of memory. - -.. note:: - We plan not to use loc::SymbolVal in RegionStore and remove it - gradually. - -Symbolic regions get their rvalue types through the following ways: - -- Through the parameter or global variable that points to it, e.g.: - - :: - - void f(struct s* p) { - ... - } - - The symbolic region pointed to by ``p`` has type ``struct s``. - -- Through explicit or implicit casts, e.g.: - - :: - - void f(void* p) { - struct s* q = (struct s*) p; - ... - } - -We attach the type information to the symbolic region lazily. For the -first case above, we create the ``TypedViewRegion`` only when the -pointer is actually used to access the pointee memory object, that is -when the element or field region is created. For the cast case, the -``TypedViewRegion`` is created when visiting the ``CastExpr``. - -The reason for doing lazy typing is that symbolic regions are sometimes -only used to do location comparison. - -Pointer Casts -------------- - -Pointer casts allow people to impose different 'views' onto a chunk of -memory. - -Usually we have two kinds of casts. One kind of casts cast down with in -the type hierarchy. It imposes more specific views onto more generic -memory regions. The other kind of casts cast up with in the type -hierarchy. It strips away more specific views on top of the more generic -memory regions. - -We simulate the down casts by layering another ``TypedViewRegion`` on -top of the original region. We simulate the up casts by striping away -the top ``TypedViewRegion``. Down casts is usually simple. For up casts, -if the there is no ``TypedViewRegion`` to be stripped, we return the -original region. If the underlying region is of the different type than -the cast-to type, we flag an error state. - -For toll-free bridging casts, we return the original region. - -We can set up a partial order for pointer types, with the most general -type ``void*`` at the top. The partial order forms a tree with ``void*`` -as its root node. - -Every ``MemRegion`` has a root position in the type tree. For example, -the pointee region of ``void *p`` has its root position at the root node -of the tree. ``VarRegion`` of ``int x`` has its root position at the -'int type' node. - -``TypedViewRegion`` is used to move the region down or up in the tree. -Moving down in the tree adds a ``TypedViewRegion``. Moving up in the -tree removes a ``TypedViewRegion``. - -Do we want to allow moving up beyond the root position? This happens -when: - -:: - - int x; void *p = &x; - -The region of ``x`` has its root position at 'int\*' node. the cast to -void\* moves that region up to the 'void\*' node. I propose to not allow -such casts, and assign the region of ``x`` for ``p``. - -Another non-ideal case is that people might cast to a non-generic -pointer from another non-generic pointer instead of first casting it -back to the generic pointer. Direct handling of this case would result -in multiple layers of TypedViewRegions. This enforces an incorrect -semantic view to the region, because we can only have one typed view on -a region at a time. To avoid this inconsistency, before casting the -region, we strip the TypedViewRegion, then do the cast. In summary, we -only allow one layer of TypedViewRegion. - -Region Bindings ---------------- - -The following region kinds are boundable: VarRegion, -CompoundLiteralRegion, StringRegion, ElementRegion, FieldRegion, and -ObjCIvarRegion. - -When binding regions, we perform canonicalization on element regions and -field regions. This is because we can have different views on the same -region, some of which are essentially the same view with different sugar -type names. - -To canonicalize a region, we get the canonical types for all -TypedViewRegions along the way up to the root region, and make new -TypedViewRegions with those canonical types. - -For Objective-C and C++, perhaps another canonicalization rule should be -added: for FieldRegion, the least derived class that has the field is -used as the type of the super region of the FieldRegion. - -All bindings and retrievings are done on the canonicalized regions. - -Canonicalization is transparent outside the region store manager, and -more specifically, unaware outside the Bind() and Retrieve() method. We -don't need to consider region canonicalization when doing pointer cast. - -Constraint Manager ------------------- - -The constraint manager reasons about the abstract location of memory -objects. We can have different views on a region, but none of these -views changes the location of that object. Thus we should get the same -abstract location for those regions. |