1 | // Copyright 2013 The Go Authors. All rights reserved. |
---|---|
2 | // Use of this source code is governed by a BSD-style |
3 | // license that can be found in the LICENSE file. |
4 | |
5 | /* |
6 | Package pointer implements Andersen's analysis, an inclusion-based |
7 | pointer analysis algorithm first described in (Andersen, 1994). |
8 | |
9 | A pointer analysis relates every pointer expression in a whole program |
10 | to the set of memory locations to which it might point. This |
11 | information can be used to construct a call graph of the program that |
12 | precisely represents the destinations of dynamic function and method |
13 | calls. It can also be used to determine, for example, which pairs of |
14 | channel operations operate on the same channel. |
15 | |
16 | The package allows the client to request a set of expressions of |
17 | interest for which the points-to information will be returned once the |
18 | analysis is complete. In addition, the client may request that a |
19 | callgraph is constructed. The example program in example_test.go |
20 | demonstrates both of these features. Clients should not request more |
21 | information than they need since it may increase the cost of the |
22 | analysis significantly. |
23 | |
24 | # CLASSIFICATION |
25 | |
26 | Our algorithm is INCLUSION-BASED: the points-to sets for x and y will |
27 | be related by pts(y) ⊇ pts(x) if the program contains the statement |
28 | y = x. |
29 | |
30 | It is FLOW-INSENSITIVE: it ignores all control flow constructs and the |
31 | order of statements in a program. It is therefore a "MAY ALIAS" |
32 | analysis: its facts are of the form "P may/may not point to L", |
33 | not "P must point to L". |
34 | |
35 | It is FIELD-SENSITIVE: it builds separate points-to sets for distinct |
36 | fields, such as x and y in struct { x, y *int }. |
37 | |
38 | It is mostly CONTEXT-INSENSITIVE: most functions are analyzed once, |
39 | so values can flow in at one call to the function and return out at |
40 | another. Only some smaller functions are analyzed with consideration |
41 | of their calling context. |
42 | |
43 | It has a CONTEXT-SENSITIVE HEAP: objects are named by both allocation |
44 | site and context, so the objects returned by two distinct calls to f: |
45 | |
46 | func f() *T { return new(T) } |
47 | |
48 | are distinguished up to the limits of the calling context. |
49 | |
50 | It is a WHOLE PROGRAM analysis: it requires SSA-form IR for the |
51 | complete Go program and summaries for native code. |
52 | |
53 | See the (Hind, PASTE'01) survey paper for an explanation of these terms. |
54 | |
55 | # SOUNDNESS |
56 | |
57 | The analysis is fully sound when invoked on pure Go programs that do not |
58 | use reflection or unsafe.Pointer conversions. In other words, if there |
59 | is any possible execution of the program in which pointer P may point to |
60 | object O, the analysis will report that fact. |
61 | |
62 | # REFLECTION |
63 | |
64 | By default, the "reflect" library is ignored by the analysis, as if all |
65 | its functions were no-ops, but if the client enables the Reflection flag, |
66 | the analysis will make a reasonable attempt to model the effects of |
67 | calls into this library. However, this comes at a significant |
68 | performance cost, and not all features of that library are yet |
69 | implemented. In addition, some simplifying approximations must be made |
70 | to ensure that the analysis terminates; for example, reflection can be |
71 | used to construct an infinite set of types and values of those types, |
72 | but the analysis arbitrarily bounds the depth of such types. |
73 | |
74 | Most but not all reflection operations are supported. |
75 | In particular, addressable reflect.Values are not yet implemented, so |
76 | operations such as (reflect.Value).Set have no analytic effect. |
77 | |
78 | # UNSAFE POINTER CONVERSIONS |
79 | |
80 | The pointer analysis makes no attempt to understand aliasing between the |
81 | operand x and result y of an unsafe.Pointer conversion: |
82 | |
83 | y = (*T)(unsafe.Pointer(x)) |
84 | |
85 | It is as if the conversion allocated an entirely new object: |
86 | |
87 | y = new(T) |
88 | |
89 | # NATIVE CODE |
90 | |
91 | The analysis cannot model the aliasing effects of functions written in |
92 | languages other than Go, such as runtime intrinsics in C or assembly, or |
93 | code accessed via cgo. The result is as if such functions are no-ops. |
94 | However, various important intrinsics are understood by the analysis, |
95 | along with built-ins such as append. |
96 | |
97 | The analysis currently provides no way for users to specify the aliasing |
98 | effects of native code. |
99 | |
100 | ------------------------------------------------------------------------ |
101 | |
102 | # IMPLEMENTATION |
103 | |
104 | The remaining documentation is intended for package maintainers and |
105 | pointer analysis specialists. Maintainers should have a solid |
106 | understanding of the referenced papers (especially those by H&L and PKH) |
107 | before making making significant changes. |
108 | |
109 | The implementation is similar to that described in (Pearce et al, |
110 | PASTE'04). Unlike many algorithms which interleave constraint |
111 | generation and solving, constructing the callgraph as they go, this |
112 | implementation for the most part observes a phase ordering (generation |
113 | before solving), with only simple (copy) constraints being generated |
114 | during solving. (The exception is reflection, which creates various |
115 | constraints during solving as new types flow to reflect.Value |
116 | operations.) This improves the traction of presolver optimisations, |
117 | but imposes certain restrictions, e.g. potential context sensitivity |
118 | is limited since all variants must be created a priori. |
119 | |
120 | # TERMINOLOGY |
121 | |
122 | A type is said to be "pointer-like" if it is a reference to an object. |
123 | Pointer-like types include pointers and also interfaces, maps, channels, |
124 | functions and slices. |
125 | |
126 | We occasionally use C's x->f notation to distinguish the case where x |
127 | is a struct pointer from x.f where is a struct value. |
128 | |
129 | Pointer analysis literature (and our comments) often uses the notation |
130 | dst=*src+offset to mean something different than what it means in Go. |
131 | It means: for each node index p in pts(src), the node index p+offset is |
132 | in pts(dst). Similarly *dst+offset=src is used for store constraints |
133 | and dst=src+offset for offset-address constraints. |
134 | |
135 | # NODES |
136 | |
137 | Nodes are the key datastructure of the analysis, and have a dual role: |
138 | they represent both constraint variables (equivalence classes of |
139 | pointers) and members of points-to sets (things that can be pointed |
140 | at, i.e. "labels"). |
141 | |
142 | Nodes are naturally numbered. The numbering enables compact |
143 | representations of sets of nodes such as bitvectors (or BDDs); and the |
144 | ordering enables a very cheap way to group related nodes together. For |
145 | example, passing n parameters consists of generating n parallel |
146 | constraints from caller+i to callee+i for 0<=i<n. |
147 | |
148 | The zero nodeid means "not a pointer". For simplicity, we generate flow |
149 | constraints even for non-pointer types such as int. The pointer |
150 | equivalence (PE) presolver optimization detects which variables cannot |
151 | point to anything; this includes not only all variables of non-pointer |
152 | types (such as int) but also variables of pointer-like types if they are |
153 | always nil, or are parameters to a function that is never called. |
154 | |
155 | Each node represents a scalar part of a value or object. |
156 | Aggregate types (structs, tuples, arrays) are recursively flattened |
157 | out into a sequential list of scalar component types, and all the |
158 | elements of an array are represented by a single node. (The |
159 | flattening of a basic type is a list containing a single node.) |
160 | |
161 | Nodes are connected into a graph with various kinds of labelled edges: |
162 | simple edges (or copy constraints) represent value flow. Complex |
163 | edges (load, store, etc) trigger the creation of new simple edges |
164 | during the solving phase. |
165 | |
166 | # OBJECTS |
167 | |
168 | Conceptually, an "object" is a contiguous sequence of nodes denoting |
169 | an addressable location: something that a pointer can point to. The |
170 | first node of an object has a non-nil obj field containing information |
171 | about the allocation: its size, context, and ssa.Value. |
172 | |
173 | Objects include: |
174 | - functions and globals; |
175 | - variable allocations in the stack frame or heap; |
176 | - maps, channels and slices created by calls to make(); |
177 | - allocations to construct an interface; |
178 | - allocations caused by conversions, e.g. []byte(str). |
179 | - arrays allocated by calls to append(); |
180 | |
181 | Many objects have no Go types. For example, the func, map and chan type |
182 | kinds in Go are all varieties of pointers, but their respective objects |
183 | are actual functions (executable code), maps (hash tables), and channels |
184 | (synchronized queues). Given the way we model interfaces, they too are |
185 | pointers to "tagged" objects with no Go type. And an *ssa.Global denotes |
186 | the address of a global variable, but the object for a Global is the |
187 | actual data. So, the types of an ssa.Value that creates an object is |
188 | "off by one indirection": a pointer to the object. |
189 | |
190 | The individual nodes of an object are sometimes referred to as "labels". |
191 | |
192 | For uniformity, all objects have a non-zero number of fields, even those |
193 | of the empty type struct{}. (All arrays are treated as if of length 1, |
194 | so there are no empty arrays. The empty tuple is never address-taken, |
195 | so is never an object.) |
196 | |
197 | # TAGGED OBJECTS |
198 | |
199 | An tagged object has the following layout: |
200 | |
201 | T -- obj.flags ⊇ {otTagged} |
202 | v |
203 | ... |
204 | |
205 | The T node's typ field is the dynamic type of the "payload": the value |
206 | v which follows, flattened out. The T node's obj has the otTagged |
207 | flag. |
208 | |
209 | Tagged objects are needed when generalizing across types: interfaces, |
210 | reflect.Values, reflect.Types. Each of these three types is modelled |
211 | as a pointer that exclusively points to tagged objects. |
212 | |
213 | Tagged objects may be indirect (obj.flags ⊇ {otIndirect}) meaning that |
214 | the value v is not of type T but *T; this is used only for |
215 | reflect.Values that represent lvalues. (These are not implemented yet.) |
216 | |
217 | # ANALYSIS ABSTRACTION OF EACH TYPE |
218 | |
219 | Variables of the following "scalar" types may be represented by a |
220 | single node: basic types, pointers, channels, maps, slices, 'func' |
221 | pointers, interfaces. |
222 | |
223 | Pointers: |
224 | |
225 | Nothing to say here, oddly. |
226 | |
227 | Basic types (bool, string, numbers, unsafe.Pointer): |
228 | |
229 | Currently all fields in the flattening of a type, including |
230 | non-pointer basic types such as int, are represented in objects and |
231 | values. Though non-pointer nodes within values are uninteresting, |
232 | non-pointer nodes in objects may be useful (if address-taken) |
233 | because they permit the analysis to deduce, in this example, |
234 | |
235 | var s struct{ ...; x int; ... } |
236 | p := &s.x |
237 | |
238 | that p points to s.x. If we ignored such object fields, we could only |
239 | say that p points somewhere within s. |
240 | |
241 | All other basic types are ignored. Expressions of these types have |
242 | zero nodeid, and fields of these types within aggregate other types |
243 | are omitted. |
244 | |
245 | unsafe.Pointers are not modelled as pointers, so a conversion of an |
246 | unsafe.Pointer to *T is (unsoundly) treated equivalent to new(T). |
247 | |
248 | Channels: |
249 | |
250 | An expression of type 'chan T' is a kind of pointer that points |
251 | exclusively to channel objects, i.e. objects created by MakeChan (or |
252 | reflection). |
253 | |
254 | 'chan T' is treated like *T. |
255 | *ssa.MakeChan is treated as equivalent to new(T). |
256 | *ssa.Send and receive (*ssa.UnOp(ARROW)) and are equivalent to store |
257 | |
258 | and load. |
259 | |
260 | Maps: |
261 | |
262 | An expression of type 'map[K]V' is a kind of pointer that points |
263 | exclusively to map objects, i.e. objects created by MakeMap (or |
264 | reflection). |
265 | |
266 | map K[V] is treated like *M where M = struct{k K; v V}. |
267 | *ssa.MakeMap is equivalent to new(M). |
268 | *ssa.MapUpdate is equivalent to *y=x where *y and x have type M. |
269 | *ssa.Lookup is equivalent to y=x.v where x has type *M. |
270 | |
271 | Slices: |
272 | |
273 | A slice []T, which dynamically resembles a struct{array *T, len, cap int}, |
274 | is treated as if it were just a *T pointer; the len and cap fields are |
275 | ignored. |
276 | |
277 | *ssa.MakeSlice is treated like new([1]T): an allocation of a |
278 | |
279 | singleton array. |
280 | |
281 | *ssa.Index on a slice is equivalent to a load. |
282 | *ssa.IndexAddr on a slice returns the address of the sole element of the |
283 | slice, i.e. the same address. |
284 | *ssa.Slice is treated as a simple copy. |
285 | |
286 | Functions: |
287 | |
288 | An expression of type 'func...' is a kind of pointer that points |
289 | exclusively to function objects. |
290 | |
291 | A function object has the following layout: |
292 | |
293 | identity -- typ:*types.Signature; obj.flags ⊇ {otFunction} |
294 | params_0 -- (the receiver, if a method) |
295 | ... |
296 | params_n-1 |
297 | results_0 |
298 | ... |
299 | results_m-1 |
300 | |
301 | There may be multiple function objects for the same *ssa.Function |
302 | due to context-sensitive treatment of some functions. |
303 | |
304 | The first node is the function's identity node. |
305 | Associated with every callsite is a special "targets" variable, |
306 | whose pts() contains the identity node of each function to which |
307 | the call may dispatch. Identity words are not otherwise used during |
308 | the analysis, but we construct the call graph from the pts() |
309 | solution for such nodes. |
310 | |
311 | The following block of contiguous nodes represents the flattened-out |
312 | types of the parameters ("P-block") and results ("R-block") of the |
313 | function object. |
314 | |
315 | The treatment of free variables of closures (*ssa.FreeVar) is like |
316 | that of global variables; it is not context-sensitive. |
317 | *ssa.MakeClosure instructions create copy edges to Captures. |
318 | |
319 | A Go value of type 'func' (i.e. a pointer to one or more functions) |
320 | is a pointer whose pts() contains function objects. The valueNode() |
321 | for an *ssa.Function returns a singleton for that function. |
322 | |
323 | Interfaces: |
324 | |
325 | An expression of type 'interface{...}' is a kind of pointer that |
326 | points exclusively to tagged objects. All tagged objects pointed to |
327 | by an interface are direct (the otIndirect flag is clear) and |
328 | concrete (the tag type T is not itself an interface type). The |
329 | associated ssa.Value for an interface's tagged objects may be an |
330 | *ssa.MakeInterface instruction, or nil if the tagged object was |
331 | created by an instrinsic (e.g. reflection). |
332 | |
333 | Constructing an interface value causes generation of constraints for |
334 | all of the concrete type's methods; we can't tell a priori which |
335 | ones may be called. |
336 | |
337 | TypeAssert y = x.(T) is implemented by a dynamic constraint |
338 | triggered by each tagged object O added to pts(x): a typeFilter |
339 | constraint if T is an interface type, or an untag constraint if T is |
340 | a concrete type. A typeFilter tests whether O.typ implements T; if |
341 | so, O is added to pts(y). An untagFilter tests whether O.typ is |
342 | assignable to T,and if so, a copy edge O.v -> y is added. |
343 | |
344 | ChangeInterface is a simple copy because the representation of |
345 | tagged objects is independent of the interface type (in contrast |
346 | to the "method tables" approach used by the gc runtime). |
347 | |
348 | y := Invoke x.m(...) is implemented by allocating contiguous P/R |
349 | blocks for the callsite and adding a dynamic rule triggered by each |
350 | tagged object added to pts(x). The rule adds param/results copy |
351 | edges to/from each discovered concrete method. |
352 | |
353 | (Q. Why do we model an interface as a pointer to a pair of type and |
354 | value, rather than as a pair of a pointer to type and a pointer to |
355 | value? |
356 | A. Control-flow joins would merge interfaces ({T1}, {V1}) and ({T2}, |
357 | {V2}) to make ({T1,T2}, {V1,V2}), leading to the infeasible and |
358 | type-unsafe combination (T1,V2). Treating the value and its concrete |
359 | type as inseparable makes the analysis type-safe.) |
360 | |
361 | Type parameters: |
362 | |
363 | Type parameters are not directly supported by the analysis. |
364 | Calls to generic functions will be left as if they had empty bodies. |
365 | Users of the package are expected to use the ssa.InstantiateGenerics |
366 | builder mode when building code that uses or depends on code |
367 | containing generics. |
368 | |
369 | reflect.Value: |
370 | |
371 | A reflect.Value is modelled very similar to an interface{}, i.e. as |
372 | a pointer exclusively to tagged objects, but with two generalizations. |
373 | |
374 | 1. a reflect.Value that represents an lvalue points to an indirect |
375 | (obj.flags ⊇ {otIndirect}) tagged object, which has a similar |
376 | layout to an tagged object except that the value is a pointer to |
377 | the dynamic type. Indirect tagged objects preserve the correct |
378 | aliasing so that mutations made by (reflect.Value).Set can be |
379 | observed. |
380 | |
381 | Indirect objects only arise when an lvalue is derived from an |
382 | rvalue by indirection, e.g. the following code: |
383 | |
384 | type S struct { X T } |
385 | var s S |
386 | var i interface{} = &s // i points to a *S-tagged object (from MakeInterface) |
387 | v1 := reflect.ValueOf(i) // v1 points to same *S-tagged object as i |
388 | v2 := v1.Elem() // v2 points to an indirect S-tagged object, pointing to s |
389 | v3 := v2.FieldByName("X") // v3 points to an indirect int-tagged object, pointing to s.X |
390 | v3.Set(y) // pts(s.X) ⊇ pts(y) |
391 | |
392 | Whether indirect or not, the concrete type of the tagged object |
393 | corresponds to the user-visible dynamic type, and the existence |
394 | of a pointer is an implementation detail. |
395 | |
396 | (NB: indirect tagged objects are not yet implemented) |
397 | |
398 | 2. The dynamic type tag of a tagged object pointed to by a |
399 | reflect.Value may be an interface type; it need not be concrete. |
400 | |
401 | This arises in code such as this: |
402 | |
403 | tEface := reflect.TypeOf(new(interface{}).Elem() // interface{} |
404 | eface := reflect.Zero(tEface) |
405 | |
406 | pts(eface) is a singleton containing an interface{}-tagged |
407 | object. That tagged object's payload is an interface{} value, |
408 | i.e. the pts of the payload contains only concrete-tagged |
409 | objects, although in this example it's the zero interface{} value, |
410 | so its pts is empty. |
411 | |
412 | reflect.Type: |
413 | |
414 | Just as in the real "reflect" library, we represent a reflect.Type |
415 | as an interface whose sole implementation is the concrete type, |
416 | *reflect.rtype. (This choice is forced on us by go/types: clients |
417 | cannot fabricate types with arbitrary method sets.) |
418 | |
419 | rtype instances are canonical: there is at most one per dynamic |
420 | type. (rtypes are in fact large structs but since identity is all |
421 | that matters, we represent them by a single node.) |
422 | |
423 | The payload of each *rtype-tagged object is an *rtype pointer that |
424 | points to exactly one such canonical rtype object. We exploit this |
425 | by setting the node.typ of the payload to the dynamic type, not |
426 | '*rtype'. This saves us an indirection in each resolution rule. As |
427 | an optimisation, *rtype-tagged objects are canonicalized too. |
428 | |
429 | Aggregate types: |
430 | |
431 | Aggregate types are treated as if all directly contained |
432 | aggregates are recursively flattened out. |
433 | |
434 | Structs: |
435 | |
436 | *ssa.Field y = x.f creates a simple edge to y from x's node at f's offset. |
437 | |
438 | *ssa.FieldAddr y = &x->f requires a dynamic closure rule to create |
439 | |
440 | simple edges for each struct discovered in pts(x). |
441 | |
442 | The nodes of a struct consist of a special 'identity' node (whose |
443 | type is that of the struct itself), followed by the nodes for all |
444 | the struct's fields, recursively flattened out. A pointer to the |
445 | struct is a pointer to its identity node. That node allows us to |
446 | distinguish a pointer to a struct from a pointer to its first field. |
447 | |
448 | Field offsets are logical field offsets (plus one for the identity |
449 | node), so the sizes of the fields can be ignored by the analysis. |
450 | |
451 | (The identity node is non-traditional but enables the distinction |
452 | described above, which is valuable for code comprehension tools. |
453 | Typical pointer analyses for C, whose purpose is compiler |
454 | optimization, must soundly model unsafe.Pointer (void*) conversions, |
455 | and this requires fidelity to the actual memory layout using physical |
456 | field offsets.) |
457 | |
458 | *ssa.Field y = x.f creates a simple edge to y from x's node at f's offset. |
459 | |
460 | *ssa.FieldAddr y = &x->f requires a dynamic closure rule to create |
461 | |
462 | simple edges for each struct discovered in pts(x). |
463 | |
464 | Arrays: |
465 | |
466 | We model an array by an identity node (whose type is that of the |
467 | array itself) followed by a node representing all the elements of |
468 | the array; the analysis does not distinguish elements with different |
469 | indices. Effectively, an array is treated like struct{elem T}, a |
470 | load y=x[i] like y=x.elem, and a store x[i]=y like x.elem=y; the |
471 | index i is ignored. |
472 | |
473 | A pointer to an array is pointer to its identity node. (A slice is |
474 | also a pointer to an array's identity node.) The identity node |
475 | allows us to distinguish a pointer to an array from a pointer to one |
476 | of its elements, but it is rather costly because it introduces more |
477 | offset constraints into the system. Furthermore, sound treatment of |
478 | unsafe.Pointer would require us to dispense with this node. |
479 | |
480 | Arrays may be allocated by Alloc, by make([]T), by calls to append, |
481 | and via reflection. |
482 | |
483 | Tuples (T, ...): |
484 | |
485 | Tuples are treated like structs with naturally numbered fields. |
486 | *ssa.Extract is analogous to *ssa.Field. |
487 | |
488 | However, tuples have no identity field since by construction, they |
489 | cannot be address-taken. |
490 | |
491 | # FUNCTION CALLS |
492 | |
493 | There are three kinds of function call: |
494 | 1. static "call"-mode calls of functions. |
495 | 2. dynamic "call"-mode calls of functions. |
496 | 3. dynamic "invoke"-mode calls of interface methods. |
497 | |
498 | Cases 1 and 2 apply equally to methods and standalone functions. |
499 | |
500 | Static calls: |
501 | |
502 | A static call consists three steps: |
503 | - finding the function object of the callee; |
504 | - creating copy edges from the actual parameter value nodes to the |
505 | P-block in the function object (this includes the receiver if |
506 | the callee is a method); |
507 | - creating copy edges from the R-block in the function object to |
508 | the value nodes for the result of the call. |
509 | |
510 | A static function call is little more than two struct value copies |
511 | between the P/R blocks of caller and callee: |
512 | |
513 | callee.P = caller.P |
514 | caller.R = callee.R |
515 | |
516 | Context sensitivity: Static calls (alone) may be treated context sensitively, |
517 | i.e. each callsite may cause a distinct re-analysis of the |
518 | callee, improving precision. Our current context-sensitivity |
519 | policy treats all intrinsics and getter/setter methods in this |
520 | manner since such functions are small and seem like an obvious |
521 | source of spurious confluences, though this has not yet been |
522 | evaluated. |
523 | |
524 | Dynamic function calls: |
525 | |
526 | Dynamic calls work in a similar manner except that the creation of |
527 | copy edges occurs dynamically, in a similar fashion to a pair of |
528 | struct copies in which the callee is indirect: |
529 | |
530 | callee->P = caller.P |
531 | caller.R = callee->R |
532 | |
533 | (Recall that the function object's P- and R-blocks are contiguous.) |
534 | |
535 | Interface method invocation: |
536 | |
537 | For invoke-mode calls, we create a params/results block for the |
538 | callsite and attach a dynamic closure rule to the interface. For |
539 | each new tagged object that flows to the interface, we look up |
540 | the concrete method, find its function object, and connect its P/R |
541 | blocks to the callsite's P/R blocks, adding copy edges to the graph |
542 | during solving. |
543 | |
544 | Recording call targets: |
545 | |
546 | The analysis notifies its clients of each callsite it encounters, |
547 | passing a CallSite interface. Among other things, the CallSite |
548 | contains a synthetic constraint variable ("targets") whose |
549 | points-to solution includes the set of all function objects to |
550 | which the call may dispatch. |
551 | |
552 | It is via this mechanism that the callgraph is made available. |
553 | Clients may also elect to be notified of callgraph edges directly; |
554 | internally this just iterates all "targets" variables' pts(·)s. |
555 | |
556 | # PRESOLVER |
557 | |
558 | We implement Hash-Value Numbering (HVN), a pre-solver constraint |
559 | optimization described in Hardekopf & Lin, SAS'07. This is documented |
560 | in more detail in hvn.go. We intend to add its cousins HR and HU in |
561 | future. |
562 | |
563 | # SOLVER |
564 | |
565 | The solver is currently a naive Andersen-style implementation; it does |
566 | not perform online cycle detection, though we plan to add solver |
567 | optimisations such as Hybrid- and Lazy- Cycle Detection from (Hardekopf |
568 | & Lin, PLDI'07). |
569 | |
570 | It uses difference propagation (Pearce et al, SQC'04) to avoid |
571 | redundant re-triggering of closure rules for values already seen. |
572 | |
573 | Points-to sets are represented using sparse bit vectors (similar to |
574 | those used in LLVM and gcc), which are more space- and time-efficient |
575 | than sets based on Go's built-in map type or dense bit vectors. |
576 | |
577 | Nodes are permuted prior to solving so that object nodes (which may |
578 | appear in points-to sets) are lower numbered than non-object (var) |
579 | nodes. This improves the density of the set over which the PTSs |
580 | range, and thus the efficiency of the representation. |
581 | |
582 | Partly thanks to avoiding map iteration, the execution of the solver is |
583 | 100% deterministic, a great help during debugging. |
584 | |
585 | # FURTHER READING |
586 | |
587 | Andersen, L. O. 1994. Program analysis and specialization for the C |
588 | programming language. Ph.D. dissertation. DIKU, University of |
589 | Copenhagen. |
590 | |
591 | David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004. Efficient |
592 | field-sensitive pointer analysis for C. In Proceedings of the 5th ACM |
593 | SIGPLAN-SIGSOFT workshop on Program analysis for software tools and |
594 | engineering (PASTE '04). ACM, New York, NY, USA, 37-42. |
595 | http://doi.acm.org/10.1145/996821.996835 |
596 | |
597 | David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004. Online |
598 | Cycle Detection and Difference Propagation: Applications to Pointer |
599 | Analysis. Software Quality Control 12, 4 (December 2004), 311-337. |
600 | http://dx.doi.org/10.1023/B:SQJO.0000039791.93071.a2 |
601 | |
602 | David Grove and Craig Chambers. 2001. A framework for call graph |
603 | construction algorithms. ACM Trans. Program. Lang. Syst. 23, 6 |
604 | (November 2001), 685-746. |
605 | http://doi.acm.org/10.1145/506315.506316 |
606 | |
607 | Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: fast |
608 | and accurate pointer analysis for millions of lines of code. In |
609 | Proceedings of the 2007 ACM SIGPLAN conference on Programming language |
610 | design and implementation (PLDI '07). ACM, New York, NY, USA, 290-299. |
611 | http://doi.acm.org/10.1145/1250734.1250767 |
612 | |
613 | Ben Hardekopf and Calvin Lin. 2007. Exploiting pointer and location |
614 | equivalence to optimize pointer analysis. In Proceedings of the 14th |
615 | international conference on Static Analysis (SAS'07), Hanne Riis |
616 | Nielson and Gilberto Filé (Eds.). Springer-Verlag, Berlin, Heidelberg, |
617 | 265-280. |
618 | |
619 | Atanas Rountev and Satish Chandra. 2000. Off-line variable substitution |
620 | for scaling points-to analysis. In Proceedings of the ACM SIGPLAN 2000 |
621 | conference on Programming language design and implementation (PLDI '00). |
622 | ACM, New York, NY, USA, 47-56. DOI=10.1145/349299.349310 |
623 | http://doi.acm.org/10.1145/349299.349310 |
624 | */ |
625 | package pointer // import "golang.org/x/tools/go/pointer" |
626 |
Members