Static Analysis for Efficient Hybrid Information-Flow Control

Hybrid information-flow monitors use a combination of static analysis and dynamic mechanisms to provide precise strong information security guarantees. However, unlike purely static mechanisms for information security, hybrid information-flow monitors incur run-time overhead. We show how static analyses can be used to make hybrid information-flow monitors more efficient, in two ways. First, a simple static analysis can determine when it is sound for a monitor to stop tracking the security level of certain variables. This potentially reduces run-time overhead of the monitor, particularly in applications where sensitive (i.e., confidential or untrusted) data is infrequently introduced to the system. Second, we derive sufficient conditions for soundly incorporating a wide range of memory abstractions into information-flow monitors. This allows the selection of a memory abstraction that gives an appropriate tradeoff between efficiency and precision. It also facilitates the development of innovative and sound memory abstractions that use run-time security information maintained by the monitor. We present and prove our results by extending the information-flow monitor of Russo and Sabelfeld (2010). These results bring us closer to efficient, sound, and precise enforcement of information security.


I. INTRODUCTION
Tracking and controlling the flow of information in computer systems can be used to enforce strong, precise, application-specific information security [1].Informationflow control can be achieved through static or dynamic mechanisms.Static mechanisms (exemplified by security type systems, e.g., [2,3,4]) analyze a program before execution to determine whether all possible executions are secure.Dynamic mechanisms (e.g., [5,6,7]) monitor or instrument program execution to determine whether a particular execution is secure.Dynamic mechanisms can thus be more precise (since they accept or reject a single program execution, instead of an entire program), but unlike static mechanisms, they incur runtime overhead.This technical report is an extended version of the paper of the same name appearing in Proceedings of the 24th IEEE Computer Security Foundations Symposium, 2011.New material includes the addition of an Appendix containing key proofs.
Recent work considers hybrid information-flow control [8,9,10,11], which combines both static and dynamic mechanisms to enforce information-flow security guarantees.Static mechanisms can reason more precisely than purely dynamic mechanisms about certain kinds of information flows, a result recently proved by Russo and Sabelfeld [11].Hybrid mechanisms can accept or reject a single program execution, but also reason precisely about information flow.
In this work, we show how static analysis can be used to make hybrid information-flow monitors more efficient in two ways.First, we show that a straight-forward securitytype system can reduce runtime overhead of informationflow monitors by determining when it is sound for a monitor to stop tracking certain variables.Second, we derive sufficient conditions for soundly incorporating a wide range of memory abstractions into information-flow monitors.This allows the selection of a memory abstraction that gives an appropriate tradeoff between efficiency and precision.We present and prove these results by extending the fail-stop information-flow monitor of Russo and Sabelfeld [11].Selective tracking.Consider the pseudo-code in Figure 1, which models the processing of a web application request.Depending on whether the current user is a guest or an authenticated user, data from the request is regarded as either untrusted or trusted.If the data is untrusted, it is appended to an audit log.After some computation, the data may be used in a database update, which is a dangerous operation that should depend only on trusted data.Suppose the program executes with a monitor that tracks the flow of information in the program in order to prevent security violations, such as a database update depending on untrusted data.There are at least three ways in which this information-flow monitor may waste effort at runtime.First, information in variable untrustedLog never affects whether a security violation occurs, even though it stores untrusted data.Thus, a monitor that does not track variable untrustedLog will still correctly prevent security violations, and so effort spent tracking this variable is wasted.Second, variable d contains untrusted information only on some executions.Depending on the usage of the web application, it may contain trusted data on the majority of executions.In such cases there is no need for the monitor to track variable d.Third, even on executions in which d contains untrusted information, if control reaches line 11 or line 13, then the contents of d can never affect whether a security violation occurs, so there is no need to track it anymore.
There are several opportunities for reducing the number of variables tracked by the monitor, which will, for many monitors, reduce runtime overhead.Even more opportunities exist for a monitor that can dynamically start and stop tracking variables.(We are developing an inlined monitor that dynamically generates instrumented code to track a subset of program variables, and can thus dynamically start and stop tracking variables.) We present a static analysis that can determine when a program variable is no longer a security concern and show how this analysis can be incorporated into an informationflow monitor.The modified monitor provides exactly the same security guarantee, but with potentially reduced runtime overhead.For the example program above, the analysis enables the monitor to never track variable untrustedLog and to stop tracking variable d when line 11 or 13 is reached.The analysis can either be performed prior to execution or on-the-fly, allowing its use with dynamic languages [12].Memory abstractions.Practical information-flow control must deal with realistic language features, including dynamically allocated memory and first-class memory references.Consider the program fragment in Figure 2, which models a web application in which some users are secret agents.If the current user is a secret agent then events are logged in special log to avoid revealing secret activities.Variable pLog points to either secretLog or normalLog and indicates which log will be updated.It is set based on secret information.At line 11, an entry is added to the log.Suppose the current user is a secret agent.Then the location normalLog will not be updated.However, an observer of normalLog may notice that the log is unchanged, and thus learn that the current user is a secret agent.This information flow occurs because the pointer pLog depends on secret information, and updating through this pointer means that learning the value of any location that pLog might have pointed to may now reveal secret information.
Monitors can soundly track the flow of information in memory, including the heap, using appropriate memory abstractions.The choice of memory abstraction can affect the precision and efficiency of the resulting monitor.In general, fine-grained abstractions may enforce security guarantees precisely, but require an expensive analysis and higher monitor overhead (due to the need to track more abstract locations).Coarser abstractions may be more efficient, but insufficiently precise.In order to explore these tradeoffs, we must first understand the requirements for soundly incorporating a memory abstraction into an information-flow monitor.In addition, we would like to consider to what extent information about the current execution can be used to increase the precision of a memory abstraction.
We present sufficient conditions for incorporating memory abstractions and analyses into a hybrid information-flow monitor.The key insight is that the behavior of the monitor must not leak information, meaning that we must be able to reason statically and precisely about the behavior of the monitor in other possible executions of the program.This has the effect of restricting the precision of memory abstractions.However, the conditions are sufficiently lenient to allow many different memory abstractions to be soundly incorporated into information-flow monitors, such as pointsto sets (e.g., [13,14,15,16]), shape graphs (e.g., [17]), and regions (e.g., [18,19]).The conditions also allow the monitor to use information from the current execution of a program.
The rest of the paper is organized as follows.We present background information in Section II, including Russo and Sabelfeld's fail-stop hybrid information-flow monitor [11].In Section III we show how an information-flow monitor can soundly stop tracking variables when they are no longer a security concern.In Section IV we show how a wide variety of memory abstractions and analyses can be soundly incorporated into information-flow monitors.We discuss related work in Section V and conclude in Section VI.

II. BACKGROUND: INFORMATION-FLOW MONITOR
In this section, we present a simple imperative language and a hybrid information-flow monitor for the language, based closely on that of Russo and Sabelfeld [11].We generalize their language and monitor to an arbitrary security lattice (instead of the two-point lattice they use) and modify the language and monitor slightly to facilitate extensions described in later sections.The modified monitor satisfies the same termination-insensitive noninterference condition as the original monitor, modulo generalization to an arbitrary security lattice.We state the generalized security condition and prove the modified monitor satisfies it.The syntax for a simple imperative language with an explicit output command is given in Figure 3. Values are restricted to integers n, and expressions e are either values v, variables x, or binary expressions e 1 ⊕e 2 , where ⊕ ranges over total binary operations over integers.

A. Language
We assume a complete lattice L of security levels and use and respectively to denote the join operator and partial order.We write L and ⊥ L to denote the top and bottom elements of L, respectively.Security levels may represent confidentiality levels, integrity levels, or both.Security levels mandate restrictions on the use of data.Intuitively, levels higher in the lattice mandate more restrictions.For confidentiality, higher in the lattice corresponds to more confidential; for integrity, higher means less trusted.We use the term "sensitive information" to refer to information labeled with a high security level: either confidential information or untrusted information.
Commands c are standard, with the exception of the output (e) command, which outputs the value of expression e to channel , where is a security level and is intended to be an upper bound on information that may be learned by observing the channel.We assume, without loss of generality, that there is exactly one channel per security level.
Execution of commands introduces new syntactic forms.Terms t represent commands in the process of execution, and extend commands with sequences of terms, stop, and end.Term stop represents a command that has finished execution, and term end indicates that execution is leaving the scope of a branch.The operational semantics inserts end terms when any branch is encountered.
A program configuration is a pair t, m , where t is a term and memory m is a mapping from variables to values.
The judgment e, m ⇓ v indicates that expression e evaluates to value v under memory m.We write m[x → v] for the memory that maps x to v, and otherwise behaves the same as m.
The judgment t, m α − → t , m indicates that configuration t, m can take a single step to configuration t , m .As part of that step, internal event α is emitted; the monitor uses internal events to track and control execution.We describe the different internal events in the following subsection.Inference rules for the operational semantics are given in Figure 4.

B. Hybrid information-flow monitor
An information-flow monitor tracks and controls the flow of information in a program.A monitor configuration is a pair γ, σ consisting of monitor environment γ and monitor stack σ.Monitor environment γ maps program variables to security levels, tracking the security level of information currently stored in each variable.We write lev(e, γ) for the join of levels γ(x) for all variables x ∈ dom(γ) ∩ vars(e), where vars(e) is the set of all variables that occur in e. Function lev(e, γ) gives an upper bound on the information that may be learned by evaluating expression e, assuming that γ describes upper bounds of information stored in variables.
Monitor stack σ is used to track the security level of the program counter, and to account for implicit flows [20] (information flows due to the control structure of the program).It is a stack of pairs ( , γ ) where is a security level and γ is a monitor environment.A pair is pushed on the stack when the program branches and is popped off the stack at the end of the branch.An upper bound on the security level of information influencing control flow can be obtained by taking the join of the first element of all pairs in the stack, denoted lev(σ).We refer to this upper bound as the program counter level.We describe the use of the monitor stack in more detail below.
A monitored program configuration is a pair of a program configuration and a monitor configuration.Monitored execution of a program requires that both the program configuration and the monitor configuration can take a step.Figure 5 shows the inference rule for monitored execution.Note that the monitored execution rule is parameterized by monitor M.This allows us to discuss the behaviors of different monitors.We use M RS to denote Russo and Sabelfeld's fail-stop monitor [11].γ , σ , where M indicates which monitor is in use, α is the triggering internal event, and β is the resulting output, which is either nothing or o (v), indicating output of value v on channel .The monitor can thus halt the execution of the program or modify output in order to enforce security.The small step judgment for monitor configurations also takes the new term t and the original memory m; this information is not used by Russo and Sabelfeld's monitor but is used by our extensions.Small step semantics for M RS are given in Figure 6.Event skip, generated when the program executes a skip command, is always accepted by the monitor and does not change the monitor configuration.Event assign(x, e) is generated by execution of command x := e.The monitor updates the security level of x to the join of the expression's security level lev(e, γ) with the program counter level lev(σ).
When the program enters a branch (either an if or while command), an internal event branch(e, c) is generated, where expression e is the branch test, and command c is the branch not taken.This causes the monitor to push a new pair ( , γ ) onto the monitor stack, where is the join of security level of expression e with the current program counter level lev(σ), and γ is the result of calling UPDATE (c).The function UPDATE (c) analyses command c and returns a monitor environment γ such that for every variable x, γ (x) ∈ { , ⊥ L }, and if c contains an assignment to x then γ (x) = , otherwise γ (x) = ⊥ L .
When the end of a branch is reached, internal event join is generated, which causes the top element ( , γ ) of the monitor stack to be popped, and the current monitor environment γ changed to γ γ (i.e., the point-wise join of γ and γ ).This allows the monitor to track information flows that occur due to code in the branch not taken that could have executed.Internal event output (e, v) is generated when the program attempts to output expression e on channel .The monitor allows the output only if the information that may be learned by the evaluation of the expression (i.e., lev(e, γ)) and the information that has influenced the decision to perform the output (i.e., program counter level lev(σ)) are bounded above by , the information allowed to be output on the channel.In this paper, we consider only Russo and Sabelfeld's fail-stop monitor; they also explore monitors that behave differently on output events.Our results apply to those other monitor behaviors.

C. Security
Russo and Sabelfeld's fail-stop monitor satisfies a termination-insensitive noninterference security condition.Here we state the noninterference condition, extended in a straightforward way to an arbitrary lattice of security levels, and prove the generalized monitor satisfies the condition.
We write t, m , γ, σ when monitored program configuration t, m , γ, σ may take zero or more steps to reach configuration t , m , γ , σ , producing the sequence of outputs β.We write ( β) for the subsequence of β containing all and only events output to channels such that .Intuitively, ( β) is the output that is observable during program execution on channels at level or lower.
Given monitor environment γ, we say that two memories m 1 and m 2 are -equivalent, written m 1 = γ m 2 , if they agree on the values of all variables x such that γ(x) Informally, a monitor is secure if for every command c and security level , given two memories that areequivalent, the monitored execution of c from these two memories will produce outputs that look the same to an entity that can observe all channels at level or lower.Given output sequence β 1 , output sequence β 2 looks the same if either ( β 1 ) = ( β 2 ), or ( β 2 ) is a prefix of ( β 1 ) and no additional observable events will be generated by the execution that produced ( β 2 ).More precisely, we write t, m , γ, σ ⇒ if for all monitored executions starting from configuration t, m , γ, σ , there are no outputs to a channel at level or lower.Definition 1 (Security).Given monitor M and security lattice L, M is secure if for all commands c, all ∈ L, all memories m 1 and m 2 , and monitor environments γ such that then there exist t 2 , m 2 , γ 2 , and where the following conditions hold. 1) Due to our extension from a two-point lattice to an arbitrary lattice L, the proof of Theorem 1 given by Russo and Sabelfeld [11] does not apply.However, the theorem holds as an immediate consequence of Theorem 3 in Section IV, which proves security for a language that is a superset of the language described here.

III. SELECTIVE TRACKING
Monitored execution can be significantly slower than normal execution of a program.For example, Chandra and Franz [10] report that with their hybrid informationflow monitor, assignments are 3× slower than unmonitored execution, despite using a restricted set of security levels with an efficient representation.Newsome and Song [21] instrument binaries with information-flow tracking and report CPU-bound computation taking more than 10× longer than executing the binary in the same instrumentation framework with no tracking.Much of the runtime overhead of an information-flow monitor results from tracking the security levels of many variables.
If at some point in the program's execution, the contents of a specific variable can no longer influence the occurrence of a security violation, there is no need to track its security level.By not tracking a variable, the monitor can reduce the overhead associated both with storage for that variable and with performing join operations on its security level.This reduction may be significant for applications where sensitive data is rarely introduced to the system, or where operations that may violate security are rare.
We present a static analysis that soundly determines when a variable can no longer influence the occurrence of a security violation and show that the analysis can be incorporated into an information-flow monitor with no loss of security.We also discuss implementation issues.

A. Static analysis
We use a simple flow-sensitive security type system [22] to determine when a variable cannot cause a security violation.We use a special two-point lattice, containing elements ⊥ and , where ⊥ and ⊥.A typing environment Γ maps each variable to either ⊥ or .Intuitively, if Γ(x) = then the value stored in variable x may have been influenced by a variable we are considering no longer tracking.
The judgment Γ e : τ means that under typing environment Γ, information associated with expression e is at most τ .If τ = , then the evaluation of the expression may depend on a variable we are considering no longer tracking; if τ = ⊥, then evaluation is independent of such variables.
The judgment pc Γ {t } Γ means that if Γ is an accurate description of the information stored in variables before t executes, and if pc ∈ {⊥, } indicates whether the decision to execute t depends on variables we are considering no longer tracking, then Γ will accurately describe the information stored in variables after t executes.
. Flow-sensitive information-flow type system Inference rules for both judgments are given in Figure 7.We extend pointwise for environments and write Γ Γ to denote ∀x.Γ(x) Γ (x).We write Γ[x → τ ] for the typing environment that maps x to τ and otherwise behaves like Γ.More generally, we write Γ[X → τ ] for the environment that maps all variables in the set X to τ and otherwise behaves like Γ. Finally, we write ⊥ for the typing environment that maps every variable to ⊥.
The inference rules are standard for a flow-sensitive security type system, with the exception of the rule for command output (e).An output command is the only command that may cause a security violation.As such, the typing rule requires that both the value output and the decision to output are independent of any variables we are considering no longer tracking: both must have level ⊥.Note that the security level of the output command is irrelevant: the type system is simply being used to determine whether any output command may be influenced by a variable we are considering no longer tracking.
Intuitively, if the judgment ⊥ ⊥[x → ] {t } Γ holds for some Γ, then the value of variable x just before term t executes does not influence any output that execution of t may produce: it cannot affect either the decision to output or the value output.Thus, if t is all that remains of the program to execute, then there is no need to track the security level of variable x in the rest of the program.More generally, if ⊥ ⊥[X → ] {t } Γ holds, then none of the variables in the set X can cause a security violation, and there is no need to track the security level of any of them.

B. Monitor M PERF
We define a new monitor, M PERF , that uses this static analysis to reduce the number of variables that the monitor must track.Monitor M PERF has a single inference rule, given in Figure 8.We write γ \ X for the monitor environment that is undefined for variables X and otherwise behaves like monitor environment γ.Thus, dom(γ \ X) = dom(γ) \ X.
. Monitor using static analysis to increase performance, M PERF only if monitor M RS takes a step to monitor configuration γ , σ , and variables X ⊆ dom(γ ) cannot influence any output in the remainder of the program.Recall that the small step judgment for monitor configurations takes, in addition to the internal event, the original memory m and the term t that will result if monitored execution is allowed to proceed one step.Thus, term t is the remainder of the program.
To illustrate the behavior of monitor M PERF , consider the example program in Figure 9.We assume a two-point lattice, {L, H}, where L H and H L and assume that at the beginning of the program variables x and z contain sensitive information (level H), and variable y contains non-sensitive information (level L).At line 2, the program branches on the value of z.Suppose the true branch is taken.The remainder of the program is then the term t ≡ y := x; output L (x); end; output L (1).
Clearly variable z can no longer influence any output statement-it doesn't occur in t -and so the monitor can stop tracking it.Similarly, variable y can no longer influence any output statement, and the monitor can stop tracking it, even though the assignment in line 3 would otherwise have raised the security level of y to H. Variable x may affect an output produced by t , so the monitor must continue to track its security level.
Suppose instead the false branch is taken.The remainder of the program is then the term y := z; end; output L (1) and the monitor can immediately stop tracking all variables x, y, and z.
Figure 10 compares the monitor environments of M RS and M PERF after each line in the example program.We write ∅ for the monitor environment with an empty domain and write "-" when control flow does not reach the line.

C. Security
Monitor M PERF enforces the same termination-insensitive noninterference security condition as M RS (stated in Definition 1).Rather than show this directly, we prove that M PERF Line . Monitor state after executing each line of the example program given in Figure 9 is behaviorally equivalent to M RS : they allow exactly the same executions of the program.First, we say that monitor M 1 is at least as restrictive as M 2 if for every execution that M 1 allows from some initial configuration, M 2 allows an execution from the same configuration with the same sequence of outputs.
Definition 2 (At least as restrictive).Monitor M 1 is at least as restrictive as M 2 if whenever t, m , γ, σ then there exists γ and σ such that We say two monitors are behaviorally equivalent if they are both at least as restrictive as each other; that is, they allow exactly the same executions.Definition 3 (Monitor behavioral equivalence).Monitors M 1 and M 2 are behaviorally equivalent if and only if M 1 is at least as restrictive as M 2 and M 2 is at least as restrictive as M 1 .
Monitors M PERF and M RS are behaviorally equivalent, which allows us to easily prove that M PERF enforces the same security condition as M RS .Proofs of all results can be found in the appendix.

D. Implementation issues
Cost of selectively tracking variables.Our static analysis identifies variables that the monitor can safely ignore.However, this provides a performance benefit only when it reduces the work that the monitor must perform.In a naïve implementation of monitor M PERF , selectively tracking variables may increase runtime overhead if the monitor is continually checking which variables to track.
We anticipate that performance is most likely to be improved for inlined information-flow monitors (e.g., [23,24]), where the instrumented code is specialized for tracking just a subset of variables.For example, if the monitor needs to track variables y and z but not x, then no instrumentation is required for an assignment x := y + z, thus removing the lookup of the security levels of y and z and the join operation, without requiring an explicit check to determine whether x should be tracked.
We are developing an inlined information-flow monitor with the ability to dynamically generate different versions of the same code that track different sets of variables.Because of the high overhead of dynamic code generation, this monitor will be most useful for applications in which sensitive data is infrequently introduced into the system.Executions will be lightly instrumented until sensitive information is introduced, at which time additional monitoring code will be generated to track information that may cause a security violation.The system can stop tracking variables (and return to the version of code with little instrumentation) once static analysis determines that a security violation can no longer occur.In a setting where sensitive data is frequently introduced, the benefit of reducing the number of variables that must be tracked may be less than the cost of generating several versions of code.
However, even for monitor implementations that cannot selectively start and stop tracking variables, or where the cost of selectively starting and stopping tracking variables is high, our static analysis may provide some performance benefits.If a variable can no longer affect whether a security violation occurs, it is sound to assign it any security level, as doing so will not change the behavior of the monitor.For some security lattices, join and comparison operations are less expensive for certain security levels.For example, if security levels are represented as partial functions (e.g., [25,26]) then join and comparison operations can be more efficient on the partial function with an empty domain.Instead of removing variables from the domain of the monitor environment, the monitor can set the level of these variables to a level efficient for storage and computation.
On-the-fly analysis vs. pre-execution analysis.The operational semantics for monitor M PERF implies on-the-fly static analysis, but our analysis can also be performed prior to execution.Performing the static analysis on-the-fly allows its use with dynamic languages, in a similar manner to the use of on-the-fly static analysis by Askarov and Sabelfeld [12].However, if used in this way, performing the static analysis on every execution step would most likely be too expensive.
Instead, some subset of execution steps should perform the static analysis.This could be determined according to a schedule (e.g., every k steps), based on certain internal events (e.g., on every branch command and on the execution of dynamically-generated code, i.e., eval commands), or at program points identified by some static analysis.
If the static analysis is performed prior to execution, then the results of the analysis must somehow be communicated to the monitor.As we discuss above, we believe inlining the monitor and specializing the inlined code may be the most efficient way to take advantage of the static analysis results.However, other mechanisms are possible, such as the creation of a data structure that allows the monitor to look up results based on the remainder of the program to execute, perhaps represented by the current value of the program counter.Note that this data structure may contain just a subset of the analysis results; for example, only for program points where the set of variables to stop tracking is above some threshold size.
Implementing the analysis.The analysis is currently phrased as a syntax-directed type system that can check whether judgment ⊥ ⊥[X → ] {t} Γ holds for some set of variables X.However, to be useful, the analysis needs to infer the set of variables X.This corresponds to a principal typing problem [27], where given term t, we want to find a typing environment Γ such that ⊥ Γ {t} Γ holds, and Γ maps as many variables as possible to .Hunt and Sands [28] present a polynomial-time algorithm for inference of principal types for flow-sensitive securitytype systems.Their results can be easily adapted for our setting, giving us an efficient algorithm to implement the analysis.

IV. MEMORY ABSTRACTIONS
In this section, we extend the hybrid information-flow monitor to a language with dynamically allocated memory and first-class references.The extended monitor is parameterized by a sound memory abstraction.Interestingly, not all sound memory abstractions are suitable for use in an information-flow monitor.We state sufficient conditions on the information-flow monitor and memory abstraction to enforce security (and informally describe necessary conditions).Many practical memory abstractions satisfy these sufficient conditions.
By being precise about the conditions for soundly incorporating a memory abstraction into an information-flow monitor, we allow monitor implementations to find an appropriate balance between efficiency and precision.In addition, these conditions highlight opportunities for the development of novel memory abstractions for information-flow control which use run-time information, including information about the state of the monitor, to improve precision.

A. Language extensions
We extend the simple imperative language by adding the new syntactic forms and operational semantics rules given in Figure 11.Program configurations remain the same, although memories m now map both variables and concrete locations (or, simply, locations) to values.We use metavariable r to range over locations.Values in the language are now integers or locations.
Expression * e evaluates e to a location r and looks up the contents of r in the current memory.Command x := new(e) creates a new location r, evaluates e to a value v, and updates the memory so that variable x maps to r and r maps to v. Internal event new(x, e, r) is generated when x := new(e) executes.Command e 1 ← e 2 evaluates e 1 to a location r, evaluates e 2 to a value v, updates the memory so that r maps to v, and issues internal event store(e 1 , e 2 , r).
The new language features enable new information flows.In addition to the information stored in locations, the choice of location accessed can be an information channel.When a pointer is dereferenced, the security level of the result depends on the security level of the pointer, as well as the security level of the value stored in the dereferenced location.Intuitively, a pointer dereference acts like a conditional, where the location accessed is conditional on the value of the pointer.To be sound, information-flow monitors must track these flows.Line 6 outputs the contents of location a, which is always zero, reveals no sensitive information, and is thus secure.Line 7 outputs the contents of the location pointed to by y.Which location y points to (and what value is output) depends on sensitive information.Thus, the security level of the value output is sensitive, and the output is insecure.
Line 9 occurs after the value 2 is stored into the location pointed to by y.One of the locations a or b is updated, depending on the sensitive value h.Regardless of which location is updated, the value stored in location a now depends on the sensitive value h, which makes the output insecure.

B. Sound memory abstractions
A memory abstraction for a program consists of a set of abstract locations and a function points-to(•, •).During execution, concrete locations are allocated.Each concrete location is represented by one or more abstract locations, and each abstract location represents zero or more concrete locations.For a given expression e and location r, points-to(e, r) is a set of abstract locations.Intuitively, soundness of a memory abstraction requires that given a set of expressions e that evaluate to the same concrete location r, there is at least one abstract location common to all of the expressions' points-to sets.Thus, the function points-to(•, •) allows us to reason soundly about possible aliasing.
We say that expression e evaluates to location r during the execution of c if either e, m ⇓ r occurs during the execution of c or e is a variable x and judgment occurs during the execution.
Definition 4 (Sound memory abstraction).A memory abstraction for program c is sound if for any execution of c and location r, if {e 1 , . . ., e k } is a set of expressions that all evaluate to location r during the execution of c, then Note that a memory abstraction may ignore the location argument of points-to(•, •) if, for example, the memory abstraction is generated statically.We include the location argument to allow memory abstractions that use runtime information.For presentation purposes, we provide points-to(•, •) with only an expression and the location it evaluates to.This essentially restricts memory abstractions to flow-insensitive context-insensitive abstractions.We could generalize to allow flow-sensitive, context-sensitive,

C. Monitor M MEM
We define a new monitor, M MEM , that can soundly track information flow for the language defined above.The monitor is parameterized on a sound memory abstraction and an analysis algorithm.
Monitor configurations for M MEM remain unchanged, although the domain of monitor environments γ is extended to include abstract locations.Intuitively, M MEM records for each abstract location a the level of information that may be learned by examining the contents of any of the concrete locations that a represents.
All small-step semantics inference rules for monitor M RS are also inference rules for monitor M MEM , with the exception of the rule for internal event branch(e, c).Additional inference rules for M MEM are given in Figure 13.
When event new(x, e, r) (generated by allocation x := new(e)) is encountered, the monitor updates the level of both the variable x and the abstract locations that represent the newly allocated location r.The level of x is set to the program counter level lev(σ), since a pointer to the newly created reference reveals only that it was created.Abstract locations, on the other hand, are weakly updated to the join of the security level of expression e and the program counter level.Weak update is required because an abstract location may represent more than just one concrete location.As with other memory analyses, if it can be proved that an abstract memory location represents a single concrete location, strong update can be used (e.g., [16]).
The security level of expression e, lev(e, γ, m), is the join of the levels of all variables that occur in e and the levels of all locations that might be dereferenced when e is evaluated.That is, if expression * e is a subexpression of e and e , m ⇓ r, then location r will be dereferenced; the monitor is tracking the security level of those locations using abstract locations points-to(e , r), and so lev(e, γ, m) is at least as high as γ(a) for all abstract locations a in the points-to set of e .
Updating a location, e 1 ← e 2 , generates event store(e 1 , e 2 , r).The monitor updates all abstract locations that represent a location that e 1 may evaluate to: points-to(e 1 , r).These locations are weakly updated with the join of the program counter level, the level of the value being stored (lev(e 2 , γ, m)), and the level of the pointer to the updated location (lev(e 1 , γ, m)), since which location is updated may reveal information.
We must also modify the rule for branches.Like monitor M RS , when M MEM receives event branch(e, c), indicating the program has entered a branch guarded by expression e where c is the branch not taken, it pushes an analysis of the effects of c onto the monitor stack.However, in addition to possible updates to variables, the analysis must now also reason about possible updates to locations.
Since the specifics of this analysis may vary by memory abstraction, we parameterize our monitor with an analysis algorithm ANALYZE(c, m, γ, ), which returns a monitor environment that approximates the environment that would result from the monitored execution of command c, the branch that was not taken.
In order for monitor M MEM to soundly enforce security, the analysis algorithm must meet certain requirements.The key insight is that because the execution, or non-execution, of c depends on information at level , an entity that cannot observe information at level should not be able to distinguish between the monitor actually executing c (and updating the monitor configuration accordingly) and the monitor approximating the effect of executing c by using the analysis algorithm.This means that if executing c may change the security level of a variable or abstract location to or above, then ANALYZE(c, m, γ, ) must return a monitor environment where the level of that variable or abstract location is also or above.This is a sufficient condition for M MEM to soundly enforce security.γ where γ 0 = L γ 1 if and only if dom(γ 0 ) = dom(γ 1 ) and ∀s ∈ dom(γ 0 ).γ 0 (s) L ∨ γ 1 (s) L ⇒ γ 0 (s) = γ 1 (s).This property is sufficient but not necessary.To obtain a necessary condition, it must be weakened in two ways.First, instead of quantifying over all memories m that are L -equivalent to m, it is enough to quantify over memories that can be obtained by an execution of the program that, to an observer at level L , appears equivalent to the execution that produced m.Second, monitor environments ANALYZE(c, m, γ, H ) and γ do not need to be equal on every variable or abstract location that either environment maps to level L or below.Instead, they need only agree on variables and abstract locations that could later affect the output of the program.We avoid stating the necessary property here due to the additional complexity of notation that would be required, and because the sufficient property described above is weak enough to use for all the memory abstractions we consider.Theorem 3. Monitor M MEM , when instantiated with a sound memory abstraction and sufficient analysis algorithm, is secure.
Relationship of M MEM to M RS and M PERF .The function ANALYZE(c, m, γ, ) is a generalization of the function UPDATE (c) used in M RS .Modulo providing UPDATE (c) with additional arguments (the memory m and the monitor environment γ), the sufficient conditions for ANALYZE(c, m, γ, ) to make M MEM soundly enforce security are also sufficient for UPDATE (c) to make M RS soundly enforce security.This would allow for more sophisticated versions UPDATE (c).For example, an analysis could ignore the effect of dead code.
The selective tracking technique developed in Section III and used in M PERF can also be used in M MEM by extending the typing environments to also map abstract locations to security levels.

D. An example instantiation
To illustrate the use of our framework, we now describe a sound information-flow monitor based on a unificationor inclusion-based points-to analysis (e.g., [14,13]).In this memory abstraction, abstract locations are allocation sitesprogram points that create new locations.Each concrete location is represented by the abstract location corresponding to the allocation site at which it was created.The analysis computes points-to sets for each expression.If expression e evaluates to a concrete location r, then the allocation site of r is included in the points-to set of e, points-to(e, r).Note that points-to(e, r) ignores the second argument r, the concrete location to which e evaluates.This memory abstraction satisfies Definition 4; if a set of expressions evaluate to the same concrete location r, then their points-to sets will all include the abstract location representing the allocation site of r, and thus have a nonempty intersection.
To complete the monitor, we define ANALYZE(c, m, γ, ) as the natural generalization of UPDATE (c) in the presence of references.Function ANALYZE(c, m, γ, ) returns a monitor environment γ such that for every variable x, if c contains an assignment to x, then γ (x) = , otherwise γ (x) = γ(x); and for every statement e 1 ← e 2 in c, ∀a ∈ points-to(e 1 , •), γ (a) = , otherwise γ (a) = γ(a).This is a sufficient analysis algorithm (Defintion 5) as it approximates the effects of the monitored execution of c.Interestingly, the set of variables and abstract locations for which ANALYZE(c, m, γ, ) sets to security level is (and must be) exactly the set of variables and abstract locations that would be updated in the monitor environment during monitored execution of c.

E. Choosing a memory abstraction
No sufficient analysis algorithms.Surprisingly, there are sound memory abstractions for which there are no sufficient analysis algorithms.This shows that there are limits on which sound memory abstractions can be incorporated into secure information-flow monitors.
Consider a sound memory abstraction that for each concrete location r has an abstract location a r and points-to(e, r) = {a r }.A monitor using this memory abstraction is tracking information flow very precisely, on a per-location basis.However, no sufficient analysis algorithm exists.Consider a program where depending on sensitive information, branch c may or may not be taken.Command c performs some computation and, based on the result, decides to update one of two locations.To accurately approximate the effect of executing c, the analysis algorithm must determine which of the two locations is updated, which is undecidable, in general.
Novel memory abstractions.The statement of sufficient conditions for memory abstractions and analysis algorithms opens the possibility of developing novel memory abstractions that use security-relevant information to improve the precision and efficiency of information-flow monitoring.
For example, we can improve the precision of the example monitor in Section IV-D by tracking information flow by concrete location when updating a location in a context where both the program counter level and the pointer expression have security level ⊥ L .In this situation, the choice of location to update and the decision to update do not depend on sensitive information.
To achieve this, we extend the points-to(•, •) function to take additional arguments: the current monitor configuration γ, σ and an argument write which is True only when points-to is called to find the set of abstract locations for a concrete location that is being allocated or updated.
The new memory abstraction contains one abstract location for each concrete location.If write = True and the program counter level and the security level of the pointer expression e are both ⊥ L , then points-to(e, r, γ, σ , write) returns {a r }, the abstract location corresponding to the concrete location r; otherwise, it returns the set of abstract locations representing all concrete locations from an allocation site in the points-to set of e, as computed by the pointsto analysis.This memory abstraction is sound and in some cases more precise than the memory abstraction presented in Section IV-D.
Function ANALYZE(c, m, γ, ) for this monitor returns a monitor environment γ such that if the program counter level lev(σ) is ⊥ L , then γ = γ.Otherwise, for every variable x, if c contains an assignment to x, then γ (x) = , otherwise γ (x) = γ(x); and for every statement This analysis algorithm is sufficient and is able to track information flow precisely (i.e., at the granularity of single concrete locations), provided neither control flow nor the choice of which location to update depends on sensitive information.That is, explicit information flows can be tracked precisely.
Efficiency/precision tradeoffs.The memory abstraction used has a significant impact on the performance of an information-flow monitor.A more precise memory abstraction may have more abstract locations, which will increase both the storage required for monitor state and the complexity and number of security level updates.
The monitor we have defined supports a variety of memory abstractions and analysis functions.This allows us to consider trade-offs between efficient and precise memory abstractions with clear requirements for sound informationflow monitoring.At one extreme is a memory abstraction that maps all locations to a single abstract location.This will be sound, but very imprecise-a single piece of sensitive information stored in a location will irrevocably taint all memory.At the other extreme is the most precise sound memory abstraction, which is unusable in an informationflow monitor.Between these two extremes are many sound and useful memory abstractions.
For example, both unification-and inclusion-based pointer analyses (e.g., [14,13]) are sufficient under our framework but differ in precision and overhead.In a unification-based analysis, each allocation site belongs to a single points-to set.Thus, each points to set can be represented with a single abstract location.This is not the case for an inclusion-based analysis, which may be more precise, but at the expense of increasing both the number of abstract locations that must be tracked and the number of join operations on security levels.Shape analysis (e.g., [17]) is yet more precise, again in exchange for increased analysis complexity and runtime overhead.Our results allow all of these analyses and memory abstractions to be used soundly.Efficient runtime representations.Some memory abstractions are more amenable to efficient representation at runtime.Some systems that use regions (e.g., [29]) or pool allocators (e.g., [30]) implicitly represent their abstract locations at run-time and can easily compute which abstract location(s) correspond to a given concrete location.An information-flow monitor can augment the data structures used to maintain regions and pools with security levels to efficiently track security state.An inlined informationflow monitor could further reduce overhead by directly inlining references to abstract locations where lookups are performed.
V. RELATED WORK Russo and Sabelfeld [11] show the impossibility of sound, purely dynamic, flow-sensitive information-flow control.They also present a series of hybrid information-flow monitors, which combine dynamic and static analysis to provide sound flow-sensitive information control that is more precise than either purely static or purely dynamic techniques.Their monitors differ in behavior on insecure output: either stopping execution, suppressing output, or providing a default output.We extend their work by showing how additional static analysis can reduce the runtime overhead of information-flow monitors and show how a wide range of memory abstractions can be soundly incorporated into hybrid information-flow monitors.
Chandra and Franz [10] present a hybrid informationflow monitor for the Java Virtual Machine (JVM) that we believe is unsound.While they are careful to incorporate a sound pointer analysis into their approximation of untaken branches, on explicit updates they increase only the label of the object being modified.As a result, their monitor fails to control information flows through pointer dereference-they unknowingly trade soundness for precision.This highlights the importance of considering soundness while attempting to increase the precision of memory abstractions.
Le Guernic et al. [9] also present a flow-sensitive hybrid information-flow monitor which is subsumed by the monitors of Russo and Sabelfeld [11].Le Guernic [31] extends this work to enforce noninterference in concurrent programs by ensuring the monitor prevents synchronization in program contexts with high-security program counter levels [32].
Shroff et al. [8] consider dynamic information-flow control in a language with dynamic memory allocation.Their system discovers dependencies within a program, either dynamically over several executions or statically.To deal with aliasing, their system must discover which dereferences may depend on which store updates, and in essence, hardcodes a particular pointer analysis.We show how to soundly incorporate a variety of memory abstractions, allowing a choice in the tradeoff between precision and efficiency.
Nair et al. [33] present Trishul, a hybrid system for information-flow control in the JVM that performs static analysis to determine which locations may be modified by code that is not executed and uses the results to soundly track implicit information flows.When the locations modified by code cannot be precisely determined, Trishul uses a global taint to conservatively approximate effects, essentially a single, coarse, abstract memory location.
Austin and Flanagan [6] consider sound purely-dynamic info flow tracking.They achieve soundness by requiring "no sensitive upgrade": non-sensitive memory locations cannot be upgraded to sensitive by assignment within a program context with a sensitive program counter level or by assignment via a sensitive pointer.They suggest modifying a program to preemptively upgrade non-sensitive locations that might otherwise require sensitive upgrade.This transformation results in similar precision to hybrid monitors that upgrade locations based on branches that could have, but were not, executed and is similar to the transformation implemented by Rifle [34].They also introduce sparse labeling, where security labels are tracked explicitly only for data that migrates between information flow domains.They use sparse labeling to exploit label locality: the fact that items in a data structure tend to have the same security level.We believe that this complements our approach (i.e., tracking only items that may cause a security violation), and may lead to efficient representations of monitor environments in hybrid monitors.
Vachharajani et al. [34] propose Rifle, a system with architectural support for tracking information flow.Architectural support has the potential to improve the performance of information flow tracking, but is less portable than languagelevel approaches.Rifle tracks only explicit flows of information and handles implicit flows by performing a binary translation that makes implicit flows explicit, using a static analysis to reason about implicit information flows.No proof of soundness is given.
Newsome and Song [21] implement TaintCheck, which instruments binaries to track the flow of information within the program at byte-level granularity.Although they do not track implicit flows, they record detailed information about how data flows within the system.These detailed traces are analogous to a rich lattice of security levels.They report high overheads for their instrumented execution, particular of CPU-bound computation, and could possibly benefit from our static analysis to reduce some of the instrumentation.
Tripp et al. [35] present Taint Analysis for Java (TAJ), a static analysis tool for detecting vulnerabilities in Java web applications.Their analysis does not consider implicit flows, but uses a novel technique of hybrid thin slicing to detect data dependencies of tainted sources.Dynamic languages.Askarov and Sabelfeld [12] consider an information-flow monitor for dynamic languages, which can generate executable code at runtime.Their monitor uses on-the-fly static analysis of the dynamically generated code.Chugh et al. [36] consider information flow-control in JavaScript, a dynamic language, and perform a lightweight on-the-fly static analysis to determine whether dynamically generated code is secure; parameters for the on-the-fly static analysis are determined by a static analysis of the static portions of the code.Our results are applicable to dynamic languages, and our static analysis for selectively tracking variables can be performed on the fly.
Expressive language features.Russo and Sabelfeld [37] present a monitor for soundly tracking and controlling information flow due to timeouts, a mechanism for executing code snippets after a specified delay.Our selective tracking technique could be extended to this model by appropriately modifying the static analysis used.Russo et al. [38] precisely track and control information flow in dynamic tree structures.While some memory abstractions can reason quite precisely about tree structures, their monitor uses domainspecific knowledge and is thus likely more precise than ours, regardless of the memory abstraction used.
Inlining information-flow monitors.Chudnov and Naumann [23] prove that the information-flow monitor of Russo and Sabelfeld [11] can be inlined.Inlining enables compiler optimizations for the monitoring and facilitates incorporation of the monitor into existing systems.Magazinius et al. [24] present and prove sound a framework for inlining dynamic information-flow monitors that use the "no sensitive upgrade" mechanism to soundly control implicit information flows.Their framework permits on-the-fly inlining, thus providing support for dynamic languages.Venkatakrishnan et al. [39] present a program transformation that, in essence, inlines a hybrid information-flow monitor.They prove that the transformation enforces a noninterference-based security condition.Our work complements monitor inlining, and we believe the most benefit will be gained by applying our results to inlined monitors.

VI. CONCLUSION
We present two ways to use static analysis to increase the efficiency of hybrid information-flow monitors.First, we demonstrate a sound technique for selectively tracking variables during monitored program executions.Second, we derive sufficient conditions for soundly incorporating a variety of memory abstractions into a monitor for languages with dynamically allocated memory.
Selective tracking.Information-flow monitoring significantly decreases program performance.Part of this overhead is effort wasted on tracking security levels of data that cannot cause a security violation.We present a simple static analysis to soundly determine when variables can no longer influence dangerous operations and show that this analysis can be soundly incorporated into an information-flow monitor.Memory abstractions.Practical information-flow control systems must deal with realistic language features, including dynamically allocated memory.The choice of memory abstraction used by an information-flow monitor has a large effect on both its precision and efficiency.While many information-flow control systems reason about memory, no clear requirements have been defined for permissible memory abstractions.We present sufficient conditions for incorporating memory abstractions and discuss how they apply to a variety of memory abstractions.This enables a principled exploration of tradeoffs between precision and efficiency, and opens the possibility of novel useful memory abstractions for information-flow monitors.
We are currently developing a system that dynamically generates instrumented code to enforce noninterference, guided by the results from the paper.

APPENDIX
Soundness of M MEM and M RS .The proof is based on that of Russo and Sabelfeld [11] for their hybrid information-flow monitor.Proof: Specialization of definition 1, observing that if γ = L γ, γ = γ.
Note.For the purposes of the proof, we assume that concrete locations r are assigned deterministically and depend only on information at the level of the program counter at allocation or below, allowing the use of strict equality to detect isomorphism between memories from different executions.Alternatively, we could explicitly construct this isomorphism, or enforce it via additional mechanisms in the information-flow monitor.
We say that expression e evaluates to location r during the execution of c if either e 1 , m ⇓ r occurs during the execution of c, or e is a variable Proof: Corrolary of lemma 1.
Note.For the remainder of this section, we assume the simple two-level lattice {H, L}, where L H and H L. We write = γ in place of ∼ = L γ .Lemma 5 ( on typing environments).Given memories m 1 , m 2 , and m 3 , and typing environments γ, γ , and γ , the following hold: Proof: By definition of and induction on the number of elements in γ and γ .Since lev(σ) = H, by semantics, we have only on the values of x and r.Since γ (x) = H, all that remains to be shown is that ∀e such that e evaluates to r during the execution of the program ∃a ∈ points-to(e , r), γ (a) = H.By lemma 2, abstract(r) ⊆ points-to(e , r), which suffices to ensure this is true since abstract(r) ⊆ points-to(x, r) and thus ∃a ∈ abstract(r) such that γ (a) = H, and 3) holds.
By lemma 4 applied to (2), there exists γ , σ , m such that: In this case, the number of high steps is zero (since the command has a low output), and therefore cf g 1 = cf g 1 and cf g 2 = cf g 2 .We now have: stop, m 2 , cf gm .
By the semantics, cf gm 1 = cf gm 2 = cf gm and m 1 = m 1 = cf gm(1) m 2 = m 2 .All that remains to be shown is Let cf gm = γ, σ .It must be the case that lev(σ) = L or cf g 1 and cf g 2 would not be able to trigger low-events (by lemma 8).By semantics, x := e; c, m 2 , γ, σ given e, m 1 ⇓ v 1 , e, m 2 ⇓ v 2 , and = (e, γ) (σ).We must show that m In the first case, (e, γ) = H by the definition of = γ .In the second case, ∀e that evaluate to r during the execution of c, abstract(r) ⊆ points-to(e , r).By Definition 4, ∀a ∈ abstract(r) = H.As a result, lev(e, γ) = H.Therefore, in both cases m The final result follows by applying the inductive hypothesis to ( 6) and ( 8).x := new(e); c Let cf gm = γ, σ .It must be the case that lev(σ) = L or cf g 1 and cf g 2 would not be able to trigger low-events (by lemma 8).By semantics, x := new(e); c, m 2 , γ, σ given e, m 1 ⇓ v 1 , e, m 2 ⇓ v 2 , A = points-to(x, r) and = (e, γ) (σ).Since the locations r are isomorphic, we can ignore the x component of m 1 and m 2 and focus on showing that if v 1 = v 2 , then = H.The argument proceeds as in the case above, noting that points-to(e, r) has a non-empty intersection with A (points-to(x, r)) because of property 2. The final result follows by applying the inductive hypothesis to (10) and ( 12). e 1 ← e 2 ; c Let cf gm = γ, σ .It must be the case that lev(σ) = L or cf g 1 and cf g 2 would not be able to trigger low-events (by lemma 8).By semantics, e 1 ← e 2 ; c, m 2 , γ, σ given e 2 , m 1 ⇓ v 1 , e 2 , m 2 ⇓ v 2 , A = points-to(e 1 , •,) and = (e 2 , γ) lev(e 1 , γ) (σ).We consider two cases: r1 = r2 Since r1 and r2 are isomorphic, all that must be shown is that if v 1 = v 2 , then = H.The argument proceeds as in the case above, but considering e 2 and noting that both points-to(e 2 , r1) and points-to(e 2 , r2) have non-empty intersections with A (points-to(x, r)) because of Lemma 2. r1 = r2 If r1 and r2 are not isomorphic, we must show that for all e that evaluate to r1 or r2, ∃a1 ∈ abstract(r1), a2 ∈ abstract(r2), γ[A → ](a1) = H and γ[A → ](a2) = H.The argument proceeds as the above, but by considering e 1 .Finally, we must show again that if v 1 = v 2 , then = H.This proceeds exactly as above.

Figure 1 .
Figure 1.Example of inefficient information-flow monitoring

Figure 2 .
Figure 2. Example of information flow through pointer value

Figure 5 .
Figure 5. Semantics of monitored executions A monitor configuration takes a step based on the internal event generated by the executing program and may produce

Lemma 1 .
M PERF is behaviorally equivalent to M RS .Theorem 2. Monitor M PERF is secure.Proof: Immediate from Lemma 1 and Theorem 1.

Figure 11 .
Figure 11.Language extensions for dynamic memory

Figure 13 .
Figure13.Monitor with memory abstractions, M MEM and even path-sensitive abstractions by providing additional arguments, such as the current program configuration, the current monitored program configuration, or a trace of program execution.For simplicity of presentation, we refrain from doing so.