Estimating Information Value in Collaborative Multi-Agent Planning Systems School of Engineering and Applied Sciences Harvard University, Cambridge MA 02138 USA {sarned,grosz}@eecs.harvard.edu David Sarne, Barbara J. Grosz ABSTRACT This paper addresses the problem of identifying the value of information held by a teammate on a distributed, multi-agent team. It focuses on a distributed scheduling task in which computer agents support people who are carrying out complex tasks in a dynamic environment. The paper presents a decision-theoretic algorithm for determining the value of information that is potentially relevant to schedule revisions, but is directly available only to the person and not the computer agent. The design of a “coordination autonomy” (CA) module within a coordination-manager system provided the empirical setting for this work. By design, the CA module depends on an external scheduler module to determine the specific effect of additional information on overall system performance. The paper describes two methods for reducing the number of queries the CA issues to the scheduler, enabling it to satisfy computational resource constraints placed on it. Experimental results indicate the algorithm improves system performance and establish the exceptional efficiency—measured in terms of the number of queries required for estimating the value of information—that can be achieved by the query-reducing methods. Categories and Subject Descriptors I.2.8 [Artificial Intelligence]: Problem Solving, Control Methods, and Search—Scheduling; H.1.1 [Information Systems]: Systems and Information Theory—Value of information General Terms Algorithms, Experimentation 1. INTRODUCTION Incorporating autonomous-agents capabilities into planning and scheduling systems can significantly increase their power, especially in highly dynamic environments and settings in which multiple agents’ schedules are being coordinated [19; 22, inter alia]. Agents can suggest alternative courses of action as well as negotiate conflicts. They can help achieve team objectives more effectively in such highly dynamic environments as first-response situations [21, 13], exploration of remote planets, cleanup of hazardous sites, and military conflicts [11], because they have the ability to efficiently process the rich set of information affecting the problem. By taking responsibility for changes in task schedules, they can free people involved in doing tasks from being unnecessarily overburdened, especially in times of crisis when plans need to be adapted to new circumstances [13, 11]. Situations characterized by agents Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 2007 IFAAMAS . being separated geographically, coordinating groups having limited communication bandwidth, and changes in events and physical environment being complex, dynamic and stochastic in nature typically preclude a complete global view of the problem. They thus require localized, but coordinated, planning and scheduling decisions [20, 21]. In the remainder of this paper, we use the term “automated scheduling agent” (ASA) to refer to autonomous-agent systems that support multi-agent coordinated scheduling. ASAs must take into account various external facts that may influence choice of scheduling changes significantly. These include changes in the environment in which tasks are being carried out and the state of the people or other agents working on a distributed, coordinated task. Information beyond the system’s limited knowledge of the environment may affect constraints, either relaxing them or imposing new ones that better reflect the actual situation. For example, a driver will see changes in weather conditions that may affect route selection when they occur, while an automated navigation system does not. A scientist can induce that a colleague will be going to the school play and thus be unavailable to meet simply because their children go to the same school. Human-agent interaction must be managed appropriately for the system to obtain such information without overburdening other participants, especially people [13, 19]. Obtaining system-external information is associated with costs, including communication costs and the degradation in task performance of the person being interrupted [5, 8, 10, 22]. An ASA needs to plan its interactions to maximize the difference between the value of the information being obtained (taking into account the probability the person being queried has this information) and the costs, deploying appropriate decision-making mechanisms [9, 4]. This paper addresses the problem of identifying the value of information held by a teammate on a distributed, multi-agent, mixed human and computer team. We focus on a scheduling task in which the computer agents are ASAs supporting people who are carrying out complex tasks in a dynamic environment. The ASA is responsible for coordinating modifications to the task schedule as the environment changes with as few interruptions of the person as possible, thus enabling the person to focus on actual task performance. Although coordination itself is domain independent, determining the value of information requires scheduling knowledge; information is valuable to the extent it influences schedule changes, but that influence cannot be determined without task and schedule knowledge. The design of a “coordination autonomy” (CA) module within a coordination-manager system (as part of the DARPA Coordinators project, which aims to construct intelligent cognitive software agents able to assist fielded military units adapt their mission plans and schedules as their situations change) provided the empirical setting for this work, including both constraints on system design and a testbed environment. The overall coordinationsystem design separates the CA module from the scheduler which encapsulates all the scheduling-related expertise. As a result, the CA must consider not only interruptions of the person, but also resource demands it makes on the scheduler through querying for schedule-expertise. Thus, a key challenge to CA-module design was to develop methods that would obtain information from the (human) participants in a distributed, coordinated task without interrupting them too frequently or at the wrong times, while simultaneously not consuming too many scheduler, communication or other coordination system resources. The paper makes two major contributions. First, it presents a decision-theoretic algorithm for determining the value of information in a multi-agent context in which a person rather than a computer agent has the information in question and in a systems context in which the calculation of information-value is complicated by the need to invoke a separate systems module (in this case, the scheduler).1 Second, it describes two novel methods for dealing with the computational resource constraints on the CA. The first method filters queries depending on whether there has been a change in relevant tasks in the schedule. The second translates the problem of generating queries into a game and then attempts to minimize the score. Experimental results demonstrate the effectiveness of the algorithm and that these two methods increase efficiency by significantly decreasing the number of queries required for estimating the value of information. To provide context for the empirical results as well as for the methods we designed for determining information value, the next section of the paper briefly describes the Coordinators application domain and the basic architecture of the CA module. Section 3 discusses the principles on which the process of estimating the value of information is based, and Section 4 presents mechanisms for improving the efficiency of the scheduler-querying process. Experimental results are given in Section 5. Section 6 discusses related work, and Section 7 contains our conclusions. 2. IMPLEMENTATION BACKGROUND The CA module and algorithms we describe in this paper were developed and tested in the Coordinators’ application domain [21]. In this domain, the ASAs, called “coordinators”, operate in a rapidly changing environment and are intended to help maximize an overall team-objective which is measured using a hierarchical objective function. Each ASA operates on behalf of its owner (e.g., the team leader of a first-response team or a unit commander) whose schedule it manages. The actual tasks being scheduled are executed either by owners or by units they oversee, and the ASAs’ responsibility is limited to scheduling these tasks. The quality of ASAs’ schedules depends on knowledge of the environments in which their owners are operating, and in many cases the owners of ASAs may have more accurate information about the state of the environment. Owners in this domain may acquire relevant external information through various communication channels (formal and informal) they maintain with other owners. Typical sources of such information are occasional coordination meetings (e.g., used for reporting status of task execution), open communication the owner overhears (e.g. when leaving the radio open, one gets to listen to messages associated with other teams in the area) and direct communication used for coordination with other people acting on a joint task (throughout which an individual often learns informally about the status and existence of actions being executed by others). An additional source of task-outcome related information is an owner’s accumulated experience. In this domain, scheduling information and constraints are dis1 The algorithm focuses on the “benefits” side of the decision making process that determines when to interact with a user. Specifically, we focus on estimating the expected value of the information the user may have. A complete decision-making mechanism would combine the work described in this paper with a method for determining the cost of interrupting the user and the probability the user has the necessary information [17, 4, 5]. tributed. Each agent receives a different view of the tasks and structures that constitute the full multi-agent problem—typically only a partial, local one. Thus, the intended context of use of coordinators precludes central management. Scheduling problems must be solved distributively. As a result, each ASA needs to reason about changes in the timings of tasks with regard to their correlation with other ASAs’ tasks and to adapt its owner’s schedule accordingly. The time pressure of the intended applications (e.g., emergency response) requires that ASAs make decisions in real time, concurrently with their owner’s (or their units) execution of tasks. The ASA-owner relationship is a collaborative one, with the ASA needing to interact with its owner to obtain task- and environment information relevant to scheduling. The CA module is responsible for deciding intelligently when and how to interact with the owner for improving the ASA’s distributed scheduling. It interacts primarily with a metacognition component of the ASA, which controls computational resources allocated to the CA and other ASA modules, and with the scheduler model of the ASA.2 The scheduler, which is the locus of the ASA’s scheduling expertise, is responsible for task analysis, and thus is a resource whenever scheduling reasoning is required for evaluating the effect of changes in scheduling-problem settings on the system’s expected performance. The separation of scheduling expertise in its own module is generally good system design, because many other ASA modules need to reason about similar aspects of the schedule. Furthermore, unlike such user-focused problems as estimating the probability a user knows relevant information, the user’s confidence level in the information, and the costs associated user-interactions, which are naturally part of the CA, this scheduling expertise is system-focused. However, the mechanisms described in this paper are useful even in environments in which a CA would have scheduling expertise. In such settings, the CA could also deploy heuristics that take advantage of its knowledge of the problem structure. The CA initiates interaction with the ASA when it requires scheduler resources or after interacting with the owner (for the purpose of transferring the information received to the ASA). Our basic CA architecture includes components that track the owner’s interruptibility preferences and costs as well as managing interactions with the owner [17, 18]. The module is able to determine (future) times at which there is a high probability the owner will have new information that may affect system performance. For example, an owner is more likely to know the probability of success of a task to be done in the future shortly prior to the task’s execution than further from its execution time. If the CA evaluates the expected benefit to be positive, then an interaction with the owner will be initiated. The mechanism and algorithms presented below enable the CA to efficiently activate the scheduler to estimate the value of a specific piece of task or environment information the owner has. The benefit of interrupting the owner is based on a comparison of this latter value multiplied by the (estimated) probability an owner has this specific information with the CA’s estimation of the cost such an interaction will incur. Coordinators ASAs represent planning and scheduling problems in cTAEMS structures [2] defining multi-agent hierarchical tasks with probabilistic expectations on their outcomes. For this paper the relevant elements of cTAEMS are methods, quality measures and inter-action dependency links, which we describe briefly. Methods, which are actions executed by a single agent, are the atomic elements in cTAEMS. Figure 1 provides an example of a method definition. As shown, a method has multiple possible discrete outcomes that determine its possible durations and quality. Several different architectures have been suggested for ASAs in the Coordinators domain [14, 12, 20]. All of them share the same core set of modules, including the scheduler and metacognition components. 2 Methods are usually constrained by “release times” (earliest possible start) and deadlines that are set by tasks higher in the cTAEMS hierarchy to which a method belongs, and inherited hierarchically. If a method violates its temporal constraint, it is considered to have failed and yields a zero quality. The probability of zero quality and the duration distributions are given in the “failure” portion of the method definition, as shown in Figure 1. The quality of execution of a method contributes to the quality of its parent task by means of a quality accumulation function (QAF), which defines how the qualities are aggregated up the hierarchy. For example, a qmin QAF for a task results in a quality equal to the minimum achieved by its subtasks. The functions qmax and qsum are defined analogously. Only methods that are executed within their constraints accrue quality, and the overall objective is to maximize quality. The non-local effects links (NLEs) in cTAEMS indicate inter-node relationships such as enablement and facilitation. These abstract cTAEMS structures are used to represent real-world situations. For example, the situation in Figure 1 might represent a plan to go the store to buy wine as part of a larger plan for a dinner party. The time required for this activity might be 16, 20 or 24 minutes (depending, for instance, on the probability of running into a neighbor who wants to talk or the conditions of the sidewalk after a snowstorm). The quality of the schedule (12 or 15) depends on the brand of wine obtained. The zero quality might result if the shopper forgets his or her wallet, which may have a low probability, but will affect the rest of the dinner plan. The hierarchial outcomes of cTAEMS structures may be translated into a simple tabular representation of outcomes and their probability, as given in the table to the right of Figure 1. Each cell in the table contains the probability associated with the outcome described by its row quality and column duration. This tabular presentation is particularly useful when we investigate methods for increasing the CA mechanism’s efficiency in Section 4. cTAEMS Method Outcome Matrix 20 .2*.5 =0.1 .8*.25*.5 =0.1 .8*. 5*.75 =0.3 (spec_method duration (label method2A) (agent agent2) quality 16 (outcomes .2*.25 First set of outcomes (default 0 =0.05 (density 0.8) (quality_distribution 12 0.25 15 0.75) (duration_distribution 16 0.25 20 0.5 24 0.25) .8*.25*.25 ) 12 =0.05 (failure Second set of outcomes (density 0.2) .8*.25*.75 (quality_distribution 0.0 1.0) 15 =0.15 (duration_distribution 16 0.25 20 0.5 24 0.25) ) ) value probability ) if an owner can tell ahead of time that the method its ASA has scheduled to be executed next will definitely fail (e.g., because the weather has changed and will prevent success), early notification of the ASA will allow it to change the schedule before the owner even begins executing this method. If the ASA notifies other owners’ ASAs of this change, then the schedules of any other owners whose tasks are affected by this method’s outcome (because their methods either are part of common QAFs or are enabled by NLE links originating from this method) also can be changed earlier, rather than waiting for the failure to occur. Without such external information, the ASAs may lose flexibility in choosing alternative schedules as well as valuable task-execution time, because they will only know method outcome at the end of execution. In this section, we describe a Bayesian-decision-theoretic [16, 1, 3] algorithm for estimating the value of information an owner may have. The algorithm calculates possible method outcomes and associated consequences resulting from getting this information at different times; it weighs each possibility’s contribution by the probability it will occur. The algorithm queries the scheduler iteratively to determine the value of information an owner may have. The CA does not know what information the owner has, only that the owner has (or may have) relevant information. Thus, unlike prior approaches that use decision theory to calculate the value of information [8, 10, 22], the algorithm must compute the value of information without knowing the actual piece of information (i.e., without knowing a priori which value the owner will provide). Furthermore, the algorithm handles the overall changes in schedule that may result when new information is obtained (essentially emulating the re-scheduling that would result), and it considers all possible outcomes of a method, not simply success or failure. This functionality is not included in the scheduler and is essential for updating the value-of-information appropriately. 3.1 Querying Principles 24 .2*.25 =0.05 .8*.25*.25 =0.05 .8*.25*.75 =0.15 Figure 1: A typical method in cTAEMS cTAEMS problems are known to be difficult scheduling problems and are not easily modeled and solved by traditional solvers for planning and scheduling [6]. In Coordinators, a variety of techniques for scheduler implementations have been used [6, 12, 20]. The value-of-information methods we present do not depend on the particular scheduler technology. The only functionality they require is the ability to produce a feasible schedule given changes in the cTAEMS problem and to be consistent (i.e., if the same problem arrives again then the same solution is supplied). 3. INFORMATION VALUE In cTAEMS, the outcome of a method after it has been executed (or has failed) represents the contribution of performance of that action (its “quality”) to the overall team effort weighted according to the method’s NLEs and QAFs. Because execution outcome is probabilistic, if the CA can obtain more accurate information about the probability distribution of a method’s outcomes before method execution is completed, it can help increase solution quality significantly. Environmental information relevant to method outcomes may be helpful in deciding among possible schedules. For instance, Throughout this section we will consider a scenario in which the ASA has scheduled method M to be executed next, and the potential outcomes of this M are given by the duration vector DM = M M {dM , . . . , dM } and the quality vector QM = {q1 , . . . , ql }. The 1 k owner may be able to reveal (with some probability) that the actual M outcome of M is (dM , qj ), information not directly available to i the CA. That is, prior to interacting with the owner, the CA has only the a priori probability P (dM , q M ) for each potential outcome as given in the method’s definition (as illustrated in Figure M 1). The value of the information (dM , qj ) is the marginal imi provement in the quality of the schedule after it is changed as a result of obtaining this information (at query time). That is, the value M of information (dM , qj ) is the difference between performance of i the ASA with this information “a priori” (i.e., when it queries the owner), denoted E[Q/access owner], and its performance when it obtains this information only at the end of execution of a method, denoted E[Q/¬access owner]. The value is positive only if the M system changes its action depending on (dM , qj ), but is always i non-negative. In the ASA setting, to estimate the value of an interaction with the owner (i.e., to calculate E[Q/access owner]), the CA needs to weigh the value of each possible specific outcome according to the probability of this outcome, because (as noted above) even when the CA knows that the owner has certain information about a method’s outcome, it does not know the outcome-value the owner knows. The algorithm presented below uses hypothetical (“whatif”) queries, which include the original cTAEMS problem settings and the potential new information from the owner, to the scheduler to determine the value of a specific piece of information. We will use the notation St (T, I) to denote the best schedule the scheduler can produce at time t, given a cTAEMS problem T and the additional information I (known at time t). The strucM ture I may be empty, indicate a specific outcome (dM , qj ) of the i method M , or eliminate a specific outcome from the distribution of possible outcomes. We denote the value (quality) of a schedule St (T, I) by St (T, I).quality. The expected quality of a schedule given updated-method-status information from an interaction with the owner is given by k l XX M M E[Q/access owner]= S(T, (dM, qj )).qualityP (dM, qj ). (1) i i i=1j=1 Thus, the estimation of the improvement in quality that results from learning each possible outcome requires two types of queries of the scheduler, (1) a non-constrained query using the new information, and (2) a constrained query that incorporates the results of re-scheduling processes that occur until an EC message is received. 3.2 The Querying Algorithm Calculating the value E[Q/¬access owner] is a non-trivial task. Although the owner is operating according to some schedule, the quality associated with this schedule cannot be used as a baseline, because it does not reflect the re-scheduling that can occur after a method’s actual outcomes are known (or other outcomes are eliminated). This rescheduling may alter significantly the actual quality that is achieved. As an example, we consider a method M planned to be executed at time t, that a priori has four duration outcomes {dM , . . . , dM }, with probability P (dM ) = 0.25 ∀1 ≤ i ≤ 4, 4 1 i and its actual duration outcome is dM . In Coordinators, each agent 3 receives an execution completion method at the time the method successfully completed its execution.3 . At time t + dM the ASA 1 will realize that the method did not complete execution yet because it did not receive an execution completion (EC) message; thus, outcome dM is no longer valid for this method. As a result, 1 the ASA and all other relevant ASAs can update their expectations for method M using Bayesian update; the method will now have three possible duration outcomes {dM , . . . , dM } with probability 2 4 P (dM ) = 0.33 ∀2 ≤ i ≤ 4. The ASAs can then re-analyze i the problem and re-schedule methods that do not start execution prior to time t + dM . Similar re-scheduling can be performed at 1 time t + dM . Finally, at time t + dM the ASA receives the com2 3 pletion message and all agents can re-schedule any methods that have not yet (i.e., prior to time t + dM ) started execution. This 3 final rescheduling is based on the actual duration and quality outcome of method M . The quality achieved at the end of this ”rescheduling” process may be very different from the expected quality of the schedule the ASAs had at the beginning of the process. Thus, to calculate the expected quality, it is necessary to capture the effect of the re-scheduling of events that occurs whenever a method’s execution time exceeds one of the possible durations defined for the method and no EC message has been sent, or when the method completes execution. Generally, if a method is still executing at time t + dM , then its outcome distribution can be j updated to {dM , . . . , dM M | } where P (dM ) = P (dM )/(1 − j+1 i i |D Pj P (dM )) ∀j < |DM |, j < i ≤ |DM |. k k=1 Thus, the schedule that will be used at future time t (t > t) should be constructed incrementally, based on the last re-scheduling event that would have occurred. For exposition purposes, we assume that method M is supposed to start execution at time t. Hence, M St (T, (dM , qj )) = St (Tc (St+dM ), ¬dM ) i i i Algorithm 1, given in pseudo-code below, embodies the principles just described to estimate the value of getting the actual outcome of a method from the owner (before the method completes execution). The core functionality it supports is: (a) the calculation of the differences in the expected quality with and without the owner’s information for the outcomes associated with the first (shortest) duration; and, (b) if execution exceeds the first possible duration outcome, recursive execution of the algorithm on the new schedule to obtain the revised quality for the reduced problem. Algorithm 1 GetV alue(T, M ) - Evaluating the benefit in interacting with the owner Input: T - a cTAEMS problem; M - The method we are working on; t the time when method M starts executing; Output: V - the expected additional utility if receiving the actual outcome from the owner. 1: if DM .length = 0 then 2: return 0 3: end if 4: Set S = GetSch(T ); V alue = 0; 5: Remove schedule S from cTAEMS T 6: for j = 1 to QM .length do 7: Set Tc = T.clone(); Tnc = T.clone(); M = M.clone() M M 8: Set DM = (dM ); QM = (qj ); P (dM ) = P (qj ) = 1.0 1 1 9: Replace method M in Tc and Tnc with method M 10: for i = 1 to S.length do 11: if S[i].start time < t + dM then 1 12: Add S[i] to Tc 13: end if 14: end for M 15: Set V alue = V alue +P (dM)P (qj )(GetSch(Tnc ).quality− 1 GetSch(Tc ).quality) 16: end for 17: for j = 2 to DM .length do 18: Set P (dM ) = P (dM )/(1 − P (dM )) 1 j j 19: end for 20: Set Premain = 1 − P (dM ) 1 21: Remove dM from DM 1 22: Replace method M from Tc with method M 23: return V alue + Premain ∗ GetV alue(Tc , M ) , t < t + dM i (2) where Tc is the constrained problem generated by the re-scheduling process that occurs at the time t + dM (i = argmaxi {dM < t }) i i when the ASA learns that duration dM is no longer a valid outcome. i Finally, the expected value of the information the owner may have can be calculated as: V = k l XX` M St (T, (dM, qj )).quality− i i=1 j=1 (3) ´ M M − St+dM (T , (dM, qj )).quality P (dM , qj ) i i i This has many equivalents in other domains, e.g., a UPS scheduling assistant can be updated based on events recorded in the standard tracking system. 3 The input to the algorithm is a cTAEMS problem, T , which includes a partial schedule and a method M (with an updated outcome distribution). M ’s actual outcome is what the CA is considering asking the user about. The recursive implementation (step 23) allows the algorithm to emulate the re-scheduling process used by the ASA, as described above. The algorithm first queries the scheduler for the best schedule that can be produced given the initial problem settings (represented by S (in step 4)). The variable value accumulates value over the recursion. The main loop (steps 6-16) weighs the differences in expected quality of the schedules for the first possible duration outcome, of the method M , by creating two new cTAEMS problems (step 7) in which method M has that duration outcome and one of the quality possible outcomes with probability 1.0 (steps 8-9). In the first problem, Tnc , no scheduling constraints are imposed, whereas in the second problem, Tc all the ASAs’ methods that were scheduled to start executing before the ASA learns the actual outcome of method M are included in the problem settings as an initial schedule that cannot be changed (steps 10-14). By removing any re-scheduling flexibility up to time t + dM , the cTAEMS problem Tc emulates the scenario in which 1 the ASAs adjust to information about method M ’s outcomes only when the actual outcome is determined by the environment (i.e., when the method has completed execution or failed). Similarly, the cTAEMS problem Tnc emulates the scenario in which the information is received a priori and the ASAs are free to re-plan at this earlier time using this information. Step 15 uses the function GetSch(T ) to query the scheduler so that it returns the optimal schedule that can be generated for the problem T (taking the schedule elements it contains as a constraint for scheduling). A comparison of the quality of the schedule received for the non-constrained problem Tnc and that for the constrained problem Tc yields the incremental quality improvement M that can be obtained by knowing the outcome (dM , qj ) a priori. i The result is multiplied by the probability of getting this outcome from the owner (step 15). Therefore, upon the completion of steps 6-16 the algorithm has the weighted expected utility of outcomes associated with duration dM . To complete the process (i.e., for adding the weighted utility 1 for methods with dM > dM ), the algorithm creates a new problem 1 i in which the outcome dM is removed from M , and it updates the 1 probabilities of the remaining outcomes (steps 17-22). The new problem (Tc ) contains the scheduling constraints imposed by the inability to change any of the methods that are scheduled to start their execution prior to time t + dM . This problem is the input for 1 the recursive execution of the algorithm, which stops when there are no more duration outcomes for M . Once the CA module calculates this gain, it can multiply it by the probability representing its estimation that the owner knows the actual outcome (as detailed by Sarne and Grosz [17]) and if the result is greater than its estimated cost associated with the interruption then the interaction is being initiated. The total number of queries sent to the scheduler throughout the recursive executions of the algorithm is (2 ∗ |QM | + 1) ∗ |DM |, which is derived as follows: (a) two queries (constrained and nonconstrained) for each outcome (making a total of 2 ∗ |QM | ∗ |DM |), used for calculating the outcome’s marginal utility (see step 15); and (b) one query for generating the schedule evolvement due to realizing the elimination of each duration outcome, as time goes by (step 4) (i.e., a total of |DM |). 4. IMPROVING EFFICIENCY The (2 ∗ |QM | + 1) ∗ |DM | scheduler queries required by Algorithm 1 is a significant overhead, because querying the scheduler is a resource-intensive task. As a result, techniques that enable minimizing the number of queries the CA uses for its benefit estimation process can significantly improve ASA performance. In this section, we show how to reduce significantly the number of queries used for estimating the information value without compromising the accuracy of the obtained result. This decrease is achieved by an intelligent selection of the sequence of outcomes for which queries are generated. As shown in Equation 3, the expected value of information can be represented as a weighted sum of the differences between the quality associated with the non-constrained problem, Tnc , and the constrained one, Tc . By identifying outcomes for which the difference is zero, it is possible to eliminate the need for generating the two queries in such cases. Information has value at a specific time only if its receipt at that time leads an agent to change its owner’s schedule in a way that it could not have if it only received this information in an EC message or induced it from not receiving an EC message at the end of a possible method duration. Thus a zero difference occurs when no agent in the environment changes its owner’s schedule in comparison to EC-messages based schedule. We use the term “relevant change” to refer to a scheduling change the existence of which indicates that the ASA’s learning about a method’s outcome before that method finishes executing will yield positive value for the ASA. Formally, the definition of a relevant change is as follows: D EFINITION 1. Consider an initial problem T and a specific M outcome (dM , qj ) of method M . Any change between the schedi M M ules St (T, (di , qj )) and St+dM (T ) over the interval (t, t + dM ) i i is a “relevant” change associated with receiving the true outcome M (dM , qj ) at time t. i The basic idea of the first query-reduction method is that the existence of relevant changes is the only thing that matters for identifying zero value differences, because these changes determine M whether knowing the outcome (dM , qj ) before the completion i message occurs contributes value or not. The difference in quality due knowing this information prior to time t + dM is necessarily i positive if relevant changes between the non-constrained schedule and the EC-based schedule are found at time prior to t + dM . Othi erwise, the difference is necessarily zero. Thus, the CA should first send the scheduler all its non-constrained queries ((Tnc )-based) and should initiate a constrained query ((Tc )M M based) only for those outcomes (qi , dM ), qi ∈ QM , dM ∈ DM , j j for which a relevant change was identified by inspecting the schedule sent back for the constrained query. If the number of outcomes that result with relevant changes is N (N < |QM | ∗ |DM |), then the number of queries required using this technique is (|QM | + 1) ∗ |DM | + N , derived as follows: (a) |DM | queries for generating the schedule changes resulting from the elimination of duration outcomes as in the original Algorithm 1; (b) |QM |∗|DM | for obtaining the non-constrained schedules; and (c) N constrained queries for calculating the quality differences for those scenarios which yield relevant changes. Although this approach significantly lessens the number of queries, it is still far from the theoretical minimum number of queries that need to be used, which is 2 ∗ N + |DM |, where the |DM | additional constrained queries are required for following the re-planning procedure and the 2∗N queries (N constrained and N non-constrained) are required for calculating the quality differences for those scenarios which yield relevant changes. A further significant reduction in the number of queries sent to the scheduler may be obtained by analyzing the possible-outcomes space. Figure 2 illustrates this space of outcomes (and is an analogue of the table given in Figure 1) for a method M with 5 possible duration outcomes and 4 quality outcomes. We begin our analysis with an arbitrary outcome o = (di , qj ) that is not associated with relevant changes. Though this outcome is not associated with relevant changes, its neighboring outcomes in the outcomes space (i.e., (di , qj+k ) or (di+k , qj ), k∈{+1,−1}) may be associated with relevant schedule changes. Table 1 gives examples of ways an increase or a decrease in the duration or quality results in relevant schedule changes (both in a particular ASA’s and other ASAs’ schedules). quality q4 q3 q2 q1 d1 d2 d3 d4 d5 Duration Figure 2: A method’s outcome space If a method o which is a neighbor of method o and has relevant changes can be found, then relevant changes appear in any other outcome in that direction of movement in the outcomes space. For example, if no relevant changes are found for outcome o=(di , qj ), D↑ D↓ Self changes Method M exceeds deadline thus replaced by method M The execution of method M can now be delayed and a new method M can be scheduled before it Not applicable Q↑ Q↓ Method M is replaced by method M (which has a better quality) Changes in other ASAs Method M1 (enabled by M ) cannot start on time, and thus is replaced with M2 Method M1 enabled by M is added to the schedule, and consequently method M2 that facilitates M1 is added at time t < dM + t Method M1 facilitated by M is added to the schedule, and consequently method M2 that enables M1 is added at time t < dM + t A method M1 under a qmin QAF along with M is replaced by method M2 cess reaches a column that has already been scanned, or if the first method in the scanned column reveals no relevant schedule changes, then the process is repeated from the top of the columns down the quality scale. (Depending on the dimensions of the problem, it may be better to scan rows instead of columns, but that process is analogous.) This method guarantees a maximum overhead of 2 ∗ min(|D|, |Q|) unnecessary queries to the scheduler, since at most it will go over all columns and will stop once while moving upward and once while moving downward in each column. Therefore, the number of queries generated using this method is bounded by 2 ∗ N + 2 ∗ min(|DM |, |QM |) + |DM |. Table 1: Relevant schedule changes due to a new information concerning method M ’s quality and duration (Mj denotes a method that is currently non-scheduled) and a relevant change is found for outcome o =(di−1 , qj ), then relevant changes exist also in any outcome o∗=(di−k , qj ), ∀0