Algebraic Optimization of Outerjoin Queries
Galindo-Legaria, Cesar Alejandro
MetadataShow full item record
CitationGalindo-Legaria, Cesar Alejandro. 1992. Algebraic Optimization of Outerjoin Queries. Harvard Computer Science Group Technical Report TR-12-92
AbstractAn advantage of relational database languages is that they allow "declarative" query specification: users pose queries as a set of conditions or properties on data to be retrieved, rather than by giving a procedure to obtain such data. The database system is then responsible for generating an efficient execution plan, depending on how information is physically stored. In this context, generation of efficient plans is known as database query optimization. Careful query analysis is justified during optimization, as it often brings considerable savings over the time that straightforward evaluation would take. These savings are sometimes of several orders of magnitude. Query optimization has been an active area of research since the introduction of the first relational databases, and important heuristics have been developed for specified classes of queries. In particular, when relational join is the operator used to combine tables, the problem is well understood and it has been solved with considerable success. Although joins cover a large number of common queries, database languages provide facilities that cannot be evaluated using joins only, and an increasing number of applications rely on these facilities. The approach taken by current query processors is to divide the query into various blocks that are optimized independently, but by discarding a global analysis of the query, many optimization opportunities are likely to be be missed. The purpose of this thesis is to extend join optimization techniques to queries that contain both joins and outerjoins. The benefits of query optimization are thus extended to a number of important database applications, such as federated databases, nested queries, and hierarchical views, for which outerjoin is a key component. Query optimization techniques are based on algebraic properties of relational operators, to find simplification rules and associativity identities. Our approach is comprehensive and includes, as special cases, some outerjoin optimization heuristics that have appeared in the literature. Second, we abstract the notion of feasible evaluation order for binary, join-like operators, considering associativity rules but not specific operator semantics. Combining these two parts, we show that a join/outerjoin query can be evaluated by combining relations in any given order--just as it is done on join queries, except that now we need to synthesize an operator to use at each step, rather than using always join. The purpose of changing the order of processing of relations is to reduce the size of intermediate results. Our theoretical results are converted into algorithms compatible with the architecture of conventional database optimizers. For optimizers that repeatedly transform feasible strategies, the outerjoin identities we have identified can be applied directly. Those identities are sufficient to obtain all possible orders of processing. For optimizers that generate join programs bottom-up, we give a rule to determine the operators to use at each step.
Citable link to this pagehttp://nrs.harvard.edu/urn-3:HUL.InstRepos:25235126
- FAS Scholarly Articles