WWW 2008, April 21-25, 2008, Beijing, China.
2008
978-1-60558-085-2/08/04
Efficient Vectorial Operators for Processing XML Twig Queries
H.2.8Database Applications Miscellaneous
Algorithms, Performance, Languages
Holistic Twig Join,Vectorial Operators,Subsequence Match
Categories and
Subject Descriptors
H.2.3 [Languages] Subjects: Query languages
H.2.4 [Systems] Subjects: Query processing
General Terms
Algorithms, Performance, Languages
Keywords
Holistic Twig Join,Vectorial Operators,Subsequence Match
Approaches based on structural index, numbering scheme and subsequence matching have been studied for processing XML twig queries, among which the most efficient one is holistic twig join based on numbering scheme. However, holistic twig join is suboptimal for those twig queries in which both Ancestor-Descendant (A-D) and Parent-Child (P-C) relationships are included, because it may involve huge even unbounded intermediate results. In addition, most of existing proposals are not efficient for the GTP queries [3,4], because to answer GTP queries they have to compute the result sets of all the nodes although a part of them will not contribute to the answer, and hence they have to eliminate those redundancies involved in the result.
To address these problems, we present some vectorial operators for the P-C and A-D relationships to avoid redundant intermediate results and demonstrate how to answer twig queries using these vectorial operators efficiently. To accelerate the processing of twig queries, we propose several techniques to optimize these vectorial operators. Although [1] is proposed to process GTP queries, it is constrained by the fan-out of the XML document and thus leads to inefficiency. We discuss how to efficiently answer GTP queries according to our vectorial operators. To the best of our knowledge, this is the first paper that employs vectorial operators to process twig queries and GTP queries.
We employ a sequence to represent an XML document and answer XML queries based on the sequence. For any node , we construct list , which keeps the elements w.r.t. and the positions of their occurrences in the corresponding sequence. To check whether element and element satisfy the A-D or P-C relationships, we transform and into bit-vectors and evaluate the A-D or P-C relationships on the bit-vectors.
Operator | Operand | Result | Definition |
() | []=1 if ,1 ,1 and = ; otherwise, []=0. | ||
() | []=1 if ,1 , = and = ; otherwise, []=0. | ||
() | []=1 if ,1 ,= ,= . and .leftmost; otherwise, []=0. | ||
() | []=1 if ,1, []=1 and =; otherwise, []=0. | ||
() | []=1 if , []=1 and =, but , =; otherwise, []=0. | ||
() | []=1 if ,1, ()[]=1 and .leftmosti; otherwise, []=0. | ||
(,) | , | = ( () (()1) ) | |
(,) | , | = ( 1 () | |
( ) | =1 if 1, ,.leftmost and =1; otherwise, =0. | ||
(,) | , | = () () | |
(,) | , | = ( ( (), () () ) ) |
We begin by introducing three vectorial operators, , and , to transform any input list into bit-vectors, , and , respectively as illustrated in Table 1. We propose three corresponding operators, , which are similar to and operate on any bit-vector (e.g., an intermediate bit-vector) as shown in Table 1.
We propose two vectorial operators and to
evaluate the P-C relationship as shown in Table 1. The
two operators can evaluate whether the elements in
and the elements in satisfy . In addition, we
introduce three vectorial operators , and
, to evaluate the A-D relationship, as shown in Table
1. Equations (
(2.1-10) |
We introduce how to answer twig queries according to our operators.
Given a query , let denote the
bit-vector result of node w.r.t. the sub-query rooted at node
, while let denote the bit-vector result of node
w.r.t. . We first compute from the
leaf to the root and then compute from the root to
the leaf. Thus, we can get the result set of any query node
according to Equations (
We introduce several techniques to optimize our vectorial operators. Let ,,..., be the -children (P-C) of node and ,,..., be the -children (A-D), to optimize the computation of , we extend from a binary operator to a multiple operator :
=(, , ,..., =1, if =1,1, , .leftmost and =1; otherwise, =0.
We introduce the following Equations (
To answer twig queries in holistic and process GTP queries effectively, we propose an effective algorithm by employing the vectorial operators. directly computes the results and does not require a post-processing to eliminate the irrelevant elements. only maintains several bit-vectors and the number of these bit-vectors is no more than the number of result nodes, thus will not involve large intermediate results.
We compared with state-of-the-art methods, PRIX [5], iTwigJoin [2] and TwigStack [1]. All the algorithms were coded in C++ and all the experiments were conducted on a 2.4 GHz Pentium IV processor with 1GB RAM, running Microsoft Windows XP. We used the real-world dataset DBLP and the dataset TreeBank with deep recursive structures, and employed twelve queries for our experiments. Figure 1 shows the experimental results. We can observe that outperforms PRIX, iTwigJoin and TwigStack on various queries and datasets significantly.
We propose several vectorial operators to evaluate the A-D and P-C relationships and present how to process twig queries in holistic by employing these operators. We demonstrate several effective techniques to optimize these operators for processing XML twig queries and GTP queries.
This work is partly supported by the National Natural Science Foundation of China under Grant No.60573094, the National High Technology Development 863 Program of China under Grant No.2007AA01Z152 and 2006AA01A101, the National Grand Fundamental Research 973 Program of China under Grant No.2006CB303103.