diff options
-rw-r--r-- | doc/src/sgml/arch-dev.sgml | 4164 |
1 files changed, 4083 insertions, 81 deletions
diff --git a/doc/src/sgml/arch-dev.sgml b/doc/src/sgml/arch-dev.sgml index 076952eb60a..c04e47cf0bf 100644 --- a/doc/src/sgml/arch-dev.sgml +++ b/doc/src/sgml/arch-dev.sgml @@ -1,81 +1,4083 @@ -<Chapter Id="arch-dev"> - <TITLE>Architecture</TITLE> - -<Sect1> -<Title><ProductName>Postgres</ProductName> Architectural Concepts</Title> - -<Para> - Before we continue, you should understand the basic - <ProductName>Postgres</ProductName> system architecture. Understanding how the - parts of <ProductName>Postgres</ProductName> interact will make the next chapter - somewhat clearer. - In database jargon, <ProductName>Postgres</ProductName> uses a simple "process - per-user" client/server model. A <ProductName>Postgres</ProductName> session - consists of the following cooperating UNIX processes (programs): - -<ItemizedList> -<ListItem> -<Para> - A supervisory daemon process (<Application>postmaster</Application>), -</Para> -</ListItem> -<ListItem> -<Para> - the user's frontend application (e.g., the <Application>psql</Application> program), and -</Para> -</ListItem> -<ListItem> -<Para> - the one or more backend database servers (the <Application>postgres</Application> process itself). -</Para> -</ListItem> -</ItemizedList> -</Para> - -<Para> - A single <Application>postmaster</Application> manages a given collection of - databases on a single host. Such a collection of - databases is called an installation or site. Frontend - applications that wish to access a given database - within an installation make calls to the library. - The library sends user requests over the network to the - <Application>postmaster</Application> (<XRef LinkEnd="ARCHDEV-CONNECTIONS" EndTerm="ARCHDEV-CONNECTIONS">(a)), which in turn starts a new - backend server process (<XRef LinkEnd="ARCHDEV-CONNECTIONS" EndTerm="ARCHDEV-CONNECTIONS">(b)) - -<Figure id="ARCHDEV-CONNECTIONS"> -<Title>How a connection is established</Title> -<Graphic Align="center" FileRef="connections.gif" Format="GIF"></Graphic> -</Figure> - - and connects the - frontend process to the new server (<XRef LinkEnd="ARCHDEV-CONNECTIONS" EndTerm="ARCHDEV-CONNECTIONS">(c)). From - that point on, the frontend process and the backend - server communicate without intervention by the - <Application>postmaster</Application>. Hence, the <Application>postmaster</Application> is always running, waiting - for requests, whereas frontend and backend processes - come and go. The <FileName>libpq</FileName> library allows a single - frontend to make multiple connections to backend processes. - However, the frontend application is still a - single-threaded process. Multithreaded frontend/backend - connections are not currently supported in <FileName>libpq</FileName>. - One implication of this architecture is that the - <Application>postmaster</Application> and the backend always run on the same - machine (the database server), while the frontend - application may run anywhere. You should keep this - in mind, - because the files that can be accessed on a client - machine may not be accessible (or may only be accessed - using a different filename) on the database server - machine. - You should also be aware that the <Application>postmaster</Application> and - postgres servers run with the user-id of the <ProductName>Postgres</ProductName> - "superuser." Note that the <ProductName>Postgres</ProductName> superuser does not - have to be a special user (e.g., a user named - "postgres"). Furthermore, the <ProductName>Postgres</ProductName> superuser - should - definitely not be the UNIX superuser, "root"! In any - case, all files relating to a database should belong to - this <ProductName>Postgres</ProductName> superuser. -</Para> -</sect1> -</Chapter> + <chapter id="overview"> + <title>PostgreSQL from the Developer's Point of View</title> + + <para> + This chapter gives an overview of the internal structure of the + backend of <productname>Postgres</productname>. + After having read the following sections you + should have an idea of how a query is processed. Don't expect a + detailed description here (I think such a description dealing with + all data structures and functions used within <productname>Postgres</productname> + would exceed 1000 + pages!). This chapter is intended to help understanding the general + control and data flow within the backend from receiving a query to + sending the results. + </para> + + <sect1> + <title>The Path of a Query</title> + + <para> + Here we give a short overview of the stages a query has to pass in + order to obtain a result. + </para> + + <procedure> + <step> + <para> + A connection from an application program to the <productname>Postgres</productname> + server has to be established. The application program transmits a + query to the server and receives the results sent back by the server. + </para> + </step> + + <step> + <para> + The <firstterm>parser stage</firstterm> checks the query + transmitted by the application + program (client) for correct syntax and creates + a <firstterm>query tree</firstterm>. + </para> + </step> + + <step> + <para> + The <firstterm>rewrite system</firstterm> takes + the query tree created by the parser stage and looks for + any <firstterm>rules</firstterm> (stored in the + <firstterm>system catalogs</firstterm>) to apply to + the <firstterm>querytree</firstterm> and performs the + transformations given in the <firstterm>rule bodies</firstterm>. + One application of the rewrite system is given in the realization of + <firstterm>views</firstterm>. + </para> + + <para> + Whenever a query against a view + (i.e. a <firstterm>virtual table</firstterm>) is made, + the rewrite system rewrites the user's query to + a query that accesses the <firstterm>base tables</firstterm> given in + the <firstterm>view definition</firstterm> instead. + </para> + </step> + + <step> + <para> + The <firstterm>planner/optimizer</firstterm> takes + the (rewritten) querytree and creates a + <firstterm>queryplan</firstterm> that will be the input to the + <firstterm>executor</firstterm>. + </para> + + <para> + It does so by first creating all possible <firstterm>paths</firstterm> + leading to the same result. For example if there is an index on a + relation to be scanned, there are two paths for the + scan. One possibility is a simple sequential scan and the other + possibility is to use the index. Next the cost for the execution of + each plan is estimated and the + cheapest plan is chosen and handed back. + </para> + </step> + + <step> + <para> + The executor recursively steps through + the <firstterm>plan tree</firstterm> and + retrieves tuples in the way represented by the plan. + The executor makes use of the + <firstterm>storage system</firstterm> while scanning + relations, performs <firstterm>sorts</firstterm> and <firstterm>joins</firstterm>, + evaluates <firstterm>qualifications</firstterm> and finally hands back the tuples derived. + </para> + </step> + </procedure> + + <para> + In the following sections we will cover every of the above listed items + in more detail to give a better understanding on <productname>Postgres</productname>'s internal + control and data structures. + </para> + </sect1> + + <sect1> + <title>How Connections are Established</title> + + <para> + <productname>Postgres</productname> is implemented using a simple "process per-user" + client/server model. In this model there is one <firstterm>client process</firstterm> + connected to exactly one <firstterm>server process</firstterm>. + As we don't know <foreignphrase>per se</foreignphrase> + how many connections will be made, we have to use a <firstterm>master process</firstterm> + that spawns a new server process every time a connection is + requested. This master process is called <literal>postmaster</literal> and + listens at a specified TCP/IP port for incoming connections. Whenever + a request for a connection is detected the <literal>postmaster</literal> process + spawns a new server process called <literal>postgres</literal>. The server + tasks (<literal>postgres</literal> processes) communicate with each other using + <firstterm>semaphores</firstterm> and <firstterm>shared memory</firstterm> + to ensure data integrity + throughout concurrent data access. Figure + \ref{connection} illustrates the interaction of the master process + <literal>postmaster</literal> the server process <literal>postgres</literal> and a client + application. + </para> + + <para> + The client process can either be the <application>psql</application> frontend (for + interactive SQL queries) or any user application implemented using + the <filename>libpg</filename> library. Note that applications implemented using + <application>ecpg</application> + (the <productname>Postgres</productname> embedded SQL preprocessor for C) + also use this library. + </para> + + <para> + Once a connection is established the client process can send a query + to the <firstterm>backend</firstterm> (server). The query is transmitted using plain text, + i.e. there is no parsing done in the <firstterm>frontend</firstterm> (client). The + server parses the query, creates an <firstterm>execution plan</firstterm>, + executes the plan and returns the retrieved tuples to the client + by transmitting them over the established connection. + </para> + +<!-- +\begin{figure}[ht] +\begin{center} +\epsfig{figure=figures/connection.ps} +\caption{How a connection is established} +\label{connection} +\end{center} +\end{figure} +--> + + </sect1> + + <sect1> + <title>The Parser Stage</title> + + <para> + The <firstterm>parser stage</firstterm> consists of two parts: + + <itemizedlist> + <listitem> + <para> + The <firstterm>parser</firstterm> defined in + <filename>gram.y</filename> and <filename>scan.l</filename> is + built using the UNIX tools <application>yacc</application> + and <application>lex</application>. + </para> + </listitem> + <listitem> + <para> + The <firstterm>transformation process</firstterm> does + modifications and augmentations to the data structures returned by the parser. + </para> + </listitem> + </itemizedlist> + </para> + + <sect2> + <title>Parser</title> + + <para> + The parser has to check the query string (which arrives as + plain ASCII text) for valid syntax. If the syntax is correct a + <firstterm>parse tree</firstterm> is built up and handed back otherwise an error is + returned. For the implementation the well known UNIX + tools <application>lex</application> and <application>yacc</application> + are used. + </para> + + <para> + The <firstterm>lexer</firstterm> is defined in the file + <filename>scan.l</filename> and is responsible + for recognizing <firstterm>identifiers</firstterm>, + the <firstterm>SQL keywords</firstterm> etc. For + every keyword or identifier that is found, a <firstterm>token</firstterm> + is generated and handed to the parser. + </para> + + <para> + The parser is defined in the file <filename>gram.y</filename> and consists of a + set of <firstterm>grammar rules</firstterm> and <firstterm>actions</firstterm> + that are executed + whenever a rule is fired. The code of the actions (which + is actually C-code) is used to build up the parse tree. + </para> + + <para> + The file <filename>scan.l</filename> is transformed to + the C-source file <filename>scan.c</filename> + using the program <application>lex</application> + and <filename>gram.y</filename> is transformed to + <filename>gram.c</filename> using <application>yacc</application>. + After these transformations have taken + place a normal C-compiler can be used to create the + parser. Never make any changes to the generated C-files as they will + be overwritten the next time <application>lex</application> + or <application>yacc</application> is called. + + <note> + <para> + The mentioned transformations and compilations are normally done + automatically using the <firstterm>makefiles</firstterm> + shipped with the <productname>Postgres</productname> + source distribution. + </para> + </note> + </para> + + <para> + A detailed description of <application>yacc</application> or + the grammar rules given + in <filename>gram.y</filename> would be beyond the scope of this paper. There are + many books and documents dealing with <application>lex</application> + and <application>yacc</application>. You + should be familiar with <application>yacc</application> before you start to study the + grammar given in <filename>gram.y</filename> otherwise you won't understand what + happens there. + </para> + + <para> + For a better understanding of the data structures used in <productname>Postgres</productname> + for the processing of a query we use an example to illustrate the + changes made to these data structures in every stage. + </para> + + <example id="simple-select"> + <title>A Simple Select</title> + + <para> + This example contains the following simple query that will be used in + various descriptions and figures throughout the following + sections. The query assumes that the tables given in + <citetitle>The Supplier Database</citetitle> + <!-- + XXX The above citetitle should really be an xref, + but that part has not yet been converted from Stefan's original document. + - thomas 1999-02-11 + <xref linkend="supplier" endterm="supplier"> + --> + have already been defined. + + <programlisting> + select s.sname, se.pno + from supplier s, sells se + where s.sno > 2 and + s.sno = se.sno; + </programlisting> + </para> + </example> + + <para> + Figure \ref{parsetree} shows the <firstterm>parse tree</firstterm> built by the + grammar rules and actions given in <filename>gram.y</filename> for the query + given in <xref linkend="simple-select" endterm="simple-select"> + (without the <firstterm>operator tree</firstterm> for + the <firstterm>where clause</firstterm> which is shown in figure \ref{where_clause} + because there was not enough space to show both data structures in one + figure). + </para> + + <para> + The top node of the tree is a <literal>SelectStmt</literal> node. For every entry + appearing in the <firstterm>from clause</firstterm> of the SQL query a <literal>RangeVar</literal> + node is created holding the name of the <firstterm>alias</firstterm> and a pointer to a + <literal>RelExpr</literal> node holding the name of the <firstterm>relation</firstterm>. All + <literal>RangeVar</literal> nodes are collected in a list which is attached to the field + <literal>fromClause</literal> of the <literal>SelectStmt</literal> node. + </para> + + <para> + For every entry appearing in the <firstterm>select list</firstterm> of the SQL query a + <literal>ResTarget</literal> node is created holding a pointer to an <literal>Attr</literal> + node. The <literal>Attr</literal> node holds the <firstterm>relation name</firstterm> of the entry and + a pointer to a <literal>Value</literal> node holding the name of the + <firstterm>attribute</firstterm>. + All <literal>ResTarget</literal> nodes are collected to a list which is + connected to the field <literal>targetList</literal> of the <literal>SelectStmt</literal> node. + </para> + + <para> + Figure \ref{where_clause} shows the operator tree built for the + where clause of the SQL query given in example + <xref linkend="simple-select" endterm="simple-select"> + which is attached to the field + <literal>qual</literal> of the <literal>SelectStmt</literal> node. The top node of the + operator tree is an <literal>A_Expr</literal> node representing an <literal>AND</literal> + operation. This node has two successors called <literal>lexpr</literal> and + <literal>rexpr</literal> pointing to two <firstterm>subtrees</firstterm>. The subtree attached to + <literal>lexpr</literal> represents the qualification <literal>s.sno > 2</literal> and the one + attached to <literal>rexpr</literal> represents <literal>s.sno = se.sno</literal>. For every + attribute an <literal>Attr</literal> node is created holding the name of the + relation and a pointer to a <literal>Value</literal> node holding the name of the + attribute. For the constant term appearing in the query a + <literal>Const</literal> node is created holding the value. + </para> + +<!-- +XXX merge in the figures later... - thomas 1999-01-29 + +\begin{figure}[ht] +\begin{center} +\epsfig{figure=figures/parsetree.ps} +\caption{{\it TargetList} and {\it FromList} for query of example \ref{simple_select}} +\label{parsetree} +\end{center} +\end{figure} + +\begin{figure}[ht] +\begin{center} +\epsfig{figure=figures/where_clause.ps} +\caption{{\it WhereClause} for query of example \ref{simple_select}} +\label{where_clause} +\end{center} +\end{figure} +--> + + </sect2> + + <sect2> + <title>Transformation Process</title> + + <para> + The <firstterm>transformation process</firstterm> takes the tree handed back by + the parser as input and steps recursively through it. If + a <literal>SelectStmt</literal> node is found, it is transformed + to a <literal>Query</literal> + node which will be the top most node of the new data structure. Figure + \ref{transformed} shows the transformed data structure (the part + for the transformed <firstterm>where clause</firstterm> is given in figure + \ref{transformed_where} because there was not enough space to show all + parts in one figure). + </para> + + <para> + Now a check is made, if the <firstterm>relation names</firstterm> in the + <firstterm>FROM clause</firstterm> are known to the system. For every relation name + that is present in the <firstterm>system catalogs</firstterm> a <abbrev>RTE</abbrev> node is + created containing the relation name, the <firstterm>alias name</firstterm> and + the <firstterm>relation id</firstterm>. From now on the relation ids are used to + refer to the <firstterm>relations</firstterm> given in the query. All <abbrev>RTE</abbrev> nodes + are collected in the <firstterm>range table entry list</firstterm> which is connected + to the field <literal>rtable</literal> of the <literal>Query</literal> node. If a name of a + relation that is not known to the system is detected in the query an + error will be returned and the query processing will be aborted. + </para> + + <para> + Next it is checked if the <firstterm>attribute names</firstterm> used are + contained in the relations given in the query. For every + attribute} that is found a <abbrev>TLE</abbrev> node is created holding a pointer + to a <literal>Resdom</literal> node (which holds the name of the column) and a + pointer to a <literal>VAR</literal> node. There are two important numbers in the + <literal>VAR</literal> node. The field <literal>varno</literal> gives the position of the + relation containing the current attribute} in the range + table entry list created above. The field <literal>varattno</literal> gives the + position of the attribute within the relation. If the name + of an attribute cannot be found an error will be returned and + the query processing will be aborted. + </para> + +<!-- +\begin{figure}[ht] +\begin{center} +\epsfig{figure=figures/transform.ps} +\caption{Transformed {\it TargetList} and {\it FromList} for query of example \ref{simple_select}} +\label{transformed} +\end{center} +\end{figure} + +\noindent Figure \ref{transformed_where} shows the transformed {\it where +clause}. Every {\tt A\_Expr} node is transformed to an {\tt Expr} +node. The {\tt Attr} nodes representing the attributes are replaced by +{\tt VAR} nodes as it has been done for the {\it targetlist} +above. Checks if the appearing {\it attributes} are valid and known to the +system are made. If there is an {\it attribute} used which +is not known an error will be returned and the {\it query +processing} will be aborted. \\ +\\ +The whole {\it transformation process} performs a transformation of +the data structure handed back by the {\it parser} to a more +comfortable form. The character strings representing the {\it +relations} and {\it attributes} in the original tree are replaced by +{\it relation ids} and {\tt VAR} nodes whose fields are referring to +the entries of the {\it range table entry list}. In addition to the +transformation, also various checks if the used {\it relation} +and {\it attribute} names (appearing in the query) are valid in the +current context are performed. + +\begin{figure}[ht] +\begin{center} +\epsfig{figure=figures/transform_where.ps} +\caption{Transformed {\it where clause} for query of example \ref{simple_select}} +\label{transformed_where} +\end{center} +\end{figure} + +\pagebreak +\clearpage + +\begin{figure}[ht] +\begin{center} +\epsfig{figure=figures/plan.ps} +\caption{{\it Plan} for query of example \ref{simple_select}} +\label{plan} +\end{center} +\end{figure} +--> + + </sect1> + + <sect1> + <title>The <productname>Postgres</productname> Rule System</title> + + <para> + <productname>Postgres</productname> supports a powerful + <firstterm>rule system</firstterm> for the specification + of <firstterm>views</firstterm> and ambiguous <firstterm>view updates</firstterm>. + Originally the <productname>Postgres</productname> + rule system consisted of two implementations: + + <itemizedlist> + <listitem> + <para> + The first one worked using <firstterm>tuple level</firstterm> processing and was + implemented deep in the <firstterm>executor</firstterm>. The rule system was + called whenever an individual tuple had been accessed. This + implementation was removed in 1995 when the last official release + of the <productname>Postgres</productname> project was transformed into + <productname>Postgres95</productname>. + </para> + </listitem> + + <listitem> + <para> + The second implementation of the rule system is a technique + called <firstterm>query rewriting</firstterm>. + The <firstterm>rewrite system</firstterm>} is a module + that exists between the <firstterm>parser stage</firstterm> and the + <firstterm>planner/optimizer</firstterm>. This technique is still implemented. + </para> + </listitem> + </itemizedlist> + </para> + + <para> + For information on the syntax and creation of rules in the + <productname>Postgres</productname> system refer to + <citetitle>The PostgreSQL User's Guide</citetitle>. + </para> + + <sect2> + <title>The Rewrite System</title> + + <para> + The <firstterm>query rewrite system</firstterm> is a module between + the parser stage and the planner/optimizer. It processes the tree handed + back by the parser stage (which represents a user query) and if + there is a rule present that has to be applied to the query it + rewrites the tree to an alternate form. + </para> + + <sect3 id="view-impl"> + <title>Techniques To Implement Views</title> + + <para> + Now will sketch the algorithm of the query rewrite system. For + better illustration we show how to implement views using rules + as an example. + </para> + + <para> + Let the following rule be given: + + <programlisting> + create rule view_rule + as on select + to test_view + do instead + select s.sname, p.pname + from supplier s, sells se, part p + where s.sno = se.sno and + p.pno = se.pno; + </programlisting> + </para> + + <para> + The given rule will be <firstterm>fired</firstterm> whenever a select + against the relation <literal>test_view</literal> is detected. Instead of + selecting the tuples from <literal>test_view</literal> the select statement + given in the <firstterm>action part</firstterm> of the rule is executed. + </para> + + <para> + Let the following user-query against <literal>test_view</literal> be given: + + <programlisting> + select sname + from test_view + where sname <> 'Smith'; + </programlisting> + </para> + + <para> + Here is a list of the steps performed by the query rewrite + system whenever a user-query against <literal>test_view</literal> appears. (The + following listing is a very informal description of the algorithm just + intended for basic understanding. For a detailed description refer + to <xref linkend="STON89-full" endterm="STON89">). + </para> + + <procedure> + <title><literal>test_view</literal> Rewrite</title> + <step> + <para> + Take the query given in the action part of the rule. + </para> + </step> + + <step> + <para> + Adapt the targetlist to meet the number and order of + attributes given in the user-query. + </para> + </step> + + <step> + <para> + Add the qualification given in the where clause of the + user-query to the qualification of the query given in the + action part of the rule. + </para> + </step> + </procedure> + + <para> + Given the rule definition above, the user-query will be + rewritten to the following form (Note that the rewriting is done on + the internal representation of the user-query handed back by the + parser stage but the derived new data structure will represent the following + query): + + <programlisting> + select s.sname + from supplier s, sells se, part p + where s.sno = se.sno and + p.pno = se.pno and + s.sname <> 'Smith'; + </programlisting> + </para> + </sect2> + </sect1> + + <sect1> + <title>Planner/Optimizer</title> + + <para> + The task of the <firstterm>planner/optimizer</firstterm> is to create an optimal + execution plan. It first combines all possible ways of + <firstterm>scanning</firstterm> and <firstterm>joining</firstterm> + the relations that appear in a + query. All the created paths lead to the same result and it's the + task of the optimizer to estimate the cost of executing each path and + find out which one is the cheapest. + </para> + + <sect2> + <title>Generating Possible Plans</title> + + <para> + The planner/optimizer decides which plans should be generated + based upon the types of indices defined on the relations appearing in + a query. There is always the possibility of performing a + sequential scan on a relation, so a plan using only + sequential scans is always created. Assume an index is defined on a + relation (for example a B-tree index) and a query contains the + restriction + <literal>relation.attribute OPR constant</literal>. If + <literal>relation.attribute</literal> happens to match the key of the B-tree + index and <literal>OPR</literal> is anything but '<>' another plan is created using + the B-tree index to scan the relation. If there are further indices + present and the restrictions in the query happen to match a key of an + index further plans will be considered. + </para> + + <para> + After all feasible plans have been found for scanning single + relations, plans for joining relations are created. The + planner/optimizer considers only joins between every two relations for + which there exists a corresponding join clause (i.e. for which a + restriction like <literal>where rel1.attr1=rel2.attr2</literal> exists) in the + where qualification. All possible plans are generated for every + join pair considered by the planner/optimizer. The three possible join + strategies are: + + <itemizedlist> + <listitem> + <para> + <firstterm>nested iteration join</firstterm>: The right relation is scanned + once for every tuple found in the left relation. This strategy + is easy to implement but can be very time consuming. + </para> + </listitem> + + <listitem> + <para> + <firstterm>merge sort join</firstterm>: Each relation is sorted on the join + attributes before the join starts. Then the two relations are + merged together taking into account that both relations are + ordered on the join attributes. This kind of join is more + attractive because every relation has to be scanned only once. + </para> + </listitem> + + <listitem> + <para> + <firstterm>hash join</firstterm>: the right relation is first hashed on its + join attributes. Next the left relation is scanned and the + appropriate values of every tuple found are used as hash keys to + locate the tuples in the right relation. + </para> + </listitem> + </itemizedlist> + </para> + </sect2> + + <sect2> + <title>Data Structure of the Plan</title> + + <para> + Here we will give a little description of the nodes appearing in the + plan. Figure \ref{plan} shows the plan produced for the query in + example \ref{simple_select}. + </para> + + <para> + The top node of the plan is a <literal>MergeJoin</literal> node which has two + successors, one attached to the field <literal>lefttree</literal> and the second + attached to the field <literal>righttree</literal>. Each of the subnodes represents + one relation of the join. As mentioned above a merge sort + join requires each relation to be sorted. That's why we find + a <literal>Sort</literal> node in each subplan. The additional qualification given + in the query (<literal>s.sno > 2</literal>) is pushed down as far as possible and is + attached to the <literal>qpqual</literal> field of the leaf <literal>SeqScan</literal> node of + the corresponding subplan. + </para> + + <para> + The list attached to the field <literal>mergeclauses</literal> of the + <literal>MergeJoin</literal> node contains information about the join attributes. + The values <literal>65000</literal> and <literal>65001</literal> + for the <literal>varno</literal> fields in the + <literal>VAR</literal> nodes appearing in the <literal>mergeclauses</literal> list (and also in the + <literal>targetlist</literal>) mean that not the tuples of the current node should be + considered but the tuples of the next "deeper" nodes (i.e. the top + nodes of the subplans) should be used instead. + </para> + + <para> + Note that every <literal>Sort</literal> and <literal>SeqScan</literal> node appearing in figure + \ref{plan} has got a <literal>targetlist</literal> but because there was not enough space + only the one for the <literal>MergeJoin</literal> node could be drawn. + </para> + + <para> + Another task performed by the planner/optimizer is fixing the + <firstterm>operator ids</firstterm> in the <literal>Expr</literal> + and <literal>Oper</literal> nodes. As + mentioned earlier, <productname>Postgres</productname> supports a variety of different data + types and even user defined types can be used. To be able to maintain + the huge amount of functions and operators it is necessary to store + them in a system table. Each function and operator gets a unique + operator id. According to the types of the attributes used + within the qualifications etc., the appropriate operator ids + have to be used. + </para> + </sect2> + </sect1> + + <sect1> + <title>Executor</title> + + <para> + The <firstterm>executor</firstterm> takes the plan handed back by the + planner/optimizer and starts processing the top node. In the case of + our example (the query given in example \ref{simple_select}) the top + node is a <literal>MergeJoin</literal> node. + </para> + + <para> + Before any merge can be done two tuples have to be fetched (one from + each subplan). So the executor recursively calls itself to + process the subplans (it starts with the subplan attached to + <literal>lefttree</literal>). The new top node (the top node of the left subplan) is a + <literal>SeqScan</literal> node and again a tuple has to be fetched before the node + itself can be processed. The executor calls itself recursively + another time for the subplan attached to <literal>lefttree</literal> of the + <literal>SeqScan</literal> node. + </para> + + <para> + Now the new top node is a <literal>Sort</literal> node. As a sort has to be done on + the whole relation, the executor starts fetching tuples + from the <literal>Sort</literal> node's subplan and sorts them into a temporary + relation (in memory or a file) when the <literal>Sort</literal> node is visited for + the first time. (Further examinations of the <literal>Sort</literal> node will + always return just one tuple from the sorted temporary + relation.) + </para> + + <para> + Every time the processing of the <literal>Sort</literal> node needs a new tuple the + executor is recursively called for the <literal>SeqScan</literal> node + attached as subplan. The relation (internally referenced by the + value given in the <literal>scanrelid</literal> field) is scanned for the next + tuple. If the tuple satisfies the qualification given by the tree + attached to <literal>qpqual</literal> it is handed back, otherwise the next tuple + is fetched until the qualification is satisfied. If the last tuple of + the relation has been processed a <literal>NULL</literal> pointer is + returned. + </para> + + <para> + After a tuple has been handed back by the <literal>lefttree</literal> of the + <literal>MergeJoin</literal> the <literal>righttree</literal> is processed in the same way. If both + tuples are present the executor processes the <literal>MergeJoin</literal> + node. Whenever a new tuple from one of the subplans is needed a + recursive call to the executor is performed to obtain it. If a + joined tuple could be created it is handed back and one complete + processing of the plan tree has finished. + </para> + + <para> + Now the described steps are performed once for every tuple, until a + <literal>NULL</literal> pointer is returned for the processing of the + <literal>MergeJoin</literal> node, indicating that we are finished. + </para> + + </sect1> + +<!-- +********************************************************** +********************************************************** +********************************************************** +********************************************************** +********************************************************** + +\pagebreak +\clearpage + +\section{The Realization of the Having Clause} +\label{having_impl} + +The {\it having clause} has been designed in SQL to be able to use the +results of {\it aggregate functions} within a query qualification. The +handling of the {\it having clause} is very similar to the handling of +the {\it where clause}. Both are formulas in first order logic (FOL) +that have to evaluate to true for a certain object to be handed back: +% +\begin{itemize} +<step> The formula given in the {\it where clause} is evaluated for +every tuple. If the evaluation returns {\tt true} the tuple is +returned, every tuple not satisfying the qualification is ignored. +% +<step> In the case of {\it groups} the {\it having clause} is evaluated +for every group. If the evaluation returns {\tt true} the group is +taken into account otherwise it is ignored. +\end{itemize} +% +\subsection{How Aggregate Functions are Implemented} +\label{aggregates} + +Before we can describe how the {\it having clause} is implemented we +will have a look at the implementation of {\it aggregate functions} as +long as they just appear in the {\it targetlist}. Note that {\it aggregate +functions} are applied to groups so the query must contain a {\it +group clause}. +% +\begin{example} +\label{having} +Here is an example of the usage of the {\it aggregate +function} {\tt count} which counts the number of part numbers ({\tt +pno}) of every group. (The table {\tt sells} is defined in example +\ref{supplier}.) +% +\begin{verbatim} + select sno, count(pno) + from sells + group by sno; +\end{verbatim} +% +\end{example} +% +A query like the one in example \ref{having} is processed by the usual +stages: +% +\begin{itemize} +<step> the parser stage +<step> the rewrite system +<step> the planner/optimizer +<step> the executor +\end{itemize} +% +and in the following sections we will describe what every stage does +to the query in order to obtain the appropriate result. + +\subsubsection{The Parser Stage} + +\begin{figure}[ht] +\begin{center} +\epsfig{figure=figures/parse_having.ps} +\caption{{\it Querytree} built up for the query of example \ref{having}} +\label{parse_having} +\end{center} +\end{figure} + +The parser stage builds up a {\it querytree} containing the {\it +where} qualification and information about the {\it grouping} that has +to be done (i.e. a list of all attributes to group for is attached to +the field {\tt groupClause}). The main difference to {\it querytrees} +built up for queries without {\it aggregate functions} is given in the +field {\tt hasAggs} which is set to {\tt true} and in the {\it +targetlist}. The {\tt expr} field of the second {\tt TLE} node of the +{\it targetlist} shown in figure \ref{parse_having} does not point +directly to a {\tt VAR} node but to an {\tt Aggreg} node representing +the {\it aggregate function} used in the query. + +A check is made that every attribute grouped for appears only without +an {\it aggregate function} in the {\it targetlist} and that every +attribute which appears without an {\it aggregate function} in the +{\it targetlist} is grouped for. +% + +\pagebreak + +\subsubsection{The Rewrite System} + +The rewriting system does not make any changes to the {\it querytree} +as long as the query involves just {\it base tables}. If any {\it +views} are present the query is rewritten to access the tables +specified in the {\it view definition}. +% +\subsubsection{Planner/Optimizer} +Whenever an {\it aggregate function} is involved in a query (which is +indicated by the {\tt hasAggs} flag set to {\tt true}) the planner +creates a {\it plantree} whose top node is an {\tt AGG} node. The {\it +targetlist} is searched for {\it aggregate functions} and for every +function that is found, a pointer to the corresponding {\tt Aggreg} +node is added to a list which is finally attached to the field {\tt aggs} of +the {\tt AGG} node. This list is needed by the {\it executor} to know which +{\it aggregate functions} are present and have to be +handled. + +The {\tt AGG} node is followed by a {\tt GRP} node. The implementation +of the {\it grouping} logic needs a sorted table for its operation so +the {\tt GRP} node is followed by a {\tt SORT} node. The {\tt SORT} +operation gets its tuples from a kind of {\tt Scan} node (if no +indices are present this will be a simple {\tt SeqScan} node). Any +qualifications present are attached to the {\tt Scan} node. Figure +\ref{plan_having} shows the {\it plan} created for the query given in +example \ref{having}. + +\begin{figure}[ht] +\begin{center} +\epsfig{figure=figures/plan_having.ps} +\caption{{\it Plantree} for the query of example \ref{having}} +\label{plan_having} +\end{center} +\end{figure} + +Note that every node has its own {\it targetlist} which may differ from the +one of the node above or below. The field {\tt varattno} of every +{\tt VAR} node included in a {\it targetlist} contains a number representing +the position of the attribute's value in the tuple of the current node. + +\pagebreak +\clearpage + +\subsubsection{Executor} + +The {\it executor} uses the function {\tt execAgg()} to execute {\tt AGG} +nodes. As described earlier it uses one main function {\tt +ExecProcNode} which is called recursively to execute subtrees. The +following steps are performed by {\tt execAgg()}: +% +\begin{itemize} +% +<step> The list attached to the field {\tt aggs} of the {\tt AGG} node is +examined and for every {\it aggregate function} included the {\it transition +functions} are fetched from a {\it function table}. Calculating the +value of an {\it aggregate function} is done using three functions: +% +\begin{itemize} +<step> The {\it first transition function} {\tt xfn1} is called with the +current value of the attribute the {\it aggregate function} is applied +to and changes its internal state using the attribute's value given as +an argument. +% +<step> The {\it second transition function} {\tt xfn2} is called without any +arguments and changes its internal state only according to internal rules. +% +<step> The {\it final function} {\tt finalfn} takes the final states of {\tt +xfn1} and {\tt xfn2} as arguments and finishes the {\it aggregation}. +\end{itemize} +% +\begin{example} Recall the functions necessary to implement the +{\it aggregate function} {\tt avg} building the average over all +values of an attribute in a group (see section \ref{ext_agg} {\it +Extending Aggregates}): +% +\begin{itemize} +% +<step> The first transition function {\tt xfn1} has to be a function that +takes the actual value of the attribute {\tt avg} is applied to as an +argument and adds it to the internally stored sum of previous +calls. +% +<step> The second transition function {\tt xfn2} only increases an internal +counter every time it is called. +% +<step> The final function {\tt finalfn} divides the result of {\tt xfn1} by +the counter of {\tt xfn2} to calculate the average. +% +\end{itemize} +% +\end{example} +% +Note that {\tt xfn2} and {\tt finalfn} may be absent (e.g. for the +{\it aggregate function} {\tt sum} which simply sums up all values of +the given attribute within a group. \\ +\\ +{\tt execAgg()} creates an array containing one entry for every {\it +aggregate function} found in the list attached to the field {\tt +aggs}. The array will hold information needed for the execution of +every {\it aggregate function} (including the {\it transition functions} +described above). +% +<step> The following steps are executed in a loop as long as there are +still tuples returned by the subplan (i.e. as long as there are still +tuples left in the current group). When there are no tuples left +in the group a {\tt NULL} pointer is returned indicating the end of the +group. +% +\begin{itemize} +<step> A new tuple from the subplan (i.e. the {\it plan} attached to the +field {\tt lefttree}) is fetched by recursively calling {\tt +ExecProcNode()} with the subplan as argument. +% +<step> For every {\it aggregate function} (contained in the array created +before) apply the transition functions {\tt xfn1} and {\tt xfn2} to +the values of the appropriate attributes of the current tuple. +\end{itemize} +% +<step> When we get here, all tuples of the current group have been +processed and the {\it transition functions} of all {\it aggregate +functions} have been applied to the values of the attributes. We are +now ready to complete the {\it aggregation} by applying the {\it final +function} ({\tt finalfn}) for every {\it aggregate function}. +% +<step> Store the tuple containing the new values (the results of the +{\it aggregate functions}) and hand it back. +\end{itemize} +% +Note that the procedure described above only returns one tuple +(i.e. it processes just one group and when the end of the group is +detected it processes the {\it aggregate functions} and hands back one +tuple). To retrieve all tuples (i.e. to process all groups) the +function {\tt execAgg()} has to be called (returning a new tuple every +time) until it returns a {\tt NULL} pointer indicating that there are +no groups left to process. + +\subsection{How the Having Clause is Implemented} + +The basic idea of the implementation is to attach the {\it operator tree} +built for the {\it having clause} to the field {\tt qpqual} of +node {\tt AGG} (which is the top node of the query tree). Now the executor +has to evaluate the new {\it operator tree} attached to {\tt qpqual} for +every group processed. If the evaluation returns {\tt true} the group +is taken into account otherwise it is ignored and the next group will +be examined. \\ +\\ +In order to implement the {\it having clause} a variety of changes +have been made to the following stages: +% +\begin{itemize} +<step> The {\it parser stage} has been modified slightly to build up and +transform an {\it operator tree} for the {\it having clause}. +% +<step> The {\it rewrite system} has been adapted to be able to use {\it +views} with the {\it having logic}. +% +<step> The {\it planner/optimizer} now takes the {\it operator tree} of +the {\it having clause} and attaches it to the {\tt AGG} node (which +is the top node of the {\it queryplan}). +% +<step> The {\it executor} has been modified to evaluate the {\it operator tree} +(i.e. the internal representation of the {\it having +qualification}) attached to the {\tt AGG} node and the results of the +{\it aggregation} are only considered if the evaluation returns {\tt true}. +\end{itemize} +% +In the following sections we will describe the changes made to every +single stage in detail. + +\subsubsection{The Parser Stage} + +The grammar rules of the {\it parser} defined in {\tt gram.y} did not +require any changes (i.e. the rules had already been prepared for the +{\it having clause}). The {\it operator tree} built up for the {\it having +clause} is attached to the field {\tt havingClause} of the {\tt +SelectStmt} node handed back by the {\it parser}. \\ +\\ +The {\it transformation} procedures applied to the tree handed back by +the {\it parser} transform the {\it operator tree} attached to the field +{\tt havingClause} using exactly the same functions used for the +transformation of the {\it operator tree} for the {\it where clause}. This +is possible because both trees are built up by the same grammar rules +of the {\it parser} and are therefore compatible. Additional checks +which make sure that the {\it having clause} involves at least one +{\it aggregate function} etc. are performed at a later point in time +in the {\it planner/optimizer} stage. \\ +\\ +The necessary changes have been applied to the following functions +included in the file {\tt +$\ldots$/src/backend/parser/analyze.c}. Note, that only the relevant +parts of the affected code are presented instead of the whole +functions. Every added source line will be marked by a {\tt '+'} at the +beginning of the line and every changed source line will be marked by +a {\tt '!'} throughout the following code listings. Whenever a part of +the code which is not relevant at the moment is skipped, three +vertical dots are inserted instead. +% +\pagebreak +% +\begin{itemize} +<step> {\tt transformInsertStmt()} \\ +This function becomes is invoked every time a SQL {\tt insert} +statement involving a {\tt select} is used like the following example +illustrates: +% +\begin{verbatim} + insert into t2 + select x, y + from t1; +\end{verbatim} +% +Two statements have been added to this function. The first one +performs the transformation of the {\it operator tree} attached to the +field {\tt havingClause} using the function {\tt +transformWhereClause()} as done for the {\it where clause}. It is +possible to use the same function for both clauses, because they are +both built up by the same {\it grammar rules} given in {\tt gram.y} +and are therefore compatible. + +The second statement makes sure, that {\it aggregate functions} are +involved in the query whenever a {\it having clause} is used, +otherwise the query could have been formulated using only a {\it where +clause}. +% +\begin{verbatim} + static Query * + transformInsertStmt(ParseState *pstate, + InsertStmt *stmt) + { + /* make a new query tree */ + Query *qry = makeNode(Query); + . + . + . + /* fix where clause */ + qry->qual = transformWhereClause(pstate, + stmt->whereClause); + ++ /* The havingQual has a similar meaning as "qual" in ++ * the where statement. So we can easily use the ++ * code from the "where clause" with some additional ++ * traversals done in .../optimizer/plan/planner.c ++ */ ++ qry->havingQual = transformWhereClause(pstate, ++ stmt->havingClause); + . + . + . ++ /* If there is a havingQual but there are no ++ * aggregates, then there is something wrong with ++ * the query because having must contain aggregates ++ * in its expressions! Otherwise the query could ++ * have been formulated using the where clause. ++ */ ++ if((qry->hasAggs == false) && ++ (qry->havingQual != NULL)) ++ { ++ elog(ERROR,"This is not a valid having query!"); ++ return (Query *)NIL; ++ } + return (Query *) qry; + } +\end{verbatim} +% +<step> {\tt transformSelectStmt()} \\ +Exactly the same statements added to the function {\tt +transformInsertStmt()} above have been added here as well. +% +\begin{verbatim} + static Query * + transformSelectStmt(ParseState *pstate, + SelectStmt *stmt) + { + Query *qry = makeNode(Query); + + qry->commandType = CMD_SELECT; + . + . + . + qry->qual = transformWhereClause(pstate, + stmt->whereClause); + ++ /* The havingQual has a similar meaning as "qual" in ++ * the where statement. So we can easily use the ++ * code from the "where clause" with some additional ++ * traversals done in .../optimizer/plan/planner.c ++ */ ++ qry->havingQual = transformWhereClause(pstate, ++ stmt->havingClause); + . + . + . ++ /* If there is a havingQual but there are no ++ * aggregates, then there is something wrong with ++ * the query because having must contain aggregates ++ * in its expressions! Otherwise the query could ++ * have been formulated using the where clause. ++ */ ++ if((qry->hasAggs == false) && ++ (qry->havingQual != NULL)) ++ { ++ elog(ERROR,"This is not a valid having query!"); ++ return (Query *)NIL; ++ } + return (Query *) qry; + } +\end{verbatim} +% +\end{itemize} + + +\subsubsection{The Rewrite System} + +This section describes the changes to the {\it rewrite system} of +<productname>Postgres</productname> that have been necessary to support the use of {\it views} +within queries using a {\it having clause} and to support the +definition of {\it views} by queries using a {\it having clause}. + +As described in section \ref{view_impl} {\it Techniques To Implement +Views} a query rewrite technique is used to implement {\it +views}. There are two cases to be handled within the {\it rewrite +system} as far as the {\it having clause} is concerned: +% +\pagebreak +% +\begin{itemize} +<step> The {\it view definition} does not contain a {\it having clause} +but the queries evaluated against this view may contain {\it having +clauses}. +<step> The {\it view definition} contains a {\it having clause}. In this +case queries evaluated against this view must meet some +restrictions as we will describe later. +\end{itemize} +% +\paragraph{No having clause in the view definition:} First we will +look at the changes necessary to support queries using a +{\it having clause} against a {\it view} defined without +a {\it having clause}. \\ +\\ +Let the following view definition be given: +% +\begin{verbatim} + create view test_view + as select sno, pno + from sells + where sno > 2; +\end{verbatim} +% +and the following query made against <literal>test_view</literal>: +% +\begin{verbatim} + select * + from testview + where sno <> 5; +\end{verbatim} +% +The query will be rewritten to: +% +\begin{verbatim} + select sno, pno + from sells + where sno > 2 and + sno <> 5; +\end{verbatim} +% +The query given in the definition of the {\it view} <literal>test_view</literal> +is the {\it backbone} of the rewritten query. The {\it targetlist} is taken +from the user's query and also the {\it where qualification } of the +user's query is added to the {\it where qualification} of the new +query by using an {\tt AND} operation. \\ +\\ +Now consider the following query: +% +\begin{verbatim} + select sno, count(pno) + from testview + where sno <> 5 + group by sno + having count(pno) > 1; +\end{verbatim} +% +From now on it is no longer sufficient to add just the {\it where +clause} and the {\it targetlist} of the user's query to the new query. The +{\it group clause} and the {\it having qualification} also have to be +added to the rewritten query: +% +\begin{verbatim} + select sno, count(pno) + from sells + where sno > 2 and + sno <> 5 + group by sno + having count(pno) > 1; +\end{verbatim} +% +\pagebreak + +\noindent Several changes that have already been applied to the {\it +targetlist} and the {\it where clause} also have to be applied to the +{\it having clause}. Here is a collection of all {\it additional} steps that +have to be performed in order to rewrite a query using a {\it having +clause} against a simple {\it view} (i.e. a {\it view} whose +definition does not use any {\it group} and {\it having clauses}): +% +\begin{itemize} +% +<step> Rewrite the subselects contained in the {\it having clause} if any are +present. +% +<step> Adapt the {\tt varno} and {\tt varattno} fields of all {\tt +VAR} nodes contained in the {\it operator tree} representing the {\it +having clause} in the same way as it has been done for the tree +representing the {\it where clause}. The {\tt varno} fields are changed +to use the {\it base tables} given in the {\it view definition} (which +have been inserted into the {\it range table entry list} in the +meantime) instead of the {\it virtual tables}. The positions of +the attributes used in the {\it view} may differ from the positions of +the corresponding attributes in the {\it base tables}. That's why the +{\tt varattno} fields also have to be adapted. +% +<step> Adapt the {\tt varno} and {\tt varattno} fields of all {\tt +VAR} nodes contained in the {\tt groupClause} of the user's query in +the way and for the reasons described above. +% +<step> Attach the tree representing the {\it having qualification} (which is +currently attached to the field {\tt havingClause} of the {\tt Query} +node for the user's query) to the field {\tt havingClause} of the {\tt +Query} node for the new (rewritten) query. +% +<step> Attach the list representing the {\it group clause} (currently +attached to the field {\tt groupClause} of the {\tt Query} node for +the user's query) to the field {\it group clause} of the node for the +new (rewritten) query. +% +\end{itemize} + +\paragraph{The view definition contains a having clause:} Now we will +look at the problems that can arise using {\it views} that are +defined using a query involving a {\it having clause}. \\ +\\ +Let the following {\it view definition} be given: +% +\begin{verbatim} + create view test_view + as select sno, count(pno) as number + from sells + where sno > 2 + group by sno + having count(pno) > 1; +\end{verbatim} +% +Simple queries against this {\it view} will not cause any troubles: +% +\begin{verbatim} + select * + from test_view + where sno <> 5; +\end{verbatim} +% +This query can easily be rewritten by adding the {\it where +qualification} of the user's query ({\tt sno $<>$ 5}) to the {\it +where qualification} of the {\it view definition's } query. \\ +\\ +The next query is also simple but it will cause troubles when +it is evaluated against the above given {\it view definition}: +% +\begin{verbatim} + select * + from test_view + where number > 1; /* count(pno) in the view def. + * is called number here */ +\end{verbatim} +% +\pagebreak +The currently implemented techniques for query rewriting will rewrite +the query to: +% +\begin{verbatim} + select * + from sells + where sno > 2 and + count(pno) > 1 + group by sno + having count(pno) > 1; +\end{verbatim} +% +which is an invalid query because an {\it aggregate function} appears +in the {\it where clause}. \\ +\\ +Also the next query will cause troubles: +% +\begin{verbatim} + select pno, count(sno) + from test_view + group by pno; +\end{verbatim} +% +As you can see this query does neither involve a {\it where clause} +nor a {\it having clause} but it contains a {\it group clause} which +groups by the attribute {\tt pno}. The query in the definition of the +{\it view} also contains a {\it group clause} that groups by the +attribute {\tt sno}. The two {\it group clauses} are in conflict with +each other and therefore the query cannot be rewritten to a form that +would make sense.\\ +\\ +{\bf Note:} There is no solution to the above mentioned problems at the +moment and it does not make sense to put much effort into that because +the implementation of the support for queries like: +% +\begin{verbatim} + select pno_count, count(sno) + from ( select sno, count(pno) as pno_count + from sells + where sno > 2 + group by sno + having count(pno) > 1) + group by pno_count; +\end{verbatim} +% +(which is part of the SQL92 standard) will automatically also solve +these problems. \\ +\\ +In the next part of the current section we will present the changes +applied to the source code in order to realize the above described +items. Note that it is not necessary to understand the meaning of +every single source line here and therefore we will not discuss +detailed questions like "Why has the variable {\tt varno} to be +increased by 3?". Questions like that belong to a chapter dealing +with the implementation of {\it views} in <productname>Postgres</productname> and to be able to +answer them it would be necessary to know all the functions and not +only those described here. The fact important for us is to make sure, +that whatever is applied to the {\it targetlist} and the data +structures representing the {\it where clause} is also applied to the +data structures for the {\it having clause}. There are three files +affected: \\ +\\ +\indent {\tt $\ldots$/src/backend/rewrite/rewriteHandler.c} \\ +\indent {\tt $\ldots$/src/backend/rewrite/rewriteManip.c} \\ +\indent {\tt $\ldots$/src/backend/commands/view.c} \\ +\\ +Here is a description of the changes made to the functions contained +in the file {\tt $\ldots$/src/backend/rewrite/rewriteHandler.c}: +% +\pagebreak +% +\begin{itemize} +<step> {\tt ApplyRetrieveRule()} \\ +This function becomes invoked whenever a {\tt select} statement +against a {\it view} is recognized and applies the {\it rewrite rule} +stored in the {\it system catalogs}. The additional source lines given +in the listing below make sure that the functions {\tt +OffsetVarNodes()} and {\tt ChangeVarNodes()} that are invoked for the +{\it where clause} and the {\it targetlist} of the query given in the +{\it view definition} are also called for the {\it having clause} and +the {\it group clause} of the query in the {\it view +definition}. These functions adapt the {\tt varno} and {\tt varattno} +fields of the {\tt VAR} nodes involved. + +The additional source lines at the end of {\tt ApplyRetrieveRule()} +attach the data structures representing the {\it having clause} and +the {\it group clause} of the query in the {\it view definition} to +the rewritten {\it parsetree}. As mentioned earlier, a {\it view +definition} involving a {\it group clause} will cause troubles +whenever a query using a different {\it group clause} against this +{\it view} is executed. There is no mechanism preventing these +troubles included at the moment. + +Note that the functions {\tt OffsetVarNodes()} , {\tt ChangeVarNodes()} +and {\tt AddHavingQual()} appearing in {\tt ApplyRetrieveRule()} are +described at a later point in time. +% +\begin{verbatim} + static void + ApplyRetrieveRule(Query *parsetree, RewriteRule *rule, + int rt_index, int relation_level, + Relation relation, int *modified) + { + Query *rule_action = NULL; + Node *rule_qual; + List *rtable, + . + . + . + OffsetVarNodes((Node *) rule_action->targetList, + rt_length); + OffsetVarNodes(rule_qual, rt_length); + ++ OffsetVarNodes((Node *) rule_action->groupClause, ++ rt_length); ++ OffsetVarNodes((Node *) rule_action->havingQual, ++ rt_length); + . + . + . + ChangeVarNodes(rule_qual, + PRS2_CURRENT_VARNO + rt_length, + rt_index, 0); + ++ ChangeVarNodes((Node *) rule_action->groupClause, ++ PRS2_CURRENT_VARNO + rt_length, ++ rt_index, 0); ++ ChangeVarNodes((Node *) rule_action->havingQual, ++ PRS2_CURRENT_VARNO + rt_length, ++ rt_index, 0); + . + . + . +\end{verbatim} +\pagebreak +\begin{verbatim} + if (*modified && !badsql) + { + AddQual(parsetree, rule_action->qual); ++ /* This will only work if the query made to the ++ * view defined by the following groupClause ++ * groups by the same attributes or does not use ++ * groups at all! ++ */ ++ if (parsetree->groupClause == NULL) ++ parsetree->groupClause = ++ rule_action->groupClause; ++ AddHavingQual(parsetree, ++ rule_action->havingQual); ++ parsetree->hasAggs = ++ (rule_action->hasAggs || parsetree->hasAggs); ++ parsetree->hasSubLinks = ++ (rule_action->hasSubLinks || ++ parsetree->hasSubLinks); + } + } +\end{verbatim} +% +<step> {\tt QueryRewriteSubLink()} \\ +This function is called by {\tt QueryRewrite()} to process possibly +contained subqueries first. It searches for nested queries by +recursively tracing through the {\it parsetree} given as argument. The +additional statement makes sure that the {\it having clause} is also +examined. +% +\begin{verbatim} + static void + QueryRewriteSubLink(Node *node) + { + if (node == NULL) + return; + switch (nodeTag(node)) + { + case T_SubLink: + { + . + . + . + QueryRewriteSubLink((Node *) query->qual); ++ QueryRewriteSubLink((Node *) ++ query->havingQual); + . + . + . + } + . + . + . + } + return; + } +\end{verbatim} +% +\pagebreak +% +<step> {\tt QueryRewrite()} \\ +This function takes the {\it parsetree} of a query and rewrites it +using <productname>Postgres</productname>'s {\it rewrite system}. Before the query itself can be +rewritten, subqueries that are possibly part of the query have to be +processed. Therefore the function {\tt QueryRewriteSubLink()} is +called for the {\it where clause} and for the {\it having clause}. +% +\begin{verbatim} + List * + QueryRewrite(Query *parsetree) + { + QueryRewriteSubLink(parsetree->qual); ++ QueryRewriteSubLink(parsetree->havingQual); + return QueryRewriteOne(parsetree); + } +\end{verbatim} +% +\end{itemize} +% +Here we present the changes applied to the functions that are contained +in the file {\tt $\ldots$/src/backend/rewrite/rewriteManip.c}: +% +\begin{itemize} +% +<step> {\tt OffsetVarNodes()} \\ +Recursively steps through the {\it parsetree} given as the first +argument and increments the {\tt varno} and {\tt varnoold} fields of +every {\tt VAR} node found by the {\it offset} given as the second +argument. The additional statements are necessary to be able to handle +{\tt GroupClause} nodes and {\tt Sublink} nodes that may appear in the {\it +parsetree} from now on. +% +\begin{verbatim} + void + OffsetVarNodes(Node *node, int offset) + { + if (node == NULL) + return; + switch (nodeTag(node)) + { + . + . + . ++ /* This has to be done to make queries using ++ * groupclauses work on views ++ */ ++ case T_GroupClause: ++ { ++ GroupClause *group = (GroupClause *) node; ++ ++ OffsetVarNodes((Node *)(group->entry), ++ offset); ++ } ++ break; + . + . + . ++ case T_SubLink: ++ { ++ SubLink *sublink = (SubLink *) node; ++ List *tmp_oper, *tmp_lefthand; ++ +\end{verbatim} +\pagebreak +\begin{verbatim} ++ /* We also have to adapt the variables used ++ * in sublink->lefthand and sublink->oper ++ */ ++ OffsetVarNodes((Node *)(sublink->lefthand), ++ offset); ++ ++ /* Make sure the first argument of ++ * sublink->oper points to the same var as ++ * sublink->lefthand does otherwise we will ++ * run into troubles using aggregates (aggno ++ * will not be set correctly) ++ */ ++ tmp_lefthand = sublink->lefthand; ++ foreach(tmp_oper, sublink->oper) ++ { ++ lfirst(((Expr *)lfirst(tmp_oper))->args) = ++ lfirst(tmp_lefthand); ++ tmp_lefthand = lnext(tmp_lefthand); ++ } ++ } ++ break; + . + . + . + } + } +\end{verbatim} +% +<step> {\tt ChangeVarNodes()} \\ +This function is similar to the above described function {\tt +OffsetVarNodes()} but instead of incrementing the fields {\tt varno} +and {\tt varnoold} of {\it all} {\tt VAR} nodes found, it processes +only those {\tt VAR} nodes whose {\tt varno} value matches the +parameter {\tt old\_varno} given as argument and whose {\tt +varlevelsup} value matches the parameter {\tt sublevels\_up}. Whenever +such a node is found, the {\tt varno} and {\tt varnoold} fields are +set to the value given in the parameter {\tt new\_varno}. The +additional statements are necessary to be able to handle {\tt GroupClause} +and {\tt Sublink} nodes. +% +\begin{verbatim} + void + ChangeVarNodes(Node *node, int old_varno, + int new_varno, int sublevels_up) + { + if (node == NULL) + return; + switch (nodeTag(node)) + { + . + . + . ++ /* This has to be done to make queries using ++ * groupclauses work on views */ ++ case T_GroupClause: ++ { ++ GroupClause *group = (GroupClause *) node; ++ +\end{verbatim} +\pagebreak +\begin{verbatim} ++ ChangeVarNodes((Node *)(group->entry), ++ old_varno, new_varno, ++ sublevels_up); ++ } ++ break; + . + . + . + case T_Var: + { + . + . + . + /* This is a hack: Whenever an attribute + * from the "outside" query is used within + * a nested subquery, the varlevelsup will + * be >0. Nodes having varlevelsup > 0 are + * forgotten to be processed. The call to + * OffsetVarNodes() should really be done at + * another place but this hack makes sure + * that also those VAR nodes are processed. + */ ++ if (var->varlevelsup > 0) ++ OffsetVarNodes((Node *)var,3); + } + break; + . + . + . + case T_SubLink: + { + . + . + . ++ ChangeVarNodes((Node *) query->havingQual, ++ old_varno, new_varno, ++ sublevels_up); ++ ChangeVarNodes((Node *) query->targetList, ++ old_varno, new_varno, ++ sublevels_up); ++ ++ /* We also have to adapt the variables used in ++ * sublink->lefthand and sublink->oper ++ */ ++ ChangeVarNodes((Node *) (sublink->lefthand), ++ old_varno, new_varno, ++ sublevels_up); + } + break; + . + . + . + } + } +\end{verbatim} +% +<step> {\tt AddHavingQual()} \\ +This function adds the {\it operator tree} given by the parameter {\tt +havingQual} to the one attached to the field {\tt havingQual} of the +{\it parsetree} given by the parameter {\tt parsetree}. This is done +by adding a new {\tt AND} node and attaching the old and the new {\it +operator tree} as arguments to it. {\tt AddHavingQual()} has not been +existing until v6.3.2. It has been created for the {\it having logic}. +% +\begin{verbatim} + void + AddHavingQual(Query *parsetree, Node *havingQual) + { + Node *copy, *old; + + if (havingQual == NULL) + return; + + copy = havingQual; + + old = parsetree->havingQual; + if (old == NULL) + parsetree->havingQual = copy; + else + parsetree->havingQual = + (Node *) make_andclause( + makeList(parsetree->havingQual, + copy, -1)); + } +\end{verbatim} +% +<step> {\tt AddNotHavingQual()} \\ +This function is similar to the above described function {\tt +AddHavingQual()}. It also adds the {\it operator tree} given by the +parameter {\tt havingQual} but prefixes it by a {\tt NOT} node. {\tt +AddNotHavingQual()} has also not been existing until v6.3.2 and has been +created for the {\it having logic}. +% +\begin{verbatim} + void + AddNotHavingQual(Query *parsetree, + Node *havingQual) + { + Node *copy; + + if (havingQual == NULL) + return; + + copy = (Node *) make_notclause((Expr *)havingQual); + AddHavingQual(parsetree, copy); +} +\end{verbatim} +% +<step> {\tt nodeHandleViewRule()} \\ +This function is called by {\tt HandleViewRule()}. It replaces all +{\tt VAR} nodes of the {\it user query} evaluated against the {\it +view} (the fields of these {\tt VAR} nodes represent the positions of +the attributes in the {\it virtual} table) by {\tt VAR} nodes that +have already been prepared to represent the positions of the +corresponding attributes in the {\it physical} tables (given in the +{\it view definition}). The additional statements make sure that {\tt +GroupClause} nodes and {\tt Sublink} nodes are handled correctly. + +\begin{verbatim} + static void + nodeHandleViewRule(Node **nodePtr, List *rtable, + List *targetlist, int rt_index, + int *modified, int sublevels_up) + { + Node *node = *nodePtr; + if (node == NULL) + return; + switch (nodeTag(node)) + { + . + . + . ++ /* This has to be done to make queries using ++ * groupclauses work on views ++ */ ++ case T_GroupClause: ++ { ++ GroupClause *group = (GroupClause *) node; ++ nodeHandleViewRule((Node **) (&(group->entry)), ++ rtable, targetlist, rt_index, ++ modified, sublevels_up); ++ } ++ break; + . + . + . + case T_Var: + { + . + . + . + if (n == NULL) + { + *nodePtr = make_null(((Var *)node)->vartype); + } + else ++ /* This is a hack: The varlevelsup of the ++ * original variable and the new one should ++ * be the same. Normally we adapt the node ++ * by changing a pointer to point to a var ++ * contained in 'targetlist'. In the ++ * targetlist all varlevelsups are 0 so if ++ * we want to change it to the original ++ * value we have to copy the node before! ++ * (Maybe this will cause troubles with some ++ * sophisticated queries on views?) ++ */ ++ { ++ if(this_varlevelsup>0) ++ { ++ *nodePtr = copyObject(n); ++ } ++ else ++ { ++ *nodePtr = n; ++ } ++ ((Var *)*nodePtr)->varlevelsup = ++ this_varlevelsup; ++ } + *modified = TRUE; + } + break; + . + . + . + case T_SubLink: + { + . + . + . ++ nodeHandleViewRule( ++ (Node **) &(query->havingQual), ++ rtable, targetlist, rt_index, ++ modified, sublevels_up); ++ nodeHandleViewRule( ++ (Node **) &(query->targetList), ++ rtable, targetlist, rt_index, ++ modified, sublevels_up); ++ /* We also have to adapt the variables used ++ * in sublink->lefthand and sublink->oper ++ */ ++ nodeHandleViewRule( ++ (Node **) &(sublink->lefthand), ++ rtable, targetlist, rt_index, ++ modified, sublevels_up); ++ /* Make sure the first argument of ++ * sublink->oper points to the same var as ++ * sublink->lefthand does otherwise we will ++ * run into troubles using aggregates ++ * (aggno will not be set correctly!) ++ */ ++ pfree(lfirst(((Expr *) ++ lfirst(sublink->oper))->args)); ++ tmp_lefthand = sublink->lefthand; ++ foreach(tmp_oper, sublink->oper) ++ { ++ lfirst(((Expr *) lfirst(tmp_oper))->args) = ++ lfirst(tmp_lefthand); ++ tmp_lefthand = lnext(tmp_lefthand); ++ } + } + break; + . + . + . + } + } +\end{verbatim} +% +<step> {\tt HandleViewRule()} \\ +This function calls {\tt nodeHandleViewRule()} for the {\it where +clause}, the {\it targetlist}, the {\it group clause} and the {\it +having clause} of the {\it user query} evaluated against the given +{\it view}. +% +\begin{verbatim} + void + HandleViewRule(Query *parsetree, List *rtable, + List *targetlist, int rt_index, + int *modified) + { + . + . + . ++ /* The variables in the havingQual and ++ * groupClause also have to be adapted ++ */ ++ nodeHandleViewRule(&parsetree->havingQual, rtable, ++ targetlist, rt_index, ++ modified, 0); ++ nodeHandleViewRule( ++ (Node **)(&(parsetree->groupClause)), ++ rtable, targetlist, rt_index, modified, 0); + } +\end{verbatim} +% +\end{itemize} +% +The following function is contained in {\tt +$\ldots$/src/backend/commands/view.c}: + +\begin{itemize} +% +<step> {\tt UpdateRangeTableOfViewParse()} \\ +This function updates the {\it range table} of the {\it parsetree} +given by the parameter {\tt viewParse}. The additional statement makes +sure that the {\tt VAR} nodes of the {\it having clause} are modified +in the same way as the {\tt VAR} nodes of the {\it where clause} are. +% +\begin{verbatim} + static void + UpdateRangeTableOfViewParse(char *viewName, + Query *viewParse) + { + . + . + . + OffsetVarNodes(viewParse->qual, 2); + ++ OffsetVarNodes(viewParse->havingQual, 2); + . + . + . + } +\end{verbatim} +% +\end{itemize} + + +\subsubsection{Planner/Optimizer} + +The {\it planner} builds a {\it queryplan} like the one shown in +figure \ref{plan_having} and in addition to that it takes the {\it operator +tree} attached to the field {\tt havingClause} of the {\tt Query} node +and attaches is to the {\tt qpqual} field of the {\tt AGG} +node. + +Unfortunately this is not the only thing to do. Remember from section +\ref{aggregates} {\it How Aggregate Functions are Implemented} that +the {\it targetlist} is searched for {\it aggregate functions} which +are appended to a list that will be attached to the field {\tt aggs} +of the {\tt AGG} node. This was sufficient as long as {\it aggregate +functions} have only been allowed to appear within the {\it +targetlist}. Now the {\it having clause} is another source of {\it +aggregate functions}. Consider the following example: +% +\begin{verbatim} + select sno, max(pno) + from sells + group by sno + having count(pno) > 1; +\end{verbatim} +% +Here the {\it aggregate functions} {\tt max} and {\tt count} are in +use. If only the {\it targetlist} is scanned (as it was the case before the +{\it having clause} had been implemented) we will only find and process the +{\it aggregate function} {\tt max}. The second function {\tt count} is +not processed and therefore any reference to the result of {\tt count} +from within the {\it having clause} will fail. The solution to this +problem is to scan the whole {\it operator tree} representing the {\it having +clause} for {\it aggregate functions} not contained in the {\it targetlist} yet +and add them to the list of {\it aggregate functions} attached to the +field {\tt aggs} of the {\tt AGG} node. The scanning is done by the +function \mbox{{\tt check\_having\_qual\_for\_aggs()}} which steps +recursively through the tree.\\ +\\ +While scanning the {\it having clause} for {\it aggregate functions} +not contained in the {\it targetlist} yet, an additional check is made to +make sure that {\it aggregate functions} are used within +the {\it having clause} (otherwise the query could have been +formulated using the {\it where clause}). Consider the following query +which is not a valid SQL92 query: +% +\begin{verbatim} + testdb=> select sno, max(pno) + testdb-> from sells + testdb-> group by sno + testdb-> having sno > 1; + ERROR: This could have been done in a where clause!! + testdb=> +\end{verbatim} +% +There is no need to express this query using a {\it having clause}, +this kind of qualification belongs to the {\it where clause}: +% +\begin{verbatim} + select sno, max(pno) + from sells + where sno > 1 + group by sno; +\end{verbatim} +% +There is still an unsolved problem left. Consider the following query +where we want to know just the supplier numbers ({\tt sno}) of all +suppliers selling more than one part: +% +\begin{verbatim} + select sno + from sells + group by sno + having count(pno) > 1; +\end{verbatim} +% +The {\it planner} creates a {\it queryplan} (like the one shown in figure +\ref{plan_having}) where the {\it targetlists} of all nodes involved contain +only entries of those attributes listed after the {\tt select} keyword of +the query. Looking at the example above this means that the +{\it targetlists} of the {\tt AGG} node, the {\tt GRP} node the {\tt SORT} +node and the {\tt SeqScan} node contain only the entry for the +attribute {\tt sno}. As described earlier the {\it aggregation logic} +operates on attributes of the tuples returned by the subplan of +the {\tt AGG} node (i.e. the result of the {\tt GRP} node). Which +attributes are contained in the tuples handed back by a subplan is +determined by the {\it targetlist}. In the case of our example the attribute +{\tt pno} needed for the {\it aggregate function} {\tt count} is not +included and therefore the {\it aggregation} will fail. + +\pagebreak + +\noindent The solution to this problem is given in the following +steps: +\begin{itemize} +<step> Make a copy of the actual {\it targetlist} of the {\tt AGG} node. +% +<step> Search the {\it operator tree} representing the {\it having clause} +for attributes that are not contained in the {\it targetlist} of the {\tt +AGG} node yet and add them to the previously made copy. +% +<step> The extended {\it targetlist} is used to create the subplan attached to +the {\tt lefttree} field of the {\tt AGG} node. That means that the +{\it targetlists} of the {\tt GRP} node, of the {\tt SORT} node and of the +{\tt SeqScan} node will now contain an entry for the attribute {\tt +pno}. The {\it targetlist} of the {\tt AGG} node itself will not be changed +because we do not want to include the attribute {\tt pno} into the +result returned by the whole query. +% +\end{itemize} +Care has to be taken that the {\tt varattno} fields of the {\tt VAR} +nodes used in the {\it targetlists} contain the position of the +corresponding attribute in the {\it targetlist} of the subplan (i.e the +subplan delivering the tuples for further processing by the actual +node). \\ +\\ +The following part deals with the source code of the new and +changed functions involved in the planner/optimizer stage. The files +affected are: \\ +\\ +\indent {\tt $\ldots$/src/backend/optimizer/plan/setrefs.c} \\ +\indent {\tt $\ldots$/src/backend/optimizer/plan/planner.c} \\ +\\ +Since all of the functions presented here are very long and would need +very much space if presented as a whole, we just list the most +important parts. \\ +\\ +The following two functions are new and have been +introduced for the {\it having logic}. They are contained in the file +{\tt $\ldots$/src/backend/optimizer/plan/setrefs.c}: +% +\begin{itemize} +% +<step> {\tt check\_having\_qual\_for\_aggs()} \\ +This function takes the representation of a {\it having clause} given +by the parameter {\tt clause}, a {\it targetlist} given by the +parameter {\tt subplanTargetList} and a {\it group clause} given by +the parameter {\tt groupClause} as arguments and scans the +representation of the {\it having clause} recursively for {\it +aggregate functions}. If an {\it aggregate function} is found it is +attached to a list (internally called {\tt agg\_list}) and finally +returned by the function. + +Additionally the {\tt varno} field of every {\tt VAR} node found is +set to the position of the corresponding attribute in the {\it +targetlist} given by {\tt subplanTargetList}. + +If the {\it having clause} contains a subquery the function also makes +sure, that every attribute from the {\it main query} that is used +within the subquery also appears in the {\it group clause} given by +{\tt groupClause}. If the attribute cannot be found in the {\it group +clause} an error message is printed to the screen and the query +processing is aborted. +% +\begin{verbatim} + List * + check_having_qual_for_aggs(Node *clause, + List *subplanTargetList, + List *groupClause) + { + List *t, *l1; + List *agg_list = NIL; + int contained_in_group_clause = 0; + + if (IsA(clause, Var)) + { + TargetEntry *subplanVar; + + subplanVar = match_varid((Var *) clause, + subplanTargetList); + /* Change the varattno fields of the + * var node to point to the resdom->resnofields + * of the subplan (lefttree) + */ + ((Var *) clause)->varattno = + subplanVar->resdom->resno; + return NIL; + } + else + if (is_funcclause(clause) || not_clause(clause) + || or_clause(clause) || and_clause(clause)) + { + int new_length=0, old_length=0; + + /* This is a function. Recursively call this + * routine for its arguments... (i.e. for AND, + * OR, ... clauses!) + */ + foreach(t, ((Expr *) clause)->args) + { + old_length=length((List *)agg_list); + agg_list = nconc(agg_list, + check_having_qual_for_aggs(lfirst(t), + subplanTargetList, + groupClause)); + if(((new_length=length((List *)agg_list)) == + old_length) || (new_length == 0)) + { + elog(ERROR,"This could have been done + in a where clause!!"); + return NIL; + } + } + return agg_list; + } + else + if (IsA(clause, Aggreg)) + { + return lcons(clause, + check_having_qual_for_aggs( + ((Aggreg *)clause)->target, + subplanTargetList, + groupClause)); + } + else + . + . + . + } +\end{verbatim} +% +<step> {\tt check\_having\_qual\_for\_vars()} \\ +This function takes the representation of a {\it having clause} given +by the parameter {\tt clause} and the actual {\it targetlist} +given by the parameter {\tt targetlist\_so\_far} as arguments and +recursively scans the representation of the {\it having clause} for +attributes that are not included in the actual {\it targetlist} +yet. Whenever such an attribute is found it is added to the actual +{\it targetlist} which is finally returned by the function. + +Attributes contained in the {\it having clause} but not in the {\it +targetlist} show up with queries like: +% +\begin{verbatim} + select sid + from part + group by sid + having min(pid) > 1; +\end{verbatim} +% +In the above query the attribute {\tt pid} is used in the {\it having +clause} but it does not appear in the {\it targetlist} (i.e. the list +of attributes after the keyword {\tt select}). Unfortunately only +those attributes are delivered by the subplan and can therefore be +used within the {\it having clause}. To become able to handle queries +like that correctly, we have to extend the actual {\it targetlist} by +those attributes used in the {\tt having clause} but not already +appearing in the {\it targetlist}. +% +\begin{verbatim} + List * + check_having_qual_for_vars(Node *clause, + List *targetlist_so_far) + { + List *t; + + if (IsA(clause, Var)) + { + Rel tmp_rel; + + tmp_rel.targetlist = targetlist_so_far; + /* Check if the VAR is already contained in the + * targetlist + */ + if (tlist_member((Var *)clause, + (List *)targetlist_so_far) == NULL) + { + add_tl_element(&tmp_rel, (Var *)clause); + } + return tmp_rel.targetlist; + } + else + if (is_funcclause(clause) || not_clause(clause) + || or_clause(clause) || and_clause(clause)) + { + /* This is a function. Recursively call this + * routine for its arguments... + */ + foreach(t, ((Expr *) clause)->args) + { + targetlist_so_far = + check_having_qual_for_vars(lfirst(t), + targetlist_so_far); + } + return targetlist_so_far; + } + else + if (IsA(clause, Aggreg)) + { + targetlist_so_far = + check_having_qual_for_vars( + ((Aggreg *)clause)->target, + targetlist_so_far); + return targetlist_so_far; + } + . + . + . + } +\end{verbatim} +% +\end{itemize} +% +The next function is found in {\tt +$\ldots$/src/backend/optimizer/plan/planner.c}: +% +\begin{itemize} +% +<step> {\tt union\_planner()} \\ +This function creates a {\it plan} from the {\it parsetree} given to +it by the parameter {\tt parse} that can be executed by the {\it +executor}. + +If {\it aggregate functions} are present (indicated by {\tt +parse->hasAggs} set to true) the first step is to extend the {\it +targetlist} by those attributes that are used within the {\it having +clause} (if any is present) but do not appear in the {\it select list} +(Refer to the description of {\tt check\_having\_qual\_for\_vars()} +above). + +The next step is to call the function {\tt query\_planner()} creating +a {\it plan} without taking the {\it group clause}, the {\it aggregate +functions} and the {\it having clause} into account for the moment. + +Next insert a {\tt GRP} node at the top of the {\it plan} according +to the {\it group clause} of the {\it parsetree} if any is present. + +Add an {\tt AGG} node to the top of the current {\it plan} if {\it +aggregate functions} are present and if a {\it having clause} is +present additionally perform the following steps: +% +\begin{itemize} +<step> Perform various transformations to the representation of the +{\it having clause} (e.g. transform it to CNF, $\ldots$). +<step> Attach the transformed representation of the {\it having clause} to the +field {\tt plan.qual} of the just created {\tt AGG} node. +<step> Examine the whole {\it having clause} and search for {\it +aggregate functions}. This is done using the function {\tt +check\_having\_qual\_for\_aggs()} which appends every {\it aggregate +function} found to a list that is finally returned. +<step> Append the list just created to the list already attached to the +field {\tt aggs} of the {\tt AGG} node (this list contains the {\it +aggregate functions} found in the {\it targetlist}). +<step> Make sure that {\it aggregate functions} do appear in the +{\it having clause}. This is done by comparing the length of the list +attached to {\tt aggs} before and after the call to {\tt +check\_having\_qual\_for\_aggs()}. If the length has not changed, we +know that no {\it aggregate function} has been detected and that this +query could have been formulated using only a {\it where clause}. In +this case an error message is printed to the screen and the processing +is aborted. +\end{itemize} +% +\pagebreak +% +\begin{verbatim} + Plan * + union_planner(Query *parse) + { + List *tlist = parse->targetList; + ++ /* copy the original tlist, we will need the ++ * original one for the AGG node later on */ ++ List *new_tlist = new_unsorted_tlist(tlist); + . + . + . ++ if (parse->hasAggs) ++ { ++ /* extend targetlist by variables not ++ * contained already but used in the ++ * havingQual. ++ */ ++ if (parse->havingQual != NULL) ++ { ++ new_tlist = ++ check_having_qual_for_vars( ++ parse->havingQual, ++ new_tlist); ++ } ++ } + . + . + . + /* Call the planner for everything + * but groupclauses and aggregate funcs. + */ + result_plan = query_planner(parse, + parse->commandType, + new_tlist, + (List *) parse->qual); + . + . + . + /* If aggregate is present, insert the AGG node + */ + if (parse->hasAggs) + { + int old_length=0, new_length=0; + /* Create the AGG node but use 'tlist' not + * 'new_tlist' as target list because we + * don't want the additional attributes + * (only used for the havingQual, see + * above) to show up in the result. + */ + result_plan = (Plan *) make_agg(tlist, + result_plan); + . + . + . ++ /* Check every clause of the havingQual for ++ * aggregates used and append them to ++ * the list in result_plan->aggs ++ */ ++ foreach(clause, ++ ((Agg *) result_plan)->plan.qual) ++ { ++ /* Make sure there are aggregates in the ++ * havingQual if so, the list must be ++ * longer after check_having_qual_for_aggs ++ */ ++ old_length = ++ length(((Agg *) result_plan)->aggs); ++ ++ ((Agg *) result_plan)->aggs = ++ nconc(((Agg *) result_plan)->aggs, ++ check_having_qual_for_aggs( ++ (Node *) lfirst(clause), ++ ((Agg *)result_plan)-> ++ plan.lefttree->targetlist, ++ ((List *) parse->groupClause))); ++ /* Have a look at the length of the returned ++ * list. If there is no difference, no ++ * aggregates have been found and that means ++ * that the Qual belongs to the where clause ++ */ ++ if (((new_length = ++ length(((Agg *) result_plan)->aggs))== ++ old_length) || (new_length == 0)) ++ { ++ elog(ERROR,"This could have been done in a ++ where clause!!"); ++ return (Plan *)NIL; ++ } ++ } + . + . + . + } +\end{verbatim} +% +\end{itemize} + +\subsubsection{Executor} + +The {\it executor} takes the {\it queryplan} produced by the {\it +planner/optimizer} in the way just described and processes all {\it +aggregate functions} in the way described in section \ref{aggregates} +{\it The Implementation of Aggregate Functions} but before the tuple +derived is handed back the {\it operator tree} attached to the field +{\tt qpqual} is evaluated by calling the function {\tt +ExecQual()}. This function recursively steps through the {\it operator +tree} (i.e. the {\it having clause}) and evaluates the predicates +appearing there. Thanks to our changes that have been made to the {\it +planner} the values of all operands needed to evaluate the predicates +(e.g. the values of all {\it aggregate functions}) are already present +and can be accessed throughout the evaluation without any problems. + +If the evaluation of the {\it having qualification} returns {\tt true} +the tuple is returned by the function {\tt execAgg()} otherwise it is +ignored and the next group is processed. \\ +\\ +The necessary changes and enhancements have been applied to the +following function in the file {\tt +$\ldots$/src/backend/executor/nodeAgg.c}: +% +\begin{itemize} +% +<step> {\tt execAgg()} +Whenever the {\it executor} gets to an {\tt AGG} node this function is +called. Before the {\it having logic} had been implemented, all the +{\it tuples} of the current group were fetched from the {\it +subplan} and all {\it aggregate functions} were applied to these +tuples. After that, the results were handed back to the calling +function. + +Since the {\it having logic} has been implemented there is one +additional step executed. Before the results of applying the {\it +aggregate functions} are handed back, the function {\tt ExecQual()} is +called with the representation of the {\it having clause} as an +argument. If {\tt true} is returned, the results are handed back, +otherwise they are ignored and we start from the beginning for the +next group until a group meeting the restrictions given in the {\it +having clause} is found. +% +\begin{verbatim} + TupleTableSlot * + ExecAgg(Agg *node) + { + . + . + . + /* We loop retrieving groups until we find one + * matching node->plan.qual + */ ++ do ++ { + . + . + . + /* Apply *all* aggregate function to the + * tuples of the *current* group + */ + . + . + . + econtext->ecxt_scantuple = + aggstate->csstate.css_ScanTupleSlot; + resultSlot = ExecProject(projInfo, &isDone); + ++ /* As long as the retrieved group does not ++ * match the qualifications it is ignored and ++ * the next group is fetched ++ */ ++ if(node->plan.qual != NULL) ++ { ++ qual_result = ++ ExecQual(fix_opids(node->plan.qual), ++ econtext); ++ } ++ if (oneTuple) pfree(oneTuple); ++ } ++ while((node->plan.qual!=NULL) && ++ (qual_result!=true)); + return resultSlot; + } +\end{verbatim} +% +\end{itemize} + +\section{The Realization of Union, Intersect and Except} +\label{union_impl} + +SQL92 supports the well known set theoretic operations {\it union}, +{\it intersect} and {\it set difference} (the {\it set difference} is +called {\it except} in SQL92). The operators are used to connect two +or more {\tt select} statements. Every {\tt select} statement returns +a set of tuples and the operators between the {\tt select} statements +tell how to merge the returned sets of tuples into one result +relation. +% +\begin{example} +\label{simple_set_ops} +Let the following tables be given: +% +\begin{verbatim} + +XXX All these table examples have combinations of "-" and "+" which +causes problems inside an SGML comment. Mess them up to keep this +from happening for now. - thomas 1999-02-10 + + A C1|C2|C3 B C1|C2|C3 + MMPMMPMM MMPMMPMM + 1| a| b 1| a| b + 2| a| b 5| a| b + 3| c| d 3| c| d + 4| e| f 8| e| f + + C C1|C2|C3 + MMPMMPMM + 4| e| f + 8| e| f +\end{verbatim} +Now let's have a look at the results of the following queries: +\begin{verbatim} + select * from A + union + select * from B; +\end{verbatim} +derives the set theoretic {\it union} of the two tables: +% +\begin{verbatim} + C1|C2|C3 + MMPMMPMM + 1| a| b + 2| a| b + 3| c| d + 4| e| f + 5| a| b + 8| e| f +\end{verbatim} +% +The {\tt select} statements used may be more complex: +% +\begin{verbatim} + select C1, C3 from A + where C2 = 'a' + union + select C1, C2 from B + where C3 = 'b'; +\end{verbatim} +% + +\noindent will return the following table: +% +\begin{verbatim} + C1|C3 + MMPMM + 1| b + 2| b + 1| a + 5| a +\end{verbatim} +% +Note that the selected columns do not need to have identical names, +they only have to be of the same type. In the previous example we +selected for {\tt C1} and {\tt C3} in the first {\tt select} statement +and for {\tt C1} and {\tt C2} in the second one. The names of the +resulting columns are taken from the first {\tt select} statement. \\ +\\ +Let's have a look at a query using {\tt intersect}: +% +\begin{verbatim} + select * from A + intersect + select * from B; +\end{verbatim} +% +will return: +\begin{verbatim} + C1|C2|C3 + MMPMMPMM + 1| a| b + 3| c| d +\end{verbatim} +% +Here is an example using {\tt except}: +\begin{verbatim} + select * from A + except + select * from B; +\end{verbatim} +% +will return: +\begin{verbatim} + C1|C2|C3 + MMPMMPMM + 2| a| b + 4| e| f +\end{verbatim} +% +The last examples were rather simple because they only used one set +operator at a time with only two operands. Now we look at some more +complex queries involving more {\it operators}: +% +\begin{verbatim} + select * from A + union + select * from B + intersect + select * from C; +\end{verbatim} +% +will return: +% +\begin{verbatim} + C1|C2|C3 + MMPMMPMM + 4| e| f + 8| e| f +\end{verbatim} +% +The above query performs the set theoretic computation $(A \cup B) +\cap C$. When no parentheses are used, the operations are considered to +be left associative, i.e. $A \cup B \cup C \cup D$ will be treated as +$((A \cup B) \cup C) \cup D$. \\ +\\ +The same query using parenthesis can lead to a completely different +result: +% +\begin{verbatim} + select * from A + union + (select * from B + intersect + select * from C); +\end{verbatim} +% +performs $A \cup (B \cap C)$ and will return: +% +\begin{verbatim} + C1|C2|C3 + MMPMMPMM + 1| a| b + 2| a| b + 3| c| d + 4| e| f + 8| e| f +\end{verbatim} +% +\end{example} +% +\subsection{How Unions have been Realized Until Version 6.3.2} + +First we give a description of the implementation of {\it union} and +{\it union all} until version 6.3.2 because we need it to understand +the implementation of {\it intersect} and {\it except} described later. \\ +\\ +A {\it union} query is passed through the usual stages: +% +\begin{itemize} +<step> parser +<step> rewrite system +<step> planner/optimizer +<step> executor +\end{itemize} +% +and we will now describe what every single stage does to the query. +For our explanation we assume to process a simple query (i.e. a query +without {\it subselects}, {\it aggregates} and without involving {\it views}) + +\subsubsection{The Parser Stage} + +As described earlier the {\it parser stage} can be divided into two parts: +% +\begin{itemize} +<step> the {\it parser} built up by the grammar rules given in {\tt gram.y} +and +<step> the {\it transformation routines} performing a lot of +changes and analysis to the tree built up by the parser. Most of these +routines reside in {\tt analyze.c}. +\end{itemize} +% +%\pagebreak +% +A {\it union} statement consists of two or more {\it select} +statements connected by the keyword {\tt union} as the following +example shows: +% +\begin{verbatim} + select * from A + where C1=1 + union + select * from B + where C2 = 'a' + union + select * from C + where C3 = 'f' +\end{verbatim} +% +The above {\it union} statement consists of three {\it select} +statements connected by the keyword {\tt union}. We will refer to the +first {\it select} statement by A, to the second one by B and to the + third one by C for our further explanation (in the new notation our +query looks like this: \mbox{{\tt A union B union C}}).\\ +\\ +The {\it parser} (given by {\tt gram.y}) processes all three {\tt select} +statements, creates a {\tt SelectStmt} node for every {\tt select} and +attaches the {\tt where} qualifications, {\it targetlists} etc. to the +corresponding nodes. Then it creates a list of the second and the +third {\tt SelectStmt} node (of B and C) and attaches it to the field +{\tt unionClause} of the first node (of A). Finally it hands back the +first node (node A) with the list of the remaining nodes attached as +shown in figure \ref{parser_union_back}. +% +\begin{figure} +\begin{center} +\epsfig{figure=figures/parser_union_back.ps} +\caption{Data structure handed back by the {\it parser}} +\label{parser_union_back} +\end{center} +\end{figure} +\\ +\\ +The following {\it transformation routines} process the data structure +handed back by the {\it parser}. First the top node (node A) is transformed +from a {\tt SelectStmt} node to a {\tt Query} node. The {\it +targetlist}, the {\tt where} qualification etc. attached to it are +transformed as well. Next the list of the remaining nodes (attached to +{\tt unionClause} of node A) is transformed and in this step also a +check is made if the types and lengths of the {\it targetlists} of +the involved nodes are equal. The new {\tt Query} nodes are now handed +back in the same way as the {\tt SelectStmt} nodes were before +(i.e. the {\tt Query} nodes B and C are collected in a list which is +attached to {\tt unionClause} of {\tt Query} node A). + +\subsubsection{The Rewrite System} + +If any {\it rewrite rules} are present for the {\tt Query} nodes +(i.e. one of the {\it select} statements uses a {\it view}) the +necessary changes to the {\tt Query} nodes are performed (see section +\ref{view_impl} {\it Techniques To Implement Views}). Otherwise no +changes are made to the nodes in this stage. + +\subsubsection{Planner/Optimizer} + +This stage has to create a {\it plan} out of the {\it querytree} +produced by the {\it parser stage} that can be executed by the {\it +executor}. In most cases there are several ways (paths) with different +cost to get to the same result. It's the {\it planner/optimizer's} +task to find out which path is the cheapest and to create a {\it plan} +using this path. The implementation of {\it unions} in <productname>Postgres</productname> is +based on the following idea: \\ +\\ +The set derived by evaluating $A \cup B$ must contain every member of +$A$ {\bf and} every member of $B$. So if we append the members of $B$ +to the members of $A$ we are almost done. If there exist members +common to $A$ and $B$ these members are now contained twice in our new +set, so the only thing left to do is to remove these duplicates. + +\pagebreak + +\noindent In the case of our example the {\it planner} would build up the {\it +tree} shown in figure \ref{union_plan}. Every {\tt Query} node is +planned separately and results in a {\tt SeqScan} node in our +example. The three {\tt SeqScan} nodes are put together into a list +which is attached to {\tt unionplans} of an {\tt Append} node. +% +\begin{figure}[ht] +\begin{center} +\epsfig{figure=figures/union_plan.ps} +\caption{{\it Plan} for a union query} +\label{union_plan} +\end{center} +\end{figure} + +\subsubsection{Executor} + +The {\it executor} will process all the {\tt SeqScan} nodes and append all +the delivered tuples to a single {\it result relation}. Now it is +possible that duplicate tuples are contained in the {\it result relation} +which have to be removed. The removal is done by the {\tt Unique} node +and the sort is just performed to make its work easier. + +\subsection{How Intersect, Except and Union Work Together} + +The last section showed that every stage ({\it parser stage}, {\it +planner/optimizer}, {\it executor}) of <productname>Postgres</productname> has to provide +features in order to support {\it union} statements. For the +implementation of {\it intersect} and {\it except} statements (and +statements involving all {\it set operators}) we choose a different approach +based on {\it query rewriting}. \\ +\\ +The idea is based on the fact that {\it intersect} and {\it except} +statements are redundant in SQL, i.e. for every {\it intersect} or +{\it except} statement it is possible to formulate a semantically +equivalent statement without using {\it intersect} or {\it except}. +% +\begin{example} +\label{transform_intersect} +This example shows how a query using {\it intersect} can be +transformed to a semantically equivalent query without an {\it +intersect}: +% +\pagebreak +% +\begin{verbatim} + select C1, C3 from A + where C1 = 1 + intersect + select C1, C2 from B + where C2 = 'c'; +\end{verbatim} +% +is equivalent to: +% +\begin{verbatim} + select C1, C3 + from A + where C1 = 1 and + (C1, C3) in (select C1, C2 + from B + where C2 = 'c'); +\end{verbatim} +% +This example shows how an {\it except} query can be transformed to an +{\it except}-less form: +% +\begin{verbatim} + select C1, C2 from A + where C2 = 'c' + except + select C1, C3 from B + where C3 = 'f'; +\end{verbatim} +% +is equivalent to: +% +\begin{verbatim} + select C1, C2 + from A + where C2 = 'c' and + (C1, C2) not in (select C1, C3 + from B + where C3 = 'f'); +\end{verbatim} +% +\end{example} +% +The transformations used in example \ref{transform_intersect} are +always valid because they just implement the set theoretic definition +of {\it intersect} and {\it except}: +% +\begin{definition} +\label{def_intersect} +~ \\ +The {\it intersection} of two sets $A$ and $B$ is +defined as: +\begin{displaymath} +(A \cap B) := \{x \mid x \in A \wedge x \in B \} +\end{displaymath} +% +The {\it intersection} of $n$ sets $A_{1}, \ldots, A_{n}$ is defined as: +% +\begin{displaymath} +\bigcap_{i=1}^{n} A_{i} := \{x \mid \bigwedge_{i=1}^{n} x \in A_{i} \} +\end{displaymath} +% +\end{definition} +% +\begin{definition} +\label{def_except} +~ \\ +The {\it difference} of two sets $A$ and $B$ is +defined as: +\begin{displaymath} +(A \backslash B) := \{x \mid x \in A \wedge x \not\in B \} +\end{displaymath} +% +\end{definition} +% +\begin{definition} +\label{def_union} +~\\ +The {\it union} of two sets $A$ and $B$ is +defined as: +\begin{displaymath} +(A \cup B) := \{x \mid x \in A \vee x \in B \} +\end{displaymath} +% +The {\it union} of $n$ sets $A_{1}, \ldots, A_{n}$ is defined as: +% +\begin{displaymath} +\bigcup_{i=1}^{n} A_{i} := \{x \mid \bigvee_{i=1}^{n} x \in A_{i} \} +\end{displaymath} +% +\end{definition} + +\begin{definition} Disjunctive Normal Form (DNF) \\ +Let $F = C_{1} \vee \ldots \vee C_{n}$ be given where every $C_{i}$ is +of the form $(L_{i}^{1} \wedge \ldots \wedge L_{i}^{k_{i}})$ and +$L_{i}^{j}$ is a propositional variable or the negation of a +propositional variable. Now we say $F$ is in DNF. +\end{definition} +% +\begin{example} In the following example the $L_{i}^{j}$ are of the form +$x \in X$ or $\neg (x \in X)$: \\ +\indent $((x \in A \wedge \neg (x \in B) \wedge x \in C) \vee (x \in D +\wedge x \in E))$ is a formula in DNF \\ +\indent $((x \in A \vee x \in B) \wedge (x \in C \vee \neg (x \in D)))$ +is not in DNF. +\end{example} +% +The transformation of any formula in propositional logic into DNF is done by +successively applying the following rules: \\ +\\ +\indent $(R1)~~\neg (F_{1} \vee F_{2}) \Rightarrow (\neg F_{1} \wedge \neg +F_{2})$ \\ +\indent $(R2)~~\neg (F_{1} \wedge F_{2}) \Rightarrow (\neg F_{1} \vee \neg +F_{2})$ \\ +\indent $(R3)~~F_{1} \wedge (F_{2} \vee F_{3}) \Rightarrow (F_{1} \wedge +F_{2}) \vee (F_{1} \wedge F_{3})$ \\ +\indent $(R4)~~(F_{1} \vee F_{2}) \wedge F_{3} \Rightarrow (F_{1} \wedge +F_{3}) \vee (F_{2} \wedge F_{3})$ \\ +\\ +It can be shown that the transformation using the rules (R1) to +(R4) always terminates after a finite number of steps. + +\subsubsection{Set Operations as Propositional Logic Formulas} + +Using the definitions from above we can treat formulas +involving set theoretic operations as formulas of {\it propositional +logic}. As we will see later these formulas can easily be used in the +{\tt where-} and {\tt having} qualifications of the {\tt select} +statements involved in the query. +% +\begin{example} +Here are some examples: \\ \\ +% +\indent $((A \cup B) \cap C) := \{x \mid (x \in A \vee x \in B) \wedge x \in C +\}$ \\ +\indent $((A \cup B) \cap (C \cup D)) := \{x \mid (x \in A \vee x \in B) +\wedge (x \in C \vee x \in D) \}$ \\ +\indent $((A \cap B) \backslash C) := \{x \mid (x \in A \wedge x \in +B) \wedge x \not\in C \}$ \\ +\indent $(A \backslash (B \cup C)) := \{x \mid x \in A \wedge \neg (x +\in B \vee x \in C) \}$ \\ +\indent $(((A \cap B) \cup (C \backslash D)) \cap E) := \{((x \in A +\wedge x \in B) \vee (x \in C \wedge x \not\in D)) \wedge x \in E \}$ +% +\end{example} +% +\subsection{Implementing Intersect and Except Using the +Union Capabilities} +% +We want to be able to use queries involving more than just one type of +set operation (e.g. only {\it union} or only {\it intersect}) at a +time, so we have to look for a solution that supports correct handling +of queries like that. As described above there is a solution for pure +{\it union} statements implemented already, so we have to develop an +approach that makes use of these {\it union} capabilities. \\ +\\ +As figure \ref{parser_union_back} illustrates, the operands of a {\it +union} operation are just {\tt Query} nodes (the first operand is the +top node and all further operands form a list which is attached to the +field {\tt unionClause} of the top node). So our goal will be to +transform every query involving set operations into this form. (Note +that the operands to the {\it union} operation may be complex, +i.e. {\it subselects}, {\it grouping}, {\it aggregates} etc. are allowed.) + +The transformation of a query involving set operations in any order +into a query that can be accepted by the {\it union} logic is +equivalent to transforming the {\it membership formula} (see +definitions \ref{def_intersect}, \ref{def_except} and \ref{def_union}) +in propositional logic into {\it disjunctive normal form} (DNF). The +transformation of any formula in propositional logic into DNF is +always possible in a finite number of steps. + +\noindent The advantage of this {\it transformation technique} is the little +impact on the whole system and the implicit invocation of the {\it +planner/optimizer}. The only changes necessary are made to the {\it +parser stage} and the {\it rewrite system}. \\ +\\ +Here are some changes that had to be applied to the source code before +the {\it parser stage} and the {\it rewrite system} could be adapted: +% +\begin{itemize} +<step> Add the additional field {\tt intersectClause} to the data +structures {\tt Query} and {\tt InsertStmt} defined in the file {\tt +$\ldots$/src/include/nodes/parsenodes.h}: +% +\begin{verbatim} + typedef struct Query + { + NodeTag type; + CmdType commandType; + . + . + . + Node *havingQual; ++ List *intersectClause; + List *unionClause; + List *base_relation_list_; + List *join_relation_list_; + } Query; + + typedef struct InsertStmt + { + NodeTag type; + . + . + . + bool unionall; ++ List *intersectClause; + } InsertStmt; +\end{verbatim} +% +<step> Add the new keywords {\tt EXCEPT} and {\tt INTERSECT} to the +file {\tt $\ldots$/src/backend/parser/keywords.c}: +% +\begin{verbatim} + static ScanKeyword ScanKeywords[] = { + {"abort", ABORT_TRANS}, + {"action", ACTION}, + . + . + . + {"end", END_TRANS}, ++ {"except", EXCEPT}, + . + . + . + {"instead", INSTEAD}, ++ {"intersect", INTERSECT}, + . + . + . + }; +\end{verbatim} +% +<step> <productname>Postgres</productname> contains functions to convert the internal +representation of a {\it parsetree} or {\it plantree} into an ASCII +representation (that can easily be printed to the screen (for +debugging purposes) or be stored in a file) and vice versa. These +functions have to be adapted to be able to deal with {\it intersects} +and {\it excepts}. These functions can be found in the files {\tt +$\ldots$/src/backend/nodes/outfuncs.c} and {\tt +$\ldots$/src/backend/nodes/readfuncs.c}: +% +\begin{verbatim} + static void + _outQuery(StringInfo str, Query *node) + { + . + . + . + appendStringInfo(str, " :unionClause "); + _outNode(str, node->unionClause); ++ appendStringInfo(str, " :intersectClause "); ++ _outNode(str, node->intersectClause); + } + + static Query * + _readQuery() + { + . + . + . + token = lsptok(NULL, &length); + local_node->unionClause = nodeRead(true); ++ token = lsptok(NULL, &length); ++ local_node->intersectClause = nodeRead(true); + + return (local_node); + } +\end{verbatim} +% +<step> The function {\tt ExecReScan()} is called whenever a new +execution of a given {\it plan} has to be started (i.e. whenever we +have to start from the beginning with the first tuple again). The call +to this function happens implicitly. For the special kind of +subqueries we are using for the rewritten queries (see example +\ref{transform_intersect}) we have to take that also +{\tt Group} nodes are processed. The function can be found in the file +{\tt $ldots$/backend/executor/execAmi.c}. +% +\begin{verbatim} + void + ExecReScan(Plan *node, ExprContext *exprCtxt, + Plan *parent) + { + . + . + . + switch (nodeTag(node)) + { + . + . + . +\end{verbatim} +\pagebreak +\begin{verbatim} ++ case T_Group: ++ ExecReScanGroup((Group *) node, ++ exprCtxt, parent); ++ break; + . + . + . + } + } +\end{verbatim} +% +<step> The function {\tt ExecReScanGroup()} is called by {\tt +ExecReScan()} described above whenever a {\tt Group} node is detected +and can be found in the file {\tt +$\ldots$/src/backend/executor/nodeGroup.c} . It has been created for +the {\it intersect} and {\it except} logic although it is actually +needed by the special kind of subselect (see above). +% +\begin{verbatim} + void + ExecReScanGroup(Group *node, ExprContext *exprCtxt, + Plan *parent) + { + GroupState *grpstate = node->grpstate; + + grpstate->grp_useFirstTuple = FALSE; + grpstate->grp_done = FALSE; + grpstate->grp_firstTuple = (HeapTupleData *)NIL; + + /* + * if chgParam of subnode is not null then plan + * will be re-scanned by first ExecProcNode. + */ + if (((Plan *) node)->lefttree->chgParam == NULL) + ExecReScan(((Plan *) node)->lefttree, + exprCtxt, (Plan *) node); + } +\end{verbatim} +% +\end{itemize} + +\subsubsection{Parser} + +The {\it parser} defined in the file {\tt +$\ldots$/src/backend/parser/gram.y} had to be modified in two ways: +% +\begin{itemize} +% +<step> The grammar had to be adapted to support the usage of +parenthesis (to be able to specify the order of execution of the set +operators). +% +<step> The code building up the data structures handed back by the +{\it parser} had to be inserted. +% +\end{itemize} +% +Here is a part of the grammar which is responsible for {\tt select} +statements having the code building up the data structures inserted: +% +\pagebreak +% +\begin{verbatim} + SelectStmt : select_w_o_sort sort_clause + { + . + . + . + /* $1 holds the tree built up by the + * rule 'select_w_o_sort' + */ + Node *op = (Node *) $1; + + if IsA($1, SelectStmt) + { + SelectStmt *n = (SelectStmt *)$1; + n->sortClause = $2; + $$ = (Node *)n; + } + else + { + /* Create a "flat list" of the operator + * tree built up by 'select_w_o_sort' and + * let select_list point to it + */ + create_select_list((Node *)op, + &select_list, + &unionall_present); + /* Replace all the A_Expr nodes in the + * operator tree by Expr nodes. + */ + op = A_Expr_to_Expr(op, &intersect_present); + . + . + . + /* Get the leftmost SelectStmt node (which + * automatically represents the first Select + * Statement of the query!) */ + first_select = + (SelectStmt *)lfirst(select_list); + /* Attach the list of all SelectStmt nodes + * to unionClause + */ + first_select->unionClause = select_list; + + /* Attach the whole operator tree to + * intersectClause */ + first_select->intersectClause = + (List *) op; + /* finally attach the sort clause */ + first_select->sortClause = $2; + + /* Now hand it back! */ + $$ = (Node *)first_select; + } + } + ; +\end{verbatim} +\pagebreak +\begin{verbatim} + select_w_o_sort : '(' select_w_o_sort ')' + { + $$ = $2; + } + | SubSelect + { + $$ = $1; + } + | select_w_o_sort EXCEPT select_w_o_sort + { + $$ = (Node *)makeA_Expr(AND,NULL,$1, + makeA_Expr(NOT,NULL,NULL,$3)); + } + | select_w_o_sort UNION opt_union select_w_o_sort + { + if (IsA($4, SelectStmt)) + { + SelectStmt *n = (SelectStmt *)$4; + n->unionall = $3; + } + $$ = (Node *)makeA_Expr(OR,NULL,$1,$4); + } + | select_w_o_sort INTERSECT select_w_o_sort + { + $$ = (Node *)makeA_Expr(AND,NULL,$1,$3); + } + ; +\end{verbatim} + +% \pagebreak + +\begin{verbatim} + SubSelect : SELECT opt_unique res_target_list2 + result from_clause where_clause + group_clause having_clause + { + SelectStmt *n = makeNode(SelectStmt); + n->unique = $2; + . + . + . + n->havingClause = $8; + $$ = (Node *)n; + } + ; +\end{verbatim} +% +The keywords {\tt SELECT}, {\tt EXCEPT}, {\tt UNION}, {\tt INTERSECT}, {\tt +'('} and {\tt ')'} are {\it terminal symbols} and {\tt SelectStmt}, +{\tt select\_w\_o\_sort}, {\tt sort\_clause}, {\tt opt\_union}, {\tt +SubSelect}, {\tt opt\_unique}, {\tt res\_target\_list2}, {\tt result}, +{\tt from\_clause}, {\tt where\_clause}, {\tt group\_clause}, {\tt +having\_clause} are {\it nonterminal symbols}. The symbols {\tt +EXCEPT}, {\tt UNION} and {\tt INTERSECT} are {\it left +associative} meaning that a statement like: +% +\begin{verbatim} + select * from A + union + select * from B + union + select * from C; +\end{verbatim} +% +\pagebreak +will be treated as: +% +\begin{verbatim} + ((select * from A + union + select * from B) + union + select * from C) +\end{verbatim} +% +The {\tt select\_w\_o\_sort} rule builds up an {\it operator tree} +using nodes of type {\tt A\_Expr}. For every {\it union} an {\tt OR} +node is created, for every {\it intersect} an {\tt AND} node and for +every {\it except} and {\tt AND NOT} node building up a representation +of a formula in propositional logic. If the query parsed did not +contain any set operations the rule hands back a {\tt SelectStmt} node +representing the query otherwise the top node of the {\it operator +tree} is returned. Figure +\ref{union_intersect_select_w_o_sort} shows a typical {\it operator tree} +returned by the {\tt select\_w\_o\_sort} rule. + +\begin{figure}[ht] +\begin{flushright} +\epsfig{figure=figures/union_intersect_select_w_o_sort.ps} +\caption{{\it Operator tree} for $(A \cup B) \backslash (C \cap D)$} +\label{union_intersect_select_w_o_sort} +\end{flushright} +\end{figure} + +\noindent The rule {\tt SelectStmt} transforms the {\it operator tree} built of +{\tt A\_Expr} nodes into an {\it operator tree} using {\tt Expr} nodes +by a call to the function {\tt A\_Expr\_to\_Expr()} which additionally +replaces every {\tt OR} node by an {\tt AND} node and vice +versa. This is performed in order to be able to use the function {\tt +cnfify()} later on. \\ +\\ +The {\it transformations} following the {\it parser} expect a {\tt +SelectStmt} node to be returned by the rule {\tt SelectStmt} and not +an {\it operator tree}. So if the rule {\tt select\_w\_o\_sort} hands +back such a node (meaning that the query did not contain any set +operations) we just have to attach the data structure built up by the +{\tt sort\_clause} rule and are finished, but when we get an {\it +operator tree} we have to perform the following steps: +% +\begin{itemize} +% +<step> Create a flat list of all {\tt SelectStmt} nodes of the {\it +operator tree} (by a call to the function {\tt create\_select\_list()}) +and attach the list to the field +\mbox{{\tt unionClause}} of the leftmost {\tt SelectStmt} (see next step). +% +<step> Find the leftmost leaf ({\tt SelectStmt} node) of the {\it operator +tree} (this is automatically the first member of the above created +list because of the technique {\tt create\_select\_list()} uses to +create the list). +% +<step> Attach the whole {\it operator tree} (including the leftmost node +itself) to the field \mbox{{\tt intersectClause}} of the leftmost {\tt +SelectStmt} node. +% +<step> Attach the data structure built up by the {\tt sort\_clause} +rule to the field {\tt sortClause} of the leftmost {\tt SelectStmt} node. +% +<step> Hand back the leftmost {\tt SelectStmt} node (with +the {\it operator tree}, the list of all {\tt SelectStmt} nodes and the {\tt +sortClause} attached to it). +% +\end{itemize} +% +\noindent Figure \ref{union_intersect_select} shows the final data structure +derived from the {\it operator tree} shown in figure +\ref{union_intersect_select_w_o_sort} handed back by the {\tt SelectStmt} rule: +% +\begin{figure}[ht] +\begin{flushright} +\epsfig{figure=figures/union_intersect_select.ps} +\caption{Data structure handed back by {\tt SelectStmt} rule} +\label{union_intersect_select} +\end{flushright} +\end{figure} + +\pagebreak + +\noindent Here is a description of the above used functions. They can +be found in the file {\tt $\ldots$/src/backend/parser/anlayze.c}. +% +\begin{itemize} +% +<step> {\tt create\_select\_list()}: \\ +This function steps through the {\it tree} handed to it by the +parameter {\tt ptr} and creates a list of all {\tt SelectStmt} nodes +found. The list is handed back by the parameter {\tt +select\_list}. The function uses a recursive {\it depth first search} +algorithm to examine the {\it tree} leading to the fact that the +leftmost {\tt SelectStmt} node will appear first in the created list. +% +\begin{verbatim} + void + create_select_list(Node *ptr, List **select_list, + bool *unionall_present) + { + if(IsA(ptr, SelectStmt)) + { + *select_list = lappend(*select_list, ptr); + if(((SelectStmt *)ptr)->unionall == TRUE) + *unionall_present = TRUE; + return; + } + + /* Recursively call for all arguments. + * A NOT expr has no lexpr! + */ + if (((A_Expr *)ptr)->lexpr != NULL) + create_select_list(((A_Expr *)ptr)->lexpr, + select_list, unionall_present); + create_select_list(((A_Expr *)ptr)->rexpr, + select_list, unionall_present); +} +\end{verbatim} +% +<step> {\tt A\_Expr\_to\_Expr()}: \\ +This function recursively steps through the {\it operator tree} handed +to it by the parameter {\tt ptr} and replaces {\tt A\_Expr} nodes by +{\tt Expr} nodes. Additionally it exchanges {\tt AND} nodes with {\tt +OR} nodes and vice versa. The reason for this exchange is easy to +understand. We implement {\it intersect} and {\it except} clauses by +rewriting these queries to semantically equivalent queries that use +{\tt IN} and {\tt NOT IN} subselects. To be able to use all three +operations ({\it unions}, {\it intersects} and {\it excepts}) in one +complex query, we have to translate the queries into {\it Disjunctive +Normal Form} (DNF). Unfortunately there is no function {\tt dnfify()}, +but there is a function {\tt cnfify()} which produces DNF when we +exchange {\tt AND} nodes with {\tt OR} nodes and vice versa before +calling {\tt cnfify()} and exchange them again in the result. +% +\begin{verbatim} + Node * + A_Expr_to_Expr(Node *ptr, + bool *intersect_present) + { + Node *result; + + switch(nodeTag(ptr)) + { + case T_A_Expr: + { + A_Expr *a = (A_Expr *)ptr; + + switch (a->oper) + { + case AND: + { + Expr *expr = makeNode(Expr); + Node *lexpr = + A_Expr_to_Expr(((A_Expr *)ptr)->lexpr, + intersect_present); + Node *rexpr = + A_Expr_to_Expr(((A_Expr *)ptr)->rexpr, + intersect_present); + + *intersect_present = TRUE; + + expr->typeOid = BOOLOID; + expr->opType = OR_EXPR; + expr->args = makeList(lexpr, rexpr, -1); + result = (Node *) expr; + break; + } + case OR: + { + Expr *expr = makeNode(Expr); + Node *lexpr = + A_Expr_to_Expr(((A_Expr *)ptr)->lexpr, + intersect_present); + Node *rexpr = + A_Expr_to_Expr(((A_Expr *)ptr)->rexpr, + intersect_present); + + expr->typeOid = BOOLOID; + expr->opType = AND_EXPR; + expr->args = makeList(lexpr, rexpr, -1); + result = (Node *) expr; + break; + } + case NOT: + { + Expr *expr = makeNode(Expr); + Node *rexpr = + A_Expr_to_Expr(((A_Expr *)ptr)->rexpr, + intersect_present); + + expr->typeOid = BOOLOID; + expr->opType = NOT_EXPR; + expr->args = makeList(rexpr, -1); + result = (Node *) expr; + break; + } + } + break; + } + default: + { + result = ptr; + } + } + } + return result; +} +\end{verbatim} +% +\end{itemize} + +\noindent Note that the {\tt stmtmulti} and {\tt OptStmtMulti} rules had to be +changed in order to avoid {\it shift/reduce} conflicts. The old rules +allowed an invalid syntax (e.g. the concatenation of two statements +without a ';' inbetween) which will be prevented now. The new rules +have the second line commented out as shown below: +% +\begin{verbatim} + stmtmulti : stmtmulti stmt ';' + /* | stmtmulti stmt */ + | stmt ';' + ; + + OptStmtMulti : OptStmtMulti OptimizableStmt ';' + /* | OptStmtMulti OptimizableStmt */ + | OptimizableStmt ';' + ; +\end{verbatim} +% + + +\subsubsection{Transformations} + +This step normally transforms every {\tt SelectStmt} node found into a +{\tt Query} node and does a lot of transformations to the {\it targetlist}, +the {\tt where} qualification etc. As mentioned above this stage +expects a {\tt SelectStmt} node and cannot handle an {\tt A\_Expr} +node. That's why we did the changes to the {\it operator tree} shown in +figure \ref{union_intersect_select}. \\ +\\ +In this stage only very few changes have been necessary: +% +\begin{itemize} +<step> The transformation of the list attached to {\tt unionClause} is +prevented. The raw list is now passed through instead and the necessary +transformations are performed at a later point in time. +<step> The additionally introduced field {\tt intersectClause} is also +passed untouched through this stage. +\end{itemize} + +\noindent The changes described in the above paragraph have been +applied to the functions {\tt transformInsertStmt()} and {\tt +transformSelectStmt()} which are contained in the file {\tt +$\ldots$/src/backend/parser/analyze.c}: +% +\begin{itemize} +% +<step> {\tt transformInsertStmt()}: +% +\begin{verbatim} + static Query * + transformInsertStmt(ParseState *pstate, + InsertStmt *stmt) + { + . + . + . +\end{verbatim} +\pagebreak +\begin{verbatim} + /* Just pass through the unionClause and + * intersectClause. We will process it in + * the function Except_Intersect_Rewrite() + */ + qry->unionClause = stmt->unionClause; + qry->intersectClause = stmt->intersectClause; + . + . + . + + return (Query *) qry; + } +\end{verbatim} +% +<step> {\tt transformSelectStmt()}: +% +\begin{verbatim} + static Query * + transformSelectStmt(ParseState *pstate, + SelectStmt *stmt) + { + . + . + . + /* Just pass through the unionClause and + * intersectClause. We will process it in + * the function Except_Intersect_Rewrite() + */ + qry->unionClause = stmt->unionClause; + qry->intersectClause = stmt->intersectClause; + . + . + . + return (Query *) qry; + } +\end{verbatim} +% +\end{itemize} + +\subsubsection{The Rewrite System} + +In this stage the information contained in the {\it operator tree} attached +to the topmost {\tt SelectStmt} node is used to form a tree of {\tt +Query} nodes representing the rewritten query (i.e. the semantically +equivalent query that contains only {\it union} but no {\it intersect} +or {\it except} operations). \\ +\\ +The following steps have to be performed: +% +\begin{itemize} +<step> Save the {\tt sortClause}, {\tt uniqueFlag}, {\tt targetList} +fields etc. of the topmost {\tt Query} node because the topmost node +may change during the rewrite process (remember (only) the topmost +{\tt SelectStmt} node has already been transformed to a {\tt Query} +node). +% +<step> Recursively step through the {\it operator tree} and transform +every {\tt SelectStmt} node to a {\tt Query} node using the function +{\tt intersect\_tree\_analyze()} described below. The one node already +transformed (the topmost node) is still contained in the {\it operator +tree} and must not be transformed again because this would cause +troubles in the {\it transforming logic}. +% +\begin{figure}[ht] +\begin{center} +\epsfig{figure=figures/union_intersect_dnf.ps} +\caption{Data structure of $(A \cup B) \backslash C$ after transformation to DNF} +\label{union_intersect_dnf} +\end{center} +\end{figure} +% +<step> Transform the new {\it operator tree} into DNF (disjunctive normal +form). <productname>Postgres</productname> does not provide any function for the transformation +into DNF but it provides a function {\tt cnfify()} that performs a +transformation into CNF (conjunctive normal form). So we can easily +make use of this function when we exchange every {\tt OR} with an {\tt +AND} and vice versa before calling {\tt cnfify()} as we did already in +the {\it parser} (compare figure \ref{union_intersect_select_w_o_sort} +to figure \ref{union_intersect_select}). When {\tt cnfify()} is called +with a special flag, the {\tt removeAndFlag} set to {\tt true} it +returns a list where the entries can be thought of being connected +together by {\tt ANDs}, so the explicit {\tt AND} nodes are removed. + +After {\tt cnfify()} has been called we normally would have to +exchange {\tt OR} and {\tt AND} nodes again. We skip this step by +simply treating every {\tt OR} node as an {\tt AND} node throughout +the following steps (remember, that there are no {\tt AND} nodes left +that have to be treated as {\tt OR} nodes because of the {\tt +removeAndFlag}). + +\noindent Figure \ref{union_intersect_dnf} shows what the data +structure looks like after the transformation to DNF has taken place +for the following query: +% +\begin{verbatim} + (select * from A + union + select * from B) + except + select * from C; +\end{verbatim} +% +<step> For every entry of the list returned by {\tt cnfify()} (i.e. for every +{\it operator tree} which may only contain {\tt OR} and {\tt NOT} +operator nodes and {\tt Query} nodes as leaves) contained in the {\tt +union\_list} perform the following steps: +% +\begin{itemize} +% +<step> Check if the {\it targetlists} of all {\tt Query} nodes appearing are +compatible (i.e. all {\it targetlists} have the same number of attributes +and the corresponding attributes are of the same type) +% +<step> There must be at least one positive {\tt +OR} node (i.e. at least one {\tt OR} node which is not preceded by a +{\tt NOT} node). Create a list of all {\tt Query} nodes (or of {\tt +Query} nodes preceded by {\tt NOT} nodes if {\tt OR NOT} is found) +with the non negated node first using the function {\tt +create\_list()} described below. +% +<step> The first (non negated) node of the list will be the topmost +{\tt Query} node of the current {\it union} operand. For all other +nodes found in the list add an {\tt IN} subselect ({\tt NOT IN} +subselect if the {\tt Query} node is preceded by a {\tt NOT}) to the +{\tt where} qualification of the topmost node. Adding a +subselect to the {\tt where} qualification is done by logically +{\tt AND}ing it to the original qualification. +% +<step> Append the {\tt Query} node setup in the last steps to a list which +is hold by the pointer {\tt union\_list}. +\end{itemize} +% +<step> Take the first node of {\tt union\_list} as the new topmost node +of the whole query and attach the rest of the list to the +field {\tt unionClause} of this topmost node. Since the new topmost +node might differ from the original one (i.e. from the node which was +topmost when we entered the {\it rewrite stage}) we have to attach the +fields saved in the first step to the new topmost node (i.e. the {\tt +sortClause}, {\tt targetList}, {\tt unionFlag}, etc.). +% +<step> Hand the new topmost {\tt Query} node back. Now the normal +{\it query rewriting} takes place (in order to handle views if +present) and then the {\it planner/optimizer} and {\it executor} +functions are called to get a result. There have no changes been made +to the code of these stages. +% +\end{itemize} +Figure \ref{union_operand} shows the rewritten data structure of the +query: +\begin{verbatim} + select C1, C2 from A + intersect + select C1, C3 from C; +\end{verbatim} +% +% \clearpage +% +\noindent against the tables defined in example \ref{simple_set_ops}. +The rewritten data structure represents the query: +\begin{verbatim} + select C1, C2 form A + where (C1, C2) in + (select C1,C3 from C); +\end{verbatim} +% +\noindent The field {\tt lefttree} of the {\tt Sublink} node points to a list +where every entry points to a {\tt VAR} node of the {\it targetlist} of the +topmost node (node A). The field {\tt oper} of the {\tt Sublink} node +points to a list holding a pointer to an {\tt Expr} node for every +attribute of the topmost {\it targetlist}. Every {\tt Expr} node is used to +compare a {\tt VAR} node of the topmost {\it targetlist} with the +corresponding {\tt VAR} node of the subselect's {\it targetlist}. So the +first argument of every {\tt Expr} node points to a {\tt VAR} node of +the topmost {\it targetlist} and the second argument points to the +corresponding {\tt VAR} node of the subselect's {\it targetlist}. +% +\begin{figure}[ht] +\begin{flushright} +\epsfig{figure=figures/union_operand.ps} +\caption{Data structure of $A \cap C$ after query rewriting} +\label{union_operand} +\end{flushright} +\end{figure} +% +\clearpage +% +\noindent If the user's query involves {\it union} as well as {\it +intersect} or {\it except} there will be more {\tt Query} nodes of the +form shown in figure \ref{union_operand}. One will be the topmost node +(as described above) and the others will be collected in a list which +is attached to the field {\tt unionClause} of the topmost node. (The +{\it intersectClause} fields of all {\tt Query} nodes will be set to +{\tt NULL} because they are no longer needed.) \\ +\\ +% +The function {\tt pg\_parse\_and\_plan()} is responsible for invoking the +rewrite procedure. It can be found in the file {\tt +$\ldots$/src/backend/tcop/postgres.c}. +% +\begin{verbatim} + List * + pg_parse_and_plan(char *query_string, Oid *typev, + int nargs, + QueryTreeList **queryListP, + CommandDest dest) + { + . + . + . + /* Rewrite Union, Intersect and Except Queries + * to normal Union Queries using IN and NOT + * IN subselects */ + if(querytree->intersectClause != NIL) + { + querytree = Except_Intersect_Rewrite(querytree); + } + . + . + . + } +\end{verbatim} +% +Here are the functions that have been added to perform the +functionality described above. They can be found in the file {\tt +$\ldots$/src/backend/rewrite/rewriteHandler.c}. +% +\begin{itemize} +<step> {\tt Except\_Intersect\_Rewrite()} \\ +Rewrites queries involving {\it union clauses}, {\it intersect +clauses} and {\it except clauses} to semantiacally equivalent queries +that use {\tt IN} and {\tt NOT IN} subselects instead. + +The {\it operator tree} is attached to {\tt intersectClause} (see rule +{\tt SelectStmt} above) of the {\it parsetree} given as an +argument. First we save some clauses (the {\tt sortClause}, the {\tt +unique flag} etc.). Then we translate the {\it operator tree} to DNF +({\it Disjunctive Normal Form}) by {\tt cnfify()}. Note that {\tt +cnfify()} produces CNF but as we exchanged {\tt AND} nodes with {\tt +OR} nodes within function {\tt A\_Expr\_to\_Expr()} earlier we get DNF +when we exchange {\tt AND} nodes and {\tt OR} nodes again in the +result. Now we create a new (rewritten) query by examining the new +{\it operator tree} which is in DNF now. For every {\tt AND} node we +create an entry in the {\it union list} and for every {\tt OR} node we +create an {\tt IN} subselect. ({\tt NOT IN} subselects are created for +{\tt OR NOT} nodes). The first entry of the {\it union list} is handed +back but before that the saved clauses ({\tt sortClause} etc.) are +restored to the new top node. Note that the new top node can differ +from the one of the {\it parsetree} given as argument because of the +translation into DNF. That's why we had to save the {\tt sortClause} +etc. +% +\pagebreak +% +\begin{verbatim} + Query * + Except_Intersect_Rewrite (Query *parsetree) + { + . + . + . + /* Save some fields, to be able to restore them + * to the resulting top node at the end of the + * function + */ + sortClause = parsetree->sortClause; + uniqueFlag = parsetree->uniqueFlag; + into = parsetree->into; + isBinary = parsetree->isBinary; + isPortal = parsetree->isPortal; + + /* Transform the SelectStmt nodes into Query nodes + * as usually done by transformSelectStmt() earlier. + * / + intersectClause = + (List *)intersect_tree_analyze( + (Node *)parsetree->intersectClause, + (Node *)lfirst(parsetree->unionClause), + (Node *)parsetree); + . + . + . + /* Transform the operator tree to DNF */ + intersectClause = + cnfify((Expr *)intersectClause, true); + /* For every entry of the intersectClause list we + * generate one entry in the union_list + */ + foreach(intersect, intersectClause) + { + /* For every OR we create an IN subselect and + * for every OR NOT we create a NOT IN subselect, + */ + intersect_list = NIL; + create_list((Node *)lfirst(intersect), + &intersect_list); + /* The first node will become the Select Query + * node, all other nodes are transformed into + * subselects under this node! + */ + intersect_node = (Query *)lfirst(intersect_list); + intersect_list = lnext(intersect_list); + . + . + . + /* Transform all remaining nodes into subselects + * and add them to the qualifications of the + * Select Query node + */ + while(intersect_list != NIL) + { + n = makeNode(SubLink); + + /* Here we got an OR so transform it to an + * IN subselect + */ + if(IsA(lfirst(intersect_list), Query)) + { + . + . + . + n->subselect = lfirst(intersect_list); + op = "="; + n->subLinkType = ANY_SUBLINK; + n->useor = false; + } + + /* Here we got an OR NOT node so transform + * it to a NOT IN subselect + */ + else + { + . + . + . + n->subselect = + (Node *)lfirst(((Expr *) + lfirst(intersect_list))->args); + op = "<>"; + n->subLinkType = ALL_SUBLINK; + n->useor = true; + } + + /* Prepare the lefthand side of the Sublinks: + * All the entries of the targetlist must be + * (IN) or must not be (NOT IN) the subselect + */ + foreach(elist, intersect_node->targetList) + { + Node *expr = lfirst(elist); + TargetEntry *tent = (TargetEntry *)expr; + + n->lefthand = + lappend(n->lefthand, tent->expr); + } + + /* The first arguments of oper also have to be + * created for the sublink (they are the same + * as the lefthand side!) + */ + left_expr = n->lefthand; + right_expr = + ((Query *)(n->subselect))->targetList; + foreach(elist, left_expr) + { + Node *lexpr = lfirst(elist); + Node *rexpr = lfirst(right_expr); + TargetEntry *tent = (TargetEntry *) rexpr; + Expr *op_expr; + + op_expr = make_op(op, lexpr, tent->expr); + n->oper = lappend(n->oper, op_expr); + right_expr = lnext(right_expr); + } + + /* If the Select Query node has aggregates + * in use add all the subselects to the + * HAVING qual else to the WHERE qual + */ + if(intersect_node->hasAggs == false) + { + AddQual(intersect_node, (Node *)n); + } + else + { + AddHavingQual(intersect_node, (Node *)n); + } + + /* Now we got sublinks */ + intersect_node->hasSubLinks = true; + intersect_list = lnext(intersect_list); + } + intersect_node->intersectClause = NIL; + union_list = lappend(union_list, intersect_node); + } + + /* The first entry to union_list is our + * new top node + */ + result = (Query *)lfirst(union_list); + + /* attach the rest to unionClause */ + result->unionClause = lnext(union_list); + + /* Attach all the items saved in the + * beginning of the function */ + result->sortClause = sortClause; + result->uniqueFlag = uniqueFlag; + result->into = into; + result->isPortal = isPortal; + result->isBinary = isBinary; + . + . + . + return result; + } +\end{verbatim} +% +<step> {\tt create\_list()} \\ +Create a list of nodes that are either {\tt Query} nodes or {\tt NOT} +nodes followed by a {\tt Query} node. The {\it tree} given in {\tt +ptr} contains at least one {\it non negated} {\tt Query} node. This +node is put to the beginning of the list. +% +\begin{verbatim} + void create_list(Node *ptr, + List **intersect_list) + { + List *arg; + + if(IsA(ptr,Query)) + { + /* The non negated node is attached at the + * beginning (lcons) */ + *intersect_list = lcons(ptr, *intersect_list); + return; + } + if(IsA(ptr,Expr)) + { + if(((Expr *)ptr)->opType == NOT_EXPR) + { + /* negated nodes are appended to the + * end (lappend) + */ + *intersect_list = + lappend(*intersect_list, ptr); + return; + } + else + { + foreach(arg, ((Expr *)ptr)->args) + { + create_list(lfirst(arg), intersect_list); + } + return; + } + return; + } + } +\end{verbatim} +% +<step> {\tt intersect\_tree\_analyze()} \\ +% +The nodes given in {\tt tree} are not transformed yet so process them +using {\tt parse\_analyze()}. The node given in {\tt first\_select} +has already been processed, so don't transform it again but return a +pointer to the already processed version given in {\tt parsetree} +instead. +% +\begin{verbatim} + Node *intersect_tree_analyze(Node *tree, + Node *first_select, Node *parsetree) + { + Node *result; + List *arg; + + if(IsA(tree, SelectStmt)) + { + QueryTreeList *qtree; + + /* If we get to the tree given in first_select + * return parsetree instead of performing + * parse_analyze() */ + if(tree == first_select) + { + result = parsetree; + } + else + { + /* transform the unprocessed Query nodes */ + qtree = + parse_analyze(lcons(tree, NIL), NULL); + result = (Node *)qtree->qtrees[0]; + } + } + if(IsA(tree,Expr)) + { + /* Call recursively for every argument */ + foreach(arg, ((Expr *)tree)->args) + { + lfirst(arg) = + intersect_tree_analyze(lfirst(arg), + first_select, parsetree); + } + result = tree; + } + return result; + } +\end{verbatim} +% + +\end{itemize} + +********************************************************** +********************************************************** +********************************************************** +********************************************************** +********************************************************** +--> + + </chapter> + +<!-- Keep this comment at the end of the file +Local variables: +mode: sgml +sgml-omittag:nil +sgml-shorttag:t +sgml-minimize-attributes:nil +sgml-always-quote-attributes:t +sgml-indent-step:1 +sgml-indent-data:t +sgml-parent-document:nil +sgml-default-dtd-file:"./reference.ced" +sgml-exposed-tags:nil +sgml-local-catalogs:"/usr/lib/sgml/catalog" +sgml-local-ecat-files:nil +End: +--> |