diff options
-rw-r--r-- | doc/src/sgml/filelist.sgml | 3 | ||||
-rw-r--r-- | doc/src/sgml/sql.sgml | 2148 |
2 files changed, 0 insertions, 2151 deletions
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 2ec83af58e9..b914086009f 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -12,9 +12,6 @@ <!ENTITY query SYSTEM "query.sgml"> <!ENTITY start SYSTEM "start.sgml"> -<!-- currently unused, but contains some interesting information --> -<!ENTITY sql SYSTEM "sql.sgml"> - <!-- user's guide --> <!ENTITY array SYSTEM "array.sgml"> <!ENTITY datatype SYSTEM "datatype.sgml"> diff --git a/doc/src/sgml/sql.sgml b/doc/src/sgml/sql.sgml deleted file mode 100644 index 57396d7c245..00000000000 --- a/doc/src/sgml/sql.sgml +++ /dev/null @@ -1,2148 +0,0 @@ -<!-- doc/src/sgml/sql.sgml --> - - <chapter id="sql-intro"> - <title>SQL</title> - - <abstract> - <para> - This chapter introduces the mathematical concepts behind - relational databases. It is not required reading, so if you bog - down or want to get straight to some simple examples feel free to - jump ahead to the next chapter and come back when you have more - time and patience. This stuff is supposed to be fun! - </para> - - <para> - This material originally appeared as a part of - Stefan Simkovics' Master's Thesis - (<xref linkend="SIM98" endterm="SIM98">). - </para> - </abstract> - - <para> - <acronym>SQL</acronym> has become the most popular relational query - language. - The name <quote><acronym>SQL</acronym></quote> is an abbreviation for - <firstterm>Structured Query Language</firstterm>. - In 1974 Donald Chamberlin and others defined the - language SEQUEL (<firstterm>Structured English Query - Language</firstterm>) at IBM - Research. This language was first implemented in an IBM - prototype called SEQUEL-XRM in 1974-75. In 1976-77 a revised version - of SEQUEL called SEQUEL/2 was defined and the name was changed to - <acronym>SQL</acronym> - subsequently. - </para> - - <para> - A new prototype called System R was developed by IBM in 1977. System R - implemented a large subset of SEQUEL/2 (now <acronym>SQL</acronym>) - and a number of - changes were made to <acronym>SQL</acronym> during the project. - System R was installed in - a number of user sites, both internal IBM sites and also some selected - customer sites. Thanks to the success and acceptance of System R at - those user sites IBM started to develop commercial products that - implemented the <acronym>SQL</acronym> language based on the System - R technology. - </para> - - <para> - Over the next years IBM and also a number of other vendors announced - <acronym>SQL</acronym> products such as - <productname>SQL/DS</productname> (IBM), - <productname>DB2</productname> (IBM), - <productname>ORACLE</productname> (Oracle Corp.), - <productname>DG/SQL</productname> (Data General Corp.), - and <productname>SYBASE</productname> (Sybase Inc.). - </para> - - <para> - <acronym>SQL</acronym> is also an official standard now. In 1982 - the American National - Standards Institute (<acronym>ANSI</acronym>) chartered its - Database Committee X3H2 to - develop a proposal for a standard relational language. This proposal - was ratified in 1986 and consisted essentially of the IBM dialect of - <acronym>SQL</acronym>. In 1987 this <acronym>ANSI</acronym> - standard was also accepted as an international - standard by the International Organization for Standardization - (<acronym>ISO</acronym>). - This original standard version of <acronym>SQL</acronym> is often - referred to, - informally, as <quote><abbrev>SQL/86</abbrev></quote>. In 1989 the original - standard was extended - and this new standard is often, again informally, referred to as - <quote><abbrev>SQL/89</abbrev></quote>. Also in 1989, a related standard called - <firstterm>Database Language Embedded <acronym>SQL</acronym></firstterm> - (<acronym>ESQL</acronym>) was developed. - </para> - - <para> - The <acronym>ISO</acronym> and <acronym>ANSI</acronym> committees - have been working for many years on the - definition of a greatly expanded version of the original standard, - referred to informally as <firstterm><acronym>SQL2</acronym></firstterm> - or <firstterm><acronym>SQL/92</acronym></firstterm>. This version became a - ratified standard - <quote>International Standard ISO/IEC 9075:1992, - Database Language <acronym>SQL</acronym></quote> - in late 1992. - <acronym>SQL/92</acronym> is the version - normally meant when people refer to <quote>the <acronym>SQL</acronym> - standard</quote>. A detailed - description of <acronym>SQL/92</acronym> is given in - <xref linkend="DATE97" endterm="DATE97">. At the time of - writing this document a new standard informally referred to - as <firstterm><acronym>SQL3</acronym></firstterm> - is under development. It is planned to make <acronym>SQL</acronym> - a Turing-complete - language, i.e., all computable queries (e.g., recursive queries) will be - possible. This has now been completed as SQL:2003. - </para> - - <sect1 id="rel-model"> - <title>The Relational Data Model</title> - - <para> - As mentioned before, <acronym>SQL</acronym> is a relational - language. That means it is - based on the <firstterm>relational data model</firstterm> - first published by E.F. Codd in - 1970. We will give a formal description of the relational model - later (in - <xref linkend="formal-notion" endterm="formal-notion">) - but first we want to have a look at it from a more intuitive - point of view. - </para> - - <para> - A <firstterm>relational database</firstterm> is a database that is - perceived by its - users as a <firstterm>collection of tables</firstterm> (and - nothing else but tables). - A table consists of rows and columns where each row represents a - record and each column represents an attribute of the records - contained in the table. - <xref linkend="supplier-fig" endterm="supplier-fig"> - shows an example of a database consisting of three tables: - - <itemizedlist> - <listitem> - <para> - SUPPLIER is a table storing the number - (SNO), the name (SNAME) and the city (CITY) of a supplier. - </para> - </listitem> - - <listitem> - <para> - PART is a table storing the number (PNO) the name (PNAME) and - the price (PRICE) of a part. - </para> - </listitem> - - <listitem> - <para> - SELLS stores information about which part (PNO) is sold by which - supplier (SNO). - It serves in a sense to connect the other two tables together. - </para> - </listitem> - </itemizedlist> - - <example> - <title id="supplier-fig">The Suppliers and Parts Database</title> -<screen> -SUPPLIER: SELLS: - SNO | SNAME | CITY SNO | PNO -----+---------+-------- -----+----- - 1 | Smith | London 1 | 1 - 2 | Jones | Paris 1 | 2 - 3 | Adams | Vienna 2 | 4 - 4 | Blake | Rome 3 | 1 - 3 | 3 - 4 | 2 -PART: 4 | 3 - PNO | PNAME | PRICE 4 | 4 -----+---------+--------- - 1 | Screw | 10 - 2 | Nut | 8 - 3 | Bolt | 15 - 4 | Cam | 25 -</screen> - </example> - </para> - - <para> - The tables PART and SUPPLIER can be regarded as - <firstterm>entities</firstterm> and - SELLS can be regarded as a <firstterm>relationship</firstterm> - between a particular - part and a particular supplier. - </para> - - <para> - As we will see later, <acronym>SQL</acronym> operates on tables - like the ones just - defined but before that we will study the theory of the relational - model. - </para> - </sect1> - - <sect1 id="relmodel-formal"> - <title id="formal-notion">Relational Data Model Formalities</title> - - <para> - The mathematical concept underlying the relational model is the - set-theoretic <firstterm>relation</firstterm> which is a subset of - the Cartesian - product of a list of domains. This set-theoretic relation gives - the model its name (do not confuse it with the relationship from the - <firstterm>Entity-Relationship model</firstterm>). - Formally a domain is simply a set of - values. For example the set of integers is a domain. Also the set of - character strings of length 20 and the real numbers are examples of - domains. - </para> - - <para> -<!-- -\begin{definition} -The <firstterm>Cartesian product</firstterm> of domains $D_{1}, - D_{2},\ldots, D_{k}$ written -\mbox{$D_{1} \times D_{2} \times \ldots \times D_{k}$} is the set of -all $k$-tuples $(v_{1},v_{2},\ldots,v_{k})$ such that \mbox{$v_{1} \in -D_{1}, v_{2} \in D_{2}, \ldots, v_{k} \in D_{k}$}. -\end{definition} ---> - The <firstterm>Cartesian product</firstterm> of domains - <parameter>D<subscript>1</subscript></parameter>, - <parameter>D<subscript>2</subscript></parameter>, - ... - <parameter>D<subscript>k</subscript></parameter>, - written - <parameter>D<subscript>1</subscript></parameter> × - <parameter>D<subscript>2</subscript></parameter> × - ... × - <parameter>D<subscript>k</subscript></parameter> - is the set of all k-tuples - <parameter>v<subscript>1</subscript></parameter>, - <parameter>v<subscript>2</subscript></parameter>, - ... - <parameter>v<subscript>k</subscript></parameter>, - such that - <parameter>v<subscript>1</subscript></parameter> ∈ - <parameter>D<subscript>1</subscript></parameter>, - <parameter>v<subscript>2</subscript></parameter> ∈ - <parameter>D<subscript>2</subscript></parameter>, - ... - <parameter>v<subscript>k</subscript></parameter> ∈ - <parameter>D<subscript>k</subscript></parameter>. - </para> - - <para> - For example, when we have -<!-- - $k=2$, $D_{1}=\{0,1\}$ and -$D_{2}=\{a,b,c\}$, then $D_{1} \times D_{2}$ is -$\{(0,a),(0,b),(0,c),(1,a),(1,b),(1,c)\}$. ---> - <parameter>k</parameter>=2, - <parameter>D<subscript>1</subscript></parameter>=<literal>{0,1}</literal> and - <parameter>D<subscript>2</subscript></parameter>=<literal>{a,b,c}</literal> then - <parameter>D<subscript>1</subscript></parameter> × - <parameter>D<subscript>2</subscript></parameter> is - <literal>{(0,a),(0,b),(0,c),(1,a),(1,b),(1,c)}</literal>. - </para> - - <para> -<!-- -\begin{definition} -A Relation is any subset of the Cartesian product of one or more -domains: $R \subseteq$ \mbox{$D_{1} \times D_{2} \times \ldots \times D_{k}$} -\end{definition} ---> - A Relation is any subset of the Cartesian product of one or more - domains: <parameter>R</parameter> ⊆ - <parameter>D<subscript>1</subscript></parameter> × - <parameter>D<subscript>2</subscript></parameter> × - ... × - <parameter>D<subscript>k</subscript></parameter>. - </para> - - <para> - For example <literal>{(0,a),(0,b),(1,a)}</literal> is a relation; - it is in fact a subset of - <parameter>D<subscript>1</subscript></parameter> × - <parameter>D<subscript>2</subscript></parameter> - mentioned above. - </para> - - <para> - The members of a relation are called tuples. Each relation of some - Cartesian product - <parameter>D<subscript>1</subscript></parameter> × - <parameter>D<subscript>2</subscript></parameter> × - ... × - <parameter>D<subscript>k</subscript></parameter> - is said to have arity <literal>k</literal> and is therefore a set - of <literal>k</literal>-tuples. - </para> - - <para> - A relation can be viewed as a table (as we already did, remember - <xref linkend="supplier-fig" endterm="supplier-fig"> where - every tuple is represented by a row and every column corresponds to - one component of a tuple. Giving names (called attributes) to the - columns leads to the definition of a - <firstterm>relation scheme</firstterm>. - </para> - - <para> -<!-- -\begin{definition} -A {\it relation scheme} $R$ is a finite set of attributes -\mbox{$\{A_{1},A_{2},\ldots,A_{k}\}$}. There is a domain $D_{i}$ for -each attribute $A_{i}, 1 \le i \le k$ where the values of the -attributes are taken from. We often write a relation scheme as -\mbox{$R(A_{1},A_{2},\ldots,A_{k})$}. -\end{definition} ---> - A <firstterm>relation scheme</firstterm> <literal>R</literal> is a - finite set of attributes - <parameter>A<subscript>1</subscript></parameter>, - <parameter>A<subscript>2</subscript></parameter>, - ... - <parameter>A<subscript>k</subscript></parameter>. - There is a domain - <parameter>D<subscript>i</subscript></parameter>, - for each attribute - <parameter>A<subscript>i</subscript></parameter>, - 1 <= <literal>i</literal> <= <literal>k</literal>, - where the values of the attributes are taken from. We often write - a relation scheme as - <literal>R(<parameter>A<subscript>1</subscript></parameter>, - <parameter>A<subscript>2</subscript></parameter>, - ... - <parameter>A<subscript>k</subscript></parameter>)</literal>. - - <note> - <para> - A <firstterm>relation scheme</firstterm> is just a kind of template - whereas a <firstterm>relation</firstterm> is an instance of a - <firstterm>relation - scheme</firstterm>. The relation consists of tuples (and can - therefore be - viewed as a table); not so the relation scheme. - </para> - </note> - </para> - - <sect2> - <title id="domains">Domains vs. Data Types</title> - - <para> - We often talked about <firstterm>domains</firstterm> - in the last section. Recall that a - domain is, formally, just a set of values (e.g., the set of integers or - the real numbers). In terms of database systems we often talk of - <firstterm>data types</firstterm> instead of domains. - When we define a table we have to make - a decision about which attributes to include. Additionally we - have to decide which kind of data is going to be stored as - attribute values. For example the values of - <classname>SNAME</classname> from the table - <classname>SUPPLIER</classname> will be character strings, - whereas <classname>SNO</classname> will store - integers. We define this by assigning a data type to each - attribute. The type of <classname>SNAME</classname> will be - <type>VARCHAR(20)</type> (this is the <acronym>SQL</acronym> type - for character strings of length <= 20), - the type of <classname>SNO</classname> will be - <type>INTEGER</type>. With the assignment of a data type we also - have selected - a domain for an attribute. The domain of - <classname>SNAME</classname> is the set of all - character strings of length <= 20, - the domain of <classname>SNO</classname> is the set of - all integer numbers. - </para> - </sect2> - </sect1> - - <sect1 id="relmodel-oper"> - <title id="operations">Operations in the Relational Data Model</title> - - <para> - In the previous section - (<xref linkend="formal-notion" endterm="formal-notion">) - we defined the mathematical notion of - the relational model. Now we know how the data can be stored using a - relational data model but we do not know what to do with all these - tables to retrieve something from the database yet. For example somebody - could ask for the names of all suppliers that sell the part - 'Screw'. Therefore two rather different kinds of notations for - expressing operations on relations have been defined: - - <itemizedlist> - <listitem> - <para> - The <firstterm>Relational Algebra</firstterm> which is an - algebraic notation, - where queries are expressed by applying specialized operators to the - relations. - </para> - </listitem> - - <listitem> - <para> - The <firstterm>Relational Calculus</firstterm> which is a - logical notation, - where queries are expressed by formulating some logical restrictions - that the tuples in the answer must satisfy. - </para> - </listitem> - </itemizedlist> - </para> - - <sect2> - <title id="rel-alg">Relational Algebra</title> - - <para> - The <firstterm>Relational Algebra</firstterm> was introduced by - E. F. Codd in 1972. It consists of a set of operations on relations: - - <itemizedlist> - <listitem> - <para> - SELECT (σ): extracts <firstterm>tuples</firstterm> from - a relation that - satisfy a given restriction. Let <parameter>R</parameter> be a - table that contains an attribute - <parameter>A</parameter>. -σ<subscript>A=a</subscript>(R) = {t ∈ R ∣ t(A) = a} - where <literal>t</literal> denotes a - tuple of <parameter>R</parameter> and <literal>t(A)</literal> - denotes the value of attribute <parameter>A</parameter> of - tuple <literal>t</literal>. - </para> - </listitem> - - <listitem> - <para> - PROJECT (π): extracts specified - <firstterm>attributes</firstterm> (columns) from a - relation. Let <classname>R</classname> be a relation - that contains an attribute <classname>X</classname>. - π<subscript>X</subscript>(<classname>R</classname>) = {t(X) ∣ t ∈ <classname>R</classname>}, - where <literal>t</literal>(<classname>X</classname>) denotes the value of - attribute <classname>X</classname> of tuple <literal>t</literal>. - </para> - </listitem> - - <listitem> - <para> - PRODUCT (×): builds the Cartesian product of two - relations. Let <classname>R</classname> be a table with arity - <literal>k</literal><subscript>1</subscript> and let - <classname>S</classname> be a table with - arity <literal>k</literal><subscript>2</subscript>. - <classname>R</classname> × <classname>S</classname> - is the set of all - <literal>k</literal><subscript>1</subscript> - + <literal>k</literal><subscript>2</subscript>-tuples - whose first <literal>k</literal><subscript>1</subscript> - components form a tuple in <classname>R</classname> and whose last - <literal>k</literal><subscript>2</subscript> components form a - tuple in <classname>S</classname>. - </para> - </listitem> - - <listitem> - <para> - UNION (∪): builds the set-theoretic union of two - tables. Given the tables <classname>R</classname> and - <classname>S</classname> (both must have the same arity), - the union <classname>R</classname> ∪ <classname>S</classname> - is the set of tuples that are in <classname>R</classname> - or <classname>S</classname> or both. - </para> - </listitem> - - <listitem> - <para> - INTERSECT (∩): builds the set-theoretic intersection of two - tables. Given the tables <classname>R</classname> and - <classname>S</classname>, - <classname>R</classname> ∩ <classname>S</classname> is the - set of tuples - that are in <classname>R</classname> and in - <classname>S</classname>. - We again require that <classname>R</classname> and - <classname>S</classname> have the - same arity. - </para> - </listitem> - - <listitem> - <para> - DIFFERENCE (− or ∖): builds the set difference of - two tables. Let <classname>R</classname> and <classname>S</classname> - again be two tables with the same - arity. <classname>R</classname> - <classname>S</classname> - is the set of tuples in <classname>R</classname> but not in - <classname>S</classname>. - </para> - </listitem> - - <listitem> - <para> - JOIN (∏): connects two tables by their common - attributes. Let <classname>R</classname> be a table with the - attributes <classname>A</classname>,<classname>B</classname> - and <classname>C</classname> and - let <classname>S</classname> be a table with the attributes - <classname>C</classname>,<classname>D</classname> - and <classname>E</classname>. There is one - attribute common to both relations, - the attribute <classname>C</classname>. -<!-- - <classname>R</classname> ∏ <classname>S</classname> = - π<subscript><classname>R</classname>.<classname>A</classname>,<classname>R</classname>.<classname>B</classname>,<classname>R</classname>.<classname>C</classname>,<classname>S</classname>.<classname>D</classname>,<classname>S</classname>.<classname>E</classname></subscript>(σ<subscript><classname>R</classname>.<classname>C</classname>=<classname>S</classname>.<classname>C</classname></subscript>(<classname>R</classname> × <classname>S</classname>)). ---> - R ∏ S = π<subscript>R.A,R.B,R.C,S.D,S.E</subscript>(σ<subscript>R.C=S.C</subscript>(R × S)). - What are we doing here? We first calculate the Cartesian - product - <classname>R</classname> × <classname>S</classname>. - Then we select those tuples whose values for the common - attribute <classname>C</classname> are equal - (σ<subscript>R.C = S.C</subscript>). - Now we have a table - that contains the attribute <classname>C</classname> - two times and we correct this by - projecting out the duplicate column. - </para> - - <example> - <title id="join-example">An Inner Join</title> - - <para> - Let's have a look at the tables that are produced by evaluating the steps - necessary for a join. - Let the following two tables be given: - -<screen> -R: S: - A | B | C C | D | E ----+---+--- ---+---+--- - 1 | 2 | 3 3 | a | b - 4 | 5 | 6 6 | c | d - 7 | 8 | 9 -</screen> - </para> - </example> - - <para> - First we calculate the Cartesian product - <classname>R</classname> × <classname>S</classname> and - get: - -<screen> -R x S: - A | B | R.C | S.C | D | E ----+---+-----+-----+---+--- - 1 | 2 | 3 | 3 | a | b - 1 | 2 | 3 | 6 | c | d - 4 | 5 | 6 | 3 | a | b - 4 | 5 | 6 | 6 | c | d - 7 | 8 | 9 | 3 | a | b - 7 | 8 | 9 | 6 | c | d -</screen> - </para> - - <para> - After the selection - σ<subscript>R.C=S.C</subscript>(R × S) - we get: - -<screen> - A | B | R.C | S.C | D | E ----+---+-----+-----+---+--- - 1 | 2 | 3 | 3 | a | b - 4 | 5 | 6 | 6 | c | d -</screen> - </para> - - <para> - To remove the duplicate column - <classname>S</classname>.<classname>C</classname> - we project it out by the following operation: - π<subscript>R.A,R.B,R.C,S.D,S.E</subscript>(σ<subscript>R.C=S.C</subscript>(R × S)) - and get: - -<screen> - A | B | C | D | E ----+---+---+---+--- - 1 | 2 | 3 | a | b - 4 | 5 | 6 | c | d -</screen> - </para> - </listitem> - - <listitem> - <para> - DIVIDE (÷): Let <classname>R</classname> be a table - with the attributes A, B, C, and D and let - <classname>S</classname> be a table with the attributes - C and D. - Then we define the division as: - -<programlisting> -R ÷ S = {t ∣ ∀ t<subscript>s</subscript> ∈ S ∃ t<subscript>r</subscript> ∈ R -</programlisting> - - such that -t<subscript>r</subscript>(A,B)=t∧t<subscript>r</subscript>(C,D)=t<subscript>s</subscript>} - where - t<subscript>r</subscript>(x,y) - denotes a - tuple of table <classname>R</classname> that consists only of - the components <literal>x</literal> and <literal>y</literal>. - Note that the tuple <literal>t</literal> only consists of the - components <classname>A</classname> and - <classname>B</classname> of relation <classname>R</classname>. - </para> - - <para id="divide-example"> - Given the following tables - -<screen> -R: S: - A | B | C | D C | D ----+---+---+--- ---+--- - a | b | c | d c | d - a | b | e | f e | f - b | c | e | f - e | d | c | d - e | d | e | f - a | b | d | e -</screen> - - R ÷ S - is derived as - -<screen> - A | B ----+--- - a | b - e | d -</screen> - </para> - </listitem> - </itemizedlist> - </para> - - <para> - For a more detailed description and definition of the relational - algebra refer to [<xref linkend="ULL88" endterm="ULL88">] or - [<xref linkend="DATE04" endterm="DATE04">]. - </para> - - <example> - <title id="suppl-rel-alg">A Query Using Relational Algebra</title> - <para> - Recall that we formulated all those relational operators to be able to - retrieve data from the database. Let's return to our example from - the previous - section (<xref linkend="operations" endterm="operations">) - where someone wanted to know the names of all - suppliers that sell the part <literal>Screw</literal>. - This question can be answered - using relational algebra by the following operation: - -<programlisting> -π<subscript>SUPPLIER.SNAME</subscript>(σ<subscript>PART.PNAME='Screw'</subscript>(SUPPLIER ∏ SELLS ∏ PART)) -</programlisting> - </para> - - <para> - We call such an operation a query. If we evaluate the above query - against the our example tables - (<xref linkend="supplier-fig" endterm="supplier-fig">) - we will obtain the following result: - -<screen> - SNAME -------- - Smith - Adams -</screen> - </para> - </example> - </sect2> - - <sect2 id="rel-calc"> - <title>Relational Calculus</title> - - <para> - The relational calculus is based on the - <firstterm>first order logic</firstterm>. There are - two variants of the relational calculus: - - <itemizedlist> - <listitem> - <para> - The <firstterm>Domain Relational Calculus</firstterm> - (<acronym>DRC</acronym>), where variables - stand for components (attributes) of the tuples. - </para> - </listitem> - - <listitem> - <para> - The <firstterm>Tuple Relational Calculus</firstterm> - (<acronym>TRC</acronym>), where variables stand for tuples. - </para> - </listitem> - </itemizedlist> - </para> - - <para> - We want to discuss the tuple relational calculus only because it is - the one underlying the most relational languages. For a detailed - discussion on <acronym>DRC</acronym> (and also - <acronym>TRC</acronym>) see - <xref linkend="DATE04" endterm="DATE04"> - or - <xref linkend="ULL88" endterm="ULL88">. - </para> - </sect2> - - <sect2> - <title>Tuple Relational Calculus</title> - - <para> - The queries used in <acronym>TRC</acronym> are of the following - form: - -<programlisting> -x(A) ∣ F(x) -</programlisting> - - where <literal>x</literal> is a tuple variable - <classname>A</classname> is a set of attributes and <literal>F</literal> is a - formula. The resulting relation consists of all tuples - <literal>t(A)</literal> that satisfy <literal>F(t)</literal>. - </para> - - <para> - If we want to answer the question from example - <xref linkend="suppl-rel-alg" endterm="suppl-rel-alg"> - using <acronym>TRC</acronym> we formulate the following query: - -<programlisting> -{x(SNAME) ∣ x ∈ SUPPLIER ∧ - ∃ y ∈ SELLS ∃ z ∈ PART (y(SNO)=x(SNO) ∧ - z(PNO)=y(PNO) ∧ - z(PNAME)='Screw')} -</programlisting> - </para> - - <para> - Evaluating the query against the tables from - <xref linkend="supplier-fig" endterm="supplier-fig"> - again leads to the same result - as in - <xref linkend="suppl-rel-alg" endterm="suppl-rel-alg">. - </para> - </sect2> - - <sect2 id="alg-vs-calc"> - <title>Relational Algebra vs. Relational Calculus</title> - - <para> - The relational algebra and the relational calculus have the same - <firstterm>expressive power</firstterm>; i.e., all queries that - can be formulated using relational algebra can also be formulated - using the relational calculus and vice versa. - This was first proved by E. F. Codd in - 1972. This proof is based on an algorithm (<quote>Codd's reduction - algorithm</quote>) by which an arbitrary expression of the relational - calculus can be reduced to a semantically equivalent expression of - relational algebra. For a more detailed discussion on that refer to - <xref linkend="DATE04" endterm="DATE04"> - and - <xref linkend="ULL88" endterm="ULL88">. - </para> - - <para> - It is sometimes said that languages based on the relational - calculus are <quote>higher level</quote> or <quote>more - declarative</quote> than languages based on relational algebra - because the algebra (partially) specifies the order of operations - while the calculus leaves it to a compiler or interpreter to - determine the most efficient order of evaluation. - </para> - </sect2> - </sect1> - - <sect1 id="sql-language"> - <title>The <acronym>SQL</acronym> Language</title> - - <para> - As is the case with most modern relational languages, - <acronym>SQL</acronym> is based on the tuple - relational calculus. As a result every query that can be formulated - using the tuple relational calculus (or equivalently, relational - algebra) can also be formulated using - <acronym>SQL</acronym>. There are, however, - capabilities beyond the scope of relational algebra or calculus. Here - is a list of some additional features provided by - <acronym>SQL</acronym> that are not - part of relational algebra or calculus: - - <itemizedlist> - <listitem> - <para> - Commands for insertion, deletion or modification of data. - </para> - </listitem> - - <listitem> - <para> - Arithmetic capability: In <acronym>SQL</acronym> it is possible - to involve - arithmetic operations as well as comparisons, e.g.: - -<programlisting> -A < B + 3. -</programlisting> - - Note - that + or other arithmetic operators appear neither in relational - algebra nor in relational calculus. - </para> - </listitem> - - <listitem> - <para> - Assignment and Print Commands: It is possible to print a - relation constructed by a query and to assign a computed relation to a - relation name. - </para> - </listitem> - - <listitem> - <para> - Aggregate Functions: Operations such as - <firstterm>average</firstterm>, <firstterm>sum</firstterm>, - <firstterm>max</firstterm>, etc. can be applied to columns of a - relation to - obtain a single quantity. - </para> - </listitem> - </itemizedlist> - </para> - - <sect2 id="select"> - <title id="select-title">Select</title> - - <para> - The most often used command in <acronym>SQL</acronym> is the - <command>SELECT</command> statement, - used to retrieve data. The syntax is: - -<synopsis> -SELECT [ ALL | DISTINCT [ ON ( <replaceable class="PARAMETER">expression</replaceable> [, ...] ) ] ] - * | <replaceable class="PARAMETER">expression</replaceable> [ [ AS ] <replaceable class="PARAMETER">output_name</replaceable> ] [, ...] - [ INTO [ TEMPORARY | TEMP ] [ TABLE ] <replaceable class="PARAMETER">new_table</replaceable> ] - [ FROM <replaceable class="PARAMETER">from_item</replaceable> [, ...] ] - [ WHERE <replaceable class="PARAMETER">condition</replaceable> ] - [ GROUP BY <replaceable class="PARAMETER">expression</replaceable> [, ...] ] - [ HAVING <replaceable class="PARAMETER">condition</replaceable> [, ...] ] - [ { UNION | INTERSECT | EXCEPT } [ ALL ] <replaceable class="PARAMETER">select</replaceable> ] - [ ORDER BY <replaceable class="parameter">expression</replaceable> [ ASC | DESC | USING <replaceable class="parameter">operator</replaceable> ] [ NULLS { FIRST | LAST } ] [, ...] ] - [ LIMIT { <replaceable class="PARAMETER">count</replaceable> | ALL } ] - [ OFFSET <replaceable class="PARAMETER">start</replaceable> ] - [ FOR { UPDATE | SHARE } [ OF <replaceable class="parameter">table_name</replaceable> [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ] -</synopsis> - </para> - - <para> - Now we will illustrate the complex syntax of the - <command>SELECT</command> statement with various examples. The - tables used for the examples are defined in <xref - linkend="supplier-fig" endterm="supplier-fig">. - </para> - - <sect3> - <title>Simple Selects</title> - - <para> - Here are some simple examples using a <command>SELECT</command> statement: - - <example> - <title id="simple-query">Simple Query with Qualification</title> - <para> - To retrieve all tuples from table PART where the attribute PRICE is - greater than 10 we formulate the following query: - -<programlisting> -SELECT * FROM PART - WHERE PRICE > 10; -</programlisting> - - and get the table: - -<screen> - PNO | PNAME | PRICE ------+---------+-------- - 3 | Bolt | 15 - 4 | Cam | 25 -</screen> - </para> - - <para> - Using <quote>*</quote> in the <command>SELECT</command> statement - will deliver all attributes from the table. If we want to retrieve - only the attributes PNAME and PRICE from table PART we use the - statement: - -<programlisting> -SELECT PNAME, PRICE - FROM PART - WHERE PRICE > 10; -</programlisting> - - In this case the result is: - -<screen> - PNAME | PRICE - --------+-------- - Bolt | 15 - Cam | 25 -</screen> - - Note that the <acronym>SQL</acronym> <command>SELECT</command> - corresponds to the <quote>projection</quote> in relational algebra - not to the <quote>selection</quote> (see <xref linkend="rel-alg" - endterm="rel-alg"> for more details). - </para> - - <para> - The qualifications in the WHERE clause can also be logically connected - using the keywords OR, AND, and NOT: - -<programlisting> -SELECT PNAME, PRICE - FROM PART - WHERE PNAME = 'Bolt' AND - (PRICE = 0 OR PRICE <= 15); -</programlisting> - - will lead to the result: - -<screen> - PNAME | PRICE ---------+-------- - Bolt | 15 -</screen> - </para> - - <para> - Arithmetic operations can be used in the target list and in the WHERE - clause. For example if we want to know how much it would cost if we - take two pieces of a part we could use the following query: - -<programlisting> -SELECT PNAME, PRICE * 2 AS DOUBLE - FROM PART - WHERE PRICE * 2 < 50; -</programlisting> - - and we get: - -<screen> - PNAME | DOUBLE ---------+--------- - Screw | 20 - Nut | 16 - Bolt | 30 -</screen> - - Note that the word DOUBLE after the keyword AS is the new title of the - second column. This technique can be used for every element of the - target list to assign a new title to the resulting - column. This new title - is often referred to as alias. The alias cannot be used throughout the - rest of the query. - </para> - </example> - </para> - </sect3> - - <sect3> - <title>Joins</title> - - <para id="simple-join"> - The following example shows how <firstterm>joins</firstterm> are - realized in <acronym>SQL</acronym>. - </para> - - <para> - To join the three tables SUPPLIER, PART and SELLS over their common - attributes we formulate the following statement: - -<programlisting> -SELECT S.SNAME, P.PNAME - FROM SUPPLIER S, PART P, SELLS SE - WHERE S.SNO = SE.SNO AND - P.PNO = SE.PNO; -</programlisting> - - and get the following table as a result: - -<screen> - SNAME | PNAME --------+------- - Smith | Screw - Smith | Nut - Jones | Cam - Adams | Screw - Adams | Bolt - Blake | Nut - Blake | Bolt - Blake | Cam -</screen> - </para> - - <para> - In the FROM clause we introduced an alias name for every relation - because there are common named attributes (SNO and PNO) among the - relations. Now we can distinguish between the common named attributes - by simply prefixing the attribute name with the alias name followed by - a dot. The join is calculated in the same way as shown in - <xref linkend="join-example" endterm="join-example">. - First the Cartesian product - - SUPPLIER × PART × SELLS - - is derived. Now only those tuples satisfying the - conditions given in the WHERE clause are selected (i.e., the common - named attributes have to be equal). Finally we project out all - columns but S.SNAME and P.PNAME. - </para> - - <para> - Another way to perform joins is to use the SQL JOIN syntax as follows: -<programlisting> -SELECT sname, pname from supplier - JOIN sells USING (sno) - JOIN part USING (pno); -</programlisting> - giving again: -<screen> - sname | pname --------+------- - Smith | Screw - Adams | Screw - Smith | Nut - Blake | Nut - Adams | Bolt - Blake | Bolt - Jones | Cam - Blake | Cam -(8 rows) -</screen> - </para> - - <para> - A joined table, created using JOIN syntax, is a table reference list - item that occurs in a FROM clause and before any WHERE, GROUP BY, - or HAVING clause. Other table references, including table names or - other JOIN clauses, can be included in the FROM clause if separated - by commas. JOINed tables are logically like any other - table listed in the FROM clause. - </para> - - <para> - SQL JOINs come in two main types, CROSS JOINs (unqualified joins) - and <firstterm>qualified JOINs</>. Qualified joins can be further - subdivided based on the way in which the <firstterm>join condition</> - is specified (ON, USING, or NATURAL) and the way in which it is - applied (INNER or OUTER join). - </para> - - <variablelist> - <title>Join Types</title> - <varlistentry> - <term>CROSS JOIN</term> - <listitem> - <cmdsynopsis> - <arg choice="req"> <replaceable class="parameter">T1</replaceable> </arg> - <command> CROSS JOIN </command> - <arg choice="req"> <replaceable class="parameter">T2</replaceable> </arg> - </cmdsynopsis> - - <para> - A cross join takes two tables T1 and T2 having N and M rows - respectively, and returns a joined table containing all - N*M possible joined rows. For each row R1 of T1, each row - R2 of T2 is joined with R1 to yield a joined table row JR - consisting of all fields in R1 and R2. A CROSS JOIN is - equivalent to an INNER JOIN ON TRUE. - </para> - </listitem> - </varlistentry> - - <varlistentry> - <term>Qualified JOINs</term> - <listitem> - - <cmdsynopsis> - <arg choice="req"> <replaceable class="parameter">T1</replaceable> </arg> - <arg choice="opt"> NATURAL </arg> - <group choice="opt"> - <arg choice="opt"> INNER </arg> - <arg choice="plain"> - <group choice="req"> - <arg choice="plain"> LEFT </arg> - <arg choice="plain"> RIGHT </arg> - <arg choice="plain"> FULL </arg> - </group> - <arg choice="opt"> OUTER </arg> - </arg> - </group> - <command> JOIN </command> - <arg choice="req"> <replaceable class="parameter">T2</replaceable> </arg> - <group choice="req"> - <arg choice="plain"> ON <replaceable>search condition</replaceable></arg> - <arg choice="plain"> USING ( <replaceable>join column list</replaceable> ) </arg> - </group> - </cmdsynopsis> - - <para> - A qualified JOIN must specify its join condition - by providing one (and only one) of NATURAL, ON, or - USING. The ON clause - takes a <replaceable>search condition</replaceable>, - which is the same as in a WHERE clause. The USING - clause takes a comma-separated list of column names, - which the joined tables must have in common, and joins - the tables on equality of those columns. NATURAL is - shorthand for a USING clause that lists all the common - column names of the two tables. A side-effect of both - USING and NATURAL is that only one copy of each joined - column is emitted into the result table (compare the - relational-algebra definition of JOIN, shown earlier). - </para> - - <!-- begin join semantics --> - <variablelist> - <varlistentry> - <term> - <cmdsynopsis> - <arg choice="opt"> INNER </arg> - <command> JOIN </command> - </cmdsynopsis> - </term> - <listitem> - <para> - For each row R1 of T1, the joined table has a row for each row - in T2 that satisfies the join condition with R1. - </para> - <tip> - <para> - The words INNER and OUTER are optional for all JOINs. - INNER is the default. LEFT, RIGHT, and FULL imply an - OUTER JOIN. - </para> - </tip> - </listitem> - </varlistentry> - <varlistentry> - <term> - <cmdsynopsis> - <arg choice="plain"> LEFT </arg> - <arg choice="opt"> OUTER </arg> - <command> JOIN </command> - </cmdsynopsis> - </term> - <listitem> - <para> - First, an INNER JOIN is performed. - Then, for each row in T1 that does not satisfy the join - condition with any row in T2, an additional joined row is - returned with null fields in the columns from T2. - </para> - <tip> - <para> - The joined table unconditionally has a row for each row in T1. - </para> - </tip> - </listitem> - </varlistentry> - <varlistentry> - <term> - <cmdsynopsis> - <arg choice="plain"> RIGHT </arg> - <arg choice="opt"> OUTER </arg> - <command> JOIN </command> - </cmdsynopsis> - </term> - <listitem> - <para> - First, an INNER JOIN is performed. - Then, for each row in T2 that does not satisfy the join - condition with any row in T1, an additional joined row is - returned with null fields in the columns from T1. - </para> - <tip> - <para> - The joined table unconditionally has a row for each row in T2. - </para> - </tip> - </listitem> - </varlistentry> - <varlistentry> - <term> - <cmdsynopsis> - <arg choice="plain"> FULL </arg> - <arg choice="opt"> OUTER </arg> - <command> JOIN </command> - </cmdsynopsis> - </term> - <listitem> - <para> - First, an INNER JOIN is performed. - Then, for each row in T1 that does not satisfy the join - condition with any row in T2, an additional joined row is - returned with null fields in the columns from T2. - Also, for each row in T2 that does not satisfy the join - condition with any row in T1, an additional joined row is - returned with null fields in the columns from T1. - </para> - <tip> - <para> - The joined table unconditionally has a row for every row of T1 - and a row for every row of T2. - </para> - </tip> - </listitem> - </varlistentry> - </variablelist> - <!-- end join semantics --> - - </listitem> - </varlistentry> - </variablelist> - - <para> - JOINs of all types can be chained together or nested where either or both of - <replaceable class="parameter">T1</replaceable> and - <replaceable class="parameter">T2</replaceable> can be JOINed tables. - Parenthesis can be used around JOIN clauses to control the order - of JOINs which are otherwise processed left to right. - </para> - - </sect3> - - <sect3> - <title id="aggregates-tutorial">Aggregate Functions</title> - - <para> - <acronym>SQL</acronym> provides aggregate functions such as AVG, - COUNT, SUM, MIN, and MAX. The argument(s) of an aggregate function - are evaluated at each row that satisfies the WHERE - clause, and the aggregate function is calculated over this set - of input values. Normally, an aggregate delivers a single - result for a whole <command>SELECT</command> statement. But if - grouping is specified in the query, then a separate calculation - is done over the rows of each group, and an aggregate result is - delivered per group (see next section). - - <example> - <title id="aggregates-example">Aggregates</title> - - <para> - If we want to know the average cost of all parts in table PART we use - the following query: - -<programlisting> -SELECT AVG(PRICE) AS AVG_PRICE - FROM PART; -</programlisting> - </para> - - <para> - The result is: - -<screen> - AVG_PRICE ------------ - 14.5 -</screen> - </para> - - <para> - If we want to know how many parts are defined in table PART we use - the statement: - -<programlisting> -SELECT COUNT(PNO) - FROM PART; -</programlisting> - - and get: - -<screen> - COUNT -------- - 4 -</screen> - - </para> - </example> - </para> - </sect3> - - <sect3> - <title>Aggregation by Groups</title> - - <para> - <acronym>SQL</acronym> allows one to partition the tuples of a table - into groups. Then the - aggregate functions described above can be applied to the groups — - i.e., the value of the aggregate function is no longer calculated over - all the values of the specified column but over all values of a - group. Thus the aggregate function is evaluated separately for every - group. - </para> - - <para> - The partitioning of the tuples into groups is done by using the - keywords <command>GROUP BY</command> followed by a list of - attributes that define the - groups. If we have - <command>GROUP BY A<subscript>1</subscript>, ⃛, A<subscript>k</subscript></command> - we partition - the relation into groups, such that two tuples are in the same group - if and only if they agree on all the attributes - A<subscript>1</subscript>, ⃛, A<subscript>k</subscript>. - - <example> - <title id="aggregates-groupby">Aggregates</title> - <para> - If we want to know how many parts are sold by every supplier we - formulate the query: - -<programlisting> -SELECT S.SNO, S.SNAME, COUNT(SE.PNO) - FROM SUPPLIER S, SELLS SE - WHERE S.SNO = SE.SNO - GROUP BY S.SNO, S.SNAME; -</programlisting> - - and get: - -<screen> - SNO | SNAME | COUNT ------+-------+------- - 1 | Smith | 2 - 2 | Jones | 1 - 3 | Adams | 2 - 4 | Blake | 3 -</screen> - </para> - - <para> - Now let's have a look of what is happening here. - First the join of the - tables SUPPLIER and SELLS is derived: - -<screen> - S.SNO | S.SNAME | SE.PNO --------+---------+-------- - 1 | Smith | 1 - 1 | Smith | 2 - 2 | Jones | 4 - 3 | Adams | 1 - 3 | Adams | 3 - 4 | Blake | 2 - 4 | Blake | 3 - 4 | Blake | 4 -</screen> - </para> - - <para> - Next we partition the tuples into groups by putting all tuples - together that agree on both attributes S.SNO and S.SNAME: - -<screen> - S.SNO | S.SNAME | SE.PNO --------+---------+-------- - 1 | Smith | 1 - | 2 --------------------------- - 2 | Jones | 4 --------------------------- - 3 | Adams | 1 - | 3 --------------------------- - 4 | Blake | 2 - | 3 - | 4 -</screen> - </para> - - <para> - In our example we got four groups and now we can apply the aggregate - function COUNT to every group leading to the final result of the query - given above. - </para> - </example> - </para> - - <para> - Note that for a query using GROUP BY and aggregate - functions to make sense, the target list can only refer directly to - the attributes being grouped by. Other attributes can only be used - inside the arguments of aggregate functions. Otherwise there would - not be a unique value to associate with the other attributes. - </para> - - <para> - Also observe that it makes no sense to ask for an aggregate of - an aggregate, e.g., AVG(MAX(sno)), because a - <command>SELECT</command> only does one pass of grouping and - aggregation. You can get a result of this kind by using a - temporary table or a sub-SELECT in the FROM clause to do the - first level of aggregation. - </para> - </sect3> - - <sect3> - <title>Having</title> - - <para> - The HAVING clause works much like the WHERE clause and is used to - consider only those groups satisfying the qualification given in the - HAVING clause. Essentially, WHERE filters out unwanted input rows - before grouping and aggregation are done, whereas HAVING filters out - unwanted group rows post-GROUP. Therefore, WHERE cannot refer to the - results of aggregate functions. On the other hand, there's no point - in writing a HAVING condition that doesn't involve an aggregate - function! If your condition doesn't involve aggregates, you might - as well write it in WHERE, and thereby avoid the computation of - aggregates for groups that you're just going to throw away anyway. - - <example> - <title id="having-example">Having</title> - - <para> - If we want only those suppliers selling more than one part we use the - query: - -<programlisting> -SELECT S.SNO, S.SNAME, COUNT(SE.PNO) - FROM SUPPLIER S, SELLS SE - WHERE S.SNO = SE.SNO - GROUP BY S.SNO, S.SNAME - HAVING COUNT(SE.PNO) > 1; -</programlisting> - - and get: - -<screen> - SNO | SNAME | COUNT ------+-------+------- - 1 | Smith | 2 - 3 | Adams | 2 - 4 | Blake | 3 -</screen> - </para> - </example> - </para> - </sect3> - - <sect3> - <title>Subqueries</title> - - <para> - In the WHERE and HAVING clauses the use of subqueries (subselects) is - allowed in every place where a value is expected. In this case the - value must be derived by evaluating the subquery first. The usage of - subqueries extends the expressive power of - <acronym>SQL</acronym>. - - <example> - <title id="subselect-example">Subselect</title> - - <para> - If we want to know all parts having a greater price than the part - named 'Screw' we use the query: - -<programlisting> -SELECT * - FROM PART - WHERE PRICE > (SELECT PRICE FROM PART - WHERE PNAME='Screw'); -</programlisting> - </para> - - <para> - The result is: - -<screen> - PNO | PNAME | PRICE ------+---------+-------- - 3 | Bolt | 15 - 4 | Cam | 25 -</screen> - </para> - - <para> - When we look at the above query we can see the keyword - <command>SELECT</command> two times. The first one at the - beginning of the query - we will refer to it as outer - <command>SELECT</command> - and the one in the WHERE clause which - begins a nested query - we will refer to it as inner - <command>SELECT</command>. For every tuple of the outer - <command>SELECT</command> the inner <command>SELECT</command> has - to be evaluated. After every evaluation we know the price of the - tuple named 'Screw' and we can check if the price of the actual - tuple is greater. (Actually, in this example the inner query need - only be evaluated once, since it does not depend on the state of - the outer query.) - </para> - - <para> - If we want to know all suppliers that do not sell any part - (e.g., to be able to remove these suppliers from the database) we use: - -<programlisting> -SELECT * - FROM SUPPLIER S - WHERE NOT EXISTS - (SELECT * FROM SELLS SE - WHERE SE.SNO = S.SNO); -</programlisting> - </para> - - <para> - In our example the result will be empty because every supplier - sells at least one part. Note that we use S.SNO from the outer - <command>SELECT</command> within the WHERE clause of the inner - <command>SELECT</command>. Here the subquery must be evaluated - afresh for each tuple from the outer query, i.e., the value for - S.SNO is always taken from the current tuple of the outer - <command>SELECT</command>. - </para> - </example> - </para> - </sect3> - - <sect3> - <title>Subqueries in FROM</title> - - <para> - A somewhat different way of using subqueries is to put them in the - FROM clause. This is a useful feature because a subquery of this - kind can output multiple columns and rows, whereas a subquery used - in an expression must deliver just a single result. It also lets - us get more than one round of grouping/aggregation without resorting - to a temporary table. - - <example> - <title id="subselect-in-from-example">Subselect in FROM</title> - - <para> - If we want to know the highest average part price among all our - suppliers, we cannot write MAX(AVG(PRICE)), but we can write: - -<programlisting> -SELECT MAX(subtable.avgprice) - FROM (SELECT AVG(P.PRICE) AS avgprice - FROM SUPPLIER S, PART P, SELLS SE - WHERE S.SNO = SE.SNO AND - P.PNO = SE.PNO - GROUP BY S.SNO) subtable; -</programlisting> - - The subquery returns one row per supplier (because of its GROUP BY) - and then we aggregate over those rows in the outer query. - </para> - </example> - </para> - </sect3> - - <sect3> - <title>Union, Intersect, Except</title> - - <para> - These operations calculate the union, intersection and set theoretic - difference of the tuples derived by two subqueries. - - <example> - <title id="union-example">Union, Intersect, Except</title> - - <para> - The following query is an example for UNION: - -<programlisting> -SELECT S.SNO, S.SNAME, S.CITY - FROM SUPPLIER S - WHERE S.SNAME = 'Jones' -UNION - SELECT S.SNO, S.SNAME, S.CITY - FROM SUPPLIER S - WHERE S.SNAME = 'Adams'; -</programlisting> - -gives the result: - -<screen> - SNO | SNAME | CITY ------+-------+-------- - 2 | Jones | Paris - 3 | Adams | Vienna -</screen> - </para> - - <para> - Here is an example for INTERSECT: - -<programlisting> -SELECT S.SNO, S.SNAME, S.CITY - FROM SUPPLIER S - WHERE S.SNO > 1 -INTERSECT - SELECT S.SNO, S.SNAME, S.CITY - FROM SUPPLIER S - WHERE S.SNO < 3; -</programlisting> - - gives the result: - -<screen> - SNO | SNAME | CITY ------+-------+-------- - 2 | Jones | Paris -</screen> - - The only tuple returned by both parts of the query is the one having SNO=2. - </para> - - <para> - Finally an example for EXCEPT: - -<programlisting> -SELECT S.SNO, S.SNAME, S.CITY - FROM SUPPLIER S - WHERE S.SNO > 1 -EXCEPT - SELECT S.SNO, S.SNAME, S.CITY - FROM SUPPLIER S - WHERE S.SNO > 3; -</programlisting> - - gives the result: - -<screen> - SNO | SNAME | CITY ------+-------+-------- - 2 | Jones | Paris - 3 | Adams | Vienna -</screen> - </para> - </example> - </para> - </sect3> - </sect2> - - <sect2 id="datadef"> - <title>Data Definition</title> - - <para> - There is a set of commands used for data definition included in the - <acronym>SQL</acronym> language. - </para> - - <sect3 id="create"> - <title id="create-title">Create Table</title> - - <para> - The most fundamental command for data definition is the - one that creates a new relation (a new table). The syntax of the - <command>CREATE TABLE</command> command is: - -<synopsis> -CREATE TABLE <replaceable class="parameter">table_name</replaceable> - (<replaceable class="parameter">name_of_attr_1</replaceable> <replaceable class="parameter">type_of_attr_1</replaceable> - [, <replaceable class="parameter">name_of_attr_2</replaceable> <replaceable class="parameter">type_of_attr_2</replaceable> - [, ...]]); -</synopsis> - - <example> - <title id="table-create">Table Creation</title> - - <para> - To create the tables defined in - <xref linkend="supplier-fig" endterm="supplier-fig"> the - following <acronym>SQL</acronym> statements are used: - -<programlisting> -CREATE TABLE SUPPLIER - (SNO INTEGER, - SNAME VARCHAR(20), - CITY VARCHAR(20)); -</programlisting> - -<programlisting> -CREATE TABLE PART - (PNO INTEGER, - PNAME VARCHAR(20), - PRICE DECIMAL(4 , 2)); -</programlisting> - -<programlisting> -CREATE TABLE SELLS - (SNO INTEGER, - PNO INTEGER); -</programlisting> - </para> - </example> - </para> - </sect3> - - <sect3> - <title>Data Types in <acronym>SQL</acronym></title> - - <para> - The following is a list of some data types that are supported by - <acronym>SQL</acronym>: - - <itemizedlist> - <listitem> - <para> - INTEGER: signed fullword binary integer (31 bits precision). - </para> - </listitem> - - <listitem> - <para> - SMALLINT: signed halfword binary integer (15 bits precision). - </para> - </listitem> - - <listitem> - <para> - DECIMAL (<replaceable class="parameter">p</replaceable>[,<replaceable class="parameter">q</replaceable>]): - signed packed decimal number of up to - <replaceable class="parameter">p</replaceable> - digits, with - <replaceable class="parameter">q</replaceable> - digits to the right of the decimal point. - If <replaceable class="parameter">q</replaceable> - is omitted it is assumed to be 0. - </para> - </listitem> - - <listitem> - <para> - FLOAT: signed doubleword floating point number. - </para> - </listitem> - - <listitem> - <para> - VARCHAR(<replaceable class="parameter">n</replaceable>): - varying length character string of maximum length - <replaceable class="parameter">n</replaceable>. - </para> - </listitem> - - <listitem> - <para> - CHAR(<replaceable class="parameter">n</replaceable>): - fixed length character string of length - <replaceable class="parameter">n</replaceable>. - </para> - </listitem> - - </itemizedlist> - </para> - </sect3> - - <sect3> - <title>Create Index</title> - - <para> - Indexes are used to speed up access to a relation. If a relation <classname>R</classname> - has an index on attribute <classname>A</classname> then we can - retrieve all tuples <replaceable>t</replaceable> - having - <replaceable>t</replaceable>(<classname>A</classname>) = <replaceable>a</replaceable> - in time roughly proportional to the number of such - tuples <replaceable>t</replaceable> - rather than in time proportional to the size of <classname>R</classname>. - </para> - - <para> - To create an index in <acronym>SQL</acronym> - the <command>CREATE INDEX</command> command is used. The syntax is: - -<programlisting> -CREATE INDEX <replaceable class="parameter">index_name</replaceable> - ON <replaceable class="parameter">table_name</replaceable> ( <replaceable class="parameter">name_of_attribute</replaceable> ); -</programlisting> - </para> - - <para> - <example> - <title id="index-create">Create Index</title> - - <para> - To create an index named I on attribute SNAME of relation SUPPLIER - we use the following statement: - -<programlisting> -CREATE INDEX I ON SUPPLIER (SNAME); -</programlisting> - </para> - - <para> - The created index is maintained automatically, i.e., whenever a new - tuple is inserted into the relation SUPPLIER the index I is - adapted. Note that the only changes a user can perceive when an - index is present are increased speed for <command>SELECT</command> - and decreases in speed of updates. - </para> - </example> - </para> - </sect3> - - <sect3> - <title>Create View</title> - - <para> - A view can be regarded as a <firstterm>virtual table</firstterm>, - i.e., a table that - does not <emphasis>physically</emphasis> exist in the database - but looks to the user - as if it does. By contrast, when we talk of a - <firstterm>base table</firstterm> there is - really a physically stored counterpart of each row of the table - somewhere in the physical storage. - </para> - - <para> - Views do not have their own, physically separate, distinguishable - stored data. Instead, the system stores the definition of the - view (i.e., the rules about how to access physically stored base - tables in order to materialize the view) somewhere in the system - catalogs (see - <xref linkend="tutorial-catalogs-title" endterm="tutorial-catalogs-title">). For a - discussion on different techniques to implement views refer to -<!-- - section - <xref linkend="view-impl" endterm="view-impl">. ---> - <citetitle>SIM98</citetitle>. - </para> - - <para> - In <acronym>SQL</acronym> the <command>CREATE VIEW</command> - command is used to define a view. The syntax - is: - -<programlisting> -CREATE VIEW <replaceable class="parameter">view_name</replaceable> - AS <replaceable class="parameter">select_stmt</replaceable> -</programlisting> - - where <replaceable class="parameter">select_stmt</replaceable> - is a valid select statement as defined - in <xref linkend="select-title" endterm="select-title">. - Note that <replaceable class="parameter">select_stmt</replaceable> is - not executed when the view is created. It is just stored in the - <firstterm>system catalogs</firstterm> - and is executed whenever a query against the view is made. - </para> - - <para> - Let the following view definition be given (we use - the tables from - <xref linkend="supplier-fig" endterm="supplier-fig"> again): - -<programlisting> -CREATE VIEW London_Suppliers - AS SELECT S.SNAME, P.PNAME - FROM SUPPLIER S, PART P, SELLS SE - WHERE S.SNO = SE.SNO AND - P.PNO = SE.PNO AND - S.CITY = 'London'; -</programlisting> - </para> - - <para> - Now we can use this <firstterm>virtual relation</firstterm> - <classname>London_Suppliers</classname> as - if it were another base table: - -<programlisting> -SELECT * FROM London_Suppliers - WHERE PNAME = 'Screw'; -</programlisting> - - which will return the following table: - -<screen> - SNAME | PNAME --------+------- - Smith | Screw -</screen> - </para> - - <para> - To calculate this result the database system has to do a - <emphasis>hidden</emphasis> - access to the base tables SUPPLIER, SELLS and PART first. It - does so by executing the query given in the view definition against - those base tables. After that the additional qualifications - (given in the - query against the view) can be applied to obtain the resulting - table. - </para> - </sect3> - - <sect3> - <title>Drop Table, Drop Index, Drop View</title> - - <para> - To destroy a table (including all tuples stored in that table) the - <command>DROP TABLE</command> command is used: - -<programlisting> -DROP TABLE <replaceable class="parameter">table_name</replaceable>; -</programlisting> - </para> - - <para> - To destroy the SUPPLIER table use the following statement: - -<programlisting> -DROP TABLE SUPPLIER; -</programlisting> - </para> - - <para> - The <command>DROP INDEX</command> command is used to destroy an index: - -<programlisting> -DROP INDEX <replaceable class="parameter">index_name</replaceable>; -</programlisting> - </para> - - <para> - Finally to destroy a given view use the command <command>DROP - VIEW</command>: - -<programlisting> -DROP VIEW <replaceable class="parameter">view_name</replaceable>; -</programlisting> - </para> - </sect3> - </sect2> - - <sect2> - <title>Data Manipulation</title> - - <sect3> - <title>Insert Into</title> - - <para> - Once a table is created (see - <xref linkend="create-title" endterm="create-title">), it can be filled - with tuples using the command <command>INSERT INTO</command>. - The syntax is: - -<programlisting> -INSERT INTO <replaceable class="parameter">table_name</replaceable> (<replaceable class="parameter">name_of_attr_1</replaceable> - [, <replaceable class="parameter">name_of_attr_2</replaceable> [, ...]]) - VALUES (<replaceable class="parameter">val_attr_1</replaceable> [, <replaceable class="parameter">val_attr_2</replaceable> [, ...]]); -</programlisting> - </para> - - <para> - To insert the first tuple into the relation SUPPLIER (from - <xref linkend="supplier-fig" endterm="supplier-fig">) we use the - following statement: - -<programlisting> -INSERT INTO SUPPLIER (SNO, SNAME, CITY) - VALUES (1, 'Smith', 'London'); -</programlisting> - </para> - - <para> - To insert the first tuple into the relation SELLS we use: - -<programlisting> -INSERT INTO SELLS (SNO, PNO) - VALUES (1, 1); -</programlisting> - </para> - </sect3> - - <sect3> - <title>Update</title> - - <para> - To change one or more attribute values of tuples in a relation the - <command>UPDATE</command> command is used. The syntax is: - -<programlisting> -UPDATE <replaceable class="parameter">table_name</replaceable> - SET <replaceable class="parameter">name_of_attr_1</replaceable> = <replaceable class="parameter">value_1</replaceable> - [, ... [, <replaceable class="parameter">name_of_attr_k</replaceable> = <replaceable class="parameter">value_k</replaceable>]] - WHERE <replaceable class="parameter">condition</replaceable>; -</programlisting> - </para> - - <para> - To change the value of attribute PRICE of the part 'Screw' in the - relation PART we use: - -<programlisting> -UPDATE PART - SET PRICE = 15 - WHERE PNAME = 'Screw'; -</programlisting> - </para> - - <para> - The new value of attribute PRICE of the tuple whose name is 'Screw' is - now 15. - </para> - </sect3> - - <sect3> - <title>Delete</title> - - <para> - To delete a tuple from a particular table use the command DELETE - FROM. The syntax is: - -<programlisting> -DELETE FROM <replaceable class="parameter">table_name</replaceable> - WHERE <replaceable class="parameter">condition</replaceable>; -</programlisting> - </para> - - <para> - To delete the supplier called 'Smith' of the table SUPPLIER the - following statement is used: - -<programlisting> -DELETE FROM SUPPLIER - WHERE SNAME = 'Smith'; -</programlisting> - </para> - </sect3> - </sect2> - - <sect2 id="tutorial-catalogs"> - <title id="tutorial-catalogs-title">System Catalogs</title> - - <para> - In every <acronym>SQL</acronym> database system - <firstterm>system catalogs</firstterm> are used to keep - track of which tables, views indexes etc. are defined in the - database. These system catalogs can be queried as if they were normal - relations. For example there is one catalog used for the definition of - views. This catalog stores the query from the view definition. Whenever - a query against a view is made, the system first gets the - <firstterm>view definition query</firstterm> out of the catalog - and materializes the view - before proceeding with the user query (see -<!-- - section - <xref linkend="view-impl" endterm="view-impl">. - <citetitle>SIM98</citetitle> ---> - <xref linkend="SIM98" endterm="SIM98"> - for a more detailed - description). For more information about system catalogs refer to - <xref linkend="DATE04" endterm="DATE04">. - </para> - </sect2> - - <sect2> - <title>Embedded <acronym>SQL</acronym></title> - - <para> - In this section we will sketch how <acronym>SQL</acronym> can be - embedded into a host language (e.g., <literal>C</literal>). - There are two main reasons why we want to use <acronym>SQL</acronym> - from a host language: - - <itemizedlist> - <listitem> - <para> - There are queries that cannot be formulated using pure <acronym>SQL</acronym> - (i.e., recursive queries). To be able to perform such queries we need a - host language with a greater expressive power than - <acronym>SQL</acronym>. - </para> - </listitem> - - <listitem> - <para> - We simply want to access a database from some application that - is written in the host language (e.g., a ticket reservation system - with a graphical user interface is written in C and the information - about which tickets are still left is stored in a database that can be - accessed using embedded <acronym>SQL</acronym>). - </para> - </listitem> - </itemizedlist> - </para> - - <para> - A program using embedded <acronym>SQL</acronym> - in a host language consists of statements - of the host language and of - <firstterm>embedded <acronym>SQL</acronym></firstterm> - (<acronym>ESQL</acronym>) statements. Every <acronym>ESQL</acronym> - statement begins with the keywords <command>EXEC SQL</command>. - The <acronym>ESQL</acronym> statements are - transformed to statements of the host language - by a <firstterm>precompiler</firstterm> - (which usually inserts - calls to library routines that perform the various <acronym>SQL</acronym> - commands). - </para> - - <para> - When we look at the examples throughout - <xref linkend="select-title" endterm="select-title"> we - realize that the result of the queries is very often a set of - tuples. Most host languages are not designed to operate on sets so we - need a mechanism to access every single tuple of the set of tuples - returned by a SELECT statement. This mechanism can be provided by - declaring a <firstterm>cursor</firstterm>. - After that we can use the <command>FETCH</command> command to - retrieve a tuple and set the cursor to the next tuple. - </para> - - <para> - For a detailed discussion on embedded <acronym>SQL</acronym> - refer to - <xref linkend="DATE97" endterm="DATE97">, - <xref linkend="DATE04" endterm="DATE04">, - or - <xref linkend="ULL88" endterm="ULL88">. - </para> - </sect2> - </sect1> - </chapter> |