aboutsummaryrefslogtreecommitdiff
path: root/doc/FAQ_DEV
diff options
context:
space:
mode:
Diffstat (limited to 'doc/FAQ_DEV')
-rw-r--r--doc/FAQ_DEV391
1 files changed, 261 insertions, 130 deletions
diff --git a/doc/FAQ_DEV b/doc/FAQ_DEV
index a8d8eee0d41..8190358b8e4 100644
--- a/doc/FAQ_DEV
+++ b/doc/FAQ_DEV
@@ -1,35 +1,40 @@
-Developer's Frequently Asked Questions (FAQ) for PostgreSQL
-
-Last updated: Wed Feb 11 20:23:01 EST 1998
-
-Current maintainer: Bruce Momjian (maillist@candle.pha.pa.us)
-
-The most recent version of this document can be viewed at the postgreSQL Web
-site, http://postgreSQL.org.
-
- ------------------------------------------------------------------------
-
-Questions answered:
-
-1) What tools are available for developers?
-2) What books are good for developers?
-3) Why do we use palloc() and pfree() to allocate memory?
-4) Why do we use Node and List to make data structures?
-5) How do I add a feature or fix a bug?
-6) How do I download/update the current source tree?
-7) How do I test my changes?
-
- ------------------------------------------------------------------------
-
-1) What tools are available for developers?
-
-Aside from the User documentation mentioned in the regular FAQ, there
-are several development tools available. First, all the files in the
-pgsql/src/tools directory are designed for developers.
+ Developer's Frequently Asked Questions (FAQ) for PostgreSQL
+
+ Last updated: Fri Oct 2 15:21:32 EDT 1998
+
+ Current maintainer: Bruce Momjian (maillist@candle.pha.pa.us)
+
+ The most recent version of this document can be viewed at the
+ postgreSQL Web site, http://postgreSQL.org.
+ _________________________________________________________________
+
+ Questions
+
+ 1) What tools are available for developers?
+ 2) What books are good for developers?
+ 3) Why do we use palloc() and pfree() to allocate memory?
+ 4) Why do we use Node and List to make data structures?
+ 5) How do I add a feature or fix a bug?
+ 6) How do I download/update the current source tree?
+ 7) How do I test my changes?
+ 7) I just added a field to a structure. What else should I do?
+ 8) Why are table, column, type, function, view names sometimes
+ referenced as Name or NameData, and sometimes as char *?
+ 9) How do I efficiently access information in tables from the backend
+ code?
+ 10) What is elog()?
+ _________________________________________________________________
+
+ 1) What tools are available for developers?
+
+ Aside from the User documentation mentioned in the regular FAQ, there
+ are several development tools available. First, all the files in the
+ /tools directory are designed for developers.
RELEASE_CHANGES changes we have to make for each release
SQL_keywords standard SQL'92 keywords
- backend web flowchart of the backend directories
+ backend description/flowchart of the backend directorie
+s
ccsym find standard defines made by your compiler
entab converts tabs to spaces, used by pgindent
find_static finds functions that could be made static
@@ -42,104 +47,230 @@ pgsql/src/tools directory are designed for developers.
mkldexport create AIX exports file
pgindent indents C source files
-Let me note some of these. If you point your browser at the
-pgsql/src/tools/backend directory, you will see all the backend
-components in a flow chart. You can click on any one to see a
-description. If you then click on the directory name, you will be taken
-to the source directory, to browse the actual source code behind it. We
-also have several README files in some source directories to describe
-the function of the module. The browser will display these when you
-enter the directory also. The pgsql/src/tools/backend directory is also
-contained on our web page under the title Backend Flowchart.
-
-Second, you really should have an editor that can handle tags, so you can
-tag a function call to see the function definition, and then tag inside that
-function to see an even lower-level function, and then back out twice to
-return to the original function. Most editors support this via tags or etags
-files.
-
-Third, you need to get mkid from ftp.postgresql.org. By running
-tools/make_mkid, an archive of source symbols can be created that can be
-rapidly queried like grep or edited.
-
-make_diff has tools to create patch diff files that can be applied to the
-distribution.
-
-pgindent will format source files to match our standard format, which has
-four-space tabs, and an indenting format specified by flags to the your
-operating system's utility indent.
-
-2) What books are good for developers?
-
-I have three good books, An Introduction to Database Systems, by C.J. Date,
-Addison, Wesley, A Guide to the SQL Standard, by C.J. Date, et. al,
-Addison, Wesley, and Transaction Processing: Concepts and Techniques,
-by Jim Gray and Andreas Reuter, Morgan, Kaufmann.
-
-3) Why do we use palloc() and pfree() to allocate memory?
-
-palloc() and pfree() are used in place of malloc() and free() because we
-automatically free all memory allocated when a transaction completes. This
-makes it easier to make sure we free memory that gets allocated in one
-place, but only freed much later. There are several contexts that memory can
-be allocated in, and this controls when the allocated memory is
-automatically freed by the backend.
-
-4) Why do we use Node and List to make data structures?
-
-We do this because this allows a consistent way to pass data inside the
-backend in a flexible way. Every node has a NodeTag which specifies what
-type of data is inside the Node. Lists are lists of Nodes. lfirst(),
-lnext(), and foreach() are used to get, skip, and traverse through Lists.
-
-5) How do I add a feature or fix a bug?
-
-The source code is over 250,000 lines. Many problems/features are isolated
-to one specific area of the code. Others require knowledge of much of the
-source. If you are confused about where to start, ask the hackers list, and
-they will be glad to assess the complexity and give pointers on where to
-start.
-
-Another thing to keep in mind is that many fixes and features can be added
-with surprisingly little code. I often start by adding code, then looking at
-other areas in the code where similar things are done, and by the time I am
-finished, the patch is quite small and compact.
-
-When adding code, keep in mind that it should use the existing facilities in
-the source, for performance reasons and for simplicity. Often a review of
-existing code doing similar things is helpful.
-
-6) How do I download/update the current source tree?
-
-There are several ways to obtain the source tree. Occasional developers can
-just get the most recent source tree snapshot from ftp.postgresql.org. For
-regular developers, you can use CVSup, which is available from
-ftp.postgresql.org too. CVSup allows you to download the source tree, then
-occasionally update your copy of the source tree with any new changes. Using
-CVSup, you don't have to download the entire source each time, only the
-changed files. CVSup does not allow developers to update the source tree.
-
-Anonymous CVS is available too. See the doc/FAQ_CVS file for more
-information.
-
-To update the source tree, there are two ways. You can generate a patch
-against your current source tree, perhaps using the make_diff tools
-mentioned above, and send them to the patches list. They will be reviewed,
-and applied in a timely manner. If the patch is major, and we are in beta
-testing, the developers may wait for the final release before applying your
-patches.
-
-For hard-core developers, Marc(scrappy@postgresql.org) will give you a Unix
-shell account on postgresql.org, and you can ftp your files into your
-account, patch, and cvs install the changes directly into the source tree.
-
-6) How do I test my changes?
-
-First, use psql to make sure it is working as you expect. Then run
-src/test/regress and get the output of src/test/regress/checkresults with
-and without your changes, to see that your patch does not change the
-regression test in unexpected ways. This practice has saved me many times.
-The regression tests test the code in ways I would never do, and has caught
-many bugs in my patches. By finding the problems now, you save yourself a
-lot of debugging later when things are broken, and you can't figure out when
-it happened.
+ Let me note some of these. If you point your browser at the
+ file:/usr/local/src/pgsql/src/tools/backend/index.html directory, you
+ will see few paragraphs describing the data flow, the backend
+ components in a flow chart, and a description of the shared memory
+ area. You can click on any flowchart box to see a description. If you
+ then click on the directory name, you will be taken to the source
+ directory, to browse the actual source code behind it. We also have
+ several README files in some source directories to describe the
+ function of the module. The browser will display these when you enter
+ the directory also. The tools/backend directory is also contained on
+ our web page under the title How PostgreSQL Processes a Query.
+
+ Second, you really should have an editor that can handle tags, so you
+ can tag a function call to see the function definition, and then tag
+ inside that function to see an even lower-level function, and then
+ back out twice to return to the original function. Most editors
+ support this via tags or etags files.
+
+ Third, you need to get mkid from ftp.postgresql.org. By running
+ tools/make_mkid, an archive of source symbols can be created that can
+ be rapidly queried like grep or edited.
+
+ make_diff has tools to create patch diff files that can be applied to
+ the distribution.
+
+ pgindent will format source files to match our standard format, which
+ has four-space tabs, and an indenting format specified by flags to the
+ your operating system's utility indent.
+
+ pgindent is run on all source files just before each beta test period.
+ It auto-formats all source files to make them consistent. Comment
+ blocks that need specific line breaks should be formatted as block
+ comments, where the comment starts as /*------. These comments will
+ not be reformatted in any way.
+
+ 2) What books are good for developers?
+
+ I have four good books, An Introduction to Database Systems, by C.J.
+ Date, Addison, Wesley, A Guide to the SQL Standard, by C.J. Date, et.
+ al, Addison, Wesley, Fundamentals of Database Systems, by Elmasri and
+ Navathe, and Transaction Processing, by Jim Gray, Morgan, Kaufmann
+
+ There is also a database performance site, with a handbook on-line
+ written by Jim Gray at http://www.benchmarkresources.com.
+
+ 3) Why do we use palloc() and pfree() to allocate memory?
+
+ palloc() and pfree() are used in place of malloc() and free() because
+ we automatically free all memory allocated when a transaction
+ completes. This makes it easier to make sure we free memory that gets
+ allocated in one place, but only freed much later. There are several
+ contexts that memory can be allocated in, and this controls when the
+ allocated memory is automatically freed by the backend.
+
+ 4) Why do we use Node and List to make data structures?
+
+ We do this because this allows a consistent way to pass data inside
+ the backend in a flexible way. Every node has a NodeTag which
+ specifies what type of data is inside the Node. Lists are lists of
+ Nodes. lfirst(), lnext(), and foreach() are used to get, skip, and
+ traverse through Lists.
+
+ You can print nodes easily inside gdb. First, to disable output
+ truncation:
+
+ (gdb) set print elements 0
+
+ You may then use either of the next two commands to print out List,
+ Node, and structure contents. The first prints in a short format, and
+ the second in a long format:
+
+ (gdb) call print(any_pointer)
+ (gdb) call pprint(any_pointer)
+
+ 5) How do I add a feature or fix a bug?
+
+ The source code is over 250,000 lines. Many problems/features are
+ isolated to one specific area of the code. Others require knowledge of
+ much of the source. If you are confused about where to start, ask the
+ hackers list, and they will be glad to assess the complexity and give
+ pointers on where to start.
+
+ Another thing to keep in mind is that many fixes and features can be
+ added with surprisingly little code. I often start by adding code,
+ then looking at other areas in the code where similar things are done,
+ and by the time I am finished, the patch is quite small and compact.
+
+ When adding code, keep in mind that it should use the existing
+ facilities in the source, for performance reasons and for simplicity.
+ Often a review of existing code doing similar things is helpful.
+
+ 6) How do I download/update the current source tree?
+
+ There are several ways to obtain the source tree. Occasional
+ developers can just get the most recent source tree snapshot from
+ ftp.postgresql.org. For regular developers, you can use CVS. CVS
+ allows you to download the source tree, then occasionally update your
+ copy of the source tree with any new changes. Using CVS, you don't
+ have to download the entire source each time, only the changed files.
+ Anonymous CVS does not allows developers to update the remote source
+ tree, though privileged developers can do this. There is a CVS FAQ on
+ our web site that describes how to use remote CVS. You can also use
+ CVSup, which has similarly functionality, and is available from
+ ftp.postgresql.org.
+
+ To update the source tree, there are two ways. You can generate a
+ patch against your current source tree, perhaps using the make_diff
+ tools mentioned above, and send them to the patches list. They will be
+ reviewed, and applied in a timely manner. If the patch is major, and
+ we are in beta testing, the developers may wait for the final release
+ before applying your patches.
+
+ For hard-core developers, Marc(scrappy@postgresql.org) will give you a
+ Unix shell account on postgresql.org, so you can use CVS to update the
+ main source tree, or you can ftp your files into your account, patch,
+ and cvs install the changes directly into the source tree.
+
+ 6) How do I test my changes?
+
+ First, use psql to make sure it is working as you expect. Then run
+ src/test/regress and get the output of src/test/regress/checkresults
+ with and without your changes, to see that your patch does not change
+ the regression test in unexpected ways. This practice has saved me
+ many times. The regression tests test the code in ways I would never
+ do, and has caught many bugs in my patches. By finding the problems
+ now, you save yourself a lot of debugging later when things are
+ broken, and you can't figure out when it happened.
+
+ 7) I just added a field to a structure. What else should I do?
+
+ The structures passing around from the parser, rewrite, optimizer, and
+ executor require quite a bit of support. Most structures have support
+ routines in src/backend/nodes used to create, copy, read, and output
+ those structures. Make sure you add support for your new field to
+ these files. Find any other places the structure may need code for
+ your new field. mkid is helpful with this (see above).
+
+ 8) Why are table, column, type, function, view names sometimes referenced as
+ Name or NameData, and sometimes as char *?
+
+ Table, column, type, function, and view names are stored in system
+ tables in columns of type Name. Name is a fixed-length,
+ null-terminated type of NAMEDATALEN bytes. (The default value for
+ NAMEDATALEN is 32 bytes.)
+ typedef struct nameData
+ {
+ char data[NAMEDATALEN];
+ } NameData;
+ typedef NameData *Name;
+
+ Table, column, type, function, and view names that come in to the
+ backend via user queries are stored as variable-length,
+ null-terminated character strings.
+
+ Many functions are called with both types of names, ie. heap_open().
+ Because the Name type is null-terminated, it is safe to pass it to a
+ function expecting a char *. Because there are many cases where
+ on-disk names(Name) are compared to user-supplied names(char *), there
+ are many cases where Name and char * are used interchangeably.
+
+ 9) How do I efficiently access information in tables from the backend code?
+
+ You first need to find the tuples(rows) you are interested in. There
+ are two ways. First, SearchSysCacheTuple() and related functions allow
+ you to query the system catalogs. This is the preferred way to access
+ system tables, because the first call to the cache loads the needed
+ rows, and future requests can return the results without accessing the
+ base table. Some of the caches use system table indexes to look up
+ tuples. A list of available caches is located in
+ src/backend/utils/cache/syscache.c.
+ src/backend/utils/cache/lsyscache.c contains many column-specific
+ cache lookup functions.
+
+ The rows returned are cached-owned versions of the heap rows. They are
+ invalidated when the base table changes. Because the cache is local to
+ each backend, you may use the pointer returned from the cache for
+ short periods without making a copy of the tuple. If you send the
+ pointer into a large function that will be doing its own cache
+ lookups, it is possible the cache entry may be flushed, so you should
+ use SearchSysCacheTupleCopy() in these cases, and pfree() the tuple
+ when you are done.
+
+ If you can't use the system cache, you will need to retrieve the data
+ directly from the heap table, using the buffer cache that is shared by
+ all backends. The backend automatically takes care of loading the rows
+ into the buffer cache.
+
+ Open the table with heap_open(). You can then start a table scan with
+ heap_beginscan(), then use heap_getnext() and continue as long as
+ HeapTupleIsValid() returns true. Then do a heap_endscan(). Keys can be
+ assigned to the scan. No indexes are used, so all rows are going to be
+ compared to the keys, and only the valid rows returned.
+
+ You can also use heap_fetch() to fetch rows by block number/offset.
+ While scans automatically lock/unlock rows from the buffer cache, with
+ heap_fetch(), you must pass a Buffer pointer, and ReleaseBuffer() it
+ when completed. Once you have the row, you can get data that is common
+ to all tuples, like t_ctid and t_oid, by mererly accessing the
+ HeapTuple structure entries. If you need a table-specific column, you
+ should take the HeapTuple pointer, and use the GETSTRUCT() macro to
+ access the table-specific start of the tuple. You then cast the
+ pointer as a Form_pg_proc pointer if you are accessing the pg_proc
+ table, or TypeTupleForm if you are accessing pg_type. You can then
+ access the columns by using a structure pointer:
+
+ ((Form_pg_class) GETSTRUCT(tuple))->relnatts
+
+ You should not directly change live tuples in this way. The best way
+ is to use heap_tuplemodify() and pass it your palloc'ed tuple, and the
+ values you want changed. It returns another palloc'ed tuple, which you
+ pass to heap_replace(). You can delete tuples by passing the tuple's
+ t_ctid to heap_destroy(). Remember, tuples can be either system cache
+ versions, which may go away soon after you get them, buffer cache
+ version, which will go away when you heap_getnext(), heap_endscan, or
+ ReleaseBuffer(), in the heap_fetch() case. Or it may be a palloc'ed
+ tuple, that you must pfree() when finished.
+
+ 10) What is elog()?
+
+ elog() is used to send messages to the front-end, and optionally
+ terminate the current query being processed. The first parameter is an
+ elog level of NOTICE, DEBUG, ERROR, or FATAL. NOTICE prints on the
+ user's terminal and the postmaster logs. DEBUG prints only in the
+ postmaster logs. ERROR prints in both places, and terminates the
+ current query, never returning from the call. FATAL terminates the
+ backend process. The remaining parameters of elog are a printf-style
+ set of parameters to print.