diff options
author | Bruce Momjian <bruce@momjian.us> | 1998-10-24 04:43:39 +0000 |
---|---|---|
committer | Bruce Momjian <bruce@momjian.us> | 1998-10-24 04:43:39 +0000 |
commit | 30b2d287fb544774162e543bff59d7f9ed1be97f (patch) | |
tree | 191957483d060e0137e28cd457f5f0675c959e9d /doc/FAQ_DEV | |
parent | ba63dcd6a63f3299b58292768c3069a8745a8e05 (diff) | |
download | postgresql-30b2d287fb544774162e543bff59d7f9ed1be97f.tar.gz postgresql-30b2d287fb544774162e543bff59d7f9ed1be97f.zip |
HISTORY file update.
Diffstat (limited to 'doc/FAQ_DEV')
-rw-r--r-- | doc/FAQ_DEV | 391 |
1 files changed, 261 insertions, 130 deletions
diff --git a/doc/FAQ_DEV b/doc/FAQ_DEV index a8d8eee0d41..8190358b8e4 100644 --- a/doc/FAQ_DEV +++ b/doc/FAQ_DEV @@ -1,35 +1,40 @@ -Developer's Frequently Asked Questions (FAQ) for PostgreSQL - -Last updated: Wed Feb 11 20:23:01 EST 1998 - -Current maintainer: Bruce Momjian (maillist@candle.pha.pa.us) - -The most recent version of this document can be viewed at the postgreSQL Web -site, http://postgreSQL.org. - - ------------------------------------------------------------------------ - -Questions answered: - -1) What tools are available for developers? -2) What books are good for developers? -3) Why do we use palloc() and pfree() to allocate memory? -4) Why do we use Node and List to make data structures? -5) How do I add a feature or fix a bug? -6) How do I download/update the current source tree? -7) How do I test my changes? - - ------------------------------------------------------------------------ - -1) What tools are available for developers? - -Aside from the User documentation mentioned in the regular FAQ, there -are several development tools available. First, all the files in the -pgsql/src/tools directory are designed for developers. + Developer's Frequently Asked Questions (FAQ) for PostgreSQL + + Last updated: Fri Oct 2 15:21:32 EDT 1998 + + Current maintainer: Bruce Momjian (maillist@candle.pha.pa.us) + + The most recent version of this document can be viewed at the + postgreSQL Web site, http://postgreSQL.org. + _________________________________________________________________ + + Questions + + 1) What tools are available for developers? + 2) What books are good for developers? + 3) Why do we use palloc() and pfree() to allocate memory? + 4) Why do we use Node and List to make data structures? + 5) How do I add a feature or fix a bug? + 6) How do I download/update the current source tree? + 7) How do I test my changes? + 7) I just added a field to a structure. What else should I do? + 8) Why are table, column, type, function, view names sometimes + referenced as Name or NameData, and sometimes as char *? + 9) How do I efficiently access information in tables from the backend + code? + 10) What is elog()? + _________________________________________________________________ + + 1) What tools are available for developers? + + Aside from the User documentation mentioned in the regular FAQ, there + are several development tools available. First, all the files in the + /tools directory are designed for developers. RELEASE_CHANGES changes we have to make for each release SQL_keywords standard SQL'92 keywords - backend web flowchart of the backend directories + backend description/flowchart of the backend directorie +s ccsym find standard defines made by your compiler entab converts tabs to spaces, used by pgindent find_static finds functions that could be made static @@ -42,104 +47,230 @@ pgsql/src/tools directory are designed for developers. mkldexport create AIX exports file pgindent indents C source files -Let me note some of these. If you point your browser at the -pgsql/src/tools/backend directory, you will see all the backend -components in a flow chart. You can click on any one to see a -description. If you then click on the directory name, you will be taken -to the source directory, to browse the actual source code behind it. We -also have several README files in some source directories to describe -the function of the module. The browser will display these when you -enter the directory also. The pgsql/src/tools/backend directory is also -contained on our web page under the title Backend Flowchart. - -Second, you really should have an editor that can handle tags, so you can -tag a function call to see the function definition, and then tag inside that -function to see an even lower-level function, and then back out twice to -return to the original function. Most editors support this via tags or etags -files. - -Third, you need to get mkid from ftp.postgresql.org. By running -tools/make_mkid, an archive of source symbols can be created that can be -rapidly queried like grep or edited. - -make_diff has tools to create patch diff files that can be applied to the -distribution. - -pgindent will format source files to match our standard format, which has -four-space tabs, and an indenting format specified by flags to the your -operating system's utility indent. - -2) What books are good for developers? - -I have three good books, An Introduction to Database Systems, by C.J. Date, -Addison, Wesley, A Guide to the SQL Standard, by C.J. Date, et. al, -Addison, Wesley, and Transaction Processing: Concepts and Techniques, -by Jim Gray and Andreas Reuter, Morgan, Kaufmann. - -3) Why do we use palloc() and pfree() to allocate memory? - -palloc() and pfree() are used in place of malloc() and free() because we -automatically free all memory allocated when a transaction completes. This -makes it easier to make sure we free memory that gets allocated in one -place, but only freed much later. There are several contexts that memory can -be allocated in, and this controls when the allocated memory is -automatically freed by the backend. - -4) Why do we use Node and List to make data structures? - -We do this because this allows a consistent way to pass data inside the -backend in a flexible way. Every node has a NodeTag which specifies what -type of data is inside the Node. Lists are lists of Nodes. lfirst(), -lnext(), and foreach() are used to get, skip, and traverse through Lists. - -5) How do I add a feature or fix a bug? - -The source code is over 250,000 lines. Many problems/features are isolated -to one specific area of the code. Others require knowledge of much of the -source. If you are confused about where to start, ask the hackers list, and -they will be glad to assess the complexity and give pointers on where to -start. - -Another thing to keep in mind is that many fixes and features can be added -with surprisingly little code. I often start by adding code, then looking at -other areas in the code where similar things are done, and by the time I am -finished, the patch is quite small and compact. - -When adding code, keep in mind that it should use the existing facilities in -the source, for performance reasons and for simplicity. Often a review of -existing code doing similar things is helpful. - -6) How do I download/update the current source tree? - -There are several ways to obtain the source tree. Occasional developers can -just get the most recent source tree snapshot from ftp.postgresql.org. For -regular developers, you can use CVSup, which is available from -ftp.postgresql.org too. CVSup allows you to download the source tree, then -occasionally update your copy of the source tree with any new changes. Using -CVSup, you don't have to download the entire source each time, only the -changed files. CVSup does not allow developers to update the source tree. - -Anonymous CVS is available too. See the doc/FAQ_CVS file for more -information. - -To update the source tree, there are two ways. You can generate a patch -against your current source tree, perhaps using the make_diff tools -mentioned above, and send them to the patches list. They will be reviewed, -and applied in a timely manner. If the patch is major, and we are in beta -testing, the developers may wait for the final release before applying your -patches. - -For hard-core developers, Marc(scrappy@postgresql.org) will give you a Unix -shell account on postgresql.org, and you can ftp your files into your -account, patch, and cvs install the changes directly into the source tree. - -6) How do I test my changes? - -First, use psql to make sure it is working as you expect. Then run -src/test/regress and get the output of src/test/regress/checkresults with -and without your changes, to see that your patch does not change the -regression test in unexpected ways. This practice has saved me many times. -The regression tests test the code in ways I would never do, and has caught -many bugs in my patches. By finding the problems now, you save yourself a -lot of debugging later when things are broken, and you can't figure out when -it happened. + Let me note some of these. If you point your browser at the + file:/usr/local/src/pgsql/src/tools/backend/index.html directory, you + will see few paragraphs describing the data flow, the backend + components in a flow chart, and a description of the shared memory + area. You can click on any flowchart box to see a description. If you + then click on the directory name, you will be taken to the source + directory, to browse the actual source code behind it. We also have + several README files in some source directories to describe the + function of the module. The browser will display these when you enter + the directory also. The tools/backend directory is also contained on + our web page under the title How PostgreSQL Processes a Query. + + Second, you really should have an editor that can handle tags, so you + can tag a function call to see the function definition, and then tag + inside that function to see an even lower-level function, and then + back out twice to return to the original function. Most editors + support this via tags or etags files. + + Third, you need to get mkid from ftp.postgresql.org. By running + tools/make_mkid, an archive of source symbols can be created that can + be rapidly queried like grep or edited. + + make_diff has tools to create patch diff files that can be applied to + the distribution. + + pgindent will format source files to match our standard format, which + has four-space tabs, and an indenting format specified by flags to the + your operating system's utility indent. + + pgindent is run on all source files just before each beta test period. + It auto-formats all source files to make them consistent. Comment + blocks that need specific line breaks should be formatted as block + comments, where the comment starts as /*------. These comments will + not be reformatted in any way. + + 2) What books are good for developers? + + I have four good books, An Introduction to Database Systems, by C.J. + Date, Addison, Wesley, A Guide to the SQL Standard, by C.J. Date, et. + al, Addison, Wesley, Fundamentals of Database Systems, by Elmasri and + Navathe, and Transaction Processing, by Jim Gray, Morgan, Kaufmann + + There is also a database performance site, with a handbook on-line + written by Jim Gray at http://www.benchmarkresources.com. + + 3) Why do we use palloc() and pfree() to allocate memory? + + palloc() and pfree() are used in place of malloc() and free() because + we automatically free all memory allocated when a transaction + completes. This makes it easier to make sure we free memory that gets + allocated in one place, but only freed much later. There are several + contexts that memory can be allocated in, and this controls when the + allocated memory is automatically freed by the backend. + + 4) Why do we use Node and List to make data structures? + + We do this because this allows a consistent way to pass data inside + the backend in a flexible way. Every node has a NodeTag which + specifies what type of data is inside the Node. Lists are lists of + Nodes. lfirst(), lnext(), and foreach() are used to get, skip, and + traverse through Lists. + + You can print nodes easily inside gdb. First, to disable output + truncation: + + (gdb) set print elements 0 + + You may then use either of the next two commands to print out List, + Node, and structure contents. The first prints in a short format, and + the second in a long format: + + (gdb) call print(any_pointer) + (gdb) call pprint(any_pointer) + + 5) How do I add a feature or fix a bug? + + The source code is over 250,000 lines. Many problems/features are + isolated to one specific area of the code. Others require knowledge of + much of the source. If you are confused about where to start, ask the + hackers list, and they will be glad to assess the complexity and give + pointers on where to start. + + Another thing to keep in mind is that many fixes and features can be + added with surprisingly little code. I often start by adding code, + then looking at other areas in the code where similar things are done, + and by the time I am finished, the patch is quite small and compact. + + When adding code, keep in mind that it should use the existing + facilities in the source, for performance reasons and for simplicity. + Often a review of existing code doing similar things is helpful. + + 6) How do I download/update the current source tree? + + There are several ways to obtain the source tree. Occasional + developers can just get the most recent source tree snapshot from + ftp.postgresql.org. For regular developers, you can use CVS. CVS + allows you to download the source tree, then occasionally update your + copy of the source tree with any new changes. Using CVS, you don't + have to download the entire source each time, only the changed files. + Anonymous CVS does not allows developers to update the remote source + tree, though privileged developers can do this. There is a CVS FAQ on + our web site that describes how to use remote CVS. You can also use + CVSup, which has similarly functionality, and is available from + ftp.postgresql.org. + + To update the source tree, there are two ways. You can generate a + patch against your current source tree, perhaps using the make_diff + tools mentioned above, and send them to the patches list. They will be + reviewed, and applied in a timely manner. If the patch is major, and + we are in beta testing, the developers may wait for the final release + before applying your patches. + + For hard-core developers, Marc(scrappy@postgresql.org) will give you a + Unix shell account on postgresql.org, so you can use CVS to update the + main source tree, or you can ftp your files into your account, patch, + and cvs install the changes directly into the source tree. + + 6) How do I test my changes? + + First, use psql to make sure it is working as you expect. Then run + src/test/regress and get the output of src/test/regress/checkresults + with and without your changes, to see that your patch does not change + the regression test in unexpected ways. This practice has saved me + many times. The regression tests test the code in ways I would never + do, and has caught many bugs in my patches. By finding the problems + now, you save yourself a lot of debugging later when things are + broken, and you can't figure out when it happened. + + 7) I just added a field to a structure. What else should I do? + + The structures passing around from the parser, rewrite, optimizer, and + executor require quite a bit of support. Most structures have support + routines in src/backend/nodes used to create, copy, read, and output + those structures. Make sure you add support for your new field to + these files. Find any other places the structure may need code for + your new field. mkid is helpful with this (see above). + + 8) Why are table, column, type, function, view names sometimes referenced as + Name or NameData, and sometimes as char *? + + Table, column, type, function, and view names are stored in system + tables in columns of type Name. Name is a fixed-length, + null-terminated type of NAMEDATALEN bytes. (The default value for + NAMEDATALEN is 32 bytes.) + typedef struct nameData + { + char data[NAMEDATALEN]; + } NameData; + typedef NameData *Name; + + Table, column, type, function, and view names that come in to the + backend via user queries are stored as variable-length, + null-terminated character strings. + + Many functions are called with both types of names, ie. heap_open(). + Because the Name type is null-terminated, it is safe to pass it to a + function expecting a char *. Because there are many cases where + on-disk names(Name) are compared to user-supplied names(char *), there + are many cases where Name and char * are used interchangeably. + + 9) How do I efficiently access information in tables from the backend code? + + You first need to find the tuples(rows) you are interested in. There + are two ways. First, SearchSysCacheTuple() and related functions allow + you to query the system catalogs. This is the preferred way to access + system tables, because the first call to the cache loads the needed + rows, and future requests can return the results without accessing the + base table. Some of the caches use system table indexes to look up + tuples. A list of available caches is located in + src/backend/utils/cache/syscache.c. + src/backend/utils/cache/lsyscache.c contains many column-specific + cache lookup functions. + + The rows returned are cached-owned versions of the heap rows. They are + invalidated when the base table changes. Because the cache is local to + each backend, you may use the pointer returned from the cache for + short periods without making a copy of the tuple. If you send the + pointer into a large function that will be doing its own cache + lookups, it is possible the cache entry may be flushed, so you should + use SearchSysCacheTupleCopy() in these cases, and pfree() the tuple + when you are done. + + If you can't use the system cache, you will need to retrieve the data + directly from the heap table, using the buffer cache that is shared by + all backends. The backend automatically takes care of loading the rows + into the buffer cache. + + Open the table with heap_open(). You can then start a table scan with + heap_beginscan(), then use heap_getnext() and continue as long as + HeapTupleIsValid() returns true. Then do a heap_endscan(). Keys can be + assigned to the scan. No indexes are used, so all rows are going to be + compared to the keys, and only the valid rows returned. + + You can also use heap_fetch() to fetch rows by block number/offset. + While scans automatically lock/unlock rows from the buffer cache, with + heap_fetch(), you must pass a Buffer pointer, and ReleaseBuffer() it + when completed. Once you have the row, you can get data that is common + to all tuples, like t_ctid and t_oid, by mererly accessing the + HeapTuple structure entries. If you need a table-specific column, you + should take the HeapTuple pointer, and use the GETSTRUCT() macro to + access the table-specific start of the tuple. You then cast the + pointer as a Form_pg_proc pointer if you are accessing the pg_proc + table, or TypeTupleForm if you are accessing pg_type. You can then + access the columns by using a structure pointer: + + ((Form_pg_class) GETSTRUCT(tuple))->relnatts + + You should not directly change live tuples in this way. The best way + is to use heap_tuplemodify() and pass it your palloc'ed tuple, and the + values you want changed. It returns another palloc'ed tuple, which you + pass to heap_replace(). You can delete tuples by passing the tuple's + t_ctid to heap_destroy(). Remember, tuples can be either system cache + versions, which may go away soon after you get them, buffer cache + version, which will go away when you heap_getnext(), heap_endscan, or + ReleaseBuffer(), in the heap_fetch() case. Or it may be a palloc'ed + tuple, that you must pfree() when finished. + + 10) What is elog()? + + elog() is used to send messages to the front-end, and optionally + terminate the current query being processed. The first parameter is an + elog level of NOTICE, DEBUG, ERROR, or FATAL. NOTICE prints on the + user's terminal and the postmaster logs. DEBUG prints only in the + postmaster logs. ERROR prints in both places, and terminates the + current query, never returning from the call. FATAL terminates the + backend process. The remaining parameters of elog are a printf-style + set of parameters to print. |