postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
...
*	Run pgindent on 9.2 source tree in preparation for first 9.3	Bruce Momjian	2012-06-10
\| \| \| \|	commit-fest.
*	Avoid repeated creation/freeing of per-subre DFAs during regex search.	Tom Lane	2012-02-24
\| \| \| \| \| \| \| \| \| \| \| \|	In nested sub-regex trees, lower-level nodes created DFAs and then destroyed them again before exiting, which is a bit dumb considering that the recursive search is likely to call those nodes again later. Instead cache each created DFA until the end of pg_regexec(). This is basically a space for time tradeoff, in that it might increase the maximum memory usage. However, in most regex patterns there are not all that many subre nodes, so not that many DFAs --- and in any case, the peak usage occurs when reaching the bottom recursion level, and except for alternation cases that's going to be the same anyway.
*	Remove useless "retry memory" logic within regex engine.	Tom Lane	2012-02-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Apparently some primordial version of Spencer's engine needed cdissect() and child functions to be able to continue matching from a previous position when re-called. That is dead code, though, since trivial inspection shows that cdissect can never be entered without having previously done zapmem which resets the relevant retry counter. I have also verified experimentally that no case in the Tcl regression tests reaches cdissect with a nonzero retry value. Accordingly, remove that logic. This doesn't really save any noticeable number of cycles in itself, but it is one step towards making dissect() and cdissect() equivalent, which will allow removing hundreds of lines of near-duplicated code. Since struct subre's "retry" field is no longer particularly related to any kind of retry, rename it to "id". As of this commit it's only used for identifying a subre node in debug printouts, so you might think we should get rid of the field entirely; but I have a plan for another use.
*	Fix the general case of quantified regex back-references.	Tom Lane	2012-02-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cases where a back-reference is part of a larger subexpression that is quantified have never worked in Spencer's regex engine, because he used a compile-time transformation that neglected the need to check the back-reference match in iterations before the last one. (That was okay for capturing parens, and we still do it if the regex has only capturing parens ... but it's not okay for backrefs.) To make this work properly, we have to add an "iteration" node type to the regex engine's vocabulary of sub-regex nodes. Since this is a moderately large change with a fair risk of introducing new bugs of its own, apply to HEAD only, even though it's a fix for a longstanding bug.
*	Create the beginnings of internals documentation for the regex code.	Tom Lane	2012-02-19
\| \| \| \| \| \| \| \| \| \|	Create src/backend/regex/README to hold an implementation overview of the regex package, and fill it in with some preliminary notes about the code's DFA/NFA processing and colormap management. Much more to do there of course. Also, improve some code comments around the colormap and cvec code. No functional changes except to add one missing assert.
*	Teach regular expression operators to honor collations.	Tom Lane	2011-04-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This involves getting the character classification and case-folding functions in the regex library to use the collations infrastructure. Most of this work had been done already in connection with the upper/lower and LIKE logic, so it was a simple matter of transposition. While at it, split out these functions into a separate source file regc_pg_locale.c, so that they can be correctly labeled with the Postgres project's license rather than the Scriptics license. These functions are 100% Postgres-written code whereas what remains in regc_locale.c is still mostly not ours, so lumping them both under the same copyright notice was getting more and more misleading.
*	Remove cvs keywords from all files.	Magnus Hagander	2010-09-20
\|
*	Teach the regular expression functions to do case-insensitive matching and	Tom Lane	2009-12-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	locale-dependent character classification properly when the database encoding is UTF8. The previous coding worked okay in single-byte encodings, or in any case for ASCII characters, but failed entirely on multibyte characters. The fix assumes that the <wctype.h> functions use Unicode code points as the wchar representation for Unicode, ie, wchar matches pg_wchar. This is only a partial solution, since we're still stupid about non-ASCII characters in multibyte encodings other than UTF8. The practical effect of that is limited, however, since those cases are generally Far Eastern glyphs for which concepts like case-folding don't apply anyway. Certainly all or nearly all of the field reports of problems have been about UTF8. A more general solution would require switching to the platform's wchar representation for all regex operations; which is possible but would have substantial disadvantages. Let's try this and see if it's sufficient in practice.
*	Remove regex_flavor GUC, so that regular expressions are always "advanced"	Tom Lane	2009-10-21
\| \| \| \| \| \| \| \| \|	style by default. Per discussion, there seems to be hardly anything that really relies on being able to change the regex flavor, so the ability to select it via embedded options ought to be enough for any stragglers. Also, if we didn't remove the GUC, we'd really be morally obligated to mark the regex functions non-immutable, which'd possibly create performance issues.
*	8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list	Bruce Momjian	2009-06-11
\| \| \| \|	provided by Andrew.
*	Convert three more guc settings to enum type:	Magnus Hagander	2008-04-02
\| \| \| \|	default_transaction_isolation, session_replication_role and regex_flavor.
*	Sync our regex code with upstream changes since last time we did this, which	Tom Lane	2008-02-14
\| \| \| \| \| \| \| \| \| \| \| \| \|	was Tcl 8.4.8. The main changes are to remove the never-fully-implemented code for multi-character collating elements, and to const-ify some stuff a bit more fully. In combination with the recent security patch, this commit brings us into line with Tcl 8.5.0. Note that I didn't make any effort to duplicate a lot of cosmetic changes that they made to bring their copy into line with their own style guidelines, such as adding braces around single-line IF bodies. Most of those we either had done already (such as ANSI-fication of function headers) or there is no point because pgindent would undo the change anyway.
*	Fix assorted security-grade bugs in the regex engine. All of these problems	Tom Lane	2008-01-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	are shared with Tcl, since it's their code to begin with, and the patches have been copied from Tcl 8.5.0. Problems: CVE-2007-4769: Inadequate check on the range of backref numbers allows crash due to out-of-bounds read. CVE-2007-4772: Infinite loop in regex optimizer for pattern '($\|^)*'. CVE-2007-6067: Very slow optimizer cleanup for regex with a large NFA representation, as well as crash if we encounter an out-of-memory condition during NFA construction. Part of the response to CVE-2007-6067 is to put a limit on the number of states in the NFA representation of a regex. This seems needed even though the within-the-code problems have been corrected, since otherwise the code could try to use very large amounts of memory for a suitably-crafted regex, leading to potential DOS by driving the system into swap, activating a kernel OOM killer, etc. Although there are certainly plenty of ways to drive the system into effective DOS with poorly-written SQL queries, these problems seem worth treating as security issues because many applications might accept regex search patterns from untrustworthy sources. Thanks to Will Drewry of Google for reporting these problems. Patches by Will Drewry and Tom Lane. Security: CVE-2007-4769, CVE-2007-4772, CVE-2007-6067
*	Adjust regcustom.h so that all those assert() calls in the regex package	Tom Lane	2007-10-06
\| \| \| \| \| \|	are converted to Postgres Assert() macros, instead of using <assert.h> as formerly. No difference in production builds, but --enable-cassert debug builds will get better coverage for regex testing.
*	Wording cleanup for error messages. Also change can't -> cannot.	Bruce Momjian	2007-02-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Standard English uses "may", "can", and "might" in different ways: may - permission, "You may borrow my rake." can - ability, "I can lift that log." might - possibility, "It might rain today." Unfortunately, in conversational English, their use is often mixed, as in, "You may use this variable to do X", when in fact, "can" is a better choice. Similarly, "It may crash" is better stated, "It might crash".
*	Standard pgindent run for 8.1.	Bruce Momjian	2005-10-15
\|
*	I made the patch that implements regexp_replace again.	Bruce Momjian	2005-07-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The specification of this function is as follows. regexp_replace(source text, pattern text, replacement text, [flags text]) returns text Replace string that matches to regular expression in source text to replacement text. - pattern is regular expression pattern. - replacement is replace string that can use '\1'-'\9', and '\&'. '\1'-'\9': back reference to the n'th subexpression. '\&' : entire matched string. - flags can use the following values: g: global (replace all) i: ignore case When the flags is not specified, case sensitive, replace the first instance only. Atsushi Ogawa
*	Add parentheses to macros when args are used in computations. Without	Bruce Momjian	2005-05-25
\| \| \| \|	them, the executation behavior could be unexpected.
*	Solve the 'Turkish problem' with undesirable locale behavior for case	Tom Lane	2004-05-07
\| \| \| \| \| \| \| \| \| \| \| \| \|	conversion of basic ASCII letters. Remove all uses of strcasecmp and strncasecmp in favor of new functions pg_strcasecmp and pg_strncasecmp; remove most but not all direct uses of toupper and tolower in favor of pg_toupper and pg_tolower. These functions use the same notions of case folding already developed for identifier case conversion. I left the straight locale-based folding in place for situations where we are just manipulating user data and not trying to match it to built-in strings --- for example, the SQL upper() function is still locale dependent. Perhaps this will prove not to be what's wanted, but at the moment we can initdb and pass regression tests in Turkish locale.
*	make sure the $Id tags are converted to $PostgreSQL as well ...	PostgreSQL Daemon	2003-11-29
\|
*	Another pgindent run with updated typedefs.	Bruce Momjian	2003-08-08
\|
*	pgindent run.	Bruce Momjian	2003-08-04
\|
*	Replace regular expression package with Henry Spencer's latest version	Tom Lane	2003-02-05
\| \| \| \| \| \| \|	(extracted from Tcl 8.4.1 release, as Henry still hasn't got round to making it a separate library). This solves a performance problem for multibyte, as well as upgrading our regexp support to match recent Tcl and nearly match recent Perl.
*	pgindent run.	Bruce Momjian	2002-09-04
\|
*	Remove #ifdef MULTIBYTE per hackers list discussion.	Tatsuo Ishii	2002-08-29
\|
*	Implement SQL99 OVERLAY(). Allows substitution of a substring in a string.	Thomas G. Lockhart	2002-06-11
\| \| \| \| \| \| \| \| \| \| \|	Implement SQL99 SIMILAR TO as a synonym for our existing operator "~". Implement SQL99 regular expression SUBSTRING(string FROM pat FOR escape). Extend the definition to make the FOR clause optional. Define textregexsubstr() to actually implement this feature. Update the regression test to include these new string features. All tests pass. Rename the regular expression support routines from "pg95_xxx" to "pg_xxx". Define CREATE CHARACTER SET in the parser per SQL99. No implementation yet.
*	New pgindent run with fixes suggested by Tom. Patch manually reviewed,	Bruce Momjian	2001-11-05
\| \| \| \|	initdb/regression tests pass.
*	Another pgindent run. Fixes enum indenting, and improves #endif	Bruce Momjian	2001-10-28
\| \| \| \|	spacing. Also adds space for one-line comments.
*	pgindent run on all C files. Java run to follow. initdb/regression	Bruce Momjian	2001-10-25
\| \| \| \|	tests pass.
*	pgindent run. Make it all clean.	Bruce Momjian	2001-03-22
\|
*	Add _REGEX_UTILS_H to avoid duplication.	Tatsuo Ishii	2001-02-22
\|
*	Clean up portability problems in regexp package: change all routine	Tom Lane	2001-02-13
\| \| \| \| \| \|	definitions from K&R to ANSI C style, and fix broken assumption that int and long are the same datatype. This repairs problems observed on Alpha with regexps having between 32 and 63 states.
*	Hmm, this isn't used either.	Tom Lane	2001-02-12
\|
*	Remove unused and largely-broken-anyway compatibility defs.	Tom Lane	2001-02-12
\|
*	Restructure the key include files per recent pghackers discussion: there	Tom Lane	2001-02-10
\| \| \| \| \| \| \| \| \| \| \|	are now separate files "postgres.h" and "postgres_fe.h", which are meant to be the primary include files for backend .c files and frontend .c files respectively. By default, only include files meant for frontend use are installed into the installation include directory. There is a new make target 'make install-all-headers' that adds the whole content of the src/include tree to the installed fileset, for use by people who want to develop server-side code without keeping the complete source tree on hand. Cleaned up a whole lot of crufty and inconsistent header inclusions.
*	Ensure that all uses of <ctype.h> functions are applied to unsigned-char	Tom Lane	2000-12-03
\| \| \| \| \|	values, whether the local char type is signed or not. This is necessary for portability. Per discussion on pghackers around 9/16/00.
*	Clean up #include's.	Bruce Momjian	2000-06-15
\|
*	Removed MBFLAGS from makefiles since it's now done in include/config.h.	Peter Eisentraut	2000-01-19
\|
*	Move some system includes into c.h, and remove duplicates.	Bruce Momjian	1999-07-17
\|
*	Change #include's to use <> and "" as appropriate.	Bruce Momjian	1999-07-15
\|
*	Change my-function-name-- to my_function_name, and optimizer renames.	Bruce Momjian	1999-02-13
\|
*	Portability fixes found needed for SunOS 4.1.x:	Tom Lane	1998-11-30
\| \| \| \| \| \|	SunOS has tas(), but not memmove or strerror, and its sprintf() doesn't return int. Also, older versions of GNU Make don't like rules with empty left-hand sides...
*	OK, folks, here is the pgindent output.	Bruce Momjian	1998-09-01
\|
*	From: t-ishii@sra.co.jp	Marc G. Fournier	1998-07-26
\| \| \| \| \| \| \| \| \|	As Bruce mentioned, this is due to the conflict among changes we made. Included patches should fix the problem(I changed all MB to MULTIBYTE). Please let me know if you have further problem. P.S. I did not include pathces to configure and gram.c to save the file size(configure.in and gram.y modified).
*	I really hope that I haven't missed anything in this one...	Marc G. Fournier	1998-07-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From: t-ishii@sra.co.jp Attached are patches to enhance the multi-byte support. (patches are against 7/18 snapshot) * determine encoding at initdb/createdb rather than compile time Now initdb/createdb has an option to specify the encoding. Also, I modified the syntax of CREATE DATABASE to accept encoding option. See README.mb for more details. For this purpose I have added new column "encoding" to pg_database. Also pg_attribute and pg_class are changed to catch up the modification to pg_database. Actually I haved added pg_database_mb.h, pg_attribute_mb.h and pg_class_mb.h. These are used only when MB is enabled. The reason having separate files is I couldn't find a way to use ifdef or whatever in those files. I have to admit it looks ugly. No way. * support for PGCLIENTENCODING when issuing COPY command commands/copy.c modified. * support for SQL92 syntax "SET NAMES" See gram.y. * support for LATIN2-5 * add UNICODE regression test case * new test suite for MB New directory test/mb added. * clean up source files Basic idea is to have MB's own subdirectory for easier maintenance. These are include/mb and backend/utils/mb.
*	Add auto-size to screen to \d? commands. Use UNION to show all	Bruce Momjian	1998-07-18
\| \| \| \| \|	\d? results in one query. Add \d? field search feature. Rename MB to MULTIBYTE.
*	Hi, here are the patches to enhance existing MB handling. This time	Bruce Momjian	1998-06-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I have implemented a framework of encoding translation between the backend and the frontend. Also I have added a new variable setting command: SET CLIENT_ENCODING TO 'encoding'; Other features include: Latin1 support more 8 bit cleaness See doc/README.mb for more details. Note that the pacthes are against May 30 snapshot. Tatsuo Ishii
*	From: t-ishii@sra.co.jp	Marc G. Fournier	1998-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hi, here are patches I promised (against 6.3.2): * character_length(), position(), substring() are now aware of multi-byte characters * add octet_length() * add --with-mb option to configure * new regression tests for EUC_KR (contributed by "Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr>) * add some test cases to the EUC_JP regression test * fix problem in regress/regress.sh in case of System V * fix toupper(), tolower() to handle 8bit chars note that: o patches for both configure.in and configure are included. maybe the one for configure is not necessary. o pg_proc.h was modified to add octet_length(). I used OIDs (1374-1379) for that. Please let me know if these numbers are not appropriate.
*	From: t-ishii@sra.co.jp	Marc G. Fournier	1998-03-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Included are patches intended for allowing PostgreSQL to handle multi-byte charachter sets such as EUC(Extende Unix Code), Unicode and Mule internal code. With the MB patch you can use multi-byte character sets in regexp and LIKE. The encoding system chosen is determined at the compile time. To enable the MB extension, you need to define a variable "MB" in Makefile.global or in Makefile.custom. For further information please take a look at README.mb under doc directory. (Note that unlike "jp patch" I do not use modified GNU regexp any more. I changed Henry Spencer's regexp coming with PostgreSQL.)
*	Used modified version of indent that understands over 100 typedefs.	Bruce Momjian	1997-09-08
\|