postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Remove replacement code for getaddrinfo.	Thomas Munro	2022-08-14
\| \| \| \| \| \| \| \| \| \|	SUSv3, all targeted Unixes and modern Windows have getaddrinfo() and related interfaces. Drop the replacement implementation, and adjust some headers slightly to make sure that the APIs are visible everywhere using standard POSIX headers and names. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKG%2BL_3brvh%3D8e0BW_VfX9h7MtwgN%3DnFHP5o7X2oZucY9dg%40mail.gmail.com
*	Fix mismatched file identifications	John Naylor	2022-08-09
\| \| \| \| \|	Masahiko Sawada Discussion: https://www.postgresql.org/message-id/CAD21AoASq93KPiNxipPaTCzEdsnxT9665UesOnWcKhmX9Qfx6A@mail.gmail.com
*	Change internal RelFileNode references to RelFileNumber or RelFileLocator.	Robert Haas	2022-07-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have been using the term RelFileNode to refer to either (1) the integer that is used to name the sequence of files for a certain relation within the directory set aside for that tablespace/database combination; or (2) that value plus the OIDs of the tablespace and database; or occasionally (3) the whole series of files created for a relation based on those values. Using the same name for more than one thing is confusing. Replace RelFileNode with RelFileNumber when we're talking about just the single number, i.e. (1) from above, and with RelFileLocator when we're talking about all the things that are needed to locate a relation's files on disk, i.e. (2) from above. In the places where we refer to (3) as a relfilenode, instead refer to "relation storage". Since there is a ton of SQL code in the world that knows about pg_class.relfilenode, don't change the name of that column, or of other SQL-facing things that derive their name from it. On the other hand, do adjust closely-related internal terminology. For example, the structure member names dbNode and spcNode appear to be derived from the fact that the structure itself was called RelFileNode, so change those to dbOid and spcOid. Likewise, various variables with names like rnode and relnode get renamed appropriately, according to how they're being used in context. Hopefully, this is clearer than before. It is also preparation for future patches that intend to widen the relfilenumber fields from its current width of 32 bits. Variables that store a relfilenumber are now declared as type RelFileNumber rather than type Oid; right now, these are the same, but that can now more easily be changed. Dilip Kumar, per an idea from me. Reviewed also by Andres Freund. I fixed some whitespace issues, changed a couple of words in a comment, and made one other minor correction. Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
*	Remove PGDLLIMPORT marker from __pg_log_level	Michael Paquier	2022-05-13
\| \| \| \| \| \| \|	Per discussion with Tom Lane and Andres Freund. I have misunderstood the intention behind the choice done in 9a374b7. Discussion: https://postgr.es/m/20220512153737.6kbbcf4qyvwgq4s2@alap3.anarazel.de
*	Add some missing PGDLLIMPORT markings	Michael Paquier	2022-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Three variables in pqsignal.h (UnBlockSig, BlockSig and StartupBlockSig) were not marked with PGDLLIMPORT, as they are declared in a way that prevents mark_pgdllimport.pl to detect them. These variables are redefined in a style more consistent with the other headers, allowing the script to find and mark them. PGDLLIMPORT was missing for __pg_log_level in logging.h, so add it back. The marking got accidentally removed in 9a374b77, just after its addition in 8ec5694. While on it, add a comment in mark_pgdllimport.pl explaining what are the arguments needed by the script (aka a list of header paths). Reported-by: Andres Freund Discussion: https://postgr.es/m/20220506234924.6mxxotl3xl63db3l@alap3.anarazel.de
*	Remove not-very-useful early checks of __pg_log_level in logging.h.	Tom Lane	2022-04-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enforce __pg_log_level message filtering centrally in logging.c, instead of relying on the calling macros to do it. This is more reliable (e.g. it works correctly for direct calls to pg_log_generic) and it saves a percent or so of total code size because we get rid of so many duplicate checks of __pg_log_level. This does mean that argument expressions in a logging macro will be evaluated even if we end up not printing anything. That seems of little concern for INFO and higher levels as those messages are printed by default, and most of our frontend programs don't even offer a way to turn them off. I left the unlikely() checks in place for DEBUG messages, though. Discussion: https://postgr.es/m/3993549.1649449609@sss.pgh.pa.us
*	Rename backup_compression.{c,h} to compression.{c,h}	Michael Paquier	2022-04-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Compression option handling (level, algorithm or even workers) can be used across several parts of the system and not only base backups. Structures, objects and routines are renamed in consequence, to remove the concept of base backups from this part of the code making this change straight-forward. pg_receivewal, that has gained support for LZ4 since babbbb5, will make use of this infrastructure for its set of compression options, bringing more consistency with pg_basebackup. This cleanup needs to be done before releasing a beta of 15. pg_dump is a potential future target, as well, and adding more compression options to it may happen in 16~. Author: Michael Paquier Reviewed-by: Robert Haas, Georgios Kokolatos Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
*	Improve frontend error logging style.	Tom Lane	2022-04-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Get rid of the separate "FATAL" log level, as it was applied so inconsistently as to be meaningless. This mostly involves s/pg_log_fatal/pg_log_error/g. Create a macro pg_fatal() to handle the common use-case of pg_log_error() immediately followed by exit(1). Various modules had already invented either this or equivalent macros; standardize on pg_fatal() and apply it where possible. Invent the ability to add "detail" and "hint" messages to a frontend message, much as we have long had in the backend. Except where rewording was needed to convert existing coding to detail/hint style, I have (mostly) resisted the temptation to change existing message wording. Patch by me. Design and patch reviewed at various stages by Robert Haas, Kyotaro Horiguchi, Peter Eisentraut and Daniel Gustafsson. Discussion: https://postgr.es/m/1363732.1636496441@sss.pgh.pa.us
*	Apply PGDLLIMPORT markings broadly.	Robert Haas	2022-04-08
\| \| \| \| \| \| \| \| \| \| \|	Up until now, we've had a policy of only marking certain variables in the PostgreSQL header files with PGDLLIMPORT, but now we've decided to mark them all. This means that extensions running on Windows should no longer operate at a disadvantage as compared to extensions running on Linux: if the variable is present in a header file, it should be accessible. Discussion: http://postgr.es/m/CA+TgmoYanc1_FSfimhgiWSqVyP5KKmh5NP2BWNwDhO8Pg2vGYQ@mail.gmail.com
*	Allow parallel zstd compression when taking a base backup.	Robert Haas	2022-03-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	libzstd allows transparent parallel compression just by setting an option when creating the compression context, so permit that for both client and server-side backup compression. To use this, use something like pg_basebackup --compress WHERE-zstd:workers=N where WHERE is "client" or "server" and N is an integer. When compression is performed on the server side, this will spawn threads inside the PostgreSQL backend. While there is almost no PostgreSQL server code which is thread-safe, the threads here are used internally by libzstd and touch only data structures controlled by libzstd. Patch by me, based in part on earlier work by Dipesh Pandit and Jeevan Ladhe. Reviewed by Justin Pryzby. Discussion: http://postgr.es/m/CA+Tgmobj6u-nWF-j=FemygUhobhryLxf9h-wJN7W-2rSsseHNA@mail.gmail.com
*	Replace BASE_BACKUP COMPRESSION_LEVEL option with COMPRESSION_DETAIL.	Robert Haas	2022-03-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are more compression parameters that can be specified than just an integer compression level, so rename the new COMPRESSION_LEVEL option to COMPRESSION_DETAIL before it gets released. Introduce a flexible syntax for that option to allow arbitrary options to be specified without needing to adjust the main replication grammar, and common code to parse it that is shared between the client and the server. This commit doesn't actually add any new compression parameters, so the only user-visible change is that you can now type something like pg_basebackup --compress gzip:level=5 instead of writing just pg_basebackup --compress gzip:5. However, it should make it easy to add new options. If for example gzip starts offering fries, we can support pg_basebackup --compress gzip:level=5,fries=true for the benefit of users who want fries with that. Along the way, this fixes a few things in pg_basebackup so that the pg_basebackup can be used with a server-side compression algorithm that pg_basebackup itself does not understand. For example, pg_basebackup --compress server-lz4 could still succeed even if only the server and not the client has LZ4 support, provided that the other options to pg_basebackup don't require the client to decompress the archive. Patch by me. Reviewed by Justin Pryzby and Dagfinn Ilmari Mannsåker. Discussion: http://postgr.es/m/CA+TgmoYvpetyRAbbg1M8b3-iHsaN4nsgmWPjOENu5-doHuJ7fA@mail.gmail.com
*	Remove IS_AF_UNIX macro	Peter Eisentraut	2022-02-15
\| \| \| \| \| \| \| \| \| \| \| \|	The AF_UNIX macro was being used unprotected by HAVE_UNIX_SOCKETS, apparently since 2008. So the redirection through IS_AF_UNIX() is apparently no longer necessary. (More generally, all supported platforms are now HAVE_UNIX_SOCKETS, but even if there were a new platform in the future, it seems plausible that it would define the AF_UNIX symbol even without kernel support.) So remove the IS_AF_UNIX() macro and make the code a bit more consistent. Discussion: https://www.postgresql.org/message-id/flat/f2d26815-9832-e333-d52d-72fbc0ade896%40enterprisedb.com
*	Improve error handling of HMAC computations	Michael Paquier	2022-01-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is similar to b69aba7, except that this completes the work for HMAC with a new routine called pg_hmac_error() that would provide more context about the type of error that happened during a HMAC computation: - The fallback HMAC implementation in hmac.c relies on cryptohashes, so in some code paths it is necessary to return back the error generated by cryptohashes. - For the OpenSSL implementation (hmac_openssl.c), the logic is very similar to cryptohash_openssl.c, where the error context comes from OpenSSL if one of its internal routines failed, with different error codes if something internal to hmac_openssl.c failed or was incorrect. Any in-core code paths that use the centralized HMAC interface are related to SCRAM, for errors that are unlikely going to happen, with only SHA-256. It would be possible to see errors when computing some HMACs with MD5 for example and OpenSSL FIPS enabled, and this commit would help in reporting the correct errors but nothing in core uses that. So, at the end, no backpatch to v14 is done, at least for now. Errors in SCRAM related to the computation of the server key, stored key, etc. need to pass down the potential error context string across more layers of their respective call stacks for the frontend and the backend, so each surrounding routine is adapted for this purpose. Reviewed-by: Sergey Shinderuk Discussion: https://postgr.es/m/Yd0N9tSAIIkFd+qi@paquier.xyz
*	Improve error handling of cryptohash computations	Michael Paquier	2022-01-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The existing cryptohash facility was causing problems in some code paths related to MD5 (frontend and backend) that relied on the fact that the only type of error that could happen would be an OOM, as the MD5 implementation used in PostgreSQL ~13 (the in-core implementation is used when compiling with or without OpenSSL in those older versions), could fail only under this circumstance. The new cryptohash facilities can fail for reasons other than OOMs, like attempting MD5 when FIPS is enabled (upstream OpenSSL allows that up to 1.0.2, Fedora and Photon patch OpenSSL 1.1.1 to allow that), so this would cause incorrect reports to show up. This commit extends the cryptohash APIs so as callers of those routines can fetch more context when an error happens, by using a new routine called pg_cryptohash_error(). The error states are stored within each implementation's internal context data, so as it is possible to extend the logic depending on what's suited for an implementation. The default implementation requires few error states, but OpenSSL could report various issues depending on its internal state so more is needed in cryptohash_openssl.c, and the code is shaped so as we are always able to grab the necessary information. The core code is changed to adapt to the new error routine, painting more "const" across the call stack where the static errors are stored, particularly in authentication code paths on variables that provide log details. This way, any future changes would warn if attempting to free these strings. The MD5 authentication code was also a bit blurry about the handling of "logdetail" (LOG sent to the postmaster), so improve the comments related that, while on it. The origin of the problem is 87ae969, that introduced the centralized cryptohash facility. Extra changes are done for pgcrypto in v14 for the non-OpenSSL code path to cope with the improvements done by this commit. Reported-by: Michael Mühlbeyer Author: Michael Paquier Reviewed-by: Tom Lane Discussion: https://postgr.es/m/89B7F072-5BBE-4C92-903E-D83E865D9367@trivadis.com Backpatch-through: 14
*	Update copyright for 2022	Bruce Momjian	2022-01-07
\| \| \| \|	Backpatch-through: 10
*	Simplify declaring variables exported from libpgcommon and libpgport.	Tom Lane	2021-11-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commits c2d1eea9e and 11b500072, as well as similar hacks elsewhere, in favor of setting up the PGDLLIMPORT macro so that it can just be used unconditionally. That can work because in frontend code, we need no marking in either the defining or consuming files for a variable exported from these libraries; and frontend code has no need to access variables exported from the core backend, either. While at it, write some actual documentation about the PGDLLIMPORT and PGDLLEXPORT macros. Patch by me, based on a suggestion from Robert Haas. Discussion: https://postgr.es/m/1160385.1638165449@sss.pgh.pa.us
*	Portability hack for pg_global_prng_state.	Tom Lane	2021-11-29
\| \| \| \| \| \| \| \| \|	PGDLLIMPORT is only appropriate for variables declared in the backend, not when the variable is coming from a library included in frontend code. (This isn't a particularly nice fix, but for now, use the same method employed elsewhere.) Discussion: https://postgr.es/m/E1mrWUD-000235-Hq@gemulon.postgresql.org
*	Replace random(), pg_erand48(), etc with a better PRNG API and algorithm.	Tom Lane	2021-11-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Standardize on xoroshiro128 as our basic PRNG algorithm, eliminating a bunch of platform dependencies as well as fundamentally-obsolete PRNG code. In addition, this API replacement will ease replacing the algorithm again in future, should that become necessary. xoroshiro128 is a few percent slower than the drand48 family, but it can produce full-width 64-bit random values not only 48-bit, and it should be much more trustworthy. It's likely to be noticeably faster than the platform's random(), depending on which platform you are thinking about; and we can have non-global state vectors easily, unlike with random(). It is not cryptographically strong, but neither are the functions it replaces. Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
*	Provide a variant of simple_prompt() that can be interrupted by ^C.	Tom Lane	2021-11-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Up to now, you couldn't escape out of psql's \password command by typing control-C (or other local spelling of SIGINT). This is pretty user-unfriendly, so improve it. To do so, we have to modify the functions provided by pg_get_line.c; but we don't want to mess with psql's SIGINT handler setup, so provide an API that lets that handler cause the cancel to occur. This relies on the assumption that we won't do any major harm by longjmp'ing out of fgets(). While that's obviously a little shaky, we've long had the same assumption in the main input loop, and few issues have been reported. psql has some other simple_prompt() calls that could usefully be improved the same way; for now, just deal with \password. Nathan Bossart, minor tweaks by me Discussion: https://postgr.es/m/747443.1635536754@sss.pgh.pa.us
*	Update Unicode data to Unicode 14.0.0	Peter Eisentraut	2021-09-15
\|
*	Extend collection of Unicode combining characters to beyond the BMP	John Naylor	2021-08-26
\| \| \| \| \| \| \| \|	The former limit was perhaps a carryover from an older hand-coded table. Since commit bab982161 we have enough space in mbinterval to store larger codepoints, so collect all combining characters. Discussion: https://www.postgresql.org/message-id/49ad1fa0-174e-c901-b14c-c484b60907f1%40enterprisedb.com
*	Update display widths as part of updating Unicode	John Naylor	2021-08-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The hardcoded "wide character" set in ucs_wcwidth() was last updated around the Unicode 5.0 era. This led to misalignment when printing emojis and other codepoints that have since been designated wide or full-width. To fix and keep up to date, extend update-unicode to download the list of wide and full-width codepoints from the offical sources. In passing, remove some comments about non-spacing characters that haven't been accurate since we removed the former hardcoded logic. Jacob Champion Reported and reviewed by Pavel Stehule Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRCeX21O69YHxmykYySYyprZAqrKWWg0KoGKdjgqcGyygg@mail.gmail.com
*	Revert "Rename unicode_combining_table to unicode_width_table"	John Naylor	2021-08-26
\| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit eb0d0d2c7300c9c5c22b35975c11265aa4becc84. After I had committed eb0d0d2c7 and 78ab944cd, I decided to add a sanity check for a "can't happen" scenario just to be cautious. It turned out that it already happened in the official Unicode source data, namely that a character can be both wide and a combining character. This fact renders the aforementioned commits unnecessary, so revert both of them. Discussion: https://www.postgresql.org/message-id/CAFBsxsH5ejH4-1xaTLpSK8vWoK1m6fA1JBtTM6jmBsLfmDki1g%40mail.gmail.com
*	Revert "Change mbbisearch to return the character range"	John Naylor	2021-08-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 78ab944cd4b9977732becd9d0bc83223b88af9a2. After I had committed eb0d0d2c7 and 78ab944cd, I decided to add a sanity check for a "can't happen" scenario just to be cautious. It turned out that it already happened in the official Unicode source data, namely that a character can be both wide and a combining character. This fact renders the aforementioned commits unnecessary, so revert both of them. Discussion: https://www.postgresql.org/message-id/CAFBsxsH5ejH4-1xaTLpSK8vWoK1m6fA1JBtTM6jmBsLfmDki1g%40mail.gmail.com
*	Change mbbisearch to return the character range	John Naylor	2021-08-25
\| \| \| \| \| \| \| \| \| \|	Add a width field to mbinterval and have mbbisearch return a pointer to the found range rather than just bool for success. A future commit will add another width besides zero, and this will allow that to use the same search. Reviewed by Jacob Champion Discussion: https://www.postgresql.org/message-id/CAFBsxsGOCpzV7c-f3a8ADsA1n4uZ%3D8puCctQp%2Bx7W0vgkv%3Dw%2Bg%40mail.gmail.com
*	Rename unicode_combining_table to unicode_width_table	John Naylor	2021-08-25
\| \| \| \| \|	No functional changes. A future commit will use this table for other purposes besides combining characters.
*	Revert refactoring of hex code to src/common/	Michael Paquier	2021-08-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a combined revert of the following commits: - c3826f8, a refactoring piece that moved the hex decoding code to src/common/. This code was cleaned up by aef8948, as it originally included no overflow checks in the same way as the base64 routines in src/common/ used by SCRAM, making it unsafe for its purpose. - aef8948, a more advanced refactoring of the hex encoding/decoding code to src/common/ that added sanity checks on the result buffer for hex decoding and encoding. As reported by Hans Buschmann, those overflow checks are expensive, and it is possible to see a performance drop in the decoding/encoding of bytea or LOs the longer they are. Simple SQLs working on large bytea values show a clear difference in perf profile. - ccf4e27, a cleanup made possible by aef8948. The reverts of all those commits bring back the performance of hex decoding and encoding back to what it was in ~13. Fow now and post-beta3, this is the simplest option. Reported-by: Hans Buschmann Discussion: https://postgr.es/m/1629039545467.80333@nidsa.net Backpatch-through: 14
*	Adjust locations which have an incorrect copyright year	David Rowley	2021-06-04
\| \| \| \| \| \| \|	A few patches committed after ca3b37487 mistakenly forgot to make the copyright year 2021. Fix these. Discussion: https://postgr.es/m/CAApHDvqyLmd9P2oBQYJ=DbrV8QwyPRdmXtCTFYPE08h+ip0UJw@mail.gmail.com
*	Refactor HMAC implementations	Michael Paquier	2021-04-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Similarly to the cryptohash implementations, this refactors the existing HMAC code into a single set of APIs that can be plugged with any crypto libraries PostgreSQL is built with (only OpenSSL currently). If there is no such libraries, a fallback implementation is available. Those new APIs are designed similarly to the existing cryptohash layer, so there is no real new design here, with the same logic around buffer bound checks and memory handling. HMAC has a dependency on cryptohashes, so all the cryptohash types supported by cryptohash{_openssl}.c can be used with HMAC. This refactoring is an advantage mainly for SCRAM, that included its own implementation of HMAC with SHA256 without relying on the existing crypto libraries even if PostgreSQL was built with their support. This code has been tested on Windows and Linux, with and without OpenSSL, across all the versions supported on HEAD from 1.1.1 down to 1.0.1. I have also checked that the implementations are working fine using some sample results, a custom extension of my own, and doing cross-checks across different major versions with SCRAM with the client and the backend. Author: Michael Paquier Reviewed-by: Bruce Momjian Discussion: https://postgr.es/m/X9m0nkEJEzIPXjeZ@paquier.xyz
*	Improve reporting for syntax errors in multi-line JSON data.	Tom Lane	2021-03-01
\| \| \| \| \| \| \| \| \| \| \|	Point to the specific line where the error was detected; the previous code tended to include several preceding lines as well. Avoid re-scanning the entire input to recompute which line that was. Simplify the logic a bit. Add test cases. Simon Riggs and Hamid Akhtar, reviewed by Daniel Gustafsson and myself Discussion: https://postgr.es/m/CANbhV-EPBnXm3MF_TTWBwwqgn1a1Ghmep9VHfqmNBQ8BT0f+_g@mail.gmail.com
*	Add result size as argument of pg_cryptohash_final() for overflow checks	Michael Paquier	2021-02-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With its current design, a careless use of pg_cryptohash_final() could would result in an out-of-bound write in memory as the size of the destination buffer to store the result digest is not known to the cryptohash internals, without the caller knowing about that. This commit adds a new argument to pg_cryptohash_final() to allow such sanity checks, and implements such defenses. The internals of SCRAM for HMAC could be tightened a bit more, but as everything is based on SCRAM_KEY_LEN with uses particular to this code there is no need to complicate its interface more than necessary, and this comes back to the refactoring of HMAC in core. Except that, this minimizes the uses of the existing DIGEST_LENGTH variables, relying instead on sizeof() for the result sizes. In ossp-uuid, this also makes the code more defensive, as it already relied on dce_uuid_t being at least the size of a MD5 digest. This is in philosophy similar to cfc40d3 for base64.c and aef8948 for hex.c. Reported-by: Ranier Vilela Author: Michael Paquier, Ranier Vilela Reviewed-by: Kyotaro Horiguchi Discussion: https://postgr.es/m/CAEudQAoqEGmcff3J4sTSV-R_16Monuz-UpJFbf_dnVH=APr02Q@mail.gmail.com
*	Introduce SHA1 implementations in the cryptohash infrastructure	Michael Paquier	2021-01-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With this commit, SHA1 goes through the implementation provided by OpenSSL via EVP when building the backend with it, and uses as fallback implementation KAME which was located in pgcrypto and already shaped for an integration with a set of init, update and final routines. Structures and routines have been renamed to make things consistent with the fallback implementations of MD5 and SHA2. uuid-ossp has used for ages a shortcut with pgcrypto to fetch a copy of SHA1 if needed. This was built depending on the build options within ./configure, so this cleans up some code and removes the build dependency between pgcrypto and uuid-ossp. Note that this will help with the refactoring of HMAC, as pgcrypto offers the option to use MD5, SHA1 or SHA2, so only the second option was missing to make that possible. Author: Michael Paquier Reviewed-by: Heikki Linnakangas Discussion: https://postgr.es/m/X9HXKTgrvJvYO7Oh@paquier.xyz
*	Remove PG_SHA*_DIGEST_STRING_LENGTH from sha2.h	Michael Paquier	2021-01-15
\| \| \| \| \| \| \|	The last reference to those variables has been removed in aef8948, so this cleans up a bit the code. Discussion: https://postgr.es/m/X//ggAqmTtt+3t7X@paquier.xyz
*	Rework refactoring of hex and encoding routines	Michael Paquier	2021-01-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit addresses some issues with c3826f83 that moved the hex decoding routine to src/common/: - The decoding function lacked overflow checks, so when used for security-related features it was an open door to out-of-bound writes if not carefully used that could remain undetected. Like the base64 routines already in src/common/ used by SCRAM, this routine is reworked to check for overflows by having the size of the destination buffer passed as argument, with overflows checked before doing any writes. - The encoding routine was missing. This is moved to src/common/ and it gains the same overflow checks as the decoding part. On failure, the hex routines of src/common/ issue an error as per the discussion done to make them usable by frontend tools, but not by shared libraries. Note that this is why ECPG is left out of this commit, and it still includes a duplicated logic doing hex encoding and decoding. While on it, this commit uses better variable names for the source and destination buffers in the existing escape and base64 routines in encode.c and it makes them more robust to overflow detection. The previous core code issued a FATAL after doing out-of-bound writes if going through the SQL functions, which would be enough to detect problems when working on changes that impacted this area of the code. Instead, an error is issued before doing an out-of-bound write. The hex routines were being directly called for bytea conversions and backup manifests without such sanity checks. The current calls happen to not have any problems, but careless uses of such APIs could easily lead to CVE-class bugs. Author: Bruce Momjian, Michael Paquier Reviewed-by: Sehrope Sarkuni Discussion: https://postgr.es/m/20201231003557.GB22199@momjian.us
*	Fix and simplify some code related to cryptohashes	Michael Paquier	2021-01-08
\| \| \| \| \| \| \| \| \| \| \|	This commit addresses two issues: - In pgcrypto, MD5 computation called pg_cryptohash_{init,update,final} without checking for the result status. - Simplify pg_checksum_raw_context to use only one variable for all the SHA2 options available in checksum manifests. Reported-by: Heikki Linnakangas Discussion: https://postgr.es/m/f62f26bb-47a5-8411-46e5-4350823e06a5@iki.fi
*	Fix allocation logic of cryptohash context data with OpenSSL	Michael Paquier	2021-01-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The allocation of the cryptohash context data when building with OpenSSL was happening in the memory context of the caller of pg_cryptohash_create(), which could lead to issues with resowner cleanup if cascading resources are cleaned up on an error. Like other facilities using resowners, move the base allocation to TopMemoryContext to ensure a correct cleanup on failure. The resulting code gets simpler with this commit as the context data is now hold by a unique opaque pointer, so as there is only one single allocation done in TopMemoryContext. After discussion, also change the cryptohash subroutines to return an error if the caller provides NULL for the context data to ease error detection on OOM. Author: Heikki Linnakangas Discussion: https://postgr.es/m/X9xbuEoiU3dlImfa@paquier.xyz
*	Update copyright for 2021	Bruce Momjian	2021-01-02
\| \| \| \|	Backpatch-through: 9.5
*	Revert "Add key management system" (978f869b99) & later commits	Bruce Momjian	2020-12-27
\| \| \| \| \| \| \| \| \| \|	The patch needs test cases, reorganization, and cfbot testing. Technically reverts commits 5c31afc49d..e35b2bad1a (exclusive/inclusive) and 08db7c63f3..ccbe34139b. Reported-by: Tom Lane, Michael Paquier Discussion: https://postgr.es/m/E1ktAAG-0002V2-VB@gemulon.postgresql.org
*	remove uint128 requirement from patch 978f869b99 (CFE)	Bruce Momjian	2020-12-25
\| \| \| \| \| \| \| \|	Used char[16] instead. Reported-by: buildfarm member florican Backpatch-through: master
*	Add key management system	Bruce Momjian	2020-12-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a key management system that stores (currently) two data encryption keys of length 128, 192, or 256 bits. The data keys are AES256 encrypted using a key encryption key, and validated via GCM cipher mode. A command to obtain the key encryption key must be specified at initdb time, and will be run at every database server start. New parameters allow a file descriptor open to the terminal to be passed. pg_upgrade support has also been added. Discussion: https://postgr.es/m/CA+fd4k7q5o6Nc_AaX6BcYM9yqTbC6_pnH-6nSD=54Zp6NBQTCQ@mail.gmail.com Discussion: https://postgr.es/m/20201202213814.GG20285@momjian.us Author: Masahiko Sawada, me, Stephen Frost
*	move hex_decode() to /common so it can be called from frontend	Bruce Momjian	2020-12-24
\| \| \| \| \| \| \|	This allows removal of a copy of hex_decode() from ecpg, and will be used by the soon-to-be added pg_alterckey command. Backpatch-through: master
*	Refactor logic to check for ASCII-only characters in string	Michael Paquier	2020-12-21
\| \| \| \| \| \| \| \| \|	The same logic was present for collation commands, SASLprep and pgcrypto, so this removes some code. Author: Michael Paquier Reviewed-by: Stephen Frost, Heikki Linnakangas Discussion: https://postgr.es/m/X9womIn6rne6Gud2@paquier.xyz
*	Improve some code around cryptohash functions	Michael Paquier	2020-12-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adjusts some code related to recent changes for cryptohash functions: - Add a variable in md5.h to track down the size of a computed result, moved from pgcrypto. Note that pg_md5_hash() assumed a result of this size already. - Call explicit_bzero() on the hashed data when freeing the context for fallback implementations. For MD5, particularly, it would be annoying to leave some non-zeroed data around. - Clean up some code related to recent changes of uuid-ossp. .gitignore still included md5.c and a comment was incorrect. Discussion: https://postgr.es/m/X9HXKTgrvJvYO7Oh@paquier.xyz
*	Refactor MD5 implementations according to new cryptohash infrastructure	Michael Paquier	2020-12-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit heavily reorganizes the MD5 implementations that exist in the tree in various aspects. First, MD5 is added to the list of options available in cryptohash.c and cryptohash_openssl.c. This means that if building with OpenSSL, EVP is used for MD5 instead of the fallback implementation that Postgres had for ages. With the recent refactoring work for cryptohash functions, this change is straight-forward. If not building with OpenSSL, a fallback implementation internal to src/common/ is used. Second, this reduces the number of MD5 implementations present in the tree from two to one, by moving the KAME implementation from pgcrypto to src/common/, and by removing the implementation that existed in src/common/. KAME was already structured with an init/update/final set of routines by pgcrypto (see original pgcrypto/md5.h) for compatibility with OpenSSL, so moving it to src/common/ has proved to be a straight-forward move, requiring no actual manipulation of the internals of each routine. Some benchmarking has not shown any performance gap between both implementations. Similarly to the fallback implementation used for SHA2, the fallback implementation of MD5 is moved to src/common/md5.c with an internal header called md5_int.h for the init, update and final routines. This gets then consumed by cryptohash.c. The original routines used for MD5-hashed passwords are moved to a separate file called md5_common.c, also in src/common/, aimed at being shared between all MD5 implementations as utility routines to keep compatibility with any code relying on them. Like the SHA2 changes, this commit had its round of tests on both Linux and Windows, across all versions of OpenSSL supported on HEAD, with and even without OpenSSL. Author: Michael Paquier Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/20201106073434.GA4961@paquier.xyz
*	Move SHA2 routines to a new generic API layer for crypto hashes	Michael Paquier	2020-12-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Two new routines to allocate a hash context and to free it are created, as these become necessary for the goal behind this refactoring: switch the all cryptohash implementations for OpenSSL to use EVP (for FIPS and also because upstream does not recommend the use of low-level cryptohash functions for 20 years). Note that OpenSSL hides the internals of cryptohash contexts since 1.1.0, so it is necessary to leave the allocation to OpenSSL itself, explaining the need for those two new routines. This part is going to require more work to properly track hash contexts with resource owners, but this not introduced here. Still, this refactoring makes the move possible. This reduces the number of routines for all SHA2 implementations from twelve (SHA{224,256,386,512} with init, update and final calls) to five (create, free, init, update and final calls) by incorporating the hash type directly into the hash context data. The new cryptohash routines are moved to a new file, called cryptohash.c for the fallback implementations, with SHA2 specifics becoming a part internal to src/common/. OpenSSL specifics are part of cryptohash_openssl.c. This infrastructure is usable for more hash types, like MD5 or HMAC. Any code paths using the internal SHA2 routines are adapted to report correctly errors, which are most of the changes of this commit. The zones mostly impacted are checksum manifests, libpq and SCRAM. Note that e21cbb4 was a first attempt to switch SHA2 to EVP, but it lacked the refactoring needed for libpq, as done here. This patch has been tested on Linux and Windows, with and without OpenSSL, and down to 1.0.1, the oldest version supported on HEAD. Author: Michael Paquier Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz
*	Improve performance of Unicode {de,re}composition in the backend	Michael Paquier	2020-10-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This replaces the existing binary search with two perfect hash functions for the composition and the decomposition in the backend code, at the cost of slightly-larger binaries there (35kB in libpgcommon_srv.a). Per the measurements done, this improves the speed of the recomposition and decomposition by up to 30~40 times for the NFC and NFKC conversions, while all other operations get at least 40% faster. This is not as "good" as what libicu has, but it closes the gap a lot as per the feedback from Daniel Verite. The decomposition table remains the same, getting used for the binary search in the frontend code, where we care more about the size of the libraries like libpq over performance as this gets involved only in code paths related to the SCRAM authentication. In consequence, note that the perfect hash function for the recomposition needs to use a new inverse lookup array back to to the existing decomposition table. The size of all frontend deliverables remains unchanged, even with --enable-debug, including libpq. Author: John Naylor Reviewed-by: Michael Paquier, Tom Lane Discussion: https://postgr.es/m/CAFBsxsHUuMFCt6-pU+oG-F1==CmEp8wR+O+bRouXWu6i8kXuqA@mail.gmail.com
*	Review format of code generated by PerfectHash.pm	Michael Paquier	2020-10-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	80f8eb7 has added to the normalization quick check headers some code generated by PerfectHash.pm that is incompatible with the settings of gitattributes for this repository, as whitespaces followed a set of tabs for the first element of a line in the table. Instead of adding a new exception to gitattributes, rework the format generated so as a right padding with spaces is used instead of a left padding. This keeps the table generated in a readable shape with its set of columns, making unnecessary an update of gitattributes. Reported-by: Peter Eisentraut Author: John Naylor Discussion: https://postgr.es/m/d601b3b5-a3c7-5457-2f84-3d6513d690fc@2ndquadrant.com
*	Use perfect hash for NFC and NFKC Unicode Normalization quick check	Michael Paquier	2020-10-11
\| \| \| \| \| \| \| \| \| \| \| \|	This makes the normalization quick check about 30% faster for NFC and 50% faster for NFKC than the binary search used previously. The hash lookup reuses the existing array of bit fields used for the binary search to get the quick check property and is generated as part of "make update-unicode" in src/common/unicode/. Author: John Naylor Reviewed-by: Mark Dilger, Michael Paquier Discussion: https://postgr.es/m/CACPNZCt4fbJ0_bGrN5QPt34N4whv=mszM0LMVQdoa2rC9UMRXA@mail.gmail.com
*	Revert "Change SHA2 implementation based on OpenSSL to use EVP digest routines"	Michael Paquier	2020-09-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit e21cbb4, as the switch to EVP routines requires a more careful design where we would need to have at least our wrapper routines return a status instead of issuing an error by themselves to let the caller do the error handling. The memory handling was also incorrect and could cause leaks in the backend if a failure happened, requiring most likely a callback to do the necessary cleanup as the only clean way to be able to allocate an EVP context requires the use of an allocation within OpenSSL. The potential rework of the wrappers also impacts the fallback implementation when not building with OpenSSL. Originally, prairiedog has reported a compilation failure, but after discussion with Tom Lane this needs a better design. Discussion: https://postgr.es/m/20200928073330.GC2316@paquier.xyz
*	Change SHA2 implementation based on OpenSSL to use EVP digest routines	Michael Paquier	2020-09-28
\| \| \| \| \| \| \| \| \| \| \| \| \|	The use of low-level hash routines is not recommended by upstream OpenSSL since 2000, and pgcrypto already switched to EVP as of 5ff4a67. Note that this also fixes a failure with SCRAM authentication when using FIPS in OpenSSL, but as there have been few complaints about this problem and as this causes an ABI breakage, no backpatch is done. Author: Michael Paquier, Alessandro Gherardi Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/20200924025314.GE7405@paquier.xyz Discussion: https://postgr.es/m/20180911030250.GA27115@paquier.xyz