aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/mb/mbutils.c
Commit message (Collapse)AuthorAge
* Revert function to get memory context stats for processesDaniel Gustafsson2025-05-23
| | | | | | | | | Due to concerns raised about the approach, and memory leaks found in sensitive contexts the functionality is reverted. This reverts commits 45e7e8ca9, f8c115a6c, d2a1ed172, 55ef7abf8 and 042a66291 for v18 with an intent to revisit this patch for v19. Discussion: https://postgr.es/m/594293.1747708165@sss.pgh.pa.us
* With GB18030, prevent SIGSEGV from reading past end of allocation.Noah Misch2025-05-05
| | | | | | | | | | | | | | | | | | | | | With GB18030 as source encoding, applications could crash the server via SQL functions convert() or convert_from(). Applications themselves could crash after passing unterminated GB18030 input to libpq functions PQescapeLiteral(), PQescapeIdentifier(), PQescapeStringConn(), or PQescapeString(). Extension code could crash by passing unterminated GB18030 input to jsonapi.h functions. All those functions have been intended to handle untrusted, unterminated input safely. A crash required allocating the input such that the last byte of the allocation was the last byte of a virtual memory page. Some malloc() implementations take measures against that, making the SIGSEGV hard to reach. Back-patch to v13 (all supported versions). Author: Noah Misch <noah@leadboat.com> Author: Andres Freund <andres@anarazel.de> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Backpatch-through: 13 Security: CVE-2025-4207
* Assert lack of hazardous buffer locks before possible catalog read.Noah Misch2025-04-17
| | | | | | | | | | | | | | | | | | | | | Commit 0bada39c83a150079567a6e97b1a25a198f30ea3 fixed a bug of this kind, which existed in all branches for six days before detection. While the probability of reaching the trouble was low, the disruption was extreme. No new backends could start, and service restoration needed an immediate shutdown. Hence, add this to catch the next bug like it. The new check in RelationIdGetRelation() suffices to make autovacuum detect the bug in commit 243e9b40f1b2dd09d6e5bf91ebf6e822a2cd3704 that led to commit 0bada39. This also checks in a number of similar places. It replaces each Assert(IsTransactionState()) that pertained to a conditional catalog read. No back-patch for now, but a back-patch of commit 243e9b4 should back-patch this, too. A back-patch could omit the src/test/regress changes, since back branches won't gain new index columns. Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/20250410191830.0e.nmisch@google.com Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com
* Update copyright for 2025Bruce Momjian2025-01-01
| | | | Backpatch-through: 13
* Remove unused #include's from backend .c filesPeter Eisentraut2024-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | as determined by include-what-you-use (IWYU) While IWYU also suggests to *add* a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
* Simplify pg_enc2gettext_tbl[] with C99-designated initializer syntaxMichael Paquier2024-03-01
| | | | | | | | | | | | | | | This commit switches pg_enc2gettext_tbl[] in encnames.c to use a C99-designated initializer syntax. pg_bind_textdomain_codeset() is simplified so as it is possible to do a direct lookup at the gettext() array with a value of the enum pg_enc rather than doing a loop through all its elements, as long as the encoding value provided by GetDatabaseEncoding() is in the correct range of supported encoding values. Note that PG_MULE_INTERNAL gains a value in the array, pointing to NULL. Author: Jelte Fennema-Nio Discussion: https://postgr.es/m/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw@mail.gmail.com
* Update copyright for 2024Bruce Momjian2024-01-03
| | | | | | | | Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
* Teach convert() and friends to avoid copying when possible.Nathan Bossart2023-12-04
| | | | | | | | | | | | | Presently, pg_convert() allocates a new bytea and copies the result regardless of whether any conversion actually happened. This commit adjusts this function to return the source pointer as-is if no conversion occurred. This optimization isn't expected to make a tremendous difference, but it still seems worthwhile to avoid unnecessary memory allocations. Author: Yurii Rashkovskii Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/CA%2BRLCQyknBPSWXRBQGOi6aYEcdQ9RpH9Kch4GjoeY8dQ3D%2Bvhw%40mail.gmail.com
* Fix typos in commentsMichael Paquier2023-05-02
| | | | | | | | | The changes done in this commit impact comments with no direct user-visible changes, with fixes for incorrect function, variable or structure names. Author: Alexander Lakhin Discussion: https://postgr.es/m/e8c38840-596a-83d6-bd8d-cebc51111572@gmail.com
* New header varatt.h split off from postgres.hPeter Eisentraut2023-01-10
| | | | | | | | | This new header contains all the variable-length data types support (TOAST support) from postgres.h, which isn't needed by large parts of the backend code. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/ddcce239-0f29-6e62-4b47-1f8ca742addf%40enterprisedb.com
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Convert json_in and jsonb_in to report errors softly.Tom Lane2022-12-11
| | | | | | | | | | | | | | | | This requires a bit of further infrastructure-extension to allow trapping errors reported by numeric_in and pg_unicode_to_server, but otherwise it's pretty straightforward. In the case of jsonb_in, we are only capturing errors reported during the initial "parse" phase. The value-construction phase (JsonbValueToJsonb) can also throw errors if assorted implementation limits are exceeded. We should improve that, but it seems like a separable project. Andrew Dunstan and Tom Lane Discussion: https://postgr.es/m/3bac9841-fe07-713d-fa42-606c225567d6@dunslane.net
* Convert *GetDatum() and DatumGet*() macros to inline functionsPeter Eisentraut2022-09-27
| | | | | | | | | | | | | | The previous macro implementations just cast the argument to a target type but did not check whether the input type was appropriate. The function implementation can do better type checking of the input type. For the *GetDatumFast() macros, converting to an inline function doesn't work in the !USE_FLOAT8_BYVAL case, but we can use AssertVariableIsOfTypeMacro() to get a similar level of type checking. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/8528fb7e-0aa2-6b54-85fb-0c0886dbd6ed%40enterprisedb.com
* Revert "Convert *GetDatum() and DatumGet*() macros to inline functions"Peter Eisentraut2022-09-12
| | | | | | This reverts commit 595836e99bf1ee6d43405b885fb69bb8c6d3ee23. It has problems when USE_FLOAT8_BYVAL is off.
* Convert *GetDatum() and DatumGet*() macros to inline functionsPeter Eisentraut2022-09-12
| | | | | | | | | The previous macro implementations just cast the argument to a target type but did not check whether the input type was appropriate. The function implementation can do better type checking of the input type. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/flat/8528fb7e-0aa2-6b54-85fb-0c0886dbd6ed%40enterprisedb.com
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Add 'noError' argument to encoding conversion functions.Heikki Linnakangas2021-04-01
| | | | | | | | | | | | | | | | | | | | | | With the 'noError' argument, you can try to convert a buffer without knowing the character boundaries beforehand. The functions now need to return the number of input bytes successfully converted. This is is a backwards-incompatible change, if you have created a custom encoding conversion with CREATE CONVERSION. This adds a check to pg_upgrade for that, refusing the upgrade if there are any user-defined encoding conversions. Custom conversions are very rare, there are no commonly used extensions that I know of that uses that feature. No other objects can depend on conversions, so if you do have one, you can fairly easily drop it before upgrading, and recreate it after the upgrade with an updated version. Add regression tests for built-in encoding conversions. This doesn't cover every conversion, but it covers all the internal functions in conv.c that are used to implement the conversions. Reviewed-by: John Naylor Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01%40iki.fi
* Add mbverifystr() functions specific to each encoding.Heikki Linnakangas2021-01-28
| | | | | | | | | | | This makes pg_verify_mbstr() function faster, by allowing more efficient encoding-specific implementations. All the implementations included in this commit are pretty naive, they just call the same encoding-specific verifychar functions that were used previously, but that already gives a performance boost because the tight character-at-a-time loop is simpler. Reviewed-by: John Naylor Discussion: https://www.postgresql.org/message-id/e7861509-3960-538a-9025-b75a61188e01@iki.fi
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Dial back -Wimplicit-fallthrough to level 3Alvaro Herrera2020-05-13
| | | | | | | | | The additional pain from level 4 is excessive for the gain. Also revert all the source annotation changes to their original wordings, to avoid back-patching pain. Discussion: https://postgr.es/m/31166.1589378554@sss.pgh.pa.us
* Add -Wimplicit-fallthrough to CFLAGS and CXXFLAGSAlvaro Herrera2020-05-12
| | | | | | | | | | | | | | | Use it at level 4, a bit more restrictive than the default level, and tweak our commanding comments to FALLTHROUGH. (However, leave zic.c alone, since it's external code; to avoid the warnings that would appear there, change CFLAGS for that file in the Makefile.) Author: Julien Rouhaud <rjuju123@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/20200412081825.qyo5vwwco3fv4gdo@nol Discussion: https://postgr.es/m/flat/E1fDenm-0000C8-IJ@gemulon.postgresql.org
* Allow Unicode escapes in any server encoding, not only UTF-8.Tom Lane2020-03-06
| | | | | | | | | | | | | | | | | | | | | | | | | | SQL includes provisions for numeric Unicode escapes in string literals and identifiers. Previously we only accepted those if they represented ASCII characters or the server encoding was UTF-8, making the conversion to internal form trivial. This patch adjusts things so that we'll call the appropriate encoding conversion function in less-trivial cases, allowing the escape sequence to be accepted so long as it corresponds to some character available in the server encoding. This also applies to processing of Unicode escapes in JSONB. However, the old restriction still applies to client-side JSON processing, since that hasn't got access to the server's encoding conversion infrastructure. This patch includes some lexer infrastructure that simplifies throwing errors with error cursors pointing into the middle of a string (or other complex token). For the moment I only used it for errors relating to Unicode escapes, but we might later expand the usage to some other cases. Patch by me, reviewed by John Naylor. Discussion: https://postgr.es/m/2393.1578958316@sss.pgh.pa.us
* Rationalize code placement between wchar.c, encnames.c, and mbutils.c.Tom Lane2020-01-16
| | | | | | | | | | | | | | | | | | | | | | | | | | Move all the backend-only code that'd crept into wchar.c and encnames.c into mbutils.c. To remove the last few #ifdef dependencies from wchar.c and encnames.c, also make the following changes: * Adjust get_encoding_name_for_icu to return NULL, not throw an error, for unsupported encodings. Its sole caller can perfectly well throw an error instead. (While at it, I also made this function and its sibling is_encoding_supported_by_icu proof against out-of-range encoding IDs.) * Remove the overlength-name error condition from pg_char_to_encoding. It's completely silly not to treat that just like any other the-name-is-not-in-the-table case. Also, get rid of pg_mic_mblen --- there's no obvious reason why conv.c shouldn't call pg_mule_mblen instead. Other than that, this is just code movement and comment-polishing with no functional changes. Notably, I reordered declarations in pg_wchar.h to show which functions are frontend-accessible and which are not. Discussion: https://postgr.es/m/CA+TgmoYO8oq-iy8E02rD8eX25T-9SmyxKWqqks5OMHxKvGXpXQ@mail.gmail.com
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Avoid unnecessary out-of-memory errors during encoding conversion.Tom Lane2019-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Encoding conversion uses the very simplistic rule that the output can't be more than 4X longer than the input, and palloc's a buffer of that size. This results in failure to convert any string longer than 1/4 GB, which is becoming an annoying limitation. As a band-aid to improve matters, allow the allocated output buffer size to exceed 1GB. We still insist that the final result fit into MaxAllocSize (1GB), though. Perhaps it'd be safe to relax that restriction, but it'd require close analysis of all callers, which is daunting (not least because external modules might call these functions). For the moment, this should allow a 2X to 4X improvement in the longest string we can convert, which is a useful gain in return for quite a simple patch. Also, once we have successfully converted a long string, repalloc the output down to the actual string length, returning the excess to the malloc pool. This seems worth doing since we can usually expect to give back several MB if we take this path at all. This still leaves much to be desired, most notably that the assumption that MAX_CONVERSION_GROWTH == 4 is very fragile, and yet we have no guard code verifying that the output buffer isn't overrun. Fixing that would require significant changes in the encoding conversion APIs, so it'll have to wait for some other day. The present patch seems safely back-patchable, so patch all supported branches. Alvaro Herrera and Tom Lane Discussion: https://postgr.es/m/20190816181418.GA898@alvherre.pgsql Discussion: https://postgr.es/m/3614.1569359690@sss.pgh.pa.us
* Represent Lists as expansible arrays, not chains of cons-cells.Tom Lane2019-07-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Originally, Postgres Lists were a more or less exact reimplementation of Lisp lists, which consist of chains of separately-allocated cons cells, each having a value and a next-cell link. We'd hacked that once before (commit d0b4399d8) to add a separate List header, but the data was still in cons cells. That makes some operations -- notably list_nth() -- O(N), and it's bulky because of the next-cell pointers and per-cell palloc overhead, and it's very cache-unfriendly if the cons cells end up scattered around rather than being adjacent. In this rewrite, we still have List headers, but the data is in a resizable array of values, with no next-cell links. Now we need at most two palloc's per List, and often only one, since we can allocate some values in the same palloc call as the List header. (Of course, extending an existing List may require repalloc's to enlarge the array. But this involves just O(log N) allocations not O(N).) Of course this is not without downsides. The key difficulty is that addition or deletion of a list entry may now cause other entries to move, which it did not before. For example, that breaks foreach() and sister macros, which historically used a pointer to the current cons-cell as loop state. We can repair those macros transparently by making their actual loop state be an integer list index; the exposed "ListCell *" pointer is no longer state carried across loop iterations, but is just a derived value. (In practice, modern compilers can optimize things back to having just one loop state value, at least for simple cases with inline loop bodies.) In principle, this is a semantics change for cases where the loop body inserts or deletes list entries ahead of the current loop index; but I found no such cases in the Postgres code. The change is not at all transparent for code that doesn't use foreach() but chases lists "by hand" using lnext(). The largest share of such code in the backend is in loops that were maintaining "prev" and "next" variables in addition to the current-cell pointer, in order to delete list cells efficiently using list_delete_cell(). However, we no longer need a previous-cell pointer to delete a list cell efficiently. Keeping a next-cell pointer doesn't work, as explained above, but we can improve matters by changing such code to use a regular foreach() loop and then using the new macro foreach_delete_current() to delete the current cell. (This macro knows how to update the associated foreach loop's state so that no cells will be missed in the traversal.) There remains a nontrivial risk of code assuming that a ListCell * pointer will remain good over an operation that could now move the list contents. To help catch such errors, list.c can be compiled with a new define symbol DEBUG_LIST_MEMORY_USAGE that forcibly moves list contents whenever that could possibly happen. This makes list operations significantly more expensive so it's not normally turned on (though it is on by default if USE_VALGRIND is on). There are two notable API differences from the previous code: * lnext() now requires the List's header pointer in addition to the current cell's address. * list_delete_cell() no longer requires a previous-cell argument. These changes are somewhat unfortunate, but on the other hand code using either function needs inspection to see if it is assuming anything it shouldn't, so it's not all bad. Programmers should be aware of these significant performance changes: * list_nth() and related functions are now O(1); so there's no major access-speed difference between a list and an array. * Inserting or deleting a list element now takes time proportional to the distance to the end of the list, due to moving the array elements. (However, it typically *doesn't* require palloc or pfree, so except in long lists it's probably still faster than before.) Notably, lcons() used to be about the same cost as lappend(), but that's no longer true if the list is long. Code that uses lcons() and list_delete_first() to maintain a stack might usefully be rewritten to push and pop at the end of the list rather than the beginning. * There are now list_insert_nth...() and list_delete_nth...() functions that add or remove a list cell identified by index. These have the data-movement penalty explained above, but there's no search penalty. * list_concat() and variants now copy the second list's data into storage belonging to the first list, so there is no longer any sharing of cells between the input lists. The second argument is now declared "const List *" to reflect that it isn't changed. This patch just does the minimum needed to get the new implementation in place and fix bugs exposed by the regression tests. As suggested by the foregoing, there's a fair amount of followup work remaining to do. Also, the ENABLE_LIST_COMPAT macros are finally removed in this commit. Code using those should have been gone a dozen years ago. Patch by me; thanks to David Rowley, Jesper Pedersen, and others for review. Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
* Phase 2 pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | | Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
* Initial pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | This is still using the 2.0 version of pg_bsd_indent. I thought it would be good to commit this separately, so as to document the differences between 2.0 and 2.1 behavior. Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
* Fail pgwin32_message_to_UTF16() for SQL_ASCII messages.Noah Misch2019-05-12
| | | | | | | | | | | | The function had been interpreting SQL_ASCII messages as UTF8, throwing an error when they were invalid UTF8. The new behavior is consistent with pg_do_encoding_conversion(). This affects LOG_DESTINATION_STDERR and LOG_DESTINATION_EVENTLOG, which will send untranslated bytes to write() and ReportEventA(). On buildfarm member bowerbird, enabling log_connections caused an error whenever the role name was not valid UTF8. Back-patch to 9.4 (all supported versions). Discussion: https://postgr.es/m/20190512015615.GD1124997@rfd.leadboat.com
* More unconstify usePeter Eisentraut2019-02-13
| | | | | | | Replace casts whose only purpose is to cast away const with the unconstify() macro. Discussion: https://www.postgresql.org/message-id/flat/53a28052-f9f3-1808-fed9-460fd43035ab%402ndquadrant.com
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Don't call pgwin32_message_to_UTF16() without CurrentMemoryContext.Noah Misch2017-11-12
| | | | | | | | | | | | | PostgreSQL running as a Windows service crashed upon calling write_stderr() before MemoryContextInit(). This fix completes work started in 5735efee15540765315aa8c1a230575e756037f7. Messages this early contain only ASCII bytes; if we removed the CurrentMemoryContext requirement, the ensuing conversions would have no effect. Back-patch to 9.3 (all supported versions). Takayuki Tsunakawa, reviewed by Michael Paquier. Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F80CC73@G01JPEXMBYT05
* Add more efficient functions to pqformat API.Andres Freund2017-10-11
| | | | | | | | | | | | | | | | | | | | | | | | | | There's three prongs to achieve greater efficiency here: 1) Allow reusing a stringbuffer across pq_beginmessage/endmessage, with the new pq_beginmessage_reuse/endmessage_reuse. This can be beneficial both because it avoids allocating the initial buffer, and because it's more likely to already have an correctly sized buffer. 2) Replacing pq_sendint() with pq_sendint$width() inline functions. Previously unnecessary and unpredictable branches in pq_sendint() were needed. Additionally the replacement functions are implemented more efficiently. pq_sendint is now deprecated, a separate commit will convert all in-tree callers. 3) Add pq_writeint$width(), pq_writestring(). These rely on sufficient space in the StringInfo's buffer, avoiding individual space checks & potential individual resizing. To allow this to be used for strings, expose mbutil.c's MAX_CONVERSION_GROWTH. Followup commits will make use of these facilities. Author: Andres Freund Discussion: https://postgr.es/m/20170914063418.sckdzgjfrsbekae4@alap3.anarazel.de
* Reduce excessive dereferencing of function pointersPeter Eisentraut2017-09-07
| | | | | | | | | | | | It is equivalent in ANSI C to write (*funcptr) () and funcptr(). These two styles have been applied inconsistently. After discussion, we'll use the more verbose style for plain function pointer variables, to make it clear that it's a variable, and the shorter style when the function pointer is in a struct (s.func() or s->func()), because then it's clear that it's not a plain function name, and otherwise the excessive punctuation makes some of those invocations hard to read. Discussion: https://www.postgresql.org/message-id/f52c16db-14ed-757d-4b48-7ef360b1631d@2ndquadrant.com
* Phase 3 of pgindent updates.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Restore old pgwin32_message_to_UTF16() behavior outside transactions.Noah Misch2015-08-14
| | | | | | | | | | | Commit 49c817eab78c6f0ce8c3bf46766b73d6cf3190b7 replaced with a hard error the dubious pg_do_encoding_conversion() behavior when outside a transaction. Reintroduce the historic soft failure locally within pgwin32_message_to_UTF16(). This fixes errors when writing messages in less-common encodings to the Windows event log or console. Back-patch to 9.4, where the aforementioned commit first appeared. Per bug #13427 from Dmitri Bourlatchkov.
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* pgindent run for 9.4Bruce Momjian2014-05-06
| | | | | This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
* C comments: remove odd blank lines after #ifdef WIN32 linesBruce Momjian2014-03-13
| | | | A few more
* Prefer pg_any_to_server/pg_server_to_any over pg_do_encoding_conversion.Tom Lane2014-02-23
| | | | | | | | | | | | | | | | | | | | A large majority of the callers of pg_do_encoding_conversion were specifying the database encoding as either source or target of the conversion, meaning that we can use the less general functions pg_any_to_server/pg_server_to_any instead. The main advantage of using the latter functions is that they can make use of a cached conversion-function lookup in the common case that the other encoding is the current client_encoding. It's notationally cleaner too in most cases, not least because of the historical artifact that the latter functions use "char *" rather than "unsigned char *" in their APIs. Note that pg_any_to_server will apply an encoding verification step in some cases where pg_do_encoding_conversion would have just done nothing. This seems to me to be a good idea at most of these call sites, though it partially negates the performance benefit. Per discussion of bug #9210.
* Plug some more holes in encoding conversion.Tom Lane2014-02-23
| | | | | | | | | | | | | | | | | | | | | | | | Various places assume that pg_do_encoding_conversion() and pg_server_to_any() will ensure encoding validity of their results; but they failed to do so in the case that the source encoding is SQL_ASCII while the destination is not. We cannot perform any actual "conversion" in that scenario, but we should still validate the string according to the destination encoding. Per bug #9210 from Digoal Zhou. Arguably this is a back-patchable bug fix, but on the other hand adding more enforcing of encoding checks might break existing applications that were being sloppy. On balance there doesn't seem to be much enthusiasm for a back-patch, so fix in HEAD only. While at it, remove some apparently-no-longer-needed provisions for letting pg_do_encoding_conversion() "work" outside a transaction --- if you consider it "working" to silently fail to do the requested conversion. Also, make a few cosmetic improvements in mbutils.c, notably removing some Asserts that are certainly dead code since the variables they assert aren't null are never null, even at process start. (I think this wasn't true at one time, but it is now.)
* Make various variables const (read-only).Tom Lane2014-01-18
| | | | | | | | | | | | | | | | | | These changes should generally improve correctness/maintainability. A nice side benefit is that several kilobytes move from initialized data to text segment, allowing them to be shared across processes and probably reducing copy-on-write overhead while forking a new backend. Unfortunately this doesn't seem to help libpq in the same way (at least not when it's compiled with -fpic on x86_64), but we can hope the linker at least collects all nominally-const data together even if it's not actually part of the text segment. Also, make pg_encname_tbl[] static in encnames.c, since there seems no very good reason for any other code to use it; per a suggestion from Wim Lewis, who independently submitted a patch that was mostly a subset of this one. Oskari Saarenmaa, with some editorialization by me
* Renovate display of non-ASCII messages on Windows.Noah Misch2013-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GNU gettext selects a default encoding for the messages it emits in a platform-specific manner; it uses the Windows ANSI code page on Windows and follows LC_CTYPE on other platforms. This is inconvenient for PostgreSQL server processes, so realize consistent cross-platform behavior by calling bind_textdomain_codeset() on Windows each time we permanently change LC_CTYPE. This primarily affects SQL_ASCII databases and processes like the postmaster that do not attach to a database, making their behavior consistent with PostgreSQL on non-Windows platforms. Messages from SQL_ASCII databases use the encoding implied by the database LC_CTYPE, and messages from non-database processes use LC_CTYPE from the postmaster system environment. PlatformEncoding becomes unused, so remove it. Make write_console() prefer WriteConsoleW() to write() regardless of the encodings in use. In this situation, write() will invariably mishandle non-ASCII characters. elog.c has assumed that messages conform to the database encoding. While usually true, this does not hold for SQL_ASCII and MULE_INTERNAL. Introduce MessageEncoding to track the actual encoding of message text. The present consumers are Windows-specific code for converting messages to UTF16 for use in system interfaces. This fixes the appearance in Windows event logs and consoles of translated messages from SQL_ASCII processes like the postmaster. Note that SQL_ASCII inherently disclaims a strong notion of encoding, so non-ASCII byte sequences interpolated into messages by %s may yet yield a nonsensical message. MULE_INTERNAL has similar problems at present, albeit for a different reason: its lack of libiconv support or a conversion to UTF8. Consequently, one need no longer restart Windows with a different Windows ANSI code page to broadly test backend logging under a given language. Changing the user's locale ("Format") is enough. Several accounts can simultaneously run postmasters under different locales, all correctly logging localized messages to Windows event logs and consoles. Alexander Law and Noah Misch
* pgindent run for release 9.3Bruce Momjian2013-05-29
| | | | | This is the first run of the Perl-based pgindent script. Also update pgindent instructions.
* Add wchar -> mb conversion routines.Robert Haas2012-07-04
| | | | | | | | This is infrastructure for Alexander Korotkov's work on indexing regular expression searches. Alexander Korotkov, with a bit of further hackery on the MULE conversion by me
* Fix char2wchar/wchar2char to support collations properly.Tom Lane2011-04-23
| | | | | | | | | | | | | | | | | These functions should take a pg_locale_t, not a collation OID, and should call mbstowcs_l/wcstombs_l where available. Where those functions are not available, temporarily select the correct locale with uselocale(). This change removes the bogus assumption that all locales selectable in a given database have the same wide-character conversion method; in particular, the collate.linux.utf8 regression test now passes with LC_CTYPE=C, so long as the database encoding is UTF8. I decided to move the char2wchar/wchar2char functions out of mbutils.c and into pg_locale.c, because they work on wchar_t not pg_wchar_t and thus don't really belong with the mbutils.c functions. Keeping them where they were would have required importing pg_locale_t into pg_wchar.h somehow, which did not seem like a good plan.
* foreach() and list_delete() don't mix.Tom Lane2011-04-17
| | | | | | | | | | | Fix crash when releasing duplicate entries in the encoding conversion cache list, caused by releasing the current entry of the list being chased by foreach(). We have a standard idiom for handling such cases, but this loop wasn't using it. This got broken in my recent rewrite of GUC assign hooks. Not sure how I missed this when testing the modified code, but I did. Per report from Peter.