aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/mb/mbutils.c
Commit message (Collapse)AuthorAge
* pgindent run for 9.4Bruce Momjian2014-05-06
| | | | | This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
* C comments: remove odd blank lines after #ifdef WIN32 linesBruce Momjian2014-03-13
| | | | A few more
* Prefer pg_any_to_server/pg_server_to_any over pg_do_encoding_conversion.Tom Lane2014-02-23
| | | | | | | | | | | | | | | | | | | | A large majority of the callers of pg_do_encoding_conversion were specifying the database encoding as either source or target of the conversion, meaning that we can use the less general functions pg_any_to_server/pg_server_to_any instead. The main advantage of using the latter functions is that they can make use of a cached conversion-function lookup in the common case that the other encoding is the current client_encoding. It's notationally cleaner too in most cases, not least because of the historical artifact that the latter functions use "char *" rather than "unsigned char *" in their APIs. Note that pg_any_to_server will apply an encoding verification step in some cases where pg_do_encoding_conversion would have just done nothing. This seems to me to be a good idea at most of these call sites, though it partially negates the performance benefit. Per discussion of bug #9210.
* Plug some more holes in encoding conversion.Tom Lane2014-02-23
| | | | | | | | | | | | | | | | | | | | | | | | Various places assume that pg_do_encoding_conversion() and pg_server_to_any() will ensure encoding validity of their results; but they failed to do so in the case that the source encoding is SQL_ASCII while the destination is not. We cannot perform any actual "conversion" in that scenario, but we should still validate the string according to the destination encoding. Per bug #9210 from Digoal Zhou. Arguably this is a back-patchable bug fix, but on the other hand adding more enforcing of encoding checks might break existing applications that were being sloppy. On balance there doesn't seem to be much enthusiasm for a back-patch, so fix in HEAD only. While at it, remove some apparently-no-longer-needed provisions for letting pg_do_encoding_conversion() "work" outside a transaction --- if you consider it "working" to silently fail to do the requested conversion. Also, make a few cosmetic improvements in mbutils.c, notably removing some Asserts that are certainly dead code since the variables they assert aren't null are never null, even at process start. (I think this wasn't true at one time, but it is now.)
* Make various variables const (read-only).Tom Lane2014-01-18
| | | | | | | | | | | | | | | | | | These changes should generally improve correctness/maintainability. A nice side benefit is that several kilobytes move from initialized data to text segment, allowing them to be shared across processes and probably reducing copy-on-write overhead while forking a new backend. Unfortunately this doesn't seem to help libpq in the same way (at least not when it's compiled with -fpic on x86_64), but we can hope the linker at least collects all nominally-const data together even if it's not actually part of the text segment. Also, make pg_encname_tbl[] static in encnames.c, since there seems no very good reason for any other code to use it; per a suggestion from Wim Lewis, who independently submitted a patch that was mostly a subset of this one. Oskari Saarenmaa, with some editorialization by me
* Renovate display of non-ASCII messages on Windows.Noah Misch2013-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GNU gettext selects a default encoding for the messages it emits in a platform-specific manner; it uses the Windows ANSI code page on Windows and follows LC_CTYPE on other platforms. This is inconvenient for PostgreSQL server processes, so realize consistent cross-platform behavior by calling bind_textdomain_codeset() on Windows each time we permanently change LC_CTYPE. This primarily affects SQL_ASCII databases and processes like the postmaster that do not attach to a database, making their behavior consistent with PostgreSQL on non-Windows platforms. Messages from SQL_ASCII databases use the encoding implied by the database LC_CTYPE, and messages from non-database processes use LC_CTYPE from the postmaster system environment. PlatformEncoding becomes unused, so remove it. Make write_console() prefer WriteConsoleW() to write() regardless of the encodings in use. In this situation, write() will invariably mishandle non-ASCII characters. elog.c has assumed that messages conform to the database encoding. While usually true, this does not hold for SQL_ASCII and MULE_INTERNAL. Introduce MessageEncoding to track the actual encoding of message text. The present consumers are Windows-specific code for converting messages to UTF16 for use in system interfaces. This fixes the appearance in Windows event logs and consoles of translated messages from SQL_ASCII processes like the postmaster. Note that SQL_ASCII inherently disclaims a strong notion of encoding, so non-ASCII byte sequences interpolated into messages by %s may yet yield a nonsensical message. MULE_INTERNAL has similar problems at present, albeit for a different reason: its lack of libiconv support or a conversion to UTF8. Consequently, one need no longer restart Windows with a different Windows ANSI code page to broadly test backend logging under a given language. Changing the user's locale ("Format") is enough. Several accounts can simultaneously run postmasters under different locales, all correctly logging localized messages to Windows event logs and consoles. Alexander Law and Noah Misch
* pgindent run for release 9.3Bruce Momjian2013-05-29
| | | | | This is the first run of the Perl-based pgindent script. Also update pgindent instructions.
* Add wchar -> mb conversion routines.Robert Haas2012-07-04
| | | | | | | | This is infrastructure for Alexander Korotkov's work on indexing regular expression searches. Alexander Korotkov, with a bit of further hackery on the MULE conversion by me
* Fix char2wchar/wchar2char to support collations properly.Tom Lane2011-04-23
| | | | | | | | | | | | | | | | | These functions should take a pg_locale_t, not a collation OID, and should call mbstowcs_l/wcstombs_l where available. Where those functions are not available, temporarily select the correct locale with uselocale(). This change removes the bogus assumption that all locales selectable in a given database have the same wide-character conversion method; in particular, the collate.linux.utf8 regression test now passes with LC_CTYPE=C, so long as the database encoding is UTF8. I decided to move the char2wchar/wchar2char functions out of mbutils.c and into pg_locale.c, because they work on wchar_t not pg_wchar_t and thus don't really belong with the mbutils.c functions. Keeping them where they were would have required importing pg_locale_t into pg_wchar.h somehow, which did not seem like a good plan.
* foreach() and list_delete() don't mix.Tom Lane2011-04-17
| | | | | | | | | | | Fix crash when releasing duplicate entries in the encoding conversion cache list, caused by releasing the current entry of the list being chased by foreach(). We have a standard idiom for handling such cases, but this loop wasn't using it. This got broken in my recent rewrite of GUC assign hooks. Not sure how I missed this when testing the modified code, but I did. Per report from Peter.
* pgindent run before PG 9.1 beta 1.Bruce Momjian2011-04-10
|
* Revise the API for GUC variable assign hooks.Tom Lane2011-04-07
| | | | | | | | | | | | | | | | | The previous functions of assign hooks are now split between check hooks and assign hooks, where the former can fail but the latter shouldn't. Aside from being conceptually clearer, this approach exposes the "canonicalized" form of the variable value to guc.c without having to do an actual assignment. And that lets us fix the problem recently noted by Bernd Helmle that the auto-tune patch for wal_buffers resulted in bogus log messages about "parameter "wal_buffers" cannot be changed without restarting the server". There may be some speed advantage too, because this design lets hook functions avoid re-parsing variable values when restoring a previous state after a rollback (they can store a pre-parsed representation of the value instead). This patch also resolves a longstanding annoyance about custom error messages from variable assign hooks: they should modify, not appear separately from, guc.c's own message about "invalid parameter value".
* Fix pg_server_to_client, that was broken in the previous commit.Itagaki Takahiro2011-02-21
|
* Add ENCODING option to COPY TO/FROM and file_fdw.Itagaki Takahiro2011-02-21
| | | | | | | | | | | File encodings can be specified separately from client encoding. If not specified, client encoding is used for backward compatibility. Cases when the encoding doesn't match client encoding are slower than matched cases because we don't have conversion procs for other encodings. Performance improvement would be be a future work. Original patch by Hitoshi Harada, and modified by me.
* Per-column collation supportPeter Eisentraut2011-02-08
| | | | | | | | This adds collation support for columns and domains, a COLLATE clause to override it per expression, and B-tree index support. Peter Eisentraut reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
* Remove unnecessary string null-termination in pg_convert.Itagaki Takahiro2010-12-03
| | | | We can directly verify the unterminated input with pg_verify_mbstr_len.
* Remove cvs keywords from all files.Magnus Hagander2010-09-20
|
* Adjust mbutils.c so it won't get broken by future pgindent runs.Tom Lane2010-07-07
| | | | | | To do that, replace L'\0' by (WCHAR) 0. Perhaps someday we should teach pgindent about wide-character literals, but so long as this is the only use-case in the entire Postgres sources, a workaround seems easier.
* Undo pgindent breakage (again). Per buildfarm.Tom Lane2010-07-06
|
* pgindent run for 9.0, second runBruce Momjian2010-07-06
|
* Undo some more pgindent breakage. Per buildfarm.Tom Lane2010-02-27
|
* pgindent run for 9.0Bruce Momjian2010-02-26
|
* Wrap calls to SearchSysCache and related functions using macros.Robert Haas2010-02-14
| | | | | | | | | | | | The purpose of this change is to eliminate the need for every caller of SearchSysCache, SearchSysCacheCopy, SearchSysCacheExists, GetSysCacheOid, and SearchSysCacheList to know the maximum number of allowable keys for a syscache entry (currently 4). This will make it far easier to increase the maximum number of keys in a future release should we choose to do so, and it makes the code shorter, too. Design and review by Tom Lane.
* Make initdb behave sanely when the selected locale has codeset "US-ASCII".Tom Lane2009-11-12
| | | | | | | | | | | | | | Per discussion, this should result in defaulting to SQL_ASCII encoding. The original coding could not support that because it conflated selection of SQL_ASCII encoding with not being able to determine the encoding. Adjust pg_get_encoding_from_locale()'s API to distinguish these cases, and fix callers appropriately. Only initdb actually changes behavior, since the other callers were perfectly content to consider these cases equivalent. Per bug #5178 from Boh Yap. Not going to bother back-patching, since no one has complained before and there's an easy workaround (namely, specify the encoding you want).
* Fix typo in previous release as reported by Itagaki Takahiro, but missedMagnus Hagander2009-10-17
| | | | by me.
* Write to the Windows eventlog in UTF16, converting the message encodingMagnus Hagander2009-10-17
| | | | | | as necessary. Itagaki Takahiro with some changes from me
* Don't use 'return' where you should use 'PG_RETURN_xxx'.Tom Lane2009-07-07
|
* More sensible character_octet_lengthPeter Eisentraut2009-07-07
| | | | | | | For character types with typmod, character_octet_length columns in the information schema now show the maximum character length times the maximum length of a character in the server encoding, instead of some huge value as before.
* 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian2009-06-11
| | | | provided by Andrew.
* Move gettext encoding names into encnames.c, so we only have one place to ↵Magnus Hagander2009-04-24
| | | | | | update. Per discussion.
* Tell gettext which codeset to use by calling bind_textdomain_codeset(). WeHeikki Linnakangas2009-04-08
| | | | | | | | | already did that on Windows, but it's needed on other platforms too when LC_CTYPE=C. With other locales, we enforce (or trust) that the codeset of the locale matches the server encoding so we don't need to bind it explicitly. It should do no harm in that case either, but I don't have full faith in the PG encoding -> OS codeset mapping table yet. Per recent discussion on pgsql-hackers.
* Add entry in the encoding number to OS name table for KOI8-U.Peter Eisentraut2009-04-06
|
* Fix SetClientEncoding() to maintain a cache of previously selected encodingTom Lane2009-04-02
| | | | | | | | | | | | conversion functions. This allows transaction rollback to revert to a previous client_encoding setting without doing fresh catalog lookups. I believe that this explains and fixes the recent report of "failed to commit client_encoding" failures. This bug is present in 8.3.x, but it doesn't seem prudent to back-patch the fix, at least not till it's had some time for field testing in HEAD. In passing, remove SetDefaultClientEncoding(), which was used nowhere.
* Revert pg_bind_textdomain_codeset to a existant-but-empty function whenAlvaro Herrera2009-03-09
| | | | | | | ENABLE_NLS is not defined, for better compatibility of the backend with modules compiled the other way. Per note from Tom after my previous commit.
* pg_bind_textdomain_codeset must exist only on ENABLE_NLS.Alvaro Herrera2009-03-08
|
* On Windows, call bind_textdomain_codeset on domains other than the default one,Alvaro Herrera2009-03-08
| | | | too, so that the codeset is properly mapped on the newly added PL domains.
* Fix usage of char2wchar/wchar2char. Changes:Teodor Sigaev2009-03-02
| | | | | | | | | | | | | | | | - pg_wchar and wchar_t could have different size, so char2wchar doesn't call pg_mb2wchar_with_len to prevent out-of-bound memory bug - make char2wchar/wchar2char symmetric, now they should not be called with C-locale because mbstowcs/wcstombs oftenly doesn't work correct with C-locale. - Text parser uses pg_mb2wchar_with_len directly in case of C-locale and multibyte encoding Per bug report by Hiroshi Inoue <inoue@tpf.co.jp> and following discussion. Backpatch up to 8.2 when multybyte support was implemented in tsearch.
* Explicitly bind gettext to the correct encoding on Windows.Magnus Hagander2009-01-22
| | | | Original patch from Hiroshi Inoue.
* Use the new text domain names ("postgres-8.4" instead of "postgres")Magnus Hagander2009-01-19
| | | | Hiroshi Inoue
* Add a pg_encoding_mbcliplen() function that is just like pg_mbcliplen()Tom Lane2009-01-04
| | | | | | | | except the caller can specify the encoding to work in; this will be needed for pg_stat_statements. In passing, do some marginal efficiency hacking and clean up some comments. Also, prevent the single-byte-encoding code path from fetching one byte past the stated length of the string (this last is a bug that might need to be back-patched at some point).
* Add an explicit caution about how to use pg_do_encoding_conversion withTom Lane2008-11-11
| | | | non-null-terminated input. Per discussion with ITAGAKI Takahiro.
* pg_do_encoding_conversion cannot return NULL (at least not unless the inputTom Lane2008-11-10
| | | | is NULL), so remove some useless tests for the case.
* Fix compiler warning introduced by recent patch. Tsk tsk.Tom Lane2008-06-18
|
* Move wchar2char() and char2wchar() from tsearch into /mb to be easier toBruce Momjian2008-06-18
| | | | | | use for other modules; also move pnstrdup(). Clean up code slightly.
* Explicitly bind gettext() to the UTF8 locale when in use.Magnus Hagander2008-05-27
| | | | | | | | | | This is required on Windows due to the special locale handling for UTF8 that doesn't change the full environment. Fixes crash with translated error messages per bugs 4180 and 4196. Tom Lane
* Clean up a few places where Datums were being treated as pointers withoutTom Lane2008-04-12
| | | | | | | | going through DatumGetPointer or some other "official" conversion macro. Not actually a bug, since Datum the same size as pointer is the only supported case at the moment, but good cleanup for the future. Gavin Sherry
* Remove incorrect (and ill-advised anyway) pfree's in pg_convert_from andTom Lane2008-01-09
| | | | pg_convert_to. Per bug #3866 from Andrew Gilligan.
* pgindent run for 8.3.Bruce Momjian2007-11-15
|
* Fix the inadvertent libpq ABI breakage discovered by Martin Pitt: theTom Lane2007-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | renumbering of encoding IDs done between 8.2 and 8.3 turns out to break 8.2 initdb and psql if they are run with an 8.3beta1 libpq.so. For the moment we can rearrange the order of enum pg_enc to keep the same number for everything except PG_JOHAB, which isn't a problem since there are no direct references to it in the 8.2 programs anyway. (This does force initdb unfortunately.) Going forward, we want to fix things so that encoding IDs can be changed without an ABI break, and this commit includes the changes needed to allow libpq's encoding IDs to be treated as fully independent of the backend's. The main issue is that libpq clients should not include pg_wchar.h or otherwise assume they know the specific values of libpq's encoding IDs, since they might encounter version skew between pg_wchar.h and the libpq.so they are using. To fix, have libpq officially export functions needed for encoding name<=>ID conversion and validity checking; it was doing this anyway unofficially. It's still the case that we can't renumber backend encoding IDs until the next bump in libpq's major version number, since doing so will break the 8.2-era client programs. However the code is now prepared to avoid this type of problem in future. Note that initdb is no longer a libpq client: we just pull in the two source files we need directly. The patch also fixes a few places that were being sloppy about checking for an unrecognized encoding name.
* Add comments re text <-> bytea internal equivalence in convert routines.Andrew Dunstan2007-09-24
|