Fix backslash-escaping multibyte chars in COPY FROM.

If a multi-byte character is escaped with a backslash in TEXT mode input, and the encoding is one of the client-only encodings where the bytes after the first one can have an ASCII byte "embedded" in the char, we didn't skip the character correctly. After a backslash, we only skipped the first byte of the next character, so if it was a multi-byte character, we would try to process its second byte as if it was a separate character. If it was one of the characters with special meaning, like '\n', '\r', or another '\\', that would cause trouble. One such exmple is the byte sequence '\x5ca45c2e666f6f' in Big5 encoding. That's supposed to be [backslash][two-byte character][.][f][o][o], but because the second byte of the two-byte character is 0x5c, we incorrectly treat it as another backslash. And because the next character is a dot, we parse it as end-of-copy marker, and throw an "end-of-copy marker corrupt" error. Backpatch to all supported versions. Reviewed-by: John Naylor, Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/a897f84f-8dca-8798-3139-07da5bb38728%40iki.fi
author: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2021-02-05 11:14:56 +0200
committer: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2021-02-05 11:17:07 +0200
commit: c06632e48b3537f9a062f6c4686a730625a429fd (patch)
tree: 24c6bef58b50498c2bccf07b7484b3c9b030c86b
parent: 2671125c75c04f7abc4a87998959c197e99e34c6 (diff)
download: postgresql-c06632e48b3537f9a062f6c4686a730625a429fd.tar.gz
postgresql-c06632e48b3537f9a062f6c4686a730625a429fd.zip
1 files changed, 9 insertions, 1 deletions
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index a5528f8307f..b52081f1e96 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3994,7 +3994,7 @@ CopyReadLineText(CopyState cstate)
 				break;
 			}
 			else if (!cstate->csv_mode)
-
+			{
 				/*
 				 * If we are here, it means we found a backslash followed by
 				 * something other than a period.  In non-CSV mode, anything
@@ -4005,8 +4005,16 @@ CopyReadLineText(CopyState cstate)
 				 * backslashes are not special, so we want to process the
 				 * character after the backslash just like a normal character,
 				 * so we don't increment in those cases.
+				 *
+				 * Set 'c' to skip whole character correctly in multi-byte
+				 * encodings.  If we don't have the whole character in the
+				 * buffer yet, we might loop back to process it, after all,
+				 * but that's OK because multi-byte characters cannot have any
+				 * special meaning.
 				 */
 				raw_buf_ptr++;
+				c = c2;
+			}
 		}
 
 		/*
author	Heikki Linnakangas <heikki.linnakangas@iki.fi>	2021-02-05 11:14:56 +0200
committer	Heikki Linnakangas <heikki.linnakangas@iki.fi>	2021-02-05 11:17:07 +0200
commit	c06632e48b3537f9a062f6c4686a730625a429fd (patch)
tree	24c6bef58b50498c2bccf07b7484b3c9b030c86b
parent	2671125c75c04f7abc4a87998959c197e99e34c6 (diff)
download	postgresql-c06632e48b3537f9a062f6c4686a730625a429fd.tar.gz postgresql-c06632e48b3537f9a062f6c4686a730625a429fd.zip