diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2021-11-06 13:28:53 -0400 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2021-11-06 13:28:53 -0400 |
commit | cbe25dcff73a297adbada9dc1d6cad3df18014e9 (patch) | |
tree | 0693c9ffdbf8bf69c75eef41abd6dce2f7d7c55f /src/backend/utils/adt/tsvector_op.c | |
parent | 1241fcbd7e649414f09f9858ba73e63975dcff64 (diff) | |
download | postgresql-cbe25dcff73a297adbada9dc1d6cad3df18014e9.tar.gz postgresql-cbe25dcff73a297adbada9dc1d6cad3df18014e9.zip |
Disallow making an empty lexeme via array_to_tsvector().
The tsvector data type has always forbidden lexemes to be empty.
However, array_to_tsvector() didn't get that memo, and would
allow an empty-string array element to become an empty lexeme.
This could result in dump/restore failures later, not to mention
whatever semantic issues might be behind the original prohibition.
However, other functions that take a plain text input directly as
a lexeme value do not need a similar restriction, because they only
match the string against existing tsvector entries. In particular
it'd be a bad idea to make ts_delete() reject empty strings, since
that is the most convenient way to clean up any bad data that might
have gotten into a tsvector column via this bug.
Reflecting on that, let's also remove the prohibition against NULL
array elements in tsvector_delete_arr and tsvector_setweight_by_filter.
It seems more consistent to ignore them, as an empty-string element
would be ignored.
There's a case for back-patching this, since it's clearly a bug fix.
On balance though, it doesn't seem like something to change in a
minor release.
Jean-Christophe Arnu
Discussion: https://postgr.es/m/CAHZmTm1YVndPgUVRoag2WL0w900XcoiivDDj-gTTYBsG25c65A@mail.gmail.com
Diffstat (limited to 'src/backend/utils/adt/tsvector_op.c')
-rw-r--r-- | src/backend/utils/adt/tsvector_op.c | 20 |
1 files changed, 13 insertions, 7 deletions
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c index 9236ebcc8fe..11ccb5297c9 100644 --- a/src/backend/utils/adt/tsvector_op.c +++ b/src/backend/utils/adt/tsvector_op.c @@ -322,10 +322,9 @@ tsvector_setweight_by_filter(PG_FUNCTION_ARGS) int lex_len, lex_pos; + /* Ignore null array elements, they surely don't match */ if (nulls[i]) - ereport(ERROR, - (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), - errmsg("lexeme array may not contain nulls"))); + continue; lex = VARDATA(dlexemes[i]); lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ; @@ -602,10 +601,9 @@ tsvector_delete_arr(PG_FUNCTION_ARGS) int lex_len, lex_pos; + /* Ignore null array elements, they surely don't match */ if (nulls[i]) - ereport(ERROR, - (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), - errmsg("lexeme array may not contain nulls"))); + continue; lex = VARDATA(dlexemes[i]); lex_len = VARSIZE(dlexemes[i]) - VARHDRSZ; @@ -761,13 +759,21 @@ array_to_tsvector(PG_FUNCTION_ARGS) deconstruct_array(v, TEXTOID, -1, false, TYPALIGN_INT, &dlexemes, &nulls, &nitems); - /* Reject nulls (maybe we should just ignore them, instead?) */ + /* + * Reject nulls and zero length strings (maybe we should just ignore them, + * instead?) + */ for (i = 0; i < nitems; i++) { if (nulls[i]) ereport(ERROR, (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), errmsg("lexeme array may not contain nulls"))); + + if (VARSIZE(dlexemes[i]) - VARHDRSZ == 0) + ereport(ERROR, + (errcode(ERRCODE_ZERO_LENGTH_CHARACTER_STRING), + errmsg("lexeme array may not contain empty strings"))); } /* Sort and de-dup, because this is required for a valid tsvector. */ |