aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorDean Rasheed <dean.a.rasheed@gmail.com>2025-05-01 11:09:24 +0100
committerDean Rasheed <dean.a.rasheed@gmail.com>2025-05-01 11:09:24 +0100
commit1ba9ffa56eb510b8d9ae57431ad61a9e1a396674 (patch)
treec56fdeefdac291949ee7e592485667a54bd0c8c4
parent7be51eb4e169672d2029d955cb776e2252e6b7d3 (diff)
downloadpostgresql-1ba9ffa56eb510b8d9ae57431ad61a9e1a396674.tar.gz
postgresql-1ba9ffa56eb510b8d9ae57431ad61a9e1a396674.zip
doc: Warn that ts_headline() output is not HTML-safe.
Add a documentation warning to ts_headline() pointing out that, when working with untrusted input documents, the output is not guaranteed to be safe for direct inclusion in web pages. This is because, while it does remove some XML tags from the input, it doesn't remove all HTML markup, and so the result may be unsafe (e.g., it might permit XSS attacks). To guard against that, all HTML markup should be removed from the input, making it plain text, or the output should be passed through an HTML sanitizer. In addition, document precisely what the default text search parser recognises as valid XML tags, since that's what determines which XML tags ts_headline() will remove. Reported-by: Richard Neill <richard.neill@telos.digital> Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Backpatch-through: 13
-rw-r--r--doc/src/sgml/textsearch.sgml29
1 files changed, 28 insertions, 1 deletions
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index febcda96634..c0025827e78 100644
--- a/doc/src/sgml/textsearch.sgml
+++ b/doc/src/sgml/textsearch.sgml
@@ -1339,7 +1339,7 @@ ts_headline(<optional> <replaceable class="parameter">config</replaceable> <type
document, to distinguish them from other excerpted words. The
default values are <quote><literal>&lt;b&gt;</literal></quote> and
<quote><literal>&lt;/b&gt;</literal></quote>, which can be suitable
- for HTML output.
+ for HTML output (but see the warning below).
</para>
</listitem>
<listitem>
@@ -1351,6 +1351,21 @@ ts_headline(<optional> <replaceable class="parameter">config</replaceable> <type
</listitem>
</itemizedlist>
+ <warning>
+ <title>Warning: Cross-site scripting (XSS) safety</title>
+ <para>
+ The output from <function>ts_headline</function> is not guaranteed to
+ be safe for direct inclusion in web pages. When
+ <literal>HighlightAll</literal> is <literal>false</literal> (the
+ default), some simple XML tags are removed from the document, but this
+ is not guaranteed to remove all HTML markup. Therefore, this does not
+ provide an effective defense against attacks such as cross-site
+ scripting (XSS) attacks, when working with untrusted input. To guard
+ against such attacks, all HTML markup should be removed from the input
+ document, or an HTML sanitizer should be used on the output.
+ </para>
+ </warning>
+
These option names are recognized case-insensitively.
You must double-quote string values if they contain spaces or commas.
</para>
@@ -2222,6 +2237,18 @@ LIMIT 10;
Specifically, the only non-alphanumeric characters supported for
email user names are period, dash, and underscore.
</para>
+
+ <para>
+ <literal>tag</literal> does not support all valid tag names as defined by
+ <ulink url="https://www.w3.org/TR/xml/">W3C Recommendation, XML</ulink>.
+ Specifically, the only tag names supported are those starting with an
+ ASCII letter, underscore, or colon, and containing only letters, digits,
+ hyphens, underscores, periods, and colons. <literal>tag</literal> also
+ includes XML comments starting with <literal>&lt;!--</literal> and ending
+ with <literal>--&gt;</literal>, and XML declarations (but note that this
+ includes anything starting with <literal>&lt;?x</literal> and ending with
+ <literal>&gt;</literal>).
+ </para>
</note>
<para>