diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2020-04-09 15:11:08 -0400 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2020-04-09 15:11:08 -0400 |
commit | 7627f64ba21a2734192c6832d01a9c38948872bc (patch) | |
tree | 69a8015c28e346b544698dd93cc4538c3926c472 /doc/src | |
parent | 91be1d1906a740725147c4e2a81ccda10543b714 (diff) | |
download | postgresql-7627f64ba21a2734192c6832d01a9c38948872bc.tar.gz postgresql-7627f64ba21a2734192c6832d01a9c38948872bc.zip |
Doc: improve documentation about ts_headline() function.
Now that I've had my nose in that code, I thought the docs about
it left something to be desired.
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/textsearch.sgml | 104 |
1 files changed, 57 insertions, 47 deletions
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml index 3b54dd575dd..765186544b6 100644 --- a/doc/src/sgml/textsearch.sgml +++ b/doc/src/sgml/textsearch.sgml @@ -1301,64 +1301,75 @@ ts_headline(<optional> <replaceable class="parameter">config</replaceable> <type <itemizedlist spacing="compact" mark="bullet"> <listitem> <para> - <literal>StartSel</literal>, <literal>StopSel</literal>: the strings with - which to delimit query words appearing in the document, to distinguish - them from other excerpted words. You must double-quote these strings - if they contain spaces or commas. + <literal>MaxWords</literal>, <literal>MinWords</literal> (integers): + these numbers determine the longest and shortest headlines to output. + The default values are 35 and 15. </para> </listitem> <listitem> <para> - <literal>MaxWords</literal>, <literal>MinWords</literal>: these numbers - determine the longest and shortest headlines to output. + <literal>ShortWord</literal> (integer): words of this length or less + will be dropped at the start and end of a headline, unless they are + query terms. The default value of three eliminates common English + articles. </para> </listitem> <listitem> <para> - <literal>ShortWord</literal>: words of this length or less will be - dropped at the start and end of a headline. The default - value of three eliminates common English articles. + <literal>HighlightAll</literal> (boolean): if + <literal>true</literal> the whole document will be used as the + headline, ignoring the preceding three parameters. The default + is <literal>false</literal>. </para> </listitem> <listitem> <para> - <literal>HighlightAll</literal>: Boolean flag; if - <literal>true</literal> the whole document will be used as the - headline, ignoring the preceding three parameters. + <literal>MaxFragments</literal> (integer): maximum number of text + fragments to display. The default value of zero selects a + non-fragment-based headline generation method. A value greater + than zero selects fragment-based headline generation (see below). </para> </listitem> <listitem> <para> - <literal>MaxFragments</literal>: maximum number of text excerpts - or fragments to display. The default value of zero selects a - non-fragment-oriented headline generation method. A value greater than - zero selects fragment-based headline generation. This method - finds text fragments with as many query words as possible and - stretches those fragments around the query words. As a result - query words are close to the middle of each fragment and have words on - each side. Each fragment will be of at most <literal>MaxWords</literal> and - words of length <literal>ShortWord</literal> or less are dropped at the start - and end of each fragment. If not all query words are found in the - document, then a single fragment of the first <literal>MinWords</literal> - in the document will be displayed. + <literal>StartSel</literal>, <literal>StopSel</literal> (strings): + the strings with which to delimit query words appearing in the + document, to distinguish them from other excerpted words. The + default values are <quote><literal><b></literal></quote> and + <quote><literal></b></literal></quote>, which can be suitable + for HTML output. </para> </listitem> <listitem> <para> - <literal>FragmentDelimiter</literal>: When more than one fragment is - displayed, the fragments will be separated by this string. + <literal>FragmentDelimiter</literal> (string): When more than one + fragment is displayed, the fragments will be separated by this string. + The default is <quote><literal> ... </literal></quote>. </para> </listitem> </itemizedlist> These option names are recognized case-insensitively. - Any unspecified options receive these defaults: + You must double-quote string values if they contain spaces or commas. + </para> -<programlisting> -StartSel=<b>, StopSel=</b>, -MaxWords=35, MinWords=15, ShortWord=3, HighlightAll=FALSE, -MaxFragments=0, FragmentDelimiter=" ... " -</programlisting> + <para> + In non-fragment-based headline + generation, <function>ts_headline</function> locates matches for the + given <replaceable class="parameter">query</replaceable> and chooses a + single one to display, preferring matches that have more query words + within the allowed headline length. + In fragment-based headline generation, <function>ts_headline</function> + locates the query matches and splits each match + into <quote>fragments</quote> of no more than <literal>MaxWords</literal> + words each, preferring fragments with more query words, and when + possible <quote>stretching</quote> fragments to include surrounding + words. The fragment-based mode is thus more useful when the query + matches span large sections of the document, or when it's desirable to + display multiple matches. + In either mode, if no query matches can be identified, then a single + fragment of the first <literal>MinWords</literal> words in the document + will be displayed. </para> <para> @@ -1370,25 +1381,24 @@ SELECT ts_headline('english', is to find all documents containing given query terms and return them in order of their similarity to the query.', - to_tsquery('query & similarity')); - ts_headline + to_tsquery('english', 'query & similarity')); + ts_headline ------------------------------------------------------------ - containing given <b>query</b> terms - and return them in order of their <b>similarity</b> to the + containing given <b>query</b> terms + + and return them in order of their <b>similarity</b> to the+ <b>query</b>. SELECT ts_headline('english', - 'The most common type of search -is to find all documents containing given query terms -and return them in order of their similarity to the -query.', - to_tsquery('query & similarity'), - 'StartSel = <, StopSel = >'); - ts_headline -------------------------------------------------------- - containing given <query> terms - and return them in order of their <similarity> to the - <query>. + 'Search terms may occur +many times in a document, +requiring ranking of the search matches to decide which +occurrences to display in the result.', + to_tsquery('english', 'search & term'), + 'MaxFragments=10, MaxWords=7, MinWords=3, StartSel=<<, StopSel=>>'); + ts_headline +------------------------------------------------------------ + <<Search>> <<terms>> may occur + + many times ... ranking of the <<search>> matches to decide </screen> </para> |