This patch addresses some issues in TOAST compression strategy that

were discussed last year, but we felt it was too late in the 8.3 cycle to change the code immediately. Specifically, the patch: * Reduces the minimum datum size to be considered for compression from 256 to 32 bytes, as suggested by Greg Stark. * Increases the required compression rate for compressed storage from 20% to 25%, again per Greg's suggestion. * Replaces force_input_size (size above which compression is forced) with a maximum size to be considered for compression. It was agreed that allowing large inputs to escape the minimum-compression-rate requirement was not bright, and that indeed we'd rather have a knob that acted in the other direction. I set this value to 1MB for the moment, but it could use some performance studies to tune it. * Adds an early-failure path to the compressor as suggested by Jan: if it's been unable to find even one compressible substring in the first 1KB (parameterizable), assume we're looking at incompressible input and give up. (Possibly this logic can be improved, but I'll commit it as-is for now.) * Improves the toasting heuristics so that when we have very large fields with attstorage 'x' or 'e', we will push those out to toast storage before considering inline compression of shorter fields. This also responds to a suggestion of Greg's, though my original proposal for a solution was a bit off base because it didn't fix the problem for large 'e' fields. There was some discussion in the earlier threads of exposing some of the compression knobs to users, perhaps even on a per-column basis. I have not done anything about that here. It seems to me that if we are changing around the parameters, we'd better get some experience and be sure we are happy with the design before we set things in stone by providing user-visible knobs.
author: Tom Lane <tgl@sss.pgh.pa.us> 2008-03-07 23:20:21 +0000
committer: Tom Lane <tgl@sss.pgh.pa.us> 2008-03-07 23:20:21 +0000
commit: ad434473ebd2d24dcf400896ac1539676009af08 (patch)
tree: 1ff13eb2c2c17f8608b86bf258e663025d9befa2 /src/include/utils/pg_lzcompress.h
parent: 1cc52905f07a2b80299b3502d664184873408fdf (diff)
download: postgresql-ad434473ebd2d24dcf400896ac1539676009af08.tar.gz
postgresql-ad434473ebd2d24dcf400896ac1539676009af08.zip
1 files changed, 17 insertions, 28 deletions
diff --git a/src/include/utils/pg_lzcompress.h b/src/include/utils/pg_lzcompress.h
index a3c49ae7a72..e81ae0d5ca7 100644
--- a/src/include/utils/pg_lzcompress.h
+++ b/src/include/utils/pg_lzcompress.h
@@ -3,7 +3,7 @@
  *
  *	Definitions for the builtin LZ compressor
  *
- * $PostgreSQL: pgsql/src/include/utils/pg_lzcompress.h,v 1.16 2007/11/15 21:14:45 momjian Exp $
+ * $PostgreSQL: pgsql/src/include/utils/pg_lzcompress.h,v 1.17 2008/03/07 23:20:21 tgl Exp $
  * ----------
  */
 
@@ -14,7 +14,7 @@
 /* ----------
  * PGLZ_Header -
  *
- *		The information at the top of the compressed data.
+ *		The information at the start of the compressed data.
  * ----------
  */
 typedef struct PGLZ_Header
@@ -48,19 +48,17 @@ typedef struct PGLZ_Header
  *
  *		Some values that control the compression algorithm.
  *
- *		min_input_size		Minimum input data size to start compression.
+ *		min_input_size		Minimum input data size to consider compression.
  *
- *		force_input_size	Minimum input data size to force compression
- *							even if the compression rate drops below
- *							min_comp_rate.	But in any case the output
- *							must be smaller than the input.  If that isn't
- *							the case, the compressor will throw away its
- *							output and copy the original, uncompressed data
- *							to the output buffer.
+ *		max_input_size		Maximum input data size to consider compression.
  *
- *		min_comp_rate		Minimum compression rate (0-99%) to require for
- *							inputs smaller than force_input_size.  If not
- *							achieved, the output will be uncompressed.
+ *		min_comp_rate		Minimum compression rate (0-99%) to require.
+ *							Regardless of min_comp_rate, the output must be
+ *							smaller than the input, else we don't store
+ *							compressed.
+ *
+ *		first_success_by	Abandon compression if we find no compressible
+ *							data within the first this-many bytes.
  *
  *		match_size_good		The initial GOOD match size when starting history
  *							lookup. When looking up the history to find a
@@ -81,8 +79,9 @@ typedef struct PGLZ_Header
 typedef struct PGLZ_Strategy
 {
 	int32		min_input_size;
-	int32		force_input_size;
+	int32		max_input_size;
 	int32		min_comp_rate;
+	int32		first_success_by;
 	int32		match_size_good;
 	int32		match_size_drop;
 } PGLZ_Strategy;
@@ -91,21 +90,11 @@ typedef struct PGLZ_Strategy
 /* ----------
  * The standard strategies
  *
- *		PGLZ_strategy_default		Starts compression only if input is
- *									at least 256 bytes large. Stores output
- *									uncompressed if compression does not
- *									gain at least 20% size reducture but
- *									input does not exceed 6K. Stops history
- *									lookup if at least a 128 byte long
- *									match has been found.
- *
- *									This is the default strategy if none
- *									is given to pglz_compress().
+ *		PGLZ_strategy_default		Recommended default strategy for TOAST.
  *
- *		PGLZ_strategy_always		Starts compression on any infinitely
- *									small input and does fallback to
- *									uncompressed storage only if output
- *									would be larger than input.
+ *		PGLZ_strategy_always		Try to compress inputs of any length.
+ *									Fallback to uncompressed storage only if
+ *									output would be larger than input.
  * ----------
  */
 extern const PGLZ_Strategy *const PGLZ_strategy_default;
author	Tom Lane <tgl@sss.pgh.pa.us>	2008-03-07 23:20:21 +0000
committer	Tom Lane <tgl@sss.pgh.pa.us>	2008-03-07 23:20:21 +0000
commit	ad434473ebd2d24dcf400896ac1539676009af08 (patch)
tree	1ff13eb2c2c17f8608b86bf258e663025d9befa2 /src/include/utils/pg_lzcompress.h
parent	1cc52905f07a2b80299b3502d664184873408fdf (diff)
download	postgresql-ad434473ebd2d24dcf400896ac1539676009af08.tar.gz postgresql-ad434473ebd2d24dcf400896ac1539676009af08.zip