aboutsummaryrefslogtreecommitdiff
path: root/src/backend/executor/nodeIndexonlyscan.c
diff options
context:
space:
mode:
authorRobert Haas <rhaas@postgresql.org>2012-06-07 12:25:41 -0400
committerRobert Haas <rhaas@postgresql.org>2012-06-07 12:48:13 -0400
commitb50991eedb458a81d875d049f48fb62ab1685c0d (patch)
treef9b24ee8a713ecabe13ec13db1461732d2a5df2c /src/backend/executor/nodeIndexonlyscan.c
parent92135ea0ed8f75daa86cd94301cedc7f5b714e3c (diff)
downloadpostgresql-b50991eedb458a81d875d049f48fb62ab1685c0d.tar.gz
postgresql-b50991eedb458a81d875d049f48fb62ab1685c0d.zip
Fix more crash-safe visibility map bugs, and improve comments.
In lazy_scan_heap, we could issue bogus warnings about incorrect information in the visibility map, because we checked the visibility map bit before locking the heap page, creating a race condition. Fix by rechecking the visibility map bit before we complain. Rejigger some related logic so that we rely on the possibly-outdated all_visible_according_to_vm value as little as possible. In heap_multi_insert, it's not safe to clear the visibility map bit before beginning the critical section. The visibility map is not crash-safe unless we treat clearing the bit as a critical operation. Specifically, if the transaction were to error out after we set the bit and before entering the critical section, we could end up writing the heap page to disk (with the bit cleared) and crashing before the visibility map page made it to disk. That would be bad. heap_insert has this correct, but somehow the order of operations got rearranged when heap_multi_insert was added. Also, add some more comments to visibilitymap_test, lazy_scan_heap, and IndexOnlyNext, expounding on concurrency issues. Per extensive code review by Andres Freund, and further review by Tom Lane, who also made the original report about the bogus warnings.
Diffstat (limited to 'src/backend/executor/nodeIndexonlyscan.c')
-rw-r--r--src/backend/executor/nodeIndexonlyscan.c13
1 files changed, 13 insertions, 0 deletions
diff --git a/src/backend/executor/nodeIndexonlyscan.c b/src/backend/executor/nodeIndexonlyscan.c
index 4abd805aa31..af31671b3eb 100644
--- a/src/backend/executor/nodeIndexonlyscan.c
+++ b/src/backend/executor/nodeIndexonlyscan.c
@@ -82,6 +82,19 @@ IndexOnlyNext(IndexOnlyScanState *node)
* We can skip the heap fetch if the TID references a heap page on
* which all tuples are known visible to everybody. In any case,
* we'll use the index tuple not the heap tuple as the data source.
+ *
+ * Note on Memory Ordering Effects: visibilitymap_test does not lock
+ * the visibility map buffer, and therefore the result we read here
+ * could be slightly stale. However, it can't be stale enough to
+ * matter. It suffices to show that (1) there is a read barrier
+ * between the time we read the index TID and the time we test the
+ * visibility map; and (2) there is a write barrier between the time
+ * some other concurrent process clears the visibility map bit and the
+ * time it inserts the index TID. Since acquiring or releasing a
+ * LWLock interposes a full barrier, this is easy to show: (1) is
+ * satisfied by the release of the index buffer content lock after
+ * reading the TID; and (2) is satisfied by the acquisition of the
+ * buffer content lock in order to insert the TID.
*/
if (!visibilitymap_test(scandesc->heapRelation,
ItemPointerGetBlockNumber(tid),