Fix use-after-free issue in regexp engine.

Commit cebc1d34e taught parseqatom() to optimize cases where a branch contains only one, "messy", atom by getting rid of excess subRE nodes. The way we really should do that is to keep the subRE built for the "messy" child atom; but to avoid changing parseqatom's nominal API, I made it delete that node after copying its fields to the outer subRE made by parsebranch(). It seems that that actually worked at the time; but it became dangerous after ea1268f63, because that later commit allowed the lower invocation of parse() to return a subRE that was also pointed to by some v->subs[] entry. This meant we could wind up with a dangling pointer in v->subs[], allowing a later backref to misbehave, but only if that subRE struct had been reused in between. So the damage seems confined to cases like '((...))...(...\2'. To fix, do what I should have done before and modify parseqatom's API to make it possible for it to remove the caller's subRE instead of the callee's. That's safer because we know that subRE isn't complete yet, so noplace else will have a pointer to it. Per report from Mark Dilger. Back-patch to v14 where the problematic patches came in. Discussion: https://postgr.es/m/0203588E-E609-43AF-9F4F-902854231EE7@enterprisedb.com
author: Tom Lane <tgl@sss.pgh.pa.us> 2021-08-07 22:05:27 -0400
committer: Tom Lane <tgl@sss.pgh.pa.us> 2021-08-07 22:27:13 -0400
commit: f42ea8350db22725a251e98a5dafb4d2539c800f (patch)
tree: f1b2dd17306cac29f8a51d2de98ee4579cde3160 /src/test/modules/test_regex
parent: 51b95fb257a24aa4186960be8abc277774466218 (diff)
download: postgresql-f42ea8350db22725a251e98a5dafb4d2539c800f.tar.gz
postgresql-f42ea8350db22725a251e98a5dafb4d2539c800f.zip
2 files changed, 10 insertions, 0 deletions
diff --git a/src/test/modules/test_regex/expected/test_regex.out b/src/test/modules/test_regex/expected/test_regex.out
index 01d50ec1e3f..44da7d20190 100644
--- a/src/test/modules/test_regex/expected/test_regex.out
+++ b/src/test/modules/test_regex/expected/test_regex.out
@@ -3468,6 +3468,14 @@ select * from test_regex(' TO (([a-z0-9._]+|"([^"]+|"")+")+)', 'asd TO foo', 'M'
  {" TO foo",foo,o,NULL}
 (2 rows)
 
+-- expectMatch	21.36 RPQ	((.))(\2){0}	xy	x	x	x	{}
+select * from test_regex('((.))(\2){0}', 'xy', 'RPQ');
+                 test_regex                 
+--------------------------------------------
+ {3,REG_UBACKREF,REG_UBOUNDS,REG_UNONPOSIX}
+ {x,x,x,NULL}
+(2 rows)
+
 -- doing 22 "multicharacter collating elements"
 -- # again ugh
 -- MCCEs are not implemented in Postgres, so we skip all these tests
diff --git a/src/test/modules/test_regex/sql/test_regex.sql b/src/test/modules/test_regex/sql/test_regex.sql
index 7f5bc6e418f..9224fdfdd3a 100644
--- a/src/test/modules/test_regex/sql/test_regex.sql
+++ b/src/test/modules/test_regex/sql/test_regex.sql
@@ -1009,6 +1009,8 @@ select * from test_regex('(.*).*', 'abc', 'N');
 select * from test_regex('(a*)*', 'bc', 'N');
 -- expectMatch	21.35 M		{ TO (([a-z0-9._]+|"([^"]+|"")+")+)}	{asd TO foo}	{ TO foo} foo o {}
 select * from test_regex(' TO (([a-z0-9._]+|"([^"]+|"")+")+)', 'asd TO foo', 'M');
+-- expectMatch	21.36 RPQ	((.))(\2){0}	xy	x	x	x	{}
+select * from test_regex('((.))(\2){0}', 'xy', 'RPQ');
 
 -- doing 22 "multicharacter collating elements"
 -- # again ugh
author	Tom Lane <tgl@sss.pgh.pa.us>	2021-08-07 22:05:27 -0400
committer	Tom Lane <tgl@sss.pgh.pa.us>	2021-08-07 22:27:13 -0400
commit	f42ea8350db22725a251e98a5dafb4d2539c800f (patch)
tree	f1b2dd17306cac29f8a51d2de98ee4579cde3160 /src/test/modules/test_regex
parent	51b95fb257a24aa4186960be8abc277774466218 (diff)
download	postgresql-f42ea8350db22725a251e98a5dafb4d2539c800f.tar.gz postgresql-f42ea8350db22725a251e98a5dafb4d2539c800f.zip