[OSSTEST PATCH 1/1] PostgreSQL db: Retry transactions on constraint failures

* [OSSTEST PATCH 1/1] PostgreSQL db: Retry transactions on constraint failures
       [not found] <1481307991-16971-1-git-send-email-ian.jackson@eu.citrix.com>
@ 2016-12-09 18:26 ` Ian Jackson
  2016-12-09 22:37 ` [HACKERS] [OSSTEST PATCH 0/1] PostgreSQL db: Retry on constraint violation Kevin Grittner
       [not found] ` <CACjxUsMsHxZ_SrWw7bdCT1Kn8E9mtA1TvhAEbTt-L9AWKKZLCg@mail.gmail.com>
  2 siblings, 0 replies; 35+ messages in thread
From: Ian Jackson @ 2016-12-09 18:26 UTC (permalink / raw)
  To: xen-devel; +Cc: Ian Jackson, pgsql-hackers

This is unfortunate but appears to be necessary.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: pgsql-hackers@postgresql.org
---
 Osstest/JobDB/Executive.pm | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 tcl/JobDB-Executive.tcl    |  6 ++++--
 2 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/Osstest/JobDB/Executive.pm b/Osstest/JobDB/Executive.pm
index 610549a..dc6d3c2 100644
--- a/Osstest/JobDB/Executive.pm
+++ b/Osstest/JobDB/Executive.pm
@@ -62,8 +62,51 @@ sub need_retry ($$$) {
     my ($jd, $dbh,$committing) = @_;
     return
 	($dbh_tests->err() // 0)==7 &&
-	($dbh_tests->state =~ m/^(?:40P01|40001)/);
+	($dbh_tests->state =~ m/^(?:40P01|40001|23|40002)/);
     # DEADLOCK DETECTED or SERIALIZATION FAILURE
+    # or any Integrity Constraint Violation including
+    # TRANSACTION_INTEGRITY_CONSTRAINT_VIOLATION.
+    #
+    # An Integrity Constraint Violation ought not to occur with
+    # serialisable transactions, so it is aways a bug.  These bugs
+    # should not be retried.  However, there is a longstanding bug in
+    # PostgreSQL: SERIALIZABLE's guarantee of transaction
+    # serialisability only applies to successful transactions.
+    # Concurrent SERIALIZABLE transactions may generate "impossible"
+    # errors.  For example, doing a SELECT to ensure that a row does
+    # not exist, and then inserting it, may produce a unique
+    # constraint violation.
+    #
+    # I have not been able to find out clearly which error codes may
+    # be spuriously generated.  At the very least "23505
+    # UNIQUE_VIOLATION" is, but I'm not sure about others.  I am
+    # making the (hopefully not unwarranted) assumption that this is
+    # the only class of spurious errors.  (We don't have triggers.)
+    #
+    # The undesirable side effect is that a buggy transaction would be
+    # retried at intervals until the retry count is reached.  But
+    # there seems no way to avoid this.
+    #
+    # This bug may have been fixed in very recent PostgreSQL (although
+    # a better promise still seems absent from the documentation, at
+    # the time of writing in December 2016).  But we need to work with
+    # PostgreSQL back to at least 9.1.  Perhaps in the future we can
+    # make this behaviour conditional on the pgsql bug being fixed.
+    #
+    # References:
+    #
+    # "WIP: Detecting SSI conflicts before reporting constraint violations"
+    # January 2016 - April 2016 on pgsql-hackers
+    # https://www.postgresql.org/message-id/flat/CAEepm%3D2_9PxSqnjp%3D8uo1XthkDVyOU9SO3%2BOLAgo6LASpAd5Bw%40mail.gmail.com
+    # (includes patch for PostgreSQL and its documentation)
+    #
+    # BUG #9301: INSERT WHERE NOT EXISTS on table with UNIQUE constraint in concurrent SERIALIZABLE transactions
+    # 2014, pgsql-bugs
+    # https://www.postgresql.org/message-id/flat/3F697CF1-2BB7-40D4-9D20-919D1A5D6D93%40apple.com
+    #
+    # "Working around spurious unique constraint errors due to SERIALIZABLE bug"
+    # 2009, pgsql-general
+    # https://www.postgresql.org/message-id/flat/D960CB61B694CF459DCFB4B0128514C203937E44%40exadv11.host.magwien.gv.at
 }
 
 sub current_flight ($) { #method
diff --git a/tcl/JobDB-Executive.tcl b/tcl/JobDB-Executive.tcl
index 62c63af..6b9bcb0 100644
--- a/tcl/JobDB-Executive.tcl
+++ b/tcl/JobDB-Executive.tcl
@@ -365,8 +365,10 @@ proc transaction {tables script {autoreconnect 0}} {
 	if {$rc} {
 	    switch -glob $errorCode {
 		{OSSTEST-PSQL * 40P01} -
-		{OSSTEST-PSQL * 40001} {
-		    # DEADLOCK DETECTED or SERIALIZATION FAILURE
+		{OSSTEST-PSQL * 40001} -
+		{OSSTEST-PSQL * 23*}   -
+		{OSSTEST-PSQL * 40002} {
+		    # See Osstest/JobDB/Executive.pm:need_retry
 		    logputs stdout \
  "transaction serialisation failure ($errorCode) ($result) retrying ..."
 		    if {$dbopen} { db-execute ROLLBACK }
-- 
2.1.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread