osstest was taking far too long calculating what test failures were regressions, and generating the email and web reports. The slow part was analysing the test history, mostly because it ended up doing a lot of dumb scans of large tables. In this series I fix this problem for sg-report-flight: I add some indexes, and reorganise some of the queries so that they can make good use of them. I suspect there may still be problems with sg-report-host-history and cs-bisection-step. I haven't investigated those yet. George: you volunteered to review my SQL. I hope the information in the commit messages is useful for that. Thanks! Ian Jackson (14): sg-report-flight: Add a comment re same-flight search narrowing sg-report-flight: Sort failures by job name as last resort schema: Provide indices for sg-report-flight sg-report-flight: Ask the db for flights of interest sg-report-flight: Use WITH to use best index use for $flightsq sg-report-flight: Use WITH clause to use index for $anypassq sg-report-flight: Use the job row from the intitial query Executive: Use index for report__find_test duration_estimator: Ignore truncated jobs unless we know the step duration_estimator: Introduce some _qtxt variables duration_estimator: Explicitly provide null in general host q duration_estimator: Return job column in first query duration_estimator: Move $uptincl_testid to separate @x_params duration_estimator: Move duration query loop into database Osstest/Executive.pm | 70 ++++++++++------ schema/runvars-built-index.sql | 7 ++ schema/runvars-revision-index.sql | 7 ++ schema/steps-job-index.sql | 7 ++ sg-report-flight | 127 +++++++++++++++++++++++++----- 5 files changed, 174 insertions(+), 44 deletions(-) create mode 100644 schema/runvars-built-index.sql create mode 100644 schema/runvars-revision-index.sql create mode 100644 schema/steps-job-index.sql -- 2.20.1
In afe851ca1771e5da6395b596afa69e509dbbc278 sg-report-flight: When justifying, disregard out-of-flight build jobs we narrowed sg-report-flight's search algorith. An extensive justification is in the commit message. I think much of this information belongs in-tree, so c&p it (with slight edits) here. No code change. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> --- sg-report-flight | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/sg-report-flight b/sg-report-flight index 6c481f6f..927ea37d 100755 --- a/sg-report-flight +++ b/sg-report-flight @@ -242,9 +242,27 @@ END # jobs. We start with all jobs in $tflight, and for each job # we also process any other jobs it refers to in *buildjob runvars. # + # The real thing we want to check that the build jobs *in the + # same flight as the justifying job* used the right revisions. + # Build jobs from other flights were either (i) build jobs for + # components not being targed for testing by this branch, but + # which were necessary for the justifying job and for which we + # decided to reuse another build job (in which case we don't + # really care what versions they used, even if underlying it + # all there might be a different version of a tree we are + # actually interested in (ii) the kind of continuous update + # thing seen with freebsdbuildjob. + # + # (This is rather different to cs-bisection-step, which is + # less focused on changes in a particular set of trees.) + # + # So we limit the scope of our recursive descent into build + # jobs, to jobs in the same flight. + # # We don't actually use a recursive algorithm because that # would involve recursive use of the same sql query object; # hence the @binfos_todo queue. + my @binfos_todo; my $binfos_queue = sub { my ($inflight,$q,$why) = @_; -- 2.20.1
This removes some nondeterminism from the output. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> --- sg-report-flight | 1 + 1 file changed, 1 insertion(+) diff --git a/sg-report-flight b/sg-report-flight index 927ea37d..70def778 100755 --- a/sg-report-flight +++ b/sg-report-flight @@ -813,6 +813,7 @@ END # they finished in the same second, we pick the lower-numbered # step, which is the earlier one (if they are sequential at # all). + or $a->{Job} cmp $b->{Job} } @failures; -- 2.20.1
These indexes allow very fast lookup of "relevant" flights eg when trying to justify failures. In my ad-hoc test case, these indices (along with the subsequent changes to sg-report-flight and Executive.pm, reduce the runtime of sg-report-flight from 2-3ks (unacceptably long!) to as little as 5-7s seconds - a speedup of about 500x. (Getting the database snapshot may take a while first, but deploying this code should help with that too by reducing long-running transactions. Quoted perf timings are from snapshot acquisition.) Without these new indexes there may be a performance change from the query changes. I haven't benchmarked this so I am setting the schema updates to be Preparatory/Needed (ie, "Schema first" as schema/README.updates has it), to say that the index should be created before the new code is deployed. Testing: I have tested this series by creating experimental indices "trial_..." in the actual production instance. (Transactional DDL was very helpful with this.) I have verified with \d that schema update instructions in this commit generate indexes which are equivalent to the trial indices. Deployment: AFter these schema updates are applied, the trial indices are redundant duplicates and should be deleted. CC: George Dunlap <George.Dunlap@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> --- schema/runvars-built-index.sql | 7 +++++++ schema/runvars-revision-index.sql | 7 +++++++ schema/steps-job-index.sql | 7 +++++++ 3 files changed, 21 insertions(+) create mode 100644 schema/runvars-built-index.sql create mode 100644 schema/runvars-revision-index.sql create mode 100644 schema/steps-job-index.sql diff --git a/schema/runvars-built-index.sql b/schema/runvars-built-index.sql new file mode 100644 index 00000000..94f85ed8 --- /dev/null +++ b/schema/runvars-built-index.sql @@ -0,0 +1,7 @@ +-- ##OSSTEST## 007 Preparatory +-- +-- This index helps sg-report-flight find relevant flights. + +CREATE INDEX runvars_built_revision_idx + ON runvars (val) + WHERE name LIKE 'built_revision_%'; diff --git a/schema/runvars-revision-index.sql b/schema/runvars-revision-index.sql new file mode 100644 index 00000000..a2e3be13 --- /dev/null +++ b/schema/runvars-revision-index.sql @@ -0,0 +1,7 @@ +-- ##OSSTEST## 008 Preparatory +-- +-- This index helps Executive::report__find_test find relevant flights. + +CREATE INDEX runvars_revision_idx + ON runvars (val) + WHERE name LIKE 'revision_%'; diff --git a/schema/steps-job-index.sql b/schema/steps-job-index.sql new file mode 100644 index 00000000..07dc5a30 --- /dev/null +++ b/schema/steps-job-index.sql @@ -0,0 +1,7 @@ +-- ##OSSTEST## 006 Preparatory +-- +-- This index helps sg-report-flight find if a test ever passed. + +CREATE INDEX steps_job_testid_status_idx + ON steps (job, testid, status); + -- 2.20.1
Specifically, we narrow the initial query to flights which have at least some job with the built_revision_foo we are looking for. This condition is strictly broader than that implemented inside the flight search loop, so there is no functional change. Perf: runtime of my test case now ~300s-500s. Example query before (from the Perl DBI trace): SELECT * FROM ( SELECT flight, blessing FROM flights WHERE (branch='xen-unstable') AND EXISTS (SELECT 1 FROM jobs WHERE jobs.flight = flights.flight AND jobs.job = ?) AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) ORDER BY flight DESC LIMIT 1000 ) AS sub ORDER BY blessing ASC, flight DESC With these bind variables: "test-armhf-armhf-libvirt" After: SELECT * FROM ( SELECT DISTINCT flight, blessing FROM flights JOIN runvars r1 USING (flight) WHERE (branch='xen-unstable') AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) AND EXISTS (SELECT 1 FROM jobs WHERE jobs.flight = flights.flight AND jobs.job = ?) AND r1.name LIKE 'built_revision_%' AND r1.name = ? AND r1.val= ? ORDER BY flight DESC LIMIT 1000 ) AS sub ORDER BY blessing ASC, flight DESC With these bind variables: "test-armhf-armhf-libvirt" 'built_revision_xen' '165f3afbfc3db70fcfdccad07085cde0a03c858b' Diff to the query: SELECT * FROM ( - SELECT flight, blessing FROM flights + SELECT DISTINCT flight, blessing + FROM flights + JOIN runvars r1 USING (flight) + WHERE (branch='xen-unstable') + AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) AND EXISTS (SELECT 1 FROM jobs WHERE jobs.flight = flights.flight AND jobs.job = ?) - AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) + AND r1.name LIKE 'built_revision_%' + AND r1.name = ? + AND r1.val= ? + ORDER BY flight DESC LIMIT 1000 ) AS sub CC: George Dunlap <George.Dunlap@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> --- schema/runvars-built-index.sql | 2 +- sg-report-flight | 64 ++++++++++++++++++++++++++++++++-- 2 files changed, 62 insertions(+), 4 deletions(-) diff --git a/schema/runvars-built-index.sql b/schema/runvars-built-index.sql index 94f85ed8..8582227e 100644 --- a/schema/runvars-built-index.sql +++ b/schema/runvars-built-index.sql @@ -1,4 +1,4 @@ --- ##OSSTEST## 007 Preparatory +-- ##OSSTEST## 007 Needed -- -- This index helps sg-report-flight find relevant flights. diff --git a/sg-report-flight b/sg-report-flight index 70def778..61aec7a8 100755 --- a/sg-report-flight +++ b/sg-report-flight @@ -185,19 +185,77 @@ END if (defined $job) { push @flightsq_params, $job; $flightsq_jobcond = <<END; - EXISTS (SELECT 1 + AND EXISTS (SELECT 1 FROM jobs WHERE jobs.flight = flights.flight AND jobs.job = ?) END } + # We build a slightly complicated query to find possibly-relevant + # flights. A "possibly-relevant" flight is one which the main + # flight categorisation algorithm below (the loop over $tflight) + # *might* decide is of interest. + # + # That algorithm produces a table of which revision(s) of what + # %specver trees the build jobs for the relevant test job used. + # And then it insists (amongst other things) that for each such + # tree the revision in question appears. + # + # It only looks at build jobs within the flight. So any flight + # that the main algorithm finds interesting will have *some* job + # (in the same flight) mentioning that revision in a built + # revision runvar. So we can search the runvars table by its + # index on the revision. + # + # So we look for flights that have an appropriate entry in runvars + # for each %specver tree. We can do this by joining the runvar + # table once for each tree. + # + # The "osstest" tree is handled specially. as ever. (We use + # "r$ri" there too for orthogonality of the code, not because + # there could be multiple specifiations for the osstest revision.) + # + # This complex query is an optimisation: for correctness, we must + # still execute the full job-specific recursive examination, for + # each possibly-relevant flight - that's the $tflight loop body. + + my $runvars_joins = ''; + my $runvars_conds = ''; + my $ri=0; + foreach my $tree (sort keys %{ $specver{$thisthat} }) { + $ri++; + if ($tree ne 'osstest') { + $runvars_joins .= <<END; + JOIN runvars r$ri USING (flight) +END + $runvars_conds .= <<END; + AND r$ri.name LIKE 'built_revision_%' + AND r$ri.name = ? + AND r$ri.val= ? +END + push @flightsq_params, "built_revision_$tree", + $specver{$thisthat}{$tree}; + } else { + $runvars_joins .= <<END; + JOIN flights_harness_touched r$ri USING (flight) +END + $runvars_conds .= <<END; + AND r$ri.harness= ? +END + push @flightsq_params, $specver{$thisthat}{$tree}; + } + } + my $flightsq= <<END; SELECT * FROM ( - SELECT flight, blessing FROM flights + SELECT DISTINCT flight, blessing + FROM flights +$runvars_joins WHERE $branches_cond_q - AND $flightsq_jobcond AND $blessingscond +$flightsq_jobcond +$runvars_conds ORDER BY flight DESC LIMIT 1000 ) AS sub -- 2.20.1
While we're here, convert this EXISTS subquery to a JOIN. Perf: runtime of my test case now ~200-300s. Example query before (from the Perl DBI trace): SELECT * FROM ( SELECT DISTINCT flight, blessing FROM flights JOIN runvars r1 USING (flight) WHERE (branch='xen-unstable') AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) AND EXISTS (SELECT 1 FROM jobs WHERE jobs.flight = flights.flight AND jobs.job = ?) AND r1.name LIKE 'built_revision_%' AND r1.name = ? AND r1.val= ? ORDER BY flight DESC LIMIT 1000 ) AS sub ORDER BY blessing ASC, flight DESC With bind variables: "test-armhf-armhf-libvirt" 'built_revision_xen' '165f3afbfc3db70fcfdccad07085cde0a03c858b' After: WITH sub AS ( SELECT DISTINCT flight, blessing FROM flights JOIN runvars r1 USING (flight) WHERE (branch='xen-unstable') AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) AND r1.name LIKE 'built_revision_%' AND r1.name = ? AND r1.val= ? ORDER BY flight DESC LIMIT 1000 ) SELECT * FROM sub JOIN jobs USING (flight) WHERE (1=1) AND jobs.job = ? ORDER BY blessing ASC, flight DESC With bind variables: 'built_revision_xen' '165f3afbfc3db70fcfdccad07085cde0a03c858b' "test-armhf-armhf-libvirt" Diff to the query: - SELECT * FROM ( + WITH sub AS ( SELECT DISTINCT flight, blessing FROM flights JOIN runvars r1 USING (flight) WHERE (branch='xen-unstable') AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) - AND EXISTS (SELECT 1 - FROM jobs - WHERE jobs.flight = flights.flight - AND jobs.job = ?) - AND r1.name LIKE 'built_revision_%' AND r1.name = ? AND r1.val= ? ORDER BY flight DESC LIMIT 1000 - ) AS sub + ) + SELECT * + FROM sub + JOIN jobs USING (flight) + + WHERE (1=1) + AND jobs.job = ? + ORDER BY blessing ASC, flight DESC CC: George Dunlap <George.Dunlap@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> --- sg-report-flight | 39 ++++++++++++++++++++++++--------------- 1 file changed, 24 insertions(+), 15 deletions(-) diff --git a/sg-report-flight b/sg-report-flight index 61aec7a8..b5398573 100755 --- a/sg-report-flight +++ b/sg-report-flight @@ -180,18 +180,6 @@ END return undef; } - my @flightsq_params; - my $flightsq_jobcond='(1=1)'; - if (defined $job) { - push @flightsq_params, $job; - $flightsq_jobcond = <<END; - AND EXISTS (SELECT 1 - FROM jobs - WHERE jobs.flight = flights.flight - AND jobs.job = ?) -END - } - # We build a slightly complicated query to find possibly-relevant # flights. A "possibly-relevant" flight is one which the main # flight categorisation algorithm below (the loop over $tflight) @@ -220,6 +208,7 @@ END # still execute the full job-specific recursive examination, for # each possibly-relevant flight - that's the $tflight loop body. + my @flightsq_params; my $runvars_joins = ''; my $runvars_conds = ''; my $ri=0; @@ -247,18 +236,38 @@ END } } + my $flightsq_jobs_join = ''; + my $flightsq_jobcond = ''; + if (defined $job) { + push @flightsq_params, $job; + $flightsq_jobs_join = <<END; + JOIN jobs USING (flight) +END + $flightsq_jobcond = <<END; + AND jobs.job = ? +END + } + + # In psql 9.6 this WITH clause makes postgresql do the flights + # query first. This is good because our built revision index finds + # relevant flights very quickly. Without this, postgresql seems + # to like to scan the jobs table. my $flightsq= <<END; - SELECT * FROM ( + WITH sub AS ( SELECT DISTINCT flight, blessing FROM flights $runvars_joins WHERE $branches_cond_q AND $blessingscond -$flightsq_jobcond $runvars_conds ORDER BY flight DESC LIMIT 1000 - ) AS sub + ) + SELECT * + FROM sub +$flightsq_jobs_join + WHERE (1=1) +$flightsq_jobcond ORDER BY blessing ASC, flight DESC END $flightsq= db_prepare($flightsq); -- 2.20.1
Perf: runtime of my test case now ~11s Example query before (from the Perl DBI trace): SELECT * FROM flights JOIN steps USING (flight) WHERE (branch='xen-unstable') AND job=? and testid=? and status='pass' AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) LIMIT 1 After: WITH s AS ( SELECT * FROM steps WHERE job=? and testid=? and status='pass' ) SELECT * FROM flights JOIN s USING (flight) WHERE (branch='xen-unstable') AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) LIMIT 1 In both cases with bind vars: "test-amd64-i386-xl-pvshim" "guest-start" Diff to the query: - SELECT * FROM flights JOIN steps USING (flight) + WITH s AS + ( + SELECT * FROM steps + WHERE job=? and testid=? and status='pass' + ) + SELECT * FROM flights JOIN s USING (flight) WHERE (branch='xen-unstable') - AND job=? and testid=? and status='pass' AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) LIMIT 1 CC: George Dunlap <George.Dunlap@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> --- schema/steps-job-index.sql | 2 +- sg-report-flight | 14 ++++++++++++-- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/schema/steps-job-index.sql b/schema/steps-job-index.sql index 07dc5a30..2c33af72 100644 --- a/schema/steps-job-index.sql +++ b/schema/steps-job-index.sql @@ -1,4 +1,4 @@ --- ##OSSTEST## 006 Preparatory +-- ##OSSTEST## 006 Needed -- -- This index helps sg-report-flight find if a test ever passed. diff --git a/sg-report-flight b/sg-report-flight index b5398573..b8d948da 100755 --- a/sg-report-flight +++ b/sg-report-flight @@ -849,10 +849,20 @@ sub justifyfailures ($;$) { my @failures= values %{ $fi->{Failures} }; + # In psql 9.6 this WITH clause makes postgresql do the steps query + # first. This is good because if this test never passed we can + # determine that really quickly using the new index, without + # having to scan the flights table. (If the test passed we will + # probably not have to look at many flights to find one, so in + # that case this is not much worse.) my $anypassq= <<END; - SELECT * FROM flights JOIN steps USING (flight) + WITH s AS + ( + SELECT * FROM steps + WHERE job=? and testid=? and status='pass' + ) + SELECT * FROM flights JOIN s USING (flight) WHERE $branches_cond_q - AND job=? and testid=? and status='pass' AND $blessingscond LIMIT 1 END -- 2.20.1
$jcheckq is redundant: we looked this up right at the start. This is not expected to speed things up very much, but it makes things somewhat cleaner and clearer. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> --- sg-report-flight | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/sg-report-flight b/sg-report-flight index b8d948da..bcb0d427 100755 --- a/sg-report-flight +++ b/sg-report-flight @@ -160,10 +160,6 @@ sub findaflight ($$$$$) { return undef; } - my $jcheckq= db_prepare(<<END); - SELECT status FROM jobs WHERE flight=? AND job=? -END - my $checkq= db_prepare(<<END); SELECT status FROM steps WHERE flight=? AND job=? AND testid=? AND status!='skip' @@ -263,7 +259,7 @@ $runvars_conds ORDER BY flight DESC LIMIT 1000 ) - SELECT * + SELECT flight, jobs.status FROM sub $flightsq_jobs_join WHERE (1=1) @@ -304,7 +300,7 @@ END WHERE flight=? END - while (my ($tflight) = $flightsq->fetchrow_array) { + while (my ($tflight, $tjstatus) = $flightsq->fetchrow_array) { # Recurse from the starting flight looking for relevant build # jobs. We start with all jobs in $tflight, and for each job # we also process any other jobs it refers to in *buildjob runvars. @@ -407,8 +403,7 @@ END $checkq->execute($tflight, $job, $testid); ($chkst) = $checkq->fetchrow_array(); if (!defined $chkst) { - $jcheckq->execute($tflight, $job); - my ($jchkst) = $jcheckq->fetchrow_array(); + my $jchkst = $tflight->{status}; $chkst = $jchkst if $jchkst eq 'starved'; } } -- 2.20.1
After we refactor this query then we can enable the index use. (Both of these things together in this commit because I haven't perf tested the version with just the refactoring.) (We have provided an index that can answer this question really quickly if a version is specified. But the query planner couldn't see that because it works without seeing the bind variables, so doesn't know that the value of name is going to be suitable for this index.) * Convert the two EXISTS subqueries into JOIN/AND with a DISTINCT clause naming the fields on flights, so as to replicate the previous result rows. Then do $selection field last. The subquery is a convenient way to let this do the previous thing for all the values of $selection (including, notably, *). * Add the additional AND clause for r.name, which has no logical effect given the actual values of name, enabling the query planner to use this index. Perf: In my test case the sg-report-flight runtime is now ~8s. I am reasonably confident that this will not make other use cases of this code worse. Perf: runtime of my test case now ~11s Example query before (from the Perl DBI trace): SELECT * FROM flights f WHERE EXISTS ( SELECT 1 FROM runvars r WHERE name=? AND val=? AND r.flight=f.flight AND ( (CASE WHEN (r.job) LIKE 'build-%-prev' THEN 'xprev' WHEN ((r.job) LIKE 'build-%-freebsd' AND 'x' = 'freebsdbuildjob') THEN 'DISCARD' ELSE '' END) = '') ) AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) AND (branch=?) ORDER BY flight DESC LIMIT 1 After: SELECT * FROM ( SELECT DISTINCT flight, started, blessing, branch, intended FROM flights f JOIN runvars r USING (flight) WHERE name=? AND name LIKE 'revision_%' AND val=? AND r.flight=f.flight AND ( (CASE WHEN (r.job) LIKE 'build-%-prev' THEN 'xprev' WHEN ((r.job) LIKE 'build-%-freebsd' AND 'x' = 'freebsdbuildjob') THEN 'DISCARD' ELSE '' END) = '') AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) AND (branch=?) ) AS sub WHERE TRUE ORDER BY flight DESC LIMIT 1 In both cases with bind vars: 'revision_xen' '165f3afbfc3db70fcfdccad07085cde0a03c858b' "xen-unstable" Diff to the example query: @@ -1,10 +1,10 @@ SELECT * + FROM ( SELECT DISTINCT + flight, started, blessing, branch, intended FROM flights f - WHERE - EXISTS ( - SELECT 1 - FROM runvars r + JOIN runvars r USING (flight) WHERE name=? + AND name LIKE 'revision_%' AND val=? AND r.flight=f.flight AND ( (CASE @@ -14,8 +14,8 @@ ELSE '' END) = '') - ) AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) AND (branch=?) +) AS sub WHERE TRUE ORDER BY flight DESC LIMIT 1 CC: George Dunlap <George.Dunlap@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> --- Osstest/Executive.pm | 20 ++++++++------------ schema/runvars-revision-index.sql | 2 +- 2 files changed, 9 insertions(+), 13 deletions(-) diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm index c3dc1261..c272e9f2 100644 --- a/Osstest/Executive.pm +++ b/Osstest/Executive.pm @@ -415,37 +415,32 @@ sub report__find_test ($$$$$$$) { my $querytext = <<END; SELECT $selection - FROM flights f - WHERE + FROM ( SELECT DISTINCT + flight, started, blessing, branch, intended + FROM flights f END if (defined $revision) { if ($tree eq 'osstest') { $querytext .= <<END; - EXISTS ( - SELECT 1 - FROM flights_harness_touched t + JOIN flights_harness_touched t USING (flight) WHERE t.harness=? - AND t.flight=f.flight - ) END push @params, $revision; } else { $querytext .= <<END; - EXISTS ( - SELECT 1 - FROM runvars r + JOIN runvars r USING (flight) WHERE name=? + AND name LIKE 'revision_%' AND val=? AND r.flight=f.flight AND ${\ main_revision_job_cond('r.job') } - ) END push @params, "revision_$tree", $revision; } } else { $querytext .= <<END; - TRUE + WHERE TRUE END } @@ -460,6 +455,7 @@ END END push @params, @$branches; + $querytext .= ") AS sub WHERE TRUE\n"; $querytext .= $extracond; $querytext .= $sortlimit; diff --git a/schema/runvars-revision-index.sql b/schema/runvars-revision-index.sql index a2e3be13..4c1aea66 100644 --- a/schema/runvars-revision-index.sql +++ b/schema/runvars-revision-index.sql @@ -1,4 +1,4 @@ --- ##OSSTEST## 008 Preparatory +-- ##OSSTEST## 008 Needed -- -- This index helps Executive::report__find_test find relevant flights. -- 2.20.1
If we are looking for a particular step then we will ignore jobs without that step, so any job which was truncated before it will be ignored. Otherwise we are looking for the whole job duration and a truncated job is not a good representative. This is a bugfix (to duration estimation), not a performance improvement like the preceding and subsequent changes. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> --- Osstest/Executive.pm | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm index c272e9f2..3cd37c14 100644 --- a/Osstest/Executive.pm +++ b/Osstest/Executive.pm @@ -1142,6 +1142,10 @@ sub duration_estimator ($$;$$) { # estimated (and only jobs which contained that step will be # considered). + my $or_status_truncated = ''; + if ($will_uptoincl_testid) { + $or_status_truncated = "OR j.status='truncated'!"; + } my $recentflights_q= $dbh_tests->prepare(<<END); SELECT f.flight AS flight, f.started AS started, @@ -1156,8 +1160,8 @@ sub duration_estimator ($$;$$) { AND f.branch=? AND j.job=? AND r.val=? - AND (j.status='pass' OR j.status='fail' OR - j.status='truncated') + AND (j.status='pass' OR j.status='fail' + $or_status_truncated) AND f.started IS NOT NULL AND f.started >= ? ORDER BY f.started DESC -- 2.20.1
No functional change. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> --- Osstest/Executive.pm | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm index 3cd37c14..c966a1be 100644 --- a/Osstest/Executive.pm +++ b/Osstest/Executive.pm @@ -1146,7 +1146,7 @@ sub duration_estimator ($$;$$) { if ($will_uptoincl_testid) { $or_status_truncated = "OR j.status='truncated'!"; } - my $recentflights_q= $dbh_tests->prepare(<<END); + my $recentflights_qtxt= <<END; SELECT f.flight AS flight, f.started AS started, j.status AS status @@ -1167,7 +1167,7 @@ sub duration_estimator ($$;$$) { ORDER BY f.started DESC END - my $duration_anyref_q= $dbh_tests->prepare(<<END); + my $duration_anyref_qtxt= <<END; SELECT f.flight AS flight, max(s.finished) AS max_finished FROM steps s JOIN flights f @@ -1212,6 +1212,8 @@ END_UPTOINCL AS duration END_ALWAYS + my $recentflights_q= $dbh_tests->prepare($recentflights_qtxt); + my $duration_anyref_q= $dbh_tests->prepare($duration_anyref_qtxt); my $duration_duration_q = $dbh_tests->prepare($duration_duration_qtxt); return sub { -- 2.20.1
Our spec. says we return nulls for started and status if we don't find a job matching the host spec. The way this works right now is that we look up the nonexistent entries in $refs->[0]. This is not really brilliant and is going to be troublesome as we continue to refactor. Provide these values explicitly. No functional change. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> --- Osstest/Executive.pm | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm index c966a1be..ee1bf07e 100644 --- a/Osstest/Executive.pm +++ b/Osstest/Executive.pm @@ -1169,6 +1169,8 @@ END my $duration_anyref_qtxt= <<END; SELECT f.flight AS flight, + NULL as started, + NULL as status, max(s.finished) AS max_finished FROM steps s JOIN flights f ON s.flight=f.flight -- 2.20.1
Right now this is pointless since the Perl code doesn't need it. But this row is going to be part of a WITH clause soon. No functional change. Diffs to two example queries (from the Perl DBI trace): SELECT f.flight AS flight, + j.job AS job, f.started AS started, j.status AS status FROM flights f JOIN jobs j USING (flight) JOIN runvars r ON f.flight=r.flight AND r.name=? WHERE j.job=r.job AND f.blessing=? AND f.branch=? AND j.job=? AND r.val=? AND (j.status='pass' OR j.status='fail' OR j.status='truncated'!) AND f.started IS NOT NULL AND f.started >= ? ORDER BY f.started DESC SELECT f.flight AS flight, + s.job AS job, NULL as started, NULL as status, max(s.finished) AS max_finished FROM steps s JOIN flights f ON s.flight=f.flight WHERE s.job=? AND f.blessing=? AND f.branch=? AND s.finished IS NOT NULL AND f.started IS NOT NULL AND f.started >= ? - GROUP BY f.flight + GROUP BY f.flight, s.job ORDER BY max_finished DESC CC: George Dunlap <George.Dunlap@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> --- Osstest/Executive.pm | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm index ee1bf07e..8e8b3d33 100644 --- a/Osstest/Executive.pm +++ b/Osstest/Executive.pm @@ -1148,6 +1148,7 @@ sub duration_estimator ($$;$$) { } my $recentflights_qtxt= <<END; SELECT f.flight AS flight, + j.job AS job, f.started AS started, j.status AS status FROM flights f @@ -1169,6 +1170,7 @@ END my $duration_anyref_qtxt= <<END; SELECT f.flight AS flight, + s.job AS job, NULL as started, NULL as status, max(s.finished) AS max_finished @@ -1178,7 +1180,7 @@ END AND s.finished IS NOT NULL AND f.started IS NOT NULL AND f.started >= ? - GROUP BY f.flight + GROUP BY f.flight, s.job ORDER BY max_finished DESC END # s J J J # fix perl-mode -- 2.20.1
This is going to be useful soon. No functional change. Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> --- Osstest/Executive.pm | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm index 8e8b3d33..621153ee 100644 --- a/Osstest/Executive.pm +++ b/Osstest/Executive.pm @@ -1223,6 +1223,9 @@ END_ALWAYS return sub { my ($job, $hostidname, $onhost, $uptoincl_testid) = @_; + my @x_params; + push @x_params, $uptoincl_testid if $will_uptoincl_testid; + my $dbg= $debug ? sub { $debug->("DUR $branch $blessing $job $hostidname $onhost @_"); } : sub { }; @@ -1257,7 +1260,7 @@ END_ALWAYS my $duration_max= 0; foreach my $ref (@$refs) { my @d_d_args = ($ref->{flight}, $job); - push @d_d_args, $uptoincl_testid if $will_uptoincl_testid; + push @d_d_args, @x_params; $duration_duration_q->execute(@d_d_args); my ($duration) = $duration_duration_q->fetchrow_array(); $duration_duration_q->finish(); -- 2.20.1
Stuff the two queries together: we use the firsty query as a WITH clause. This is significantly faster, perhaps because the query optimiser does a better job but probably just because it saves on round trips. No functional change. Perf: subjectively this seemed to help when the cache was cold. Now I have a warm cache and it doesn't seem to make much difference. Perf: runtime of my test case now ~5-7s. Example queries before (from the debugging output): Query A part I: SELECT f.flight AS flight, j.job AS job, f.started AS started, j.status AS status FROM flights f JOIN jobs j USING (flight) JOIN runvars r ON f.flight=r.flight AND r.name=? WHERE j.job=r.job AND f.blessing=? AND f.branch=? AND j.job=? AND r.val=? AND (j.status='pass' OR j.status='fail' OR j.status='truncated'!) AND f.started IS NOT NULL AND f.started >= ? ORDER BY f.started DESC With bind variables: "test-amd64-i386-xl-pvshim" "guest-start" Query B part I: SELECT f.flight AS flight, s.job AS job, NULL as started, NULL as status, max(s.finished) AS max_finished FROM steps s JOIN flights f ON s.flight=f.flight WHERE s.job=? AND f.blessing=? AND f.branch=? AND s.finished IS NOT NULL AND f.started IS NOT NULL AND f.started >= ? GROUP BY f.flight, s.job ORDER BY max_finished DESC With bind variables: "test-armhf-armhf-libvirt" 'real' "xen-unstable" 1594144469 Query common part II: WITH tsteps AS ( SELECT * FROM steps WHERE flight=? AND job=? ) , tsteps2 AS ( SELECT * FROM tsteps WHERE finished <= (SELECT finished FROM tsteps WHERE tsteps.testid = ?) ) SELECT ( SELECT max(finished)-min(started) FROM tsteps2 ) - ( SELECT sum(finished-started) FROM tsteps2 WHERE step = 'ts-hosts-allocate' ) AS duration With bind variables from previous query, eg: 152045 "test-armhf-armhf-libvirt" "guest-start.2" After: Query A (combined): WITH f AS ( SELECT f.flight AS flight, j.job AS job, f.started AS started, j.status AS status FROM flights f JOIN jobs j USING (flight) JOIN runvars r ON f.flight=r.flight AND r.name=? WHERE j.job=r.job AND f.blessing=? AND f.branch=? AND j.job=? AND r.val=? AND (j.status='pass' OR j.status='fail' OR j.status='truncated'!) AND f.started IS NOT NULL AND f.started >= ? ORDER BY f.started DESC ) SELECT flight, max_finished, job, started, status, ( WITH tsteps AS ( SELECT * FROM steps WHERE flight=f.flight AND job=f.job ) , tsteps2 AS ( SELECT * FROM tsteps WHERE finished <= (SELECT finished FROM tsteps WHERE tsteps.testid = ?) ) SELECT ( SELECT max(finished)-min(started) FROM tsteps2 ) - ( SELECT sum(finished-started) FROM tsteps2 WHERE step = 'ts-hosts-allocate' ) AS duration ) FROM f Query B (combined): WITH f AS ( SELECT f.flight AS flight, s.job AS job, NULL as started, NULL as status, max(s.finished) AS max_finished FROM steps s JOIN flights f ON s.flight=f.flight WHERE s.job=? AND f.blessing=? AND f.branch=? AND s.finished IS NOT NULL AND f.started IS NOT NULL AND f.started >= ? GROUP BY f.flight, s.job ORDER BY max_finished DESC ) SELECT flight, max_finished, job, started, status, ( WITH tsteps AS ( SELECT * FROM steps WHERE flight=f.flight AND job=f.job ) , tsteps2 AS ( SELECT * FROM tsteps WHERE finished <= (SELECT finished FROM tsteps WHERE tsteps.testid = ?) ) SELECT ( SELECT max(finished)-min(started) FROM tsteps2 ) - ( SELECT sum(finished-started) FROM tsteps2 WHERE step = 'ts-hosts-allocate' ) AS duration ) FROM f Diff for query A: @@ -1,3 +1,4 @@ + WITH f AS ( SELECT f.flight AS flight, j.job AS job, f.started AS started, @@ -18,11 +19,14 @@ AND f.started >= ? ORDER BY f.started DESC + ) + SELECT flight, max_finished, job, started, status, + ( WITH tsteps AS ( SELECT * FROM steps - WHERE flight=? AND job=? + WHERE flight=f.flight AND job=f.job ) , tsteps2 AS ( @@ -42,3 +46,5 @@ WHERE step = 'ts-hosts-allocate' ) AS duration + + ) FROM f Diff for query B: @@ -1,3 +1,4 @@ + WITH f AS ( SELECT f.flight AS flight, s.job AS job, NULL as started, @@ -12,11 +13,14 @@ GROUP BY f.flight, s.job ORDER BY max_finished DESC + ) + SELECT flight, max_finished, job, started, status, + ( WITH tsteps AS ( SELECT * FROM steps - WHERE flight=? AND job=? + WHERE flight=f.flight AND job=f.job ) , tsteps2 AS ( @@ -36,3 +40,5 @@ WHERE step = 'ts-hosts-allocate' ) AS duration + + ) FROM f CC: George Dunlap <George.Dunlap@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> --- Osstest/Executive.pm | 31 ++++++++++++++++++++----------- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/Osstest/Executive.pm b/Osstest/Executive.pm index 621153ee..66c93ab9 100644 --- a/Osstest/Executive.pm +++ b/Osstest/Executive.pm @@ -1192,7 +1192,7 @@ END ( SELECT * FROM steps - WHERE flight=? AND job=? + WHERE flight=f.flight AND job=f.job ) END_ALWAYS , tsteps2 AS @@ -1216,9 +1216,20 @@ END_UPTOINCL AS duration END_ALWAYS - my $recentflights_q= $dbh_tests->prepare($recentflights_qtxt); - my $duration_anyref_q= $dbh_tests->prepare($duration_anyref_qtxt); - my $duration_duration_q = $dbh_tests->prepare($duration_duration_qtxt); + my $prepare_combi = sub { + db_prepare(<<END); + WITH f AS ( +$_[0] + ) + SELECT flight, max_finished, job, started, status, + ( +$duration_duration_qtxt + ) FROM f +END + }; + + my $recentflights_q= $prepare_combi->($recentflights_qtxt); + my $duration_anyref_q= $prepare_combi->($duration_anyref_qtxt); return sub { my ($job, $hostidname, $onhost, $uptoincl_testid) = @_; @@ -1239,14 +1250,16 @@ END_ALWAYS $branch, $job, $onhost, - $limit); + $limit, + @x_params); $refs= $recentflights_q->fetchall_arrayref({}); $recentflights_q->finish(); $dbg->("SAME-HOST GOT ".scalar(@$refs)); } if (!@$refs) { - $duration_anyref_q->execute($job, $blessing, $branch, $limit); + $duration_anyref_q->execute($job, $blessing, $branch, $limit, + @x_params); $refs= $duration_anyref_q->fetchall_arrayref({}); $duration_anyref_q->finish(); $dbg->("ANY-HOST GOT ".scalar(@$refs)); @@ -1259,11 +1272,7 @@ END_ALWAYS my $duration_max= 0; foreach my $ref (@$refs) { - my @d_d_args = ($ref->{flight}, $job); - push @d_d_args, @x_params; - $duration_duration_q->execute(@d_d_args); - my ($duration) = $duration_duration_q->fetchrow_array(); - $duration_duration_q->finish(); + my ($duration) = $ref->{duration}; if ($duration) { $dbg->("REF $ref->{flight} DURATION $duration ". ($ref->{status} // '')); -- 2.20.1
> On Jul 21, 2020, at 7:41 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote: > > After we refactor this query then we can enable the index use. > (Both of these things together in this commit because I haven't perf > tested the version with just the refactoring.) > > (We have provided an index that can answer this question really > quickly if a version is specified. But the query planner couldn't see > that because it works without seeing the bind variables, so doesn't > know that the value of name is going to be suitable for this index.) > > * Convert the two EXISTS subqueries into JOIN/AND with a DISTINCT > clause naming the fields on flights, so as to replicate the previous > result rows. Then do $selection field last. The subquery is a > convenient way to let this do the previous thing for all the values > of $selection (including, notably, *). > > * Add the additional AND clause for r.name, which has no logical > effect given the actual values of name, enabling the query planner > to use this index. > > Perf: In my test case the sg-report-flight runtime is now ~8s. I am > reasonably confident that this will not make other use cases of this > code worse. > > Perf: runtime of my test case now ~11s > > Example query before (from the Perl DBI trace): > > SELECT * > FROM flights f > WHERE > EXISTS ( > SELECT 1 > FROM runvars r > WHERE name=? > AND val=? > AND r.flight=f.flight > AND ( (CASE > WHEN (r.job) LIKE 'build-%-prev' THEN 'xprev' > WHEN ((r.job) LIKE 'build-%-freebsd' > AND 'x' = 'freebsdbuildjob') THEN 'DISCARD' > ELSE '' > END) > = '') > ) > AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) > AND (branch=?) > ORDER BY flight DESC > LIMIT 1 So this says: Get me all the columns for the highest-numbered flight Where: There is at least one runvar for that flight has the specified $name and $value And the job is *not* like build-%-prev or build-%-freebsd The flight number (?) is <= 151903, and blessing = real For the specified $branch What’s the “TRUE and flight <= 151903” for? > > After: > > SELECT * > FROM ( SELECT DISTINCT > flight, started, blessing, branch, intended > FROM flights f > JOIN runvars r USING (flight) > WHERE name=? > AND name LIKE 'revision_%' > AND val=? > AND r.flight=f.flight > AND ( (CASE > WHEN (r.job) LIKE 'build-%-prev' THEN 'xprev' > WHEN ((r.job) LIKE 'build-%-freebsd' > AND 'x' = 'freebsdbuildjob') THEN 'DISCARD' > ELSE '' > END) > = '') > AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) > AND (branch=?) > ) AS sub WHERE TRUE > ORDER BY flight DESC > LIMIT 1 And this says (effectively) Get me <flight, started, blessing, branch, intended> From the highest-numbered flight Where That flight has a runvar with specified name and value The job *doesn’t* look like “build-%-prev” or “build-%-freebsd” flight & blessing as appropriate branch as specified. Isn’t the r.flight = f.flight redundant if we’re joining on flight? Also, in spite of the paragraph attempting to explain it, I’m afraid I don’t understand what the “AS sub WHERE TRUE” is for. But it looks like the new query should do the same thing as the old query, assuming that the columns from the subquery are all the columns that you need in the correct order. -George
> On Jul 21, 2020, at 7:41 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote: > > Specifically, we narrow the initial query to flights which have at > least some job with the built_revision_foo we are looking for. > > This condition is strictly broader than that implemented inside the > flight search loop, so there is no functional change. > > Perf: runtime of my test case now ~300s-500s. > > Example query before (from the Perl DBI trace): > > SELECT * FROM ( > SELECT flight, blessing FROM flights > WHERE (branch='xen-unstable') > AND EXISTS (SELECT 1 > FROM jobs > WHERE jobs.flight = flights.flight > AND jobs.job = ?) > > AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) > ORDER BY flight DESC > LIMIT 1000 > ) AS sub > ORDER BY blessing ASC, flight DESC This one says: Find the 1000 most recent flights Where branch is "xen-unstable” one of its jobs is $job And blessing is “real” But why are we selecting ‘blessing’ from these, if we’ve specified that blessing = “real”? Isn’t that redundant? > > With these bind variables: > > "test-armhf-armhf-libvirt" > > After: > > SELECT * FROM ( > SELECT DISTINCT flight, blessing > FROM flights > JOIN runvars r1 USING (flight) > > WHERE (branch='xen-unstable') > AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) > AND EXISTS (SELECT 1 > FROM jobs > WHERE jobs.flight = flights.flight > AND jobs.job = ?) > > AND r1.name LIKE 'built_revision_%' > AND r1.name = ? > AND r1.val= ? > > ORDER BY flight DESC > LIMIT 1000 > ) AS sub > ORDER BY blessing ASC, flight DESC So this says: Find me the most 1000 recent flights Where: branch is “xen-unstable” flight <= 15903 blessing is “real” One of its jobs is $job It has a runvar matching given $name and $val And of course it uses the ’name LIKE ‘built_revision_%’ index. Still don’t understand the ’TRUE AND’ and ‘AS sub’ bits, but it looks to me like it’s substantially the same query, with additional $name = $val runvar restriction. And given that you say, "This condition is strictly broader than that implemented inside the flight search loop”, I take it that it’s again mainly to take advantage of the new index? -George
> On Jul 21, 2020, at 7:41 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:
>
> While we're here, convert this EXISTS subquery to a JOIN.
>
> Perf: runtime of my test case now ~200-300s.
>
> Example query before (from the Perl DBI trace):
>
> SELECT * FROM (
> SELECT DISTINCT flight, blessing
> FROM flights
> JOIN runvars r1 USING (flight)
>
> WHERE (branch='xen-unstable')
> AND ( (TRUE AND flight <= 151903) AND (blessing='real') )
> AND EXISTS (SELECT 1
> FROM jobs
> WHERE jobs.flight = flights.flight
> AND jobs.job = ?)
>
> AND r1.name LIKE 'built_revision_%'
> AND r1.name = ?
> AND r1.val= ?
>
> ORDER BY flight DESC
> LIMIT 1000
> ) AS sub
> ORDER BY blessing ASC, flight DESC
>
> With bind variables:
>
> "test-armhf-armhf-libvirt"
> 'built_revision_xen'
> '165f3afbfc3db70fcfdccad07085cde0a03c858b'
>
> After:
>
> WITH sub AS (
> SELECT DISTINCT flight, blessing
> FROM flights
> JOIN runvars r1 USING (flight)
>
> WHERE (branch='xen-unstable')
> AND ( (TRUE AND flight <= 151903) AND (blessing='real') )
> AND r1.name LIKE 'built_revision_%'
> AND r1.name = ?
> AND r1.val= ?
>
> ORDER BY flight DESC
> LIMIT 1000
> )
> SELECT *
> FROM sub
> JOIN jobs USING (flight)
>
> WHERE (1=1)
> AND jobs.job = ?
>
> ORDER BY blessing ASC, flight DESC
I was wondering if it would be useful converting this to a join would be useful. :-)
Again, not sure what the “(1=1) AND” bit is for; something to poke the query planner somehow?
The main thing I see here is that there’s nothing *in the query* that guarantees you won’t get multiple flights if there are multiple jobs for that flight whose ‘job’ value; but given the naming scheme so far, I’m guessing job is unique…? As long as there’s something else preventing duplication I think it’s fine.
-George
George Dunlap writes ("Re: [OSSTEST PATCH 08/14] Executive: Use index for report__find_test"): > > On Jul 21, 2020, at 7:41 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote: > > Example query before (from the Perl DBI trace): ... > So this says: > > Get me all the columns > for the highest-numbered flight > Where: > There is at least one runvar for that flight has the specified $name and $value > And the job is *not* like build-%-prev or build-%-freebsd > The flight number (?) is <= 151903, and blessing = real > For the specified $branch Yes. > What’s the “TRUE and flight <= 151903” for? These queries are programmetically constructed. In this case, the flight condition is not always there. My test case had a --max-flight=151903 on the command line: this is a debugging option. It avoids newly added stuff in the db confusing me and generally disturbing things. This is implemented with a condition variable which contains either "" or "and flight <= 151903". Doing it this way simplifies the generation code. > And this says (effectively) > > Get me <flight, started, blessing, branch, intended> > From the highest-numbered flight > Where > That flight has a runvar with specified name and value > The job *doesn’t* look like “build-%-prev” or “build-%-freebsd” > flight & blessing as appropriate > branch as specified. I think so, yes. > Isn’t the r.flight = f.flight redundant if we’re joining on flight? Indeed it is. I guess I can add a patch at theend to delete that. > Also, in spite of the paragraph attempting to explain it, I’m afraid > I don’t understand what the “AS sub WHERE TRUE” is for. The reason for the subquery is not evident in the SQL. It's because of the Perl code which generates this query. The same code is used to generate queries that start with things like SELECT * ... SELECT COUNT(*) AS count ... The perl code gets told "*" or "COUNT(*) AS count". The call sites that pass "*" expect to see fields from flights. It would be possible to change "*" to the explicit field list everywhere, but it was much easier to do it this way. (The WHERE TRUE is another one of these stubs where a condition might appear.) > But it looks like the new query should do the same thing as the old > query, assuming that the columns from the subquery are all the > columns that you need in the correct order. The subquery columns are precisely the columns currently existing in he flights table. Thanks, Ian.
George Dunlap writes ("Re: [OSSTEST PATCH 04/14] sg-report-flight: Ask the db for flights of interest"): > > On Jul 21, 2020, at 7:41 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote: > > Example query before (from the Perl DBI trace): > > > > SELECT * FROM ( > > SELECT flight, blessing FROM flights ... > > AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) ... > But why are we selecting ‘blessing’ from these, if we’ve specified that blessing = “real”? Isn’t that redundant? That condition is programmatically constructed. Sometimes it will ask for multiple different blessings and then it wants to know which. > > After: ... > So this says: > > Find me the most 1000 recent flights > Where: > branch is “xen-unstable” > flight <= 15903 > blessing is “real” > One of its jobs is $job > It has a runvar matching given $name and $val > > And of course it uses the ’name LIKE ‘built_revision_%’ index. Yes. > Still don’t understand the ’TRUE AND’ and ‘AS sub’ bits, but it > looks to me like it’s substantially the same query, with additional > $name = $val runvar restriction. That's my intent, ytes. > And given that you say, "This condition is strictly broader than > that implemented inside the flight search loop”, I take it that it’s > again mainly to take advantage of the new index? Right. The previous approach was "iterate over recent flights, figure out precisely what they built, and decide if they meet the (complex) requirements". Now we only iterate over a subset of recent flights: those which have at least one such runvar. The big commennt is meant to be a demonstration that the "(complex) requirements" are a narrower condition than the new condition on the initial flights query. So I think the result is that it will look deeper into history, and be faster, but not otherwise change its beaviour. Ian.
George Dunlap writes ("Re: [OSSTEST PATCH 05/14] sg-report-flight: Use WITH to use best index use for $flightsq"): > On Jul 21, 2020, at 7:41 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote: > > After: > > WITH sub AS ( > > SELECT DISTINCT flight, blessing > > FROM flights > > JOIN runvars r1 USING (flight) > > > > WHERE (branch='xen-unstable') > > AND ( (TRUE AND flight <= 151903) AND (blessing='real') ) > > AND r1.name LIKE 'built_revision_%' > > AND r1.name = ? > > AND r1.val= ? > > > > ORDER BY flight DESC > > LIMIT 1000 > > ) > > SELECT * > > FROM sub > > JOIN jobs USING (flight) > > > > WHERE (1=1) > > AND jobs.job = ? > > > > ORDER BY blessing ASC, flight DESC > > I was wondering if it would be useful converting this to a join would be useful. :-) ... > The main thing I see here is that there’s nothing *in the query* > that guarantees you won’t get multiple flights if there are multiple > jobs for that flight whose ‘job’ value; but given the naming scheme > so far, I’m guessing job is unique…? As long as there’s something > else preventing duplication I think it’s fine. (flight,job) is the primary key for the jobs table. I can probably produce a schema dump if that would make reading this stuff easier. Ian.
> On Jul 21, 2020, at 7:41 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:
>
> Perf: runtime of my test case now ~11s
>
> Example query before (from the Perl DBI trace):
>
> SELECT * FROM flights JOIN steps USING (flight)
> WHERE (branch='xen-unstable')
> AND job=? and testid=? and status='pass'
> AND ( (TRUE AND flight <= 151903) AND (blessing='real') )
> LIMIT 1
>
> After:
>
> WITH s AS
> (
> SELECT * FROM steps
> WHERE job=? and testid=? and status='pass'
> )
> SELECT * FROM flights JOIN s USING (flight)
> WHERE (branch='xen-unstable')
> AND ( (TRUE AND flight <= 151903) AND (blessing='real') )
> LIMIT 1
>
> In both cases with bind vars:
>
> "test-amd64-i386-xl-pvshim"
> "guest-start"
>
> Diff to the query:
>
> - SELECT * FROM flights JOIN steps USING (flight)
> + WITH s AS
> + (
> + SELECT * FROM steps
> + WHERE job=? and testid=? and status='pass'
> + )
> + SELECT * FROM flights JOIN s USING (flight)
> WHERE (branch='xen-unstable')
> - AND job=? and testid=? and status='pass'
> AND ( (TRUE AND flight <= 151903) AND (blessing='real') )
> LIMIT 1
>
> CC: George Dunlap <George.Dunlap@citrix.com>
> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
> ---
> schema/steps-job-index.sql | 2 +-
> sg-report-flight | 14 ++++++++++++--
> 2 files changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/schema/steps-job-index.sql b/schema/steps-job-index.sql
> index 07dc5a30..2c33af72 100644
> --- a/schema/steps-job-index.sql
> +++ b/schema/steps-job-index.sql
> @@ -1,4 +1,4 @@
> --- ##OSSTEST## 006 Preparatory
> +-- ##OSSTEST## 006 Needed
> --
> -- This index helps sg-report-flight find if a test ever passed.
>
> diff --git a/sg-report-flight b/sg-report-flight
> index b5398573..b8d948da 100755
> --- a/sg-report-flight
> +++ b/sg-report-flight
> @@ -849,10 +849,20 @@ sub justifyfailures ($;$) {
>
> my @failures= values %{ $fi->{Failures} };
>
> + # In psql 9.6 this WITH clause makes postgresql do the steps query
> + # first. This is good because if this test never passed we can
> + # determine that really quickly using the new index, without
> + # having to scan the flights table. (If the test passed we will
> + # probably not have to look at many flights to find one, so in
> + # that case this is not much worse.)
Seems a bit weird, but OK. The SQL looks the same, so:
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
> On Jul 21, 2020, at 7:42 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote: > > Stuff the two queries together: we use the firsty query as a WITH > clause. This is significantly faster, perhaps because the query > optimiser does a better job but probably just because it saves on > round trips. > > No functional change. > > Perf: subjectively this seemed to help when the cache was cold. Now I > have a warm cache and it doesn't seem to make much difference. > > Perf: runtime of my test case now ~5-7s. > > Example queries before (from the debugging output): > > Query A part I: > > SELECT f.flight AS flight, > j.job AS job, > f.started AS started, > j.status AS status > FROM flights f > JOIN jobs j USING (flight) > JOIN runvars r > ON f.flight=r.flight > AND r.name=? > WHERE j.job=r.job Did these last two get mixed up? My limited experience w/ JOIN ON and WHERE would lead me to expect we’re joining on `f.flight=r.flight and r.job = j.job`, and having `r.name = ?` as part of the WHERE clause. I see it’s the same in the combined query as well. > AND f.blessing=? > AND f.branch=? > AND j.job=? > AND r.val=? > AND (j.status='pass' OR j.status='fail' > OR j.status='truncated'!) > AND f.started IS NOT NULL > AND f.started >= ? > ORDER BY f.started DESC > > With bind variables: > "test-amd64-i386-xl-pvshim" > "guest-start" > > Query B part I: > > SELECT f.flight AS flight, > s.job AS job, > NULL as started, > NULL as status, > max(s.finished) AS max_finished > FROM steps s JOIN flights f > ON s.flight=f.flight > WHERE s.job=? AND f.blessing=? AND f.branch=? > AND s.finished IS NOT NULL > AND f.started IS NOT NULL > AND f.started >= ? > GROUP BY f.flight, s.job > ORDER BY max_finished DESC > > With bind variables: > "test-armhf-armhf-libvirt" > 'real' > "xen-unstable" > 1594144469 > > Query common part II: > > WITH tsteps AS > ( > SELECT * > FROM steps > WHERE flight=? AND job=? > ) > , tsteps2 AS > ( > SELECT * > FROM tsteps > WHERE finished <= > (SELECT finished > FROM tsteps > WHERE tsteps.testid = ?) > ) > SELECT ( > SELECT max(finished)-min(started) > FROM tsteps2 > ) - ( > SELECT sum(finished-started) > FROM tsteps2 > WHERE step = 'ts-hosts-allocate' > ) > AS duration Er, wait — you were doing a separate `duration` query for each row of the previous query? Yeah, that sounds like it could be a lot of round trips. :-) > > With bind variables from previous query, eg: > 152045 > "test-armhf-armhf-libvirt" > "guest-start.2" > > After: > > Query A (combined): > > WITH f AS ( > SELECT f.flight AS flight, > j.job AS job, > f.started AS started, > j.status AS status > FROM flights f > JOIN jobs j USING (flight) > JOIN runvars r > ON f.flight=r.flight > AND r.name=? > WHERE j.job=r.job > AND f.blessing=? > AND f.branch=? > AND j.job=? > AND r.val=? > AND (j.status='pass' OR j.status='fail' > OR j.status='truncated'!) > AND f.started IS NOT NULL > AND f.started >= ? > ORDER BY f.started DESC > > ) > SELECT flight, max_finished, job, started, status, > ( > WITH tsteps AS > ( > SELECT * > FROM steps > WHERE flight=f.flight AND job=f.job > ) > , tsteps2 AS > ( > SELECT * > FROM tsteps > WHERE finished <= > (SELECT finished > FROM tsteps > WHERE tsteps.testid = ?) > ) > SELECT ( > SELECT max(finished)-min(started) > FROM tsteps2 > ) - ( > SELECT sum(finished-started) > FROM tsteps2 > WHERE step = 'ts-hosts-allocate' > ) > AS duration > > ) FROM f I mean, in both queries (A and B), the transform should basically result in the same thing happening, as far as I can tell. I can try to analyze the duration query and see if I can come up with any suggestions, but that would be a different patch anyway. -George
George Dunlap writes ("Re: [OSSTEST PATCH 14/14] duration_estimator: Move duration query loop into database"): > > On Jul 21, 2020, at 7:42 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote: ... > > Example queries before (from the debugging output): > > > > Query A part I: > > > > SELECT f.flight AS flight, > > j.job AS job, > > f.started AS started, > > j.status AS status > > FROM flights f > > JOIN jobs j USING (flight) > > JOIN runvars r > > ON f.flight=r.flight > > AND r.name=? > > WHERE j.job=r.job > > Did these last two get mixed up? My limited experience w/ JOIN ON > and WHERE would lead me to expect we’re joining on > `f.flight=r.flight and r.job = j.job`, and having `r.name = ?` as > part of the WHERE clause. I see it’s the same in the combined query > as well. Well spotted. However, actually, this makes no difference: with an inner join, ON clauses are the same as WHERE clauses. It does seem stylistically poor though, so I will add a commit to change it. > > Query common part II: > > > > WITH tsteps AS > > ( > > SELECT * > > FROM steps > > WHERE flight=? AND job=? > > ) > > , tsteps2 AS > > ( > > SELECT * > > FROM tsteps > > WHERE finished <= > > (SELECT finished > > FROM tsteps > > WHERE tsteps.testid = ?) > > ) > > SELECT ( > > SELECT max(finished)-min(started) > > FROM tsteps2 > > ) - ( > > SELECT sum(finished-started) > > FROM tsteps2 > > WHERE step = 'ts-hosts-allocate' > > ) > > AS duration > > Er, wait — you were doing a separate `duration` query for each row of the previous query? Yeah, that sounds like it could be a lot of round trips. :-) I was doing, yes. This code was not really very optimised. > I mean, in both queries (A and B), the transform should basically result in the same thing happening, as far as I can tell. Good, thanks. > I can try to analyze the duration query and see if I can come up with any suggestions, but that would be a different patch anyway. It's fast enough now :-). Thanks, Ian.
George Dunlap writes ("Re: [OSSTEST PATCH 06/14] sg-report-flight: Use WITH clause to use index for $anypassq"):
> > On Jul 21, 2020, at 7:41 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:
> > + # In psql 9.6 this WITH clause makes postgresql do the steps query
> > + # first. This is good because if this test never passed we can
> > + # determine that really quickly using the new index, without
> > + # having to scan the flights table. (If the test passed we will
> > + # probably not have to look at many flights to find one, so in
> > + # that case this is not much worse.)
>
> Seems a bit weird, but OK. The SQL looks the same, so:
>
> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Thanks. This business with the WITH clause as an optimisation fence
is well-known in Postgres circles, it seems.
Ian.
> On Jul 31, 2020, at 11:39 AM, Ian Jackson <ian.jackson@citrix.com> wrote:
>
> George Dunlap writes ("Re: [OSSTEST PATCH 14/14] duration_estimator: Move duration query loop into database"):
>>> On Jul 21, 2020, at 7:42 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:
> ...
>>> Example queries before (from the debugging output):
>>>
>>> Query A part I:
>>>
>>> SELECT f.flight AS flight,
>>> j.job AS job,
>>> f.started AS started,
>>> j.status AS status
>>> FROM flights f
>>> JOIN jobs j USING (flight)
>>> JOIN runvars r
>>> ON f.flight=r.flight
>>> AND r.name=?
>>> WHERE j.job=r.job
>>
>> Did these last two get mixed up? My limited experience w/ JOIN ON
>> and WHERE would lead me to expect we’re joining on
>> `f.flight=r.flight and r.job = j.job`, and having `r.name = ?` as
>> part of the WHERE clause. I see it’s the same in the combined query
>> as well.
>
> Well spotted. However, actually, this makes no difference: with an
> inner join, ON clauses are the same as WHERE clauses. It does seem
> stylistically poor though, so I will add a commit to change it.
Yeah, in my tiny amount of experience with SQLite, putting this sort of restriction in WHERE rather than ON didn’t seem to make a practical difference; no doubt the query planner is smart enough to DTRT. But switching them should make it slightly easier for humans to parse, so is probably worth doing while you’re here, if you have a few spare cycles.
Thanks,
-George