From: George Dunlap <George.Dunlap@citrix.com>
To: Ian Jackson <Ian.Jackson@citrix.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: Re: [OSSTEST PATCH 14/14] duration_estimator: Move duration query loop into database
Date: Mon, 27 Jul 2020 17:43:54 +0000 [thread overview]
Message-ID: <7A4B6786-4456-44E4-A85D-9CC83B522FBB@citrix.com> (raw)
In-Reply-To: <20200721184205.15232-15-ian.jackson@eu.citrix.com>
> On Jul 21, 2020, at 7:42 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:
>
> Stuff the two queries together: we use the firsty query as a WITH
> clause. This is significantly faster, perhaps because the query
> optimiser does a better job but probably just because it saves on
> round trips.
>
> No functional change.
>
> Perf: subjectively this seemed to help when the cache was cold. Now I
> have a warm cache and it doesn't seem to make much difference.
>
> Perf: runtime of my test case now ~5-7s.
>
> Example queries before (from the debugging output):
>
> Query A part I:
>
> SELECT f.flight AS flight,
> j.job AS job,
> f.started AS started,
> j.status AS status
> FROM flights f
> JOIN jobs j USING (flight)
> JOIN runvars r
> ON f.flight=r.flight
> AND r.name=?
> WHERE j.job=r.job
Did these last two get mixed up? My limited experience w/ JOIN ON and WHERE would lead me to expect we’re joining on `f.flight=r.flight and r.job = j.job`, and having `r.name = ?` as part of the WHERE clause. I see it’s the same in the combined query as well.
> AND f.blessing=?
> AND f.branch=?
> AND j.job=?
> AND r.val=?
> AND (j.status='pass' OR j.status='fail'
> OR j.status='truncated'!)
> AND f.started IS NOT NULL
> AND f.started >= ?
> ORDER BY f.started DESC
>
> With bind variables:
> "test-amd64-i386-xl-pvshim"
> "guest-start"
>
> Query B part I:
>
> SELECT f.flight AS flight,
> s.job AS job,
> NULL as started,
> NULL as status,
> max(s.finished) AS max_finished
> FROM steps s JOIN flights f
> ON s.flight=f.flight
> WHERE s.job=? AND f.blessing=? AND f.branch=?
> AND s.finished IS NOT NULL
> AND f.started IS NOT NULL
> AND f.started >= ?
> GROUP BY f.flight, s.job
> ORDER BY max_finished DESC
>
> With bind variables:
> "test-armhf-armhf-libvirt"
> 'real'
> "xen-unstable"
> 1594144469
>
> Query common part II:
>
> WITH tsteps AS
> (
> SELECT *
> FROM steps
> WHERE flight=? AND job=?
> )
> , tsteps2 AS
> (
> SELECT *
> FROM tsteps
> WHERE finished <=
> (SELECT finished
> FROM tsteps
> WHERE tsteps.testid = ?)
> )
> SELECT (
> SELECT max(finished)-min(started)
> FROM tsteps2
> ) - (
> SELECT sum(finished-started)
> FROM tsteps2
> WHERE step = 'ts-hosts-allocate'
> )
> AS duration
Er, wait — you were doing a separate `duration` query for each row of the previous query? Yeah, that sounds like it could be a lot of round trips. :-)
>
> With bind variables from previous query, eg:
> 152045
> "test-armhf-armhf-libvirt"
> "guest-start.2"
>
> After:
>
> Query A (combined):
>
> WITH f AS (
> SELECT f.flight AS flight,
> j.job AS job,
> f.started AS started,
> j.status AS status
> FROM flights f
> JOIN jobs j USING (flight)
> JOIN runvars r
> ON f.flight=r.flight
> AND r.name=?
> WHERE j.job=r.job
> AND f.blessing=?
> AND f.branch=?
> AND j.job=?
> AND r.val=?
> AND (j.status='pass' OR j.status='fail'
> OR j.status='truncated'!)
> AND f.started IS NOT NULL
> AND f.started >= ?
> ORDER BY f.started DESC
>
> )
> SELECT flight, max_finished, job, started, status,
> (
> WITH tsteps AS
> (
> SELECT *
> FROM steps
> WHERE flight=f.flight AND job=f.job
> )
> , tsteps2 AS
> (
> SELECT *
> FROM tsteps
> WHERE finished <=
> (SELECT finished
> FROM tsteps
> WHERE tsteps.testid = ?)
> )
> SELECT (
> SELECT max(finished)-min(started)
> FROM tsteps2
> ) - (
> SELECT sum(finished-started)
> FROM tsteps2
> WHERE step = 'ts-hosts-allocate'
> )
> AS duration
>
> ) FROM f
I mean, in both queries (A and B), the transform should basically result in the same thing happening, as far as I can tell.
I can try to analyze the duration query and see if I can come up with any suggestions, but that would be a different patch anyway.
-George
next prev parent reply other threads:[~2020-07-27 17:44 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-21 18:41 [OSSTEST PATCH 00/14] Flight report performance improvements Ian Jackson
2020-07-21 18:41 ` [OSSTEST PATCH 01/14] sg-report-flight: Add a comment re same-flight search narrowing Ian Jackson
2020-07-21 18:41 ` [OSSTEST PATCH 02/14] sg-report-flight: Sort failures by job name as last resort Ian Jackson
2020-07-21 18:41 ` [OSSTEST PATCH 03/14] schema: Provide indices for sg-report-flight Ian Jackson
2020-07-21 18:41 ` [OSSTEST PATCH 04/14] sg-report-flight: Ask the db for flights of interest Ian Jackson
2020-07-22 12:10 ` George Dunlap
2020-07-22 14:03 ` Ian Jackson
2020-07-21 18:41 ` [OSSTEST PATCH 05/14] sg-report-flight: Use WITH to use best index use for $flightsq Ian Jackson
2020-07-22 12:47 ` George Dunlap
2020-07-22 14:06 ` Ian Jackson
2020-07-21 18:41 ` [OSSTEST PATCH 06/14] sg-report-flight: Use WITH clause to use index for $anypassq Ian Jackson
2020-07-27 16:15 ` George Dunlap
2020-07-31 10:41 ` Ian Jackson
2020-07-21 18:41 ` [OSSTEST PATCH 07/14] sg-report-flight: Use the job row from the intitial query Ian Jackson
2020-07-21 18:41 ` [OSSTEST PATCH 08/14] Executive: Use index for report__find_test Ian Jackson
2020-07-22 11:33 ` George Dunlap
2020-07-22 13:49 ` Ian Jackson
2020-07-21 18:42 ` [OSSTEST PATCH 09/14] duration_estimator: Ignore truncated jobs unless we know the step Ian Jackson
2020-07-21 18:42 ` [OSSTEST PATCH 10/14] duration_estimator: Introduce some _qtxt variables Ian Jackson
2020-07-21 18:42 ` [OSSTEST PATCH 11/14] duration_estimator: Explicitly provide null in general host q Ian Jackson
2020-07-21 18:42 ` [OSSTEST PATCH 12/14] duration_estimator: Return job column in first query Ian Jackson
2020-07-21 18:42 ` [OSSTEST PATCH 13/14] duration_estimator: Move $uptincl_testid to separate @x_params Ian Jackson
2020-07-21 18:42 ` [OSSTEST PATCH 14/14] duration_estimator: Move duration query loop into database Ian Jackson
2020-07-27 17:43 ` George Dunlap [this message]
2020-07-31 10:39 ` Ian Jackson
2020-07-31 10:45 ` George Dunlap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7A4B6786-4456-44E4-A85D-9CC83B522FBB@citrix.com \
--to=george.dunlap@citrix.com \
--cc=Ian.Jackson@citrix.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).