xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: George Dunlap <George.Dunlap@citrix.com>
To: Ian Jackson <Ian.Jackson@citrix.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: Re: [OSSTEST PATCH 14/14] duration_estimator: Move duration query loop into database
Date: Mon, 27 Jul 2020 17:43:54 +0000	[thread overview]
Message-ID: <7A4B6786-4456-44E4-A85D-9CC83B522FBB@citrix.com> (raw)
In-Reply-To: <20200721184205.15232-15-ian.jackson@eu.citrix.com>



> On Jul 21, 2020, at 7:42 PM, Ian Jackson <ian.jackson@eu.citrix.com> wrote:
> 
> Stuff the two queries together: we use the firsty query as a WITH
> clause.  This is significantly faster, perhaps because the query
> optimiser does a better job but probably just because it saves on
> round trips.
> 
> No functional change.
> 
> Perf: subjectively this seemed to help when the cache was cold.  Now I
> have a warm cache and it doesn't seem to make much difference.
> 
> Perf: runtime of my test case now ~5-7s.
> 
> Example queries before (from the debugging output):
> 
> Query A part I:
> 
>            SELECT f.flight AS flight,
>                   j.job AS job,
>                   f.started AS started,
>                   j.status AS status
>                     FROM flights f
>                     JOIN jobs j USING (flight)
>                     JOIN runvars r
>                             ON  f.flight=r.flight
>                            AND  r.name=?
>                    WHERE  j.job=r.job

Did these last two get mixed up?  My limited experience w/ JOIN ON and WHERE would lead me to expect we’re joining on `f.flight=r.flight and r.job = j.job`, and having `r.name = ?` as part of the WHERE clause.  I see it’s the same in the combined query as well.

>                      AND  f.blessing=?
>                      AND  f.branch=?
>                      AND  j.job=?
>                      AND  r.val=?
>                      AND  (j.status='pass' OR j.status='fail'
>                           OR j.status='truncated'!)
>                      AND  f.started IS NOT NULL
>                      AND  f.started >= ?
>                 ORDER BY f.started DESC
> 
> With bind variables:
>     "test-amd64-i386-xl-pvshim"
>     "guest-start"
> 
> Query B part I:
> 
>            SELECT f.flight AS flight,
>                   s.job AS job,
>                   NULL as started,
>                   NULL as status,
>                   max(s.finished) AS max_finished
>                      FROM steps s JOIN flights f
>                        ON s.flight=f.flight
>                     WHERE s.job=? AND f.blessing=? AND f.branch=?
>                       AND s.finished IS NOT NULL
>                       AND f.started IS NOT NULL
>                       AND f.started >= ?
>                     GROUP BY f.flight, s.job
>                     ORDER BY max_finished DESC
> 
> With bind variables:
>    "test-armhf-armhf-libvirt"
>    'real'
>    "xen-unstable"
>    1594144469
> 
> Query common part II:
> 
>        WITH tsteps AS
>        (
>            SELECT *
>              FROM steps
>             WHERE flight=? AND job=?
>        )
>        , tsteps2 AS
>        (
>            SELECT *
>              FROM tsteps
>             WHERE finished <=
>                     (SELECT finished
>                        FROM tsteps
>                       WHERE tsteps.testid = ?)
>        )
>        SELECT (
>            SELECT max(finished)-min(started)
>              FROM tsteps2
>          ) - (
>            SELECT sum(finished-started)
>              FROM tsteps2
>             WHERE step = 'ts-hosts-allocate'
>          )
>                AS duration

Er, wait — you were doing a separate `duration` query for each row of the previous query?  Yeah, that sounds like it could be a lot of round trips. :-)

> 
> With bind variables from previous query, eg:
>     152045
>     "test-armhf-armhf-libvirt"
>     "guest-start.2"
> 
> After:
> 
> Query A (combined):
> 
>            WITH f AS (
>            SELECT f.flight AS flight,
>                   j.job AS job,
>                   f.started AS started,
>                   j.status AS status
>                     FROM flights f
>                     JOIN jobs j USING (flight)
>                     JOIN runvars r
>                             ON  f.flight=r.flight
>                            AND  r.name=?
>                    WHERE  j.job=r.job
>                      AND  f.blessing=?
>                      AND  f.branch=?
>                      AND  j.job=?
>                      AND  r.val=?
>                      AND  (j.status='pass' OR j.status='fail'
>                           OR j.status='truncated'!)
>                      AND  f.started IS NOT NULL
>                      AND  f.started >= ?
>                 ORDER BY f.started DESC
> 
>            )
>            SELECT flight, max_finished, job, started, status,
>            (
>        WITH tsteps AS
>        (
>            SELECT *
>              FROM steps
>             WHERE flight=f.flight AND job=f.job
>        )
>        , tsteps2 AS
>        (
>            SELECT *
>              FROM tsteps
>             WHERE finished <=
>                     (SELECT finished
>                        FROM tsteps
>                       WHERE tsteps.testid = ?)
>        )
>        SELECT (
>            SELECT max(finished)-min(started)
>              FROM tsteps2
>          ) - (
>            SELECT sum(finished-started)
>              FROM tsteps2
>             WHERE step = 'ts-hosts-allocate'
>          )
>                AS duration
> 
>            ) FROM f

I mean, in both queries (A and B), the transform should basically result in the same thing happening, as far as I can tell.

I can try to analyze the duration query and see if I can come up with any suggestions, but that would be a different patch anyway.

 -George


  reply	other threads:[~2020-07-27 17:44 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-21 18:41 [OSSTEST PATCH 00/14] Flight report performance improvements Ian Jackson
2020-07-21 18:41 ` [OSSTEST PATCH 01/14] sg-report-flight: Add a comment re same-flight search narrowing Ian Jackson
2020-07-21 18:41 ` [OSSTEST PATCH 02/14] sg-report-flight: Sort failures by job name as last resort Ian Jackson
2020-07-21 18:41 ` [OSSTEST PATCH 03/14] schema: Provide indices for sg-report-flight Ian Jackson
2020-07-21 18:41 ` [OSSTEST PATCH 04/14] sg-report-flight: Ask the db for flights of interest Ian Jackson
2020-07-22 12:10   ` George Dunlap
2020-07-22 14:03     ` Ian Jackson
2020-07-21 18:41 ` [OSSTEST PATCH 05/14] sg-report-flight: Use WITH to use best index use for $flightsq Ian Jackson
2020-07-22 12:47   ` George Dunlap
2020-07-22 14:06     ` Ian Jackson
2020-07-21 18:41 ` [OSSTEST PATCH 06/14] sg-report-flight: Use WITH clause to use index for $anypassq Ian Jackson
2020-07-27 16:15   ` George Dunlap
2020-07-31 10:41     ` Ian Jackson
2020-07-21 18:41 ` [OSSTEST PATCH 07/14] sg-report-flight: Use the job row from the intitial query Ian Jackson
2020-07-21 18:41 ` [OSSTEST PATCH 08/14] Executive: Use index for report__find_test Ian Jackson
2020-07-22 11:33   ` George Dunlap
2020-07-22 13:49     ` Ian Jackson
2020-07-21 18:42 ` [OSSTEST PATCH 09/14] duration_estimator: Ignore truncated jobs unless we know the step Ian Jackson
2020-07-21 18:42 ` [OSSTEST PATCH 10/14] duration_estimator: Introduce some _qtxt variables Ian Jackson
2020-07-21 18:42 ` [OSSTEST PATCH 11/14] duration_estimator: Explicitly provide null in general host q Ian Jackson
2020-07-21 18:42 ` [OSSTEST PATCH 12/14] duration_estimator: Return job column in first query Ian Jackson
2020-07-21 18:42 ` [OSSTEST PATCH 13/14] duration_estimator: Move $uptincl_testid to separate @x_params Ian Jackson
2020-07-21 18:42 ` [OSSTEST PATCH 14/14] duration_estimator: Move duration query loop into database Ian Jackson
2020-07-27 17:43   ` George Dunlap [this message]
2020-07-31 10:39     ` Ian Jackson
2020-07-31 10:45       ` George Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7A4B6786-4456-44E4-A85D-9CC83B522FBB@citrix.com \
    --to=george.dunlap@citrix.com \
    --cc=Ian.Jackson@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).