xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [Bugfix PATCH for-4.15] xen: credit2: fix per-entity load tracking when continuing running
@ 2021-03-19 12:14 Dario Faggioli
  2021-04-27  8:35 ` Ping: " Jan Beulich
  2021-06-07 11:44 ` George Dunlap
  0 siblings, 2 replies; 4+ messages in thread
From: Dario Faggioli @ 2021-03-19 12:14 UTC (permalink / raw)
  To: xen-devel; +Cc: George Dunlap, Ian Jackson

If we schedule, and the current vCPU continues to run, its statistical
load is not properly updated, resulting in something like this, even if
all the 8 vCPUs are 100% busy:

(XEN) Runqueue 0:
(XEN) [...]
(XEN)   aveload            = 2097152 (~800%)
(XEN) [...]
(XEN)   Domain: 0 w 256 c 0 v 8
(XEN)     1: [0.0] flags=2 cpu=4 credit=9996885 [w=256] load=35 (~0%)
(XEN)     2: [0.1] flags=2 cpu=2 credit=9993725 [w=256] load=796 (~0%)
(XEN)     3: [0.2] flags=2 cpu=1 credit=9995885 [w=256] load=883 (~0%)
(XEN)     4: [0.3] flags=2 cpu=5 credit=9998833 [w=256] load=487 (~0%)
(XEN)     5: [0.4] flags=2 cpu=6 credit=9998942 [w=256] load=1595 (~0%)
(XEN)     6: [0.5] flags=2 cpu=0 credit=9994669 [w=256] load=22 (~0%)
(XEN)     7: [0.6] flags=2 cpu=7 credit=9997706 [w=256] load=0 (~0%)
(XEN)     8: [0.7] flags=2 cpu=3 credit=9992440 [w=256] load=0 (~0%)

As we can see, the average load of the runqueue as a whole is, instead,
computed properly.

This issue would, in theory, potentially affect Credit2 load balancing
logic. In practice, however, the problem only manifests (at least with
these characteristics) when there is only 1 runqueue active in the
cpupool, which also means there is no need to do any load-balancing.

Hence its real impact is pretty much limited to wrong per-vCPU load
percentages, when looking at the output of the 'r' debug-key.

With this patch, the load is updated and displayed correctly:

(XEN) Runqueue 0:
(XEN) [...]
(XEN)   aveload            = 2097152 (~800%)
(XEN) [...]
(XEN) Domain info:
(XEN)   Domain: 0 w 256 c 0 v 8
(XEN)     1: [0.0] flags=2 cpu=4 credit=9995584 [w=256] load=262144 (~100%)
(XEN)     2: [0.1] flags=2 cpu=6 credit=9992992 [w=256] load=262144 (~100%)
(XEN)     3: [0.2] flags=2 cpu=3 credit=9998918 [w=256] load=262118 (~99%)
(XEN)     4: [0.3] flags=2 cpu=5 credit=9996867 [w=256] load=262144 (~100%)
(XEN)     5: [0.4] flags=2 cpu=1 credit=9998912 [w=256] load=262144 (~100%)
(XEN)     6: [0.5] flags=2 cpu=2 credit=9997842 [w=256] load=262144 (~100%)
(XEN)     7: [0.6] flags=2 cpu=7 credit=9994623 [w=256] load=262144 (~100%)
(XEN)     8: [0.7] flags=2 cpu=0 credit=9991815 [w=256] load=262144 (~100%)

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Ian Jackson <iwj@xenproject.org>
---
Despite the limited effect, it's a bug. So:
- it should be backported;
- I think it should be included in 4.15. The risk is pretty low, for
  the same reasons already explained when describing its limited impact.
---
 xen/common/sched/credit2.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index eb5e5a78c5..b3b5de94cf 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -3646,6 +3646,8 @@ static void csched2_schedule(
             runq_remove(snext);
             __set_bit(__CSFLAG_scheduled, &snext->flags);
         }
+        else
+            update_load(ops, rqd, snext, 0, now);
 
         /* Clear the idle mask if necessary */
         if ( cpumask_test_cpu(sched_cpu, &rqd->idle) )




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Ping: [Bugfix PATCH for-4.15] xen: credit2: fix per-entity load tracking when continuing running
  2021-03-19 12:14 [Bugfix PATCH for-4.15] xen: credit2: fix per-entity load tracking when continuing running Dario Faggioli
@ 2021-04-27  8:35 ` Jan Beulich
  2021-05-28 15:18   ` Dario Faggioli
  2021-06-07 11:44 ` George Dunlap
  1 sibling, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2021-04-27  8:35 UTC (permalink / raw)
  To: Dario Faggioli, George Dunlap; +Cc: Ian Jackson, xen-devel

On 19.03.2021 13:14, Dario Faggioli wrote:
> If we schedule, and the current vCPU continues to run, its statistical
> load is not properly updated, resulting in something like this, even if
> all the 8 vCPUs are 100% busy:
> 
> (XEN) Runqueue 0:
> (XEN) [...]
> (XEN)   aveload            = 2097152 (~800%)
> (XEN) [...]
> (XEN)   Domain: 0 w 256 c 0 v 8
> (XEN)     1: [0.0] flags=2 cpu=4 credit=9996885 [w=256] load=35 (~0%)
> (XEN)     2: [0.1] flags=2 cpu=2 credit=9993725 [w=256] load=796 (~0%)
> (XEN)     3: [0.2] flags=2 cpu=1 credit=9995885 [w=256] load=883 (~0%)
> (XEN)     4: [0.3] flags=2 cpu=5 credit=9998833 [w=256] load=487 (~0%)
> (XEN)     5: [0.4] flags=2 cpu=6 credit=9998942 [w=256] load=1595 (~0%)
> (XEN)     6: [0.5] flags=2 cpu=0 credit=9994669 [w=256] load=22 (~0%)
> (XEN)     7: [0.6] flags=2 cpu=7 credit=9997706 [w=256] load=0 (~0%)
> (XEN)     8: [0.7] flags=2 cpu=3 credit=9992440 [w=256] load=0 (~0%)
> 
> As we can see, the average load of the runqueue as a whole is, instead,
> computed properly.
> 
> This issue would, in theory, potentially affect Credit2 load balancing
> logic. In practice, however, the problem only manifests (at least with
> these characteristics) when there is only 1 runqueue active in the
> cpupool, which also means there is no need to do any load-balancing.
> 
> Hence its real impact is pretty much limited to wrong per-vCPU load
> percentages, when looking at the output of the 'r' debug-key.
> 
> With this patch, the load is updated and displayed correctly:
> 
> (XEN) Runqueue 0:
> (XEN) [...]
> (XEN)   aveload            = 2097152 (~800%)
> (XEN) [...]
> (XEN) Domain info:
> (XEN)   Domain: 0 w 256 c 0 v 8
> (XEN)     1: [0.0] flags=2 cpu=4 credit=9995584 [w=256] load=262144 (~100%)
> (XEN)     2: [0.1] flags=2 cpu=6 credit=9992992 [w=256] load=262144 (~100%)
> (XEN)     3: [0.2] flags=2 cpu=3 credit=9998918 [w=256] load=262118 (~99%)
> (XEN)     4: [0.3] flags=2 cpu=5 credit=9996867 [w=256] load=262144 (~100%)
> (XEN)     5: [0.4] flags=2 cpu=1 credit=9998912 [w=256] load=262144 (~100%)
> (XEN)     6: [0.5] flags=2 cpu=2 credit=9997842 [w=256] load=262144 (~100%)
> (XEN)     7: [0.6] flags=2 cpu=7 credit=9994623 [w=256] load=262144 (~100%)
> (XEN)     8: [0.7] flags=2 cpu=0 credit=9991815 [w=256] load=262144 (~100%)
> 
> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Ian Jackson <iwj@xenproject.org>
> ---
> Despite the limited effect, it's a bug. So:
> - it should be backported;
> - I think it should be included in 4.15. The risk is pretty low, for
>   the same reasons already explained when describing its limited impact.

I'm a little puzzled to find this is still in my waiting-to-go-in
folder, for not having had an ack (or otherwise). George?

Jan

> ---
>  xen/common/sched/credit2.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
> index eb5e5a78c5..b3b5de94cf 100644
> --- a/xen/common/sched/credit2.c
> +++ b/xen/common/sched/credit2.c
> @@ -3646,6 +3646,8 @@ static void csched2_schedule(
>              runq_remove(snext);
>              __set_bit(__CSFLAG_scheduled, &snext->flags);
>          }
> +        else
> +            update_load(ops, rqd, snext, 0, now);
>  
>          /* Clear the idle mask if necessary */
>          if ( cpumask_test_cpu(sched_cpu, &rqd->idle) )
> 
> 
> 



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Ping: [Bugfix PATCH for-4.15] xen: credit2: fix per-entity load tracking when continuing running
  2021-04-27  8:35 ` Ping: " Jan Beulich
@ 2021-05-28 15:18   ` Dario Faggioli
  0 siblings, 0 replies; 4+ messages in thread
From: Dario Faggioli @ 2021-05-28 15:18 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 1145 bytes --]

On Tue, 2021-04-27 at 10:35 +0200, Jan Beulich wrote:
> On 19.03.2021 13:14, Dario Faggioli wrote:
> > 
> > ---
> > Cc: George Dunlap <george.dunlap@citrix.com>
> > Cc: Ian Jackson <iwj@xenproject.org>
> > ---
> > Despite the limited effect, it's a bug. So:
> > - it should be backported;
> > - I think it should be included in 4.15. The risk is pretty low,
> > for
> >   the same reasons already explained when describing its limited
> > impact.
> 
> I'm a little puzzled to find this is still in my waiting-to-go-in
> folder, for not having had an ack (or otherwise). George?
> 
Yeah, and it probably still is, so... George? :-D

BTW, I'm dropping IanJ as, quite obviously,this won't go in 4.15. :-P
It should however (after it goes in) be backported (to 4.15 and
probably even earlier... I can have a look myself if it helps).

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Bugfix PATCH for-4.15] xen: credit2: fix per-entity load tracking when continuing running
  2021-03-19 12:14 [Bugfix PATCH for-4.15] xen: credit2: fix per-entity load tracking when continuing running Dario Faggioli
  2021-04-27  8:35 ` Ping: " Jan Beulich
@ 2021-06-07 11:44 ` George Dunlap
  1 sibling, 0 replies; 4+ messages in thread
From: George Dunlap @ 2021-06-07 11:44 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel, Ian Jackson



> On Mar 19, 2021, at 12:14 PM, Dario Faggioli <dfaggioli@suse.com> wrote:
> 
> If we schedule, and the current vCPU continues to run, its statistical
> load is not properly updated, resulting in something like this, even if
> all the 8 vCPUs are 100% busy:
> 
> (XEN) Runqueue 0:
> (XEN) [...]
> (XEN)   aveload            = 2097152 (~800%)
> (XEN) [...]
> (XEN)   Domain: 0 w 256 c 0 v 8
> (XEN)     1: [0.0] flags=2 cpu=4 credit=9996885 [w=256] load=35 (~0%)
> (XEN)     2: [0.1] flags=2 cpu=2 credit=9993725 [w=256] load=796 (~0%)
> (XEN)     3: [0.2] flags=2 cpu=1 credit=9995885 [w=256] load=883 (~0%)
> (XEN)     4: [0.3] flags=2 cpu=5 credit=9998833 [w=256] load=487 (~0%)
> (XEN)     5: [0.4] flags=2 cpu=6 credit=9998942 [w=256] load=1595 (~0%)
> (XEN)     6: [0.5] flags=2 cpu=0 credit=9994669 [w=256] load=22 (~0%)
> (XEN)     7: [0.6] flags=2 cpu=7 credit=9997706 [w=256] load=0 (~0%)
> (XEN)     8: [0.7] flags=2 cpu=3 credit=9992440 [w=256] load=0 (~0%)
> 
> As we can see, the average load of the runqueue as a whole is, instead,
> computed properly.
> 
> This issue would, in theory, potentially affect Credit2 load balancing
> logic. In practice, however, the problem only manifests (at least with
> these characteristics) when there is only 1 runqueue active in the
> cpupool, which also means there is no need to do any load-balancing.
> 
> Hence its real impact is pretty much limited to wrong per-vCPU load
> percentages, when looking at the output of the 'r' debug-key.
> 
> With this patch, the load is updated and displayed correctly:
> 
> (XEN) Runqueue 0:
> (XEN) [...]
> (XEN)   aveload            = 2097152 (~800%)
> (XEN) [...]
> (XEN) Domain info:
> (XEN)   Domain: 0 w 256 c 0 v 8
> (XEN)     1: [0.0] flags=2 cpu=4 credit=9995584 [w=256] load=262144 (~100%)
> (XEN)     2: [0.1] flags=2 cpu=6 credit=9992992 [w=256] load=262144 (~100%)
> (XEN)     3: [0.2] flags=2 cpu=3 credit=9998918 [w=256] load=262118 (~99%)
> (XEN)     4: [0.3] flags=2 cpu=5 credit=9996867 [w=256] load=262144 (~100%)
> (XEN)     5: [0.4] flags=2 cpu=1 credit=9998912 [w=256] load=262144 (~100%)
> (XEN)     6: [0.5] flags=2 cpu=2 credit=9997842 [w=256] load=262144 (~100%)
> (XEN)     7: [0.6] flags=2 cpu=7 credit=9994623 [w=256] load=262144 (~100%)
> (XEN)     8: [0.7] flags=2 cpu=0 credit=9991815 [w=256] load=262144 (~100%)
> 
> Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Ian Jackson <iwj@xenproject.org>
> ---
> Despite the limited effect, it's a bug. So:
> - it should be backported;
> - I think it should be included in 4.15. The risk is pretty low, for
>  the same reasons already explained when describing its limited impact.
> ---
> xen/common/sched/credit2.c |    2 ++
> 1 file changed, 2 insertions(+)
> 
> diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
> index eb5e5a78c5..b3b5de94cf 100644
> --- a/xen/common/sched/credit2.c
> +++ b/xen/common/sched/credit2.c
> @@ -3646,6 +3646,8 @@ static void csched2_schedule(
>             runq_remove(snext);
>             __set_bit(__CSFLAG_scheduled, &snext->flags);
>         }
> +        else
> +            update_load(ops, rqd, snext, 0, now);

I feel like there must be a better way to do this than just bruteforce remember everywhere we could possibly need to update the load.  But at any rate:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-06-07 11:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-19 12:14 [Bugfix PATCH for-4.15] xen: credit2: fix per-entity load tracking when continuing running Dario Faggioli
2021-04-27  8:35 ` Ping: " Jan Beulich
2021-05-28 15:18   ` Dario Faggioli
2021-06-07 11:44 ` George Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).