All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Odin Ugedal <odin@uged.al>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	"open list:CONTROL GROUP (CGROUP)" <cgroups@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/3] sched/fair: Add tg_load_contrib cfs_rq decay checking
Date: Wed, 26 May 2021 12:50:35 +0200	[thread overview]
Message-ID: <CAKfTPtAFn3=anfTCxKTDXF0wpttpEiAhksLvcEPdSiYZTj38_A@mail.gmail.com> (raw)
In-Reply-To: <CAKfTPtCaZOSEzRXVN9fTR2vTxGiANEARo6iDNMFiQV5=qAA4Tw@mail.gmail.com>

On Tue, 25 May 2021 at 16:30, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Tue, 25 May 2021 at 12:34, Odin Ugedal <odin@uged.al> wrote:
> >
> > Hi,
> >
> > tir. 25. mai 2021 kl. 11:58 skrev Vincent Guittot <vincent.guittot@linaro.org>:
> > > Could you give more details about how cfs_rq->avg.load_avg = 4 but
> > > cfs_rq->avg.load_sum = 0 ?
>
> > >
> > > cfs_rq->avg.load_sum is decayed and can become null when crossing
> > > period which implies an update of cfs_rq->avg.load_avg.  This means
> > > that your case is generated by something outside the pelt formula ...
> > > like maybe the propagation of load in the tree. If this is the case,
> > > we should find the error and fix it
> >
> > Ahh, yeah, that could probably be described better.
> >
> > It is (as far as I have found out) because the pelt divider is changed,
> > and the output from "get_pelt_divider(&cfs_rq->avg)" is changed, resulting
> > in a different value being removed than added.
>
> ok so IIUC, it happens during the adding/removing/propagating
> entities' load in the cfs_rq.
>
> >
> > Inside pelt itself, this cannot happen. When pelt changes the load_sum, it
> > recalculates the load_avg based on load_sum, and not the delta, afaik.
> >
> > And as you say, the "issue" therefore (as I see it) outside of PELT. Due to
> > how the pelt divider is changed, I assume it is hard to pinpoint where the issue
> > is. I can try to find a clear path where where we can see what is added
> > and what is removed from both cfs_rq->avg.load_sum and cfs_rq->avg.load_avg,
> > to better be able to pinpoint what is happening.
> >
> > Previously I thought this was a result of precision loss due to division and
> > multiplication during load add/remove inside fair.c, but I am not sure that
> > is the issue, or is it?
>
> I don't think the precision looss is the problem because
> adding/removing load in fair.c could truncate load_sum but it stays
> sync with load_avg. I will have a llo to see if i can see something
> weird

I have added a trace in cfs_rq_is_decayed() but I'm not able to
reproduce a situation where load_sum == 0 but not load_avg  even with
the script in the cover letter


>
> >
> > If my above line of thought makes sense, do you still view this as an error
> > outside PELT, or do you see another possible/better solution?
> >
> > Will investigate further.
>
> Thanks
>
> >
> > Thanks
> > Odin

WARNING: multiple messages have this Message-ID (diff)
From: Vincent Guittot <vincent.guittot-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
To: Odin Ugedal <odin-RObV4cXtwVA@public.gmane.org>
Cc: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
	Juri Lelli <juri.lelli-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Dietmar Eggemann <dietmar.eggemann-5wv7dgnIgG8@public.gmane.org>,
	Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>,
	Ben Segall <bsegall-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Mel Gorman <mgorman-l3A5Bk7waGM@public.gmane.org>,
	Daniel Bristot de Oliveira
	<bristot-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	"open list:CONTROL GROUP (CGROUP)"
	<cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	linux-kernel
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH 1/3] sched/fair: Add tg_load_contrib cfs_rq decay checking
Date: Wed, 26 May 2021 12:50:35 +0200	[thread overview]
Message-ID: <CAKfTPtAFn3=anfTCxKTDXF0wpttpEiAhksLvcEPdSiYZTj38_A@mail.gmail.com> (raw)
In-Reply-To: <CAKfTPtCaZOSEzRXVN9fTR2vTxGiANEARo6iDNMFiQV5=qAA4Tw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Tue, 25 May 2021 at 16:30, Vincent Guittot
<vincent.guittot-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>
> On Tue, 25 May 2021 at 12:34, Odin Ugedal <odin-RObV4cXtwVA@public.gmane.org> wrote:
> >
> > Hi,
> >
> > tir. 25. mai 2021 kl. 11:58 skrev Vincent Guittot <vincent.guittot-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>:
> > > Could you give more details about how cfs_rq->avg.load_avg = 4 but
> > > cfs_rq->avg.load_sum = 0 ?
>
> > >
> > > cfs_rq->avg.load_sum is decayed and can become null when crossing
> > > period which implies an update of cfs_rq->avg.load_avg.  This means
> > > that your case is generated by something outside the pelt formula ...
> > > like maybe the propagation of load in the tree. If this is the case,
> > > we should find the error and fix it
> >
> > Ahh, yeah, that could probably be described better.
> >
> > It is (as far as I have found out) because the pelt divider is changed,
> > and the output from "get_pelt_divider(&cfs_rq->avg)" is changed, resulting
> > in a different value being removed than added.
>
> ok so IIUC, it happens during the adding/removing/propagating
> entities' load in the cfs_rq.
>
> >
> > Inside pelt itself, this cannot happen. When pelt changes the load_sum, it
> > recalculates the load_avg based on load_sum, and not the delta, afaik.
> >
> > And as you say, the "issue" therefore (as I see it) outside of PELT. Due to
> > how the pelt divider is changed, I assume it is hard to pinpoint where the issue
> > is. I can try to find a clear path where where we can see what is added
> > and what is removed from both cfs_rq->avg.load_sum and cfs_rq->avg.load_avg,
> > to better be able to pinpoint what is happening.
> >
> > Previously I thought this was a result of precision loss due to division and
> > multiplication during load add/remove inside fair.c, but I am not sure that
> > is the issue, or is it?
>
> I don't think the precision looss is the problem because
> adding/removing load in fair.c could truncate load_sum but it stays
> sync with load_avg. I will have a llo to see if i can see something
> weird

I have added a trace in cfs_rq_is_decayed() but I'm not able to
reproduce a situation where load_sum == 0 but not load_avg  even with
the script in the cover letter


>
> >
> > If my above line of thought makes sense, do you still view this as an error
> > outside PELT, or do you see another possible/better solution?
> >
> > Will investigate further.
>
> Thanks
>
> >
> > Thanks
> > Odin

  reply	other threads:[~2021-05-26 10:51 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-18 12:51 [PATCH 0/3] sched/fair: Fix load decay issues related to throttling Odin Ugedal
2021-05-18 12:51 ` Odin Ugedal
2021-05-18 12:52 ` [PATCH 1/3] sched/fair: Add tg_load_contrib cfs_rq decay checking Odin Ugedal
2021-05-18 12:52   ` Odin Ugedal
2021-05-25  9:58   ` Vincent Guittot
2021-05-25 10:33     ` Odin Ugedal
2021-05-25 10:33       ` Odin Ugedal
2021-05-25 14:30       ` Vincent Guittot
2021-05-25 14:30         ` Vincent Guittot
2021-05-26 10:50         ` Vincent Guittot [this message]
2021-05-26 10:50           ` Vincent Guittot
2021-05-27  7:50           ` Odin Ugedal
2021-05-27  7:50             ` Odin Ugedal
2021-05-27  9:35             ` Vincent Guittot
2021-05-27  9:45               ` Odin Ugedal
2021-05-27 10:49                 ` Vincent Guittot
2021-05-27 10:49                   ` Vincent Guittot
2021-05-27 11:04                   ` Odin Ugedal
2021-05-27 11:04                     ` Odin Ugedal
2021-05-27 12:37                     ` Vincent Guittot
2021-05-27 12:37                       ` Vincent Guittot
2021-05-27 12:37                   ` Odin Ugedal
2021-05-27 12:37                     ` Odin Ugedal
2021-05-27 12:39                     ` Odin Ugedal
2021-05-27 12:39                       ` Odin Ugedal
2021-05-27 12:49                     ` Vincent Guittot
2021-05-18 12:52 ` [PATCH 2/3] sched/fair: Correctly insert cfs_rq's to list on unthrottle Odin Ugedal
2021-05-18 12:52   ` Odin Ugedal
2021-05-28 14:24   ` Vincent Guittot
2021-05-28 15:06     ` Odin Ugedal
2021-05-28 15:27       ` Vincent Guittot
2021-05-28 15:27         ` Vincent Guittot
2021-05-29  9:33         ` Odin Ugedal
2021-05-31 12:14           ` Vincent Guittot
2021-05-31 12:14             ` Vincent Guittot
2021-05-18 12:52 ` [PATCH 3/3] sched/fair: Fix ascii art by relpacing tabs Odin Ugedal
2021-05-27 13:27   ` Vincent Guittot
2021-05-27 13:27     ` Vincent Guittot
2021-06-01 14:04   ` [tip: sched/core] " tip-bot2 for Odin Ugedal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKfTPtAFn3=anfTCxKTDXF0wpttpEiAhksLvcEPdSiYZTj38_A@mail.gmail.com' \
    --to=vincent.guittot@linaro.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=cgroups@vger.kernel.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=odin@uged.al \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.