From: Vincent Guittot <vincent.guittot@linaro.org> To: Odin Ugedal <odin@uged.al> Cc: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>, Daniel Bristot de Oliveira <bristot@redhat.com>, "open list:CONTROL GROUP (CGROUP)" <cgroups@vger.kernel.org>, linux-kernel <linux-kernel@vger.kernel.org> Subject: Re: [PATCH 1/3] sched/fair: Add tg_load_contrib cfs_rq decay checking Date: Wed, 26 May 2021 12:50:35 +0200 [thread overview] Message-ID: <CAKfTPtAFn3=anfTCxKTDXF0wpttpEiAhksLvcEPdSiYZTj38_A@mail.gmail.com> (raw) In-Reply-To: <CAKfTPtCaZOSEzRXVN9fTR2vTxGiANEARo6iDNMFiQV5=qAA4Tw@mail.gmail.com> On Tue, 25 May 2021 at 16:30, Vincent Guittot <vincent.guittot@linaro.org> wrote: > > On Tue, 25 May 2021 at 12:34, Odin Ugedal <odin@uged.al> wrote: > > > > Hi, > > > > tir. 25. mai 2021 kl. 11:58 skrev Vincent Guittot <vincent.guittot@linaro.org>: > > > Could you give more details about how cfs_rq->avg.load_avg = 4 but > > > cfs_rq->avg.load_sum = 0 ? > > > > > > > cfs_rq->avg.load_sum is decayed and can become null when crossing > > > period which implies an update of cfs_rq->avg.load_avg. This means > > > that your case is generated by something outside the pelt formula ... > > > like maybe the propagation of load in the tree. If this is the case, > > > we should find the error and fix it > > > > Ahh, yeah, that could probably be described better. > > > > It is (as far as I have found out) because the pelt divider is changed, > > and the output from "get_pelt_divider(&cfs_rq->avg)" is changed, resulting > > in a different value being removed than added. > > ok so IIUC, it happens during the adding/removing/propagating > entities' load in the cfs_rq. > > > > > Inside pelt itself, this cannot happen. When pelt changes the load_sum, it > > recalculates the load_avg based on load_sum, and not the delta, afaik. > > > > And as you say, the "issue" therefore (as I see it) outside of PELT. Due to > > how the pelt divider is changed, I assume it is hard to pinpoint where the issue > > is. I can try to find a clear path where where we can see what is added > > and what is removed from both cfs_rq->avg.load_sum and cfs_rq->avg.load_avg, > > to better be able to pinpoint what is happening. > > > > Previously I thought this was a result of precision loss due to division and > > multiplication during load add/remove inside fair.c, but I am not sure that > > is the issue, or is it? > > I don't think the precision looss is the problem because > adding/removing load in fair.c could truncate load_sum but it stays > sync with load_avg. I will have a llo to see if i can see something > weird I have added a trace in cfs_rq_is_decayed() but I'm not able to reproduce a situation where load_sum == 0 but not load_avg even with the script in the cover letter > > > > > If my above line of thought makes sense, do you still view this as an error > > outside PELT, or do you see another possible/better solution? > > > > Will investigate further. > > Thanks > > > > > Thanks > > Odin
WARNING: multiple messages have this Message-ID (diff)
From: Vincent Guittot <vincent.guittot-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> To: Odin Ugedal <odin-RObV4cXtwVA@public.gmane.org> Cc: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>, Juri Lelli <juri.lelli-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, Dietmar Eggemann <dietmar.eggemann-5wv7dgnIgG8@public.gmane.org>, Steven Rostedt <rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org>, Ben Segall <bsegall-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>, Mel Gorman <mgorman-l3A5Bk7waGM@public.gmane.org>, Daniel Bristot de Oliveira <bristot-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, "open list:CONTROL GROUP (CGROUP)" <cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, linux-kernel <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> Subject: Re: [PATCH 1/3] sched/fair: Add tg_load_contrib cfs_rq decay checking Date: Wed, 26 May 2021 12:50:35 +0200 [thread overview] Message-ID: <CAKfTPtAFn3=anfTCxKTDXF0wpttpEiAhksLvcEPdSiYZTj38_A@mail.gmail.com> (raw) In-Reply-To: <CAKfTPtCaZOSEzRXVN9fTR2vTxGiANEARo6iDNMFiQV5=qAA4Tw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> On Tue, 25 May 2021 at 16:30, Vincent Guittot <vincent.guittot-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote: > > On Tue, 25 May 2021 at 12:34, Odin Ugedal <odin-RObV4cXtwVA@public.gmane.org> wrote: > > > > Hi, > > > > tir. 25. mai 2021 kl. 11:58 skrev Vincent Guittot <vincent.guittot-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>: > > > Could you give more details about how cfs_rq->avg.load_avg = 4 but > > > cfs_rq->avg.load_sum = 0 ? > > > > > > > cfs_rq->avg.load_sum is decayed and can become null when crossing > > > period which implies an update of cfs_rq->avg.load_avg. This means > > > that your case is generated by something outside the pelt formula ... > > > like maybe the propagation of load in the tree. If this is the case, > > > we should find the error and fix it > > > > Ahh, yeah, that could probably be described better. > > > > It is (as far as I have found out) because the pelt divider is changed, > > and the output from "get_pelt_divider(&cfs_rq->avg)" is changed, resulting > > in a different value being removed than added. > > ok so IIUC, it happens during the adding/removing/propagating > entities' load in the cfs_rq. > > > > > Inside pelt itself, this cannot happen. When pelt changes the load_sum, it > > recalculates the load_avg based on load_sum, and not the delta, afaik. > > > > And as you say, the "issue" therefore (as I see it) outside of PELT. Due to > > how the pelt divider is changed, I assume it is hard to pinpoint where the issue > > is. I can try to find a clear path where where we can see what is added > > and what is removed from both cfs_rq->avg.load_sum and cfs_rq->avg.load_avg, > > to better be able to pinpoint what is happening. > > > > Previously I thought this was a result of precision loss due to division and > > multiplication during load add/remove inside fair.c, but I am not sure that > > is the issue, or is it? > > I don't think the precision looss is the problem because > adding/removing load in fair.c could truncate load_sum but it stays > sync with load_avg. I will have a llo to see if i can see something > weird I have added a trace in cfs_rq_is_decayed() but I'm not able to reproduce a situation where load_sum == 0 but not load_avg even with the script in the cover letter > > > > > If my above line of thought makes sense, do you still view this as an error > > outside PELT, or do you see another possible/better solution? > > > > Will investigate further. > > Thanks > > > > > Thanks > > Odin
next prev parent reply other threads:[~2021-05-26 10:51 UTC|newest] Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-05-18 12:51 [PATCH 0/3] sched/fair: Fix load decay issues related to throttling Odin Ugedal 2021-05-18 12:51 ` Odin Ugedal 2021-05-18 12:52 ` [PATCH 1/3] sched/fair: Add tg_load_contrib cfs_rq decay checking Odin Ugedal 2021-05-18 12:52 ` Odin Ugedal 2021-05-25 9:58 ` Vincent Guittot 2021-05-25 10:33 ` Odin Ugedal 2021-05-25 10:33 ` Odin Ugedal 2021-05-25 14:30 ` Vincent Guittot 2021-05-25 14:30 ` Vincent Guittot 2021-05-26 10:50 ` Vincent Guittot [this message] 2021-05-26 10:50 ` Vincent Guittot 2021-05-27 7:50 ` Odin Ugedal 2021-05-27 7:50 ` Odin Ugedal 2021-05-27 9:35 ` Vincent Guittot 2021-05-27 9:45 ` Odin Ugedal 2021-05-27 10:49 ` Vincent Guittot 2021-05-27 10:49 ` Vincent Guittot 2021-05-27 11:04 ` Odin Ugedal 2021-05-27 11:04 ` Odin Ugedal 2021-05-27 12:37 ` Vincent Guittot 2021-05-27 12:37 ` Vincent Guittot 2021-05-27 12:37 ` Odin Ugedal 2021-05-27 12:37 ` Odin Ugedal 2021-05-27 12:39 ` Odin Ugedal 2021-05-27 12:39 ` Odin Ugedal 2021-05-27 12:49 ` Vincent Guittot 2021-05-18 12:52 ` [PATCH 2/3] sched/fair: Correctly insert cfs_rq's to list on unthrottle Odin Ugedal 2021-05-18 12:52 ` Odin Ugedal 2021-05-28 14:24 ` Vincent Guittot 2021-05-28 15:06 ` Odin Ugedal 2021-05-28 15:27 ` Vincent Guittot 2021-05-28 15:27 ` Vincent Guittot 2021-05-29 9:33 ` Odin Ugedal 2021-05-31 12:14 ` Vincent Guittot 2021-05-31 12:14 ` Vincent Guittot 2021-05-18 12:52 ` [PATCH 3/3] sched/fair: Fix ascii art by relpacing tabs Odin Ugedal 2021-05-27 13:27 ` Vincent Guittot 2021-05-27 13:27 ` Vincent Guittot 2021-06-01 14:04 ` [tip: sched/core] " tip-bot2 for Odin Ugedal
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAKfTPtAFn3=anfTCxKTDXF0wpttpEiAhksLvcEPdSiYZTj38_A@mail.gmail.com' \ --to=vincent.guittot@linaro.org \ --cc=bristot@redhat.com \ --cc=bsegall@google.com \ --cc=cgroups@vger.kernel.org \ --cc=dietmar.eggemann@arm.com \ --cc=juri.lelli@redhat.com \ --cc=linux-kernel@vger.kernel.org \ --cc=mgorman@suse.de \ --cc=mingo@redhat.com \ --cc=odin@uged.al \ --cc=peterz@infradead.org \ --cc=rostedt@goodmis.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.