From: Chengming Zhou <zhouchengming@bytedance.com> To: Johannes Weiner <hannes@cmpxchg.org> Cc: tj@kernel.org, corbet@lwn.net, surenb@google.com, mingo@redhat.com, peterz@infradead.org, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, songmuchun@bytedance.com Subject: Re: [PATCH v2 09/10] sched/psi: per-cgroup PSI stats disable/re-enable interface Date: Tue, 16 Aug 2022 21:06:21 +0800 [thread overview] Message-ID: <904851a7-7b01-8689-3ec1-2a61f8244841@bytedance.com> (raw) In-Reply-To: <YvprI6ZL8dVWGyBO@cmpxchg.org> On 2022/8/15 23:49, Johannes Weiner wrote: > On Mon, Aug 08, 2022 at 07:03:40PM +0800, Chengming Zhou wrote: >> +static ssize_t cgroup_psi_write(struct kernfs_open_file *of, >> + char *buf, size_t nbytes, loff_t off) >> +{ >> + ssize_t ret; >> + int enable; >> + struct cgroup *cgrp; >> + struct psi_group *psi; >> + >> + ret = kstrtoint(strstrip(buf), 0, &enable); >> + if (ret) >> + return ret; >> + >> + if (enable < 0 || enable > 1) >> + return -ERANGE; >> + >> + cgrp = cgroup_kn_lock_live(of->kn, false); >> + if (!cgrp) >> + return -ENOENT; >> + >> + psi = cgroup_ino(cgrp) == 1 ? &psi_system : &cgrp->psi; >> + psi_cgroup_enable(psi, enable); > > I think it should also add/remove the pressure files when enabling and > disabling the aggregation, since their contents would be stale and > misleading. > > Take a look at cgroup_add_dfl_cftypes() and cgroup_rm_cftypes() Ok, I will look. > >> @@ -5115,6 +5152,12 @@ static struct cftype cgroup_base_files[] = { >> .release = cgroup_pressure_release, >> }, >> #endif >> + { >> + .name = "cgroup.psi", >> + .flags = CFTYPE_PRESSURE, >> + .seq_show = cgroup_psi_show, >> + .write = cgroup_psi_write, >> + }, >> #endif /* CONFIG_PSI */ >> { } /* terminate */ >> }; >> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c >> index 58f8092c938f..9df1686ee02d 100644 >> --- a/kernel/sched/psi.c >> +++ b/kernel/sched/psi.c >> @@ -181,6 +181,7 @@ static void group_init(struct psi_group *group) >> { >> int cpu; >> >> + group->enabled = true; >> for_each_possible_cpu(cpu) >> seqcount_init(&per_cpu_ptr(group->pcpu, cpu)->seq); >> group->avg_last_update = sched_clock(); >> @@ -700,17 +701,16 @@ static void psi_group_change(struct psi_group *group, int cpu, >> groupc = per_cpu_ptr(group->pcpu, cpu); >> >> /* >> - * First we assess the aggregate resource states this CPU's >> - * tasks have been in since the last change, and account any >> - * SOME and FULL time these may have resulted in. >> - * >> - * Then we update the task counts according to the state >> + * First we update the task counts according to the state >> * change requested through the @clear and @set bits. >> + * >> + * Then if the cgroup PSI stats accounting enabled, we >> + * assess the aggregate resource states this CPU's tasks >> + * have been in since the last change, and account any >> + * SOME and FULL time these may have resulted in. >> */ >> write_seqcount_begin(&groupc->seq); >> >> - record_times(groupc, now); >> - >> /* >> * Start with TSK_ONCPU, which doesn't have a corresponding >> * task count - it's just a boolean flag directly encoded in >> @@ -750,6 +750,14 @@ static void psi_group_change(struct psi_group *group, int cpu, >> if (set & (1 << t)) >> groupc->tasks[t]++; >> >> + if (!group->enabled) { >> + if (groupc->state_mask & (1 << PSI_NONIDLE)) >> + record_times(groupc, now); > > Why record the nonidle time? It's only used for aggregation, which is > stopped as well. I'm considering of this situation: disable at t2 and re-enable at t3 state1(t1) --> state2(t2) --> state3(t3) If aggregator has get_recent_times() in [t1, t2], groupc->times_prev[aggregator] will include that delta of (t - t1). Then re-enable at t3, the delta of (t3-t1) is discarded, may make that aggregator see times < groupc->times_prev[aggregator] ? Maybe I missed something, not sure whether this is a problem. > >> @@ -1088,6 +1097,23 @@ void cgroup_move_task(struct task_struct *task, struct css_set *to) >> >> task_rq_unlock(rq, task, &rf); >> } >> + >> +void psi_cgroup_enable(struct psi_group *group, bool enable) >> +{ >> + struct psi_group_cpu *groupc; >> + int cpu; >> + u64 now; >> + >> + if (group->enabled == enable) >> + return; >> + group->enabled = enable; >> + >> + for_each_possible_cpu(cpu) { >> + groupc = per_cpu_ptr(group->pcpu, cpu); >> + now = cpu_clock(cpu); >> + psi_group_change(group, cpu, 0, 0, now, true); > > This loop deserves a comment, IMO. I add some comments as below, could you help take a look? + +void psi_cgroup_enable(struct psi_group *group, bool enable) +{ + int cpu; + u64 now; + + if (group->enabled == enable) + return; + group->enabled = enable; + + /* + * We use psi_group_change() to disable or re-enable the + * record_times(), test_state() loop and averaging worker + * in each psi_group_cpu of the psi_group, use .clear = 0 + * and .set = 0 here since no task status really changed. + */ + for_each_possible_cpu(cpu) { + now = cpu_clock(cpu); + psi_group_change(group, cpu, 0, 0, now, true); + } +} Thanks!
WARNING: multiple messages have this Message-ID (diff)
From: Chengming Zhou <zhouchengming-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org> To: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> Cc: tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, corbet-T1hC0tSOHrs@public.gmane.org, surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org, vincent.guittot-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org, dietmar.eggemann-5wv7dgnIgG8@public.gmane.org, rostedt-nx8X9YLhiw1AfugRpC6u6w@public.gmane.org, bsegall-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-doc-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, songmuchun-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org Subject: Re: [PATCH v2 09/10] sched/psi: per-cgroup PSI stats disable/re-enable interface Date: Tue, 16 Aug 2022 21:06:21 +0800 [thread overview] Message-ID: <904851a7-7b01-8689-3ec1-2a61f8244841@bytedance.com> (raw) In-Reply-To: <YvprI6ZL8dVWGyBO-druUgvl0LCNAfugRpC6u6w@public.gmane.org> On 2022/8/15 23:49, Johannes Weiner wrote: > On Mon, Aug 08, 2022 at 07:03:40PM +0800, Chengming Zhou wrote: >> +static ssize_t cgroup_psi_write(struct kernfs_open_file *of, >> + char *buf, size_t nbytes, loff_t off) >> +{ >> + ssize_t ret; >> + int enable; >> + struct cgroup *cgrp; >> + struct psi_group *psi; >> + >> + ret = kstrtoint(strstrip(buf), 0, &enable); >> + if (ret) >> + return ret; >> + >> + if (enable < 0 || enable > 1) >> + return -ERANGE; >> + >> + cgrp = cgroup_kn_lock_live(of->kn, false); >> + if (!cgrp) >> + return -ENOENT; >> + >> + psi = cgroup_ino(cgrp) == 1 ? &psi_system : &cgrp->psi; >> + psi_cgroup_enable(psi, enable); > > I think it should also add/remove the pressure files when enabling and > disabling the aggregation, since their contents would be stale and > misleading. > > Take a look at cgroup_add_dfl_cftypes() and cgroup_rm_cftypes() Ok, I will look. > >> @@ -5115,6 +5152,12 @@ static struct cftype cgroup_base_files[] = { >> .release = cgroup_pressure_release, >> }, >> #endif >> + { >> + .name = "cgroup.psi", >> + .flags = CFTYPE_PRESSURE, >> + .seq_show = cgroup_psi_show, >> + .write = cgroup_psi_write, >> + }, >> #endif /* CONFIG_PSI */ >> { } /* terminate */ >> }; >> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c >> index 58f8092c938f..9df1686ee02d 100644 >> --- a/kernel/sched/psi.c >> +++ b/kernel/sched/psi.c >> @@ -181,6 +181,7 @@ static void group_init(struct psi_group *group) >> { >> int cpu; >> >> + group->enabled = true; >> for_each_possible_cpu(cpu) >> seqcount_init(&per_cpu_ptr(group->pcpu, cpu)->seq); >> group->avg_last_update = sched_clock(); >> @@ -700,17 +701,16 @@ static void psi_group_change(struct psi_group *group, int cpu, >> groupc = per_cpu_ptr(group->pcpu, cpu); >> >> /* >> - * First we assess the aggregate resource states this CPU's >> - * tasks have been in since the last change, and account any >> - * SOME and FULL time these may have resulted in. >> - * >> - * Then we update the task counts according to the state >> + * First we update the task counts according to the state >> * change requested through the @clear and @set bits. >> + * >> + * Then if the cgroup PSI stats accounting enabled, we >> + * assess the aggregate resource states this CPU's tasks >> + * have been in since the last change, and account any >> + * SOME and FULL time these may have resulted in. >> */ >> write_seqcount_begin(&groupc->seq); >> >> - record_times(groupc, now); >> - >> /* >> * Start with TSK_ONCPU, which doesn't have a corresponding >> * task count - it's just a boolean flag directly encoded in >> @@ -750,6 +750,14 @@ static void psi_group_change(struct psi_group *group, int cpu, >> if (set & (1 << t)) >> groupc->tasks[t]++; >> >> + if (!group->enabled) { >> + if (groupc->state_mask & (1 << PSI_NONIDLE)) >> + record_times(groupc, now); > > Why record the nonidle time? It's only used for aggregation, which is > stopped as well. I'm considering of this situation: disable at t2 and re-enable at t3 state1(t1) --> state2(t2) --> state3(t3) If aggregator has get_recent_times() in [t1, t2], groupc->times_prev[aggregator] will include that delta of (t - t1). Then re-enable at t3, the delta of (t3-t1) is discarded, may make that aggregator see times < groupc->times_prev[aggregator] ? Maybe I missed something, not sure whether this is a problem. > >> @@ -1088,6 +1097,23 @@ void cgroup_move_task(struct task_struct *task, struct css_set *to) >> >> task_rq_unlock(rq, task, &rf); >> } >> + >> +void psi_cgroup_enable(struct psi_group *group, bool enable) >> +{ >> + struct psi_group_cpu *groupc; >> + int cpu; >> + u64 now; >> + >> + if (group->enabled == enable) >> + return; >> + group->enabled = enable; >> + >> + for_each_possible_cpu(cpu) { >> + groupc = per_cpu_ptr(group->pcpu, cpu); >> + now = cpu_clock(cpu); >> + psi_group_change(group, cpu, 0, 0, now, true); > > This loop deserves a comment, IMO. I add some comments as below, could you help take a look? + +void psi_cgroup_enable(struct psi_group *group, bool enable) +{ + int cpu; + u64 now; + + if (group->enabled == enable) + return; + group->enabled = enable; + + /* + * We use psi_group_change() to disable or re-enable the + * record_times(), test_state() loop and averaging worker + * in each psi_group_cpu of the psi_group, use .clear = 0 + * and .set = 0 here since no task status really changed. + */ + for_each_possible_cpu(cpu) { + now = cpu_clock(cpu); + psi_group_change(group, cpu, 0, 0, now, true); + } +} Thanks!
next prev parent reply other threads:[~2022-08-16 13:06 UTC|newest] Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-08-08 11:03 [PATCH v2 00/10] sched/psi: some optimization and extension Chengming Zhou 2022-08-08 11:03 ` Chengming Zhou 2022-08-08 11:03 ` [PATCH v2 01/10] sched/psi: fix periodic aggregation shut off Chengming Zhou 2022-08-08 11:03 ` [PATCH v2 02/10] sched/psi: optimize task switch inside shared cgroups again Chengming Zhou 2022-08-08 11:03 ` Chengming Zhou 2022-08-08 11:03 ` [PATCH v2 03/10] sched/psi: move private helpers to sched/stats.h Chengming Zhou 2022-08-08 11:03 ` [PATCH v2 04/10] sched/psi: don't change task psi_flags when migrate CPU/group Chengming Zhou 2022-08-08 11:03 ` [PATCH v2 05/10] sched/psi: don't create cgroup PSI files when psi_disabled Chengming Zhou 2022-08-08 11:03 ` [PATCH v2 06/10] sched/psi: save percpu memory when !psi_cgroups_enabled Chengming Zhou 2022-08-08 11:03 ` [PATCH v2 07/10] sched/psi: remove NR_ONCPU task accounting Chengming Zhou 2022-08-16 10:40 ` Chengming Zhou 2022-08-08 11:03 ` [PATCH v2 08/10] sched/psi: add PSI_IRQ to track IRQ/SOFTIRQ pressure Chengming Zhou 2022-08-08 11:03 ` [PATCH v2 09/10] sched/psi: per-cgroup PSI stats disable/re-enable interface Chengming Zhou 2022-08-09 17:48 ` Tejun Heo 2022-08-09 17:48 ` Tejun Heo 2022-08-10 0:39 ` Chengming Zhou 2022-08-10 0:39 ` Chengming Zhou 2022-08-10 1:30 ` Chengming Zhou 2022-08-10 1:30 ` Chengming Zhou 2022-08-10 15:25 ` Johannes Weiner 2022-08-10 17:27 ` Tejun Heo 2022-08-11 2:09 ` Chengming Zhou 2022-08-15 13:23 ` Michal Koutný 2022-08-15 13:23 ` Michal Koutný 2022-08-23 6:18 ` Chengming Zhou 2022-08-23 6:18 ` Chengming Zhou 2022-08-23 15:35 ` Johannes Weiner 2022-08-23 15:43 ` Chengming Zhou 2022-08-23 15:43 ` Chengming Zhou 2022-08-23 16:20 ` Tejun Heo 2022-08-23 16:20 ` Tejun Heo 2022-08-12 10:14 ` Michal Koutný 2022-08-12 10:14 ` Michal Koutný 2022-08-12 12:36 ` Chengming Zhou 2022-08-12 12:36 ` Chengming Zhou 2022-08-15 13:23 ` Michal Koutný 2022-08-15 15:49 ` Johannes Weiner 2022-08-15 19:50 ` Tejun Heo 2022-08-15 19:50 ` Tejun Heo 2022-08-16 13:06 ` Chengming Zhou [this message] 2022-08-16 13:06 ` Chengming Zhou 2022-08-08 11:03 ` [PATCH v2 10/10] sched/psi: cache parent psi_group to speed up groups iterate Chengming Zhou 2022-08-15 13:25 ` [PATCH v2 00/10] sched/psi: some optimization and extension Michal Koutný 2022-08-15 13:25 ` Michal Koutný 2022-08-16 14:01 ` Chengming Zhou 2022-08-16 14:01 ` Chengming Zhou 2022-08-17 15:19 ` Chengming Zhou 2022-08-17 15:19 ` Chengming Zhou
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=904851a7-7b01-8689-3ec1-2a61f8244841@bytedance.com \ --to=zhouchengming@bytedance.com \ --cc=bsegall@google.com \ --cc=cgroups@vger.kernel.org \ --cc=corbet@lwn.net \ --cc=dietmar.eggemann@arm.com \ --cc=hannes@cmpxchg.org \ --cc=linux-doc@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mingo@redhat.com \ --cc=peterz@infradead.org \ --cc=rostedt@goodmis.org \ --cc=songmuchun@bytedance.com \ --cc=surenb@google.com \ --cc=tj@kernel.org \ --cc=vincent.guittot@linaro.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.