* Re: [PATCH v8 -tip 17/26] sched: Split the cookie and setup per-task cookie on fork [not found] <4ff7f030-b797-6711-fb6a-fe39bb02075a@oracle.com> @ 2020-11-09 23:23 ` chris hyser 2020-11-10 14:15 ` Joel Fernandes 2020-11-17 0:45 ` Joel Fernandes 0 siblings, 2 replies; 7+ messages in thread From: chris hyser @ 2020-11-09 23:23 UTC (permalink / raw) To: Joel Fernandes (Google), linux-kernel > On 10/19/20 9:43 PM, Joel Fernandes (Google) wrote: >> In order to prevent interference and clearly support both per-task and CGroup >> APIs, split the cookie into 2 and allow it to be set from either per-task, or >> CGroup API. The final cookie is the combined value of both and is computed when >> the stop-machine executes during a change of cookie. >> >> Also, for the per-task cookie, it will get weird if we use pointers of any >> emphemeral objects. For this reason, introduce a refcounted object who's sole >> purpose is to assign unique cookie value by way of the object's pointer. >> >> While at it, refactor the CGroup code a bit. Future patches will introduce more >> APIs and support. >> >> Tested-by: Julien Desfossez <jdesfossez@digitalocean.com> >> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> >> --- >> include/linux/sched.h | 2 + >> kernel/sched/core.c | 241 ++++++++++++++++++++++++++++++++++++++++-- >> kernel/sched/debug.c | 4 + >> 3 files changed, 236 insertions(+), 11 deletions(-) >> >> diff --git a/include/linux/sched.h b/include/linux/sched.h >> index fe6f225bfbf9..c6034c00846a 100644 >> --- a/include/linux/sched.h >> +++ b/include/linux/sched.h >> @@ -688,6 +688,8 @@ struct task_struct { >> #ifdef CONFIG_SCHED_CORE >> struct rb_node core_node; >> unsigned long core_cookie; >> + unsigned long core_task_cookie; >> + unsigned long core_group_cookie; >> unsigned int core_occupation; >> #endif >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index bab4ea2f5cd8..30a9e4cb5ce1 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -346,11 +346,14 @@ void sched_core_put(void) >> mutex_unlock(&sched_core_mutex); >> } >> +static int sched_core_share_tasks(struct task_struct *t1, struct task_struct *t2); >> + >> #else /* !CONFIG_SCHED_CORE */ >> static inline void sched_core_enqueue(struct rq *rq, struct task_struct *p) { } >> static inline void sched_core_dequeue(struct rq *rq, struct task_struct *p) { } >> static bool sched_core_enqueued(struct task_struct *task) { return false; } >> +static int sched_core_share_tasks(struct task_struct *t1, struct task_struct *t2) { } >> #endif /* CONFIG_SCHED_CORE */ >> @@ -3583,6 +3586,20 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) >> #endif >> #ifdef CONFIG_SCHED_CORE >> RB_CLEAR_NODE(&p->core_node); >> + >> + /* >> + * Tag child via per-task cookie only if parent is tagged via per-task >> + * cookie. This is independent of, but can be additive to the CGroup tagging. >> + */ >> + if (current->core_task_cookie) { >> + >> + /* If it is not CLONE_THREAD fork, assign a unique per-task tag. */ >> + if (!(clone_flags & CLONE_THREAD)) { >> + return sched_core_share_tasks(p, p); >> + } >> + /* Otherwise share the parent's per-task tag. */ >> + return sched_core_share_tasks(p, current); >> + } >> #endif >> return 0; >> } sched_core_share_tasks() looks at the value of the new tasks core_task_cookie which is non-zero on a process or thread clone and causes underflow for both the enable flag itself and for cookie ref counts. So just zero it in __sched_fork(). -chris PATCH] sched: zero out the core scheduling cookie on clone From: chris hyser <chris.hyser@oracle.com> As the cookie is reference counted, even if inherited, zero this and allow explicit sharing. Signed-off-by: Chris Hyser <chris.hyser@oracle.com> --- kernel/sched/core.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index fd3cc03..2af0ea6 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3378,6 +3378,9 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p) p->capture_control = NULL; #endif init_numa_balancing(clone_flags, p); +#ifdef CONFIG_SCHED_CORE + p->core_task_cookie = 0; +#endif #ifdef CONFIG_SMP p->wake_entry.u_flags = CSD_TYPE_TTWU; #endif -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v8 -tip 17/26] sched: Split the cookie and setup per-task cookie on fork 2020-11-09 23:23 ` [PATCH v8 -tip 17/26] sched: Split the cookie and setup per-task cookie on fork chris hyser @ 2020-11-10 14:15 ` Joel Fernandes 2020-11-17 0:45 ` Joel Fernandes 1 sibling, 0 replies; 7+ messages in thread From: Joel Fernandes @ 2020-11-10 14:15 UTC (permalink / raw) To: chris hyser; +Cc: linux-kernel On Mon, Nov 09, 2020 at 06:23:13PM -0500, chris hyser wrote: [..] > > > > > diff --git a/include/linux/sched.h b/include/linux/sched.h > > > index fe6f225bfbf9..c6034c00846a 100644 > > > --- a/include/linux/sched.h > > > +++ b/include/linux/sched.h > > > @@ -688,6 +688,8 @@ struct task_struct { > > > #ifdef CONFIG_SCHED_CORE > > > struct rb_node core_node; > > > unsigned long core_cookie; > > > + unsigned long core_task_cookie; > > > + unsigned long core_group_cookie; > > > unsigned int core_occupation; > > > #endif > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > > index bab4ea2f5cd8..30a9e4cb5ce1 100644 > > > --- a/kernel/sched/core.c > > > +++ b/kernel/sched/core.c > > > @@ -346,11 +346,14 @@ void sched_core_put(void) > > > mutex_unlock(&sched_core_mutex); > > > } > > > +static int sched_core_share_tasks(struct task_struct *t1, struct task_struct *t2); > > > + > > > #else /* !CONFIG_SCHED_CORE */ > > > static inline void sched_core_enqueue(struct rq *rq, struct task_struct *p) { } > > > static inline void sched_core_dequeue(struct rq *rq, struct task_struct *p) { } > > > static bool sched_core_enqueued(struct task_struct *task) { return false; } > > > +static int sched_core_share_tasks(struct task_struct *t1, struct task_struct *t2) { } > > > #endif /* CONFIG_SCHED_CORE */ > > > @@ -3583,6 +3586,20 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) > > > #endif > > > #ifdef CONFIG_SCHED_CORE > > > RB_CLEAR_NODE(&p->core_node); > > > + > > > + /* > > > + * Tag child via per-task cookie only if parent is tagged via per-task > > > + * cookie. This is independent of, but can be additive to the CGroup tagging. > > > + */ > > > + if (current->core_task_cookie) { > > > + > > > + /* If it is not CLONE_THREAD fork, assign a unique per-task tag. */ > > > + if (!(clone_flags & CLONE_THREAD)) { > > > + return sched_core_share_tasks(p, p); > > > + } > > > + /* Otherwise share the parent's per-task tag. */ > > > + return sched_core_share_tasks(p, current); > > > + } > > > #endif > > > return 0; > > > } > > sched_core_share_tasks() looks at the value of the new tasks core_task_cookie which is non-zero on a > process or thread clone and causes underflow for both the enable flag itself and for cookie ref counts. > > So just zero it in __sched_fork(). Nice catch, applying to my v9 staging tree. Thanks! thanks, - Joel > > -chris > > > PATCH] sched: zero out the core scheduling cookie on clone > > From: chris hyser <chris.hyser@oracle.com> > > As the cookie is reference counted, even if inherited, zero this and allow > explicit sharing. > > Signed-off-by: Chris Hyser <chris.hyser@oracle.com> > --- > kernel/sched/core.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index fd3cc03..2af0ea6 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3378,6 +3378,9 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p) > p->capture_control = NULL; > #endif > init_numa_balancing(clone_flags, p); > +#ifdef CONFIG_SCHED_CORE > + p->core_task_cookie = 0; > +#endif > #ifdef CONFIG_SMP > p->wake_entry.u_flags = CSD_TYPE_TTWU; > #endif > -- > 1.8.3.1 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v8 -tip 17/26] sched: Split the cookie and setup per-task cookie on fork 2020-11-09 23:23 ` [PATCH v8 -tip 17/26] sched: Split the cookie and setup per-task cookie on fork chris hyser 2020-11-10 14:15 ` Joel Fernandes @ 2020-11-17 0:45 ` Joel Fernandes 1 sibling, 0 replies; 7+ messages in thread From: Joel Fernandes @ 2020-11-17 0:45 UTC (permalink / raw) To: chris hyser; +Cc: linux-kernel Hi Chris, On Mon, Nov 09, 2020 at 06:23:13PM -0500, chris hyser wrote: > > On 10/19/20 9:43 PM, Joel Fernandes (Google) wrote: > > > In order to prevent interference and clearly support both per-task and CGroup > > > APIs, split the cookie into 2 and allow it to be set from either per-task, or > > > CGroup API. The final cookie is the combined value of both and is computed when > > > the stop-machine executes during a change of cookie. > > > > > > Also, for the per-task cookie, it will get weird if we use pointers of any > > > emphemeral objects. For this reason, introduce a refcounted object who's sole > > > purpose is to assign unique cookie value by way of the object's pointer. > > > > > > While at it, refactor the CGroup code a bit. Future patches will introduce more > > > APIs and support. > > > > > > Tested-by: Julien Desfossez <jdesfossez@digitalocean.com> > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > --- > > > include/linux/sched.h | 2 + > > > kernel/sched/core.c | 241 ++++++++++++++++++++++++++++++++++++++++-- > > > kernel/sched/debug.c | 4 + > > > 3 files changed, 236 insertions(+), 11 deletions(-) > > > > > > diff --git a/include/linux/sched.h b/include/linux/sched.h > > > index fe6f225bfbf9..c6034c00846a 100644 > > > --- a/include/linux/sched.h > > > +++ b/include/linux/sched.h > > > @@ -688,6 +688,8 @@ struct task_struct { > > > #ifdef CONFIG_SCHED_CORE > > > struct rb_node core_node; > > > unsigned long core_cookie; > > > + unsigned long core_task_cookie; > > > + unsigned long core_group_cookie; > > > unsigned int core_occupation; > > > #endif > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > > index bab4ea2f5cd8..30a9e4cb5ce1 100644 > > > --- a/kernel/sched/core.c > > > +++ b/kernel/sched/core.c > > > @@ -346,11 +346,14 @@ void sched_core_put(void) > > > mutex_unlock(&sched_core_mutex); > > > } > > > +static int sched_core_share_tasks(struct task_struct *t1, struct task_struct *t2); > > > + > > > #else /* !CONFIG_SCHED_CORE */ > > > static inline void sched_core_enqueue(struct rq *rq, struct task_struct *p) { } > > > static inline void sched_core_dequeue(struct rq *rq, struct task_struct *p) { } > > > static bool sched_core_enqueued(struct task_struct *task) { return false; } > > > +static int sched_core_share_tasks(struct task_struct *t1, struct task_struct *t2) { } > > > #endif /* CONFIG_SCHED_CORE */ > > > @@ -3583,6 +3586,20 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) > > > #endif > > > #ifdef CONFIG_SCHED_CORE > > > RB_CLEAR_NODE(&p->core_node); > > > + > > > + /* > > > + * Tag child via per-task cookie only if parent is tagged via per-task > > > + * cookie. This is independent of, but can be additive to the CGroup tagging. > > > + */ > > > + if (current->core_task_cookie) { > > > + > > > + /* If it is not CLONE_THREAD fork, assign a unique per-task tag. */ > > > + if (!(clone_flags & CLONE_THREAD)) { > > > + return sched_core_share_tasks(p, p); > > > + } > > > + /* Otherwise share the parent's per-task tag. */ > > > + return sched_core_share_tasks(p, current); > > > + } > > > #endif > > > return 0; > > > } > > sched_core_share_tasks() looks at the value of the new tasks core_task_cookie which is non-zero on a > process or thread clone and causes underflow for both the enable flag itself and for cookie ref counts. > > So just zero it in __sched_fork(). Since I am going to split this into a new patch, could you please send me a detailed commit message explaining the issue? I feel the below one liner is unclear / insufficient as a commit message. thanks, - Joel > -chris > > > PATCH] sched: zero out the core scheduling cookie on clone > > From: chris hyser <chris.hyser@oracle.com> > > As the cookie is reference counted, even if inherited, zero this and allow > explicit sharing. > > Signed-off-by: Chris Hyser <chris.hyser@oracle.com> > --- > kernel/sched/core.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index fd3cc03..2af0ea6 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3378,6 +3378,9 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p) > p->capture_control = NULL; > #endif > init_numa_balancing(clone_flags, p); > +#ifdef CONFIG_SCHED_CORE > + p->core_task_cookie = 0; > +#endif > #ifdef CONFIG_SMP > p->wake_entry.u_flags = CSD_TYPE_TTWU; > #endif > -- > 1.8.3.1 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v8 -tip 00/26] Core scheduling @ 2020-10-20 1:43 Joel Fernandes (Google) 2020-10-20 1:43 ` [PATCH v8 -tip 17/26] sched: Split the cookie and setup per-task cookie on fork Joel Fernandes (Google) 0 siblings, 1 reply; 7+ messages in thread From: Joel Fernandes (Google) @ 2020-10-20 1:43 UTC (permalink / raw) To: Nishanth Aravamudan, Julien Desfossez, Peter Zijlstra, Tim Chen, Vineeth Pillai, Aaron Lu, Aubrey Li, tglx, linux-kernel Cc: mingo, torvalds, fweisbec, keescook, kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner, Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, pjt, rostedt, derkling, benbjiang, Alexandre Chartre, James.Bottomley, OWeisse, Dhaval Giani, Junaid Shahid, jsbarnes, chris.hyser, Aubrey Li, Paul E. McKenney, Tim Chen Eighth iteration of the Core-Scheduling feature. Core scheduling is a feature that allows only trusted tasks to run concurrently on cpus sharing compute resources (eg: hyperthreads on a core). The goal is to mitigate the core-level side-channel attacks without requiring to disable SMT (which has a significant impact on performance in some situations). Core scheduling (as of v7) mitigates user-space to user-space attacks and user to kernel attack when one of the siblings enters the kernel via interrupts or system call. By default, the feature doesn't change any of the current scheduler behavior. The user decides which tasks can run simultaneously on the same core (for now by having them in the same tagged cgroup). When a tag is enabled in a cgroup and a task from that cgroup is running on a hardware thread, the scheduler ensures that only idle or trusted tasks run on the other sibling(s). Besides security concerns, this feature can also be beneficial for RT and performance applications where we want to control how tasks make use of SMT dynamically. This iteration focuses on the the following stuff: - Redesigned API. - Rework of Kernel Protection feature based on Thomas's entry work. - Rework of hotplug fixes. - Address review comments in v7 Joel: Both a CGroup and Per-task interface via prctl(2) are provided for configuring core sharing. More details are provided in documentation patch. Kselftests are provided to verify the correctness/rules of the interface. Julien: TPCC tests showed improvements with core-scheduling. With kernel protection enabled, it does not show any regression. Possibly ASI will improve the performance for those who choose kernel protection (can be toggled through sched_core_protect_kernel sysctl). Results: v8 average stdev diff baseline (SMT on) 1197.272 44.78312824 core sched ( kernel protect) 412.9895 45.42734343 -65.51% core sched (no kernel protect) 686.6515 71.77756931 -42.65% nosmt 408.667 39.39042872 -65.87% v8 is rebased on tip/master. Future work =========== - Load balancing/Migration fixes for core scheduling. With v6, Load balancing is partially coresched aware, but has some issues w.r.t process/taskgroup weights: https://lwn.net/ml/linux-kernel/20200225034438.GA617271@z... - Core scheduling test framework: kselftests, torture tests etc Changes in v8 ============= - New interface/API implementation - Joel - Revised kernel protection patch - Joel - Revised Hotplug fixes - Joel - Minor bug fixes and address review comments - Vineeth Changes in v7 ============= - Kernel protection from untrusted usermode tasks - Joel, Vineeth - Fix for hotplug crashes and hangs - Joel, Vineeth Changes in v6 ============= - Documentation - Joel - Pause siblings on entering nmi/irq/softirq - Joel, Vineeth - Fix for RCU crash - Joel - Fix for a crash in pick_next_task - Yu Chen, Vineeth - Minor re-write of core-wide vruntime comparison - Aaron Lu - Cleanup: Address Review comments - Cleanup: Remove hotplug support (for now) - Build fixes: 32 bit, SMT=n, AUTOGROUP=n etc - Joel, Vineeth Changes in v5 ============= - Fixes for cgroup/process tagging during corner cases like cgroup destroy, task moving across cgroups etc - Tim Chen - Coresched aware task migrations - Aubrey Li - Other minor stability fixes. Changes in v4 ============= - Implement a core wide min_vruntime for vruntime comparison of tasks across cpus in a core. - Aaron Lu - Fixes a typo bug in setting the forced_idle cpu. - Aaron Lu Changes in v3 ============= - Fixes the issue of sibling picking up an incompatible task - Aaron Lu - Vineeth Pillai - Julien Desfossez - Fixes the issue of starving threads due to forced idle - Peter Zijlstra - Fixes the refcounting issue when deleting a cgroup with tag - Julien Desfossez - Fixes a crash during cpu offline/online with coresched enabled - Vineeth Pillai - Fixes a comparison logic issue in sched_core_find - Aaron Lu Changes in v2 ============= - Fixes for couple of NULL pointer dereference crashes - Subhra Mazumdar - Tim Chen - Improves priority comparison logic for process in different cpus - Peter Zijlstra - Aaron Lu - Fixes a hard lockup in rq locking - Vineeth Pillai - Julien Desfossez - Fixes a performance issue seen on IO heavy workloads - Vineeth Pillai - Julien Desfossez - Fix for 32bit build - Aubrey Li Aubrey Li (1): sched: migration changes for core scheduling Joel Fernandes (Google) (13): sched/fair: Snapshot the min_vruntime of CPUs on force idle arch/x86: Add a new TIF flag for untrusted tasks kernel/entry: Add support for core-wide protection of kernel-mode entry/idle: Enter and exit kernel protection during idle entry and exit sched: Split the cookie and setup per-task cookie on fork sched: Add a per-thread core scheduling interface sched: Add a second-level tag for nested CGroup usecase sched: Release references to the per-task cookie on exit sched: Handle task addition to CGroup sched/debug: Add CGroup node for printing group cookie if SCHED_DEBUG kselftest: Add tests for core-sched interface sched: Move core-scheduler interfacing code to a new file Documentation: Add core scheduling documentation Peter Zijlstra (10): sched: Wrap rq::lock access sched: Introduce sched_class::pick_task() sched: Core-wide rq->lock sched/fair: Add a few assertions sched: Basic tracking of matching tasks sched: Add core wide task selection and scheduling. sched: Trivial forced-newidle balancer irq_work: Cleanup sched: cgroup tagging interface for core scheduling sched: Debug bits... Vineeth Pillai (2): sched/fair: Fix forced idle sibling starvation corner case entry/kvm: Protect the kernel when entering from guest .../admin-guide/hw-vuln/core-scheduling.rst | 312 +++++ Documentation/admin-guide/hw-vuln/index.rst | 1 + .../admin-guide/kernel-parameters.txt | 7 + arch/x86/include/asm/thread_info.h | 2 + arch/x86/kvm/x86.c | 3 + drivers/gpu/drm/i915/i915_request.c | 4 +- include/linux/entry-common.h | 20 +- include/linux/entry-kvm.h | 12 + include/linux/irq_work.h | 33 +- include/linux/irqflags.h | 4 +- include/linux/sched.h | 27 +- include/uapi/linux/prctl.h | 3 + kernel/Kconfig.preempt | 6 + kernel/bpf/stackmap.c | 2 +- kernel/entry/common.c | 25 +- kernel/entry/kvm.c | 13 + kernel/fork.c | 1 + kernel/irq_work.c | 18 +- kernel/printk/printk.c | 6 +- kernel/rcu/tree.c | 3 +- kernel/sched/Makefile | 1 + kernel/sched/core.c | 1135 ++++++++++++++++- kernel/sched/coretag.c | 468 +++++++ kernel/sched/cpuacct.c | 12 +- kernel/sched/deadline.c | 34 +- kernel/sched/debug.c | 8 +- kernel/sched/fair.c | 272 ++-- kernel/sched/idle.c | 24 +- kernel/sched/pelt.h | 2 +- kernel/sched/rt.c | 22 +- kernel/sched/sched.h | 302 ++++- kernel/sched/stop_task.c | 13 +- kernel/sched/topology.c | 4 +- kernel/sys.c | 3 + kernel/time/tick-sched.c | 6 +- kernel/trace/bpf_trace.c | 2 +- tools/include/uapi/linux/prctl.h | 3 + tools/testing/selftests/sched/.gitignore | 1 + tools/testing/selftests/sched/Makefile | 14 + tools/testing/selftests/sched/config | 1 + .../testing/selftests/sched/test_coresched.c | 840 ++++++++++++ 41 files changed, 3437 insertions(+), 232 deletions(-) create mode 100644 Documentation/admin-guide/hw-vuln/core-scheduling.rst create mode 100644 kernel/sched/coretag.c create mode 100644 tools/testing/selftests/sched/.gitignore create mode 100644 tools/testing/selftests/sched/Makefile create mode 100644 tools/testing/selftests/sched/config create mode 100644 tools/testing/selftests/sched/test_coresched.c -- 2.29.0.rc1.297.gfa9743e501-goog ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v8 -tip 17/26] sched: Split the cookie and setup per-task cookie on fork 2020-10-20 1:43 [PATCH v8 -tip 00/26] Core scheduling Joel Fernandes (Google) @ 2020-10-20 1:43 ` Joel Fernandes (Google) 2020-11-04 22:30 ` chris hyser 0 siblings, 1 reply; 7+ messages in thread From: Joel Fernandes (Google) @ 2020-10-20 1:43 UTC (permalink / raw) To: Nishanth Aravamudan, Julien Desfossez, Peter Zijlstra, Tim Chen, Vineeth Pillai, Aaron Lu, Aubrey Li, tglx, linux-kernel Cc: mingo, torvalds, fweisbec, keescook, kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini, joel, vineeth, Chen Yu, Christian Brauner, Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, pjt, rostedt, derkling, benbjiang, Alexandre Chartre, James.Bottomley, OWeisse, Dhaval Giani, Junaid Shahid, jsbarnes, chris.hyser, Aubrey Li, Paul E. McKenney, Tim Chen In order to prevent interference and clearly support both per-task and CGroup APIs, split the cookie into 2 and allow it to be set from either per-task, or CGroup API. The final cookie is the combined value of both and is computed when the stop-machine executes during a change of cookie. Also, for the per-task cookie, it will get weird if we use pointers of any emphemeral objects. For this reason, introduce a refcounted object who's sole purpose is to assign unique cookie value by way of the object's pointer. While at it, refactor the CGroup code a bit. Future patches will introduce more APIs and support. Tested-by: Julien Desfossez <jdesfossez@digitalocean.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> --- include/linux/sched.h | 2 + kernel/sched/core.c | 241 ++++++++++++++++++++++++++++++++++++++++-- kernel/sched/debug.c | 4 + 3 files changed, 236 insertions(+), 11 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index fe6f225bfbf9..c6034c00846a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -688,6 +688,8 @@ struct task_struct { #ifdef CONFIG_SCHED_CORE struct rb_node core_node; unsigned long core_cookie; + unsigned long core_task_cookie; + unsigned long core_group_cookie; unsigned int core_occupation; #endif diff --git a/kernel/sched/core.c b/kernel/sched/core.c index bab4ea2f5cd8..30a9e4cb5ce1 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -346,11 +346,14 @@ void sched_core_put(void) mutex_unlock(&sched_core_mutex); } +static int sched_core_share_tasks(struct task_struct *t1, struct task_struct *t2); + #else /* !CONFIG_SCHED_CORE */ static inline void sched_core_enqueue(struct rq *rq, struct task_struct *p) { } static inline void sched_core_dequeue(struct rq *rq, struct task_struct *p) { } static bool sched_core_enqueued(struct task_struct *task) { return false; } +static int sched_core_share_tasks(struct task_struct *t1, struct task_struct *t2) { } #endif /* CONFIG_SCHED_CORE */ @@ -3583,6 +3586,20 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) #endif #ifdef CONFIG_SCHED_CORE RB_CLEAR_NODE(&p->core_node); + + /* + * Tag child via per-task cookie only if parent is tagged via per-task + * cookie. This is independent of, but can be additive to the CGroup tagging. + */ + if (current->core_task_cookie) { + + /* If it is not CLONE_THREAD fork, assign a unique per-task tag. */ + if (!(clone_flags & CLONE_THREAD)) { + return sched_core_share_tasks(p, p); + } + /* Otherwise share the parent's per-task tag. */ + return sched_core_share_tasks(p, current); + } #endif return 0; } @@ -9177,6 +9194,217 @@ static u64 cpu_rt_period_read_uint(struct cgroup_subsys_state *css, #endif /* CONFIG_RT_GROUP_SCHED */ #ifdef CONFIG_SCHED_CORE +/* + * A simple wrapper around refcount. An allocated sched_core_cookie's + * address is used to compute the cookie of the task. + */ +struct sched_core_cookie { + refcount_t refcnt; +}; + +/* + * sched_core_tag_requeue - Common helper for all interfaces to set a cookie. + * @p: The task to assign a cookie to. + * @cookie: The cookie to assign. + * @group: is it a group interface or a per-task interface. + * + * This function is typically called from a stop-machine handler. + */ +void sched_core_tag_requeue(struct task_struct *p, unsigned long cookie, bool group) +{ + if (!p) + return; + + if (group) + p->core_group_cookie = cookie; + else + p->core_task_cookie = cookie; + + /* Use up half of the cookie's bits for task cookie and remaining for group cookie. */ + p->core_cookie = (p->core_task_cookie << + (sizeof(unsigned long) * 4)) + p->core_group_cookie; + + if (sched_core_enqueued(p)) { + sched_core_dequeue(task_rq(p), p); + if (!p->core_task_cookie) + return; + } + + if (sched_core_enabled(task_rq(p)) && + p->core_cookie && task_on_rq_queued(p)) + sched_core_enqueue(task_rq(p), p); +} + +/* Per-task interface */ +static unsigned long sched_core_alloc_task_cookie(void) +{ + struct sched_core_cookie *ptr = + kmalloc(sizeof(struct sched_core_cookie), GFP_KERNEL); + + if (!ptr) + return 0; + refcount_set(&ptr->refcnt, 1); + + /* + * NOTE: sched_core_put() is not done by put_task_cookie(). Instead, it + * is done after the stopper runs. + */ + sched_core_get(); + return (unsigned long)ptr; +} + +static bool sched_core_get_task_cookie(unsigned long cookie) +{ + struct sched_core_cookie *ptr = (struct sched_core_cookie *)cookie; + + /* + * NOTE: sched_core_put() is not done by put_task_cookie(). Instead, it + * is done after the stopper runs. + */ + sched_core_get(); + return refcount_inc_not_zero(&ptr->refcnt); +} + +static void sched_core_put_task_cookie(unsigned long cookie) +{ + struct sched_core_cookie *ptr = (struct sched_core_cookie *)cookie; + + if (refcount_dec_and_test(&ptr->refcnt)) + kfree(ptr); +} + +struct sched_core_task_write_tag { + struct task_struct *tasks[2]; + unsigned long cookies[2]; +}; + +/* + * Ensure that the task has been requeued. The stopper ensures that the task cannot + * be migrated to a different CPU while its core scheduler queue state is being updated. + * It also makes sure to requeue a task if it was running actively on another CPU. + */ +static int sched_core_task_join_stopper(void *data) +{ + struct sched_core_task_write_tag *tag = (struct sched_core_task_write_tag *)data; + int i; + + for (i = 0; i < 2; i++) + sched_core_tag_requeue(tag->tasks[i], tag->cookies[i], false /* !group */); + + return 0; +} + +static int sched_core_share_tasks(struct task_struct *t1, struct task_struct *t2) +{ + struct sched_core_task_write_tag wr = {}; /* for stop machine. */ + bool sched_core_put_after_stopper = false; + unsigned long cookie; + int ret = -ENOMEM; + + mutex_lock(&sched_core_mutex); + + /* + * NOTE: sched_core_get() is done by sched_core_alloc_task_cookie() or + * sched_core_put_task_cookie(). However, sched_core_put() is done + * by this function *after* the stopper removes the tasks from the + * core queue, and not before. This is just to play it safe. + */ + if (t2 == NULL) { + if (t1->core_task_cookie) { + sched_core_put_task_cookie(t1->core_task_cookie); + sched_core_put_after_stopper = true; + wr.tasks[0] = t1; /* Keep wr.cookies[0] reset for t1. */ + } + } else if (t1 == t2) { + /* Assign a unique per-task cookie solely for t1. */ + + cookie = sched_core_alloc_task_cookie(); + if (!cookie) + goto out_unlock; + + if (t1->core_task_cookie) { + sched_core_put_task_cookie(t1->core_task_cookie); + sched_core_put_after_stopper = true; + } + wr.tasks[0] = t1; + wr.cookies[0] = cookie; + } else + /* + * t1 joining t2 + * CASE 1: + * before 0 0 + * after new cookie new cookie + * + * CASE 2: + * before X (non-zero) 0 + * after 0 0 + * + * CASE 3: + * before 0 X (non-zero) + * after X X + * + * CASE 4: + * before Y (non-zero) X (non-zero) + * after X X + */ + if (!t1->core_task_cookie && !t2->core_task_cookie) { + /* CASE 1. */ + cookie = sched_core_alloc_task_cookie(); + if (!cookie) + goto out_unlock; + + /* Add another reference for the other task. */ + if (!sched_core_get_task_cookie(cookie)) { + return -EINVAL; + goto out_unlock; + } + + wr.tasks[0] = t1; + wr.tasks[1] = t2; + wr.cookies[0] = wr.cookies[1] = cookie; + + } else if (t1->core_task_cookie && !t2->core_task_cookie) { + /* CASE 2. */ + sched_core_put_task_cookie(t1->core_task_cookie); + sched_core_put_after_stopper = true; + + wr.tasks[0] = t1; /* Reset cookie for t1. */ + + } else if (!t1->core_task_cookie && t2->core_task_cookie) { + /* CASE 3. */ + if (!sched_core_get_task_cookie(t2->core_task_cookie)) { + ret = -EINVAL; + goto out_unlock; + } + + wr.tasks[0] = t1; + wr.cookies[0] = t2->core_task_cookie; + + } else { + /* CASE 4. */ + if (!sched_core_get_task_cookie(t2->core_task_cookie)) { + ret = -EINVAL; + goto out_unlock; + } + sched_core_put_task_cookie(t1->core_task_cookie); + sched_core_put_after_stopper = true; + + wr.tasks[0] = t1; + wr.cookies[0] = t2->core_task_cookie; + } + + stop_machine(sched_core_task_join_stopper, (void *)&wr, NULL); + + if (sched_core_put_after_stopper) + sched_core_put(); + + ret = 0; +out_unlock: + mutex_unlock(&sched_core_mutex); + return ret; +} + +/* CGroup interface */ static u64 cpu_core_tag_read_u64(struct cgroup_subsys_state *css, struct cftype *cft) { struct task_group *tg = css_tg(css); @@ -9207,18 +9435,9 @@ static int __sched_write_tag(void *data) * when we set cgroup tag to 0 when the loop is done below. */ while ((p = css_task_iter_next(&it))) { - p->core_cookie = !!val ? (unsigned long)tg : 0UL; - - if (sched_core_enqueued(p)) { - sched_core_dequeue(task_rq(p), p); - if (!p->core_cookie) - continue; - } - - if (sched_core_enabled(task_rq(p)) && - p->core_cookie && task_on_rq_queued(p)) - sched_core_enqueue(task_rq(p), p); + unsigned long cookie = !!val ? (unsigned long)tg : 0UL; + sched_core_tag_requeue(p, cookie, true /* group */); } css_task_iter_end(&it); diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index c8fee8d9dfd4..88bf45267672 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -1024,6 +1024,10 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns, __PS("clock-delta", t1-t0); } +#ifdef CONFIG_SCHED_CORE + __PS("core_cookie", p->core_cookie); +#endif + sched_show_numa(p, m); } -- 2.29.0.rc1.297.gfa9743e501-goog ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v8 -tip 17/26] sched: Split the cookie and setup per-task cookie on fork 2020-10-20 1:43 ` [PATCH v8 -tip 17/26] sched: Split the cookie and setup per-task cookie on fork Joel Fernandes (Google) @ 2020-11-04 22:30 ` chris hyser 2020-11-05 14:49 ` Joel Fernandes 2020-11-09 23:30 ` chris hyser 0 siblings, 2 replies; 7+ messages in thread From: chris hyser @ 2020-11-04 22:30 UTC (permalink / raw) To: Joel Fernandes (Google), Nishanth Aravamudan, Julien Desfossez, Peter Zijlstra, Tim Chen, Vineeth Pillai, Aaron Lu, Aubrey Li, tglx, linux-kernel Cc: mingo, torvalds, fweisbec, keescook, kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini, vineeth, Chen Yu, Christian Brauner, Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, pjt, rostedt, derkling, benbjiang, Alexandre Chartre, James.Bottomley, OWeisse, Dhaval Giani, Junaid Shahid, jsbarnes, Aubrey Li, Paul E. McKenney, Tim Chen On 10/19/20 9:43 PM, Joel Fernandes (Google) wrote: > In order to prevent interference and clearly support both per-task and CGroup > APIs, split the cookie into 2 and allow it to be set from either per-task, or > CGroup API. The final cookie is the combined value of both and is computed when > the stop-machine executes during a change of cookie. > > Also, for the per-task cookie, it will get weird if we use pointers of any > emphemeral objects. For this reason, introduce a refcounted object who's sole > purpose is to assign unique cookie value by way of the object's pointer. > > While at it, refactor the CGroup code a bit. Future patches will introduce more > APIs and support. > > Tested-by: Julien Desfossez <jdesfossez@digitalocean.com> > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > --- > include/linux/sched.h | 2 + > kernel/sched/core.c | 241 ++++++++++++++++++++++++++++++++++++++++-- > kernel/sched/debug.c | 4 + > 3 files changed, 236 insertions(+), 11 deletions(-) > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index fe6f225bfbf9..c6034c00846a 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -688,6 +688,8 @@ struct task_struct { > #ifdef CONFIG_SCHED_CORE > struct rb_node core_node; > unsigned long core_cookie; > + unsigned long core_task_cookie; > + unsigned long core_group_cookie; > unsigned int core_occupation; > #endif > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index bab4ea2f5cd8..30a9e4cb5ce1 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -346,11 +346,14 @@ void sched_core_put(void) > mutex_unlock(&sched_core_mutex); > } > > +static int sched_core_share_tasks(struct task_struct *t1, struct task_struct *t2); > + > #else /* !CONFIG_SCHED_CORE */ > > static inline void sched_core_enqueue(struct rq *rq, struct task_struct *p) { } > static inline void sched_core_dequeue(struct rq *rq, struct task_struct *p) { } > static bool sched_core_enqueued(struct task_struct *task) { return false; } > +static int sched_core_share_tasks(struct task_struct *t1, struct task_struct *t2) { } > > #endif /* CONFIG_SCHED_CORE */ > > @@ -3583,6 +3586,20 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) > #endif > #ifdef CONFIG_SCHED_CORE > RB_CLEAR_NODE(&p->core_node); > + > + /* > + * Tag child via per-task cookie only if parent is tagged via per-task > + * cookie. This is independent of, but can be additive to the CGroup tagging. > + */ > + if (current->core_task_cookie) { > + > + /* If it is not CLONE_THREAD fork, assign a unique per-task tag. */ > + if (!(clone_flags & CLONE_THREAD)) { > + return sched_core_share_tasks(p, p); > + } > + /* Otherwise share the parent's per-task tag. */ > + return sched_core_share_tasks(p, current); > + } > #endif > return 0; > } > @@ -9177,6 +9194,217 @@ static u64 cpu_rt_period_read_uint(struct cgroup_subsys_state *css, > #endif /* CONFIG_RT_GROUP_SCHED */ > > #ifdef CONFIG_SCHED_CORE > +/* > + * A simple wrapper around refcount. An allocated sched_core_cookie's > + * address is used to compute the cookie of the task. > + */ > +struct sched_core_cookie { > + refcount_t refcnt; > +}; > + > +/* > + * sched_core_tag_requeue - Common helper for all interfaces to set a cookie. > + * @p: The task to assign a cookie to. > + * @cookie: The cookie to assign. > + * @group: is it a group interface or a per-task interface. > + * > + * This function is typically called from a stop-machine handler. > + */ > +void sched_core_tag_requeue(struct task_struct *p, unsigned long cookie, bool group) > +{ > + if (!p) > + return; > + > + if (group) > + p->core_group_cookie = cookie; > + else > + p->core_task_cookie = cookie; > + > + /* Use up half of the cookie's bits for task cookie and remaining for group cookie. */ > + p->core_cookie = (p->core_task_cookie << > + (sizeof(unsigned long) * 4)) + p->core_group_cookie; > + > + if (sched_core_enqueued(p)) { > + sched_core_dequeue(task_rq(p), p); > + if (!p->core_task_cookie) > + return; > + } > + > + if (sched_core_enabled(task_rq(p)) && > + p->core_cookie && task_on_rq_queued(p)) > + sched_core_enqueue(task_rq(p), p); > +} > + > +/* Per-task interface */ > +static unsigned long sched_core_alloc_task_cookie(void) > +{ > + struct sched_core_cookie *ptr = > + kmalloc(sizeof(struct sched_core_cookie), GFP_KERNEL); > + > + if (!ptr) > + return 0; > + refcount_set(&ptr->refcnt, 1); > + > + /* > + * NOTE: sched_core_put() is not done by put_task_cookie(). Instead, it > + * is done after the stopper runs. > + */ > + sched_core_get(); > + return (unsigned long)ptr; > +} > + > +static bool sched_core_get_task_cookie(unsigned long cookie) > +{ > + struct sched_core_cookie *ptr = (struct sched_core_cookie *)cookie; > + > + /* > + * NOTE: sched_core_put() is not done by put_task_cookie(). Instead, it > + * is done after the stopper runs. > + */ > + sched_core_get(); > + return refcount_inc_not_zero(&ptr->refcnt); > +} > + > +static void sched_core_put_task_cookie(unsigned long cookie) > +{ > + struct sched_core_cookie *ptr = (struct sched_core_cookie *)cookie; > + > + if (refcount_dec_and_test(&ptr->refcnt)) > + kfree(ptr); > +} > + > +struct sched_core_task_write_tag { > + struct task_struct *tasks[2]; > + unsigned long cookies[2]; > +}; > + > +/* > + * Ensure that the task has been requeued. The stopper ensures that the task cannot > + * be migrated to a different CPU while its core scheduler queue state is being updated. > + * It also makes sure to requeue a task if it was running actively on another CPU. > + */ > +static int sched_core_task_join_stopper(void *data) > +{ > + struct sched_core_task_write_tag *tag = (struct sched_core_task_write_tag *)data; > + int i; > + > + for (i = 0; i < 2; i++) > + sched_core_tag_requeue(tag->tasks[i], tag->cookies[i], false /* !group */); > + > + return 0; > +} > + > +static int sched_core_share_tasks(struct task_struct *t1, struct task_struct *t2) > +{ > + struct sched_core_task_write_tag wr = {}; /* for stop machine. */ > + bool sched_core_put_after_stopper = false; > + unsigned long cookie; > + int ret = -ENOMEM; > + > + mutex_lock(&sched_core_mutex); > + > + /* > + * NOTE: sched_core_get() is done by sched_core_alloc_task_cookie() or > + * sched_core_put_task_cookie(). However, sched_core_put() is done > + * by this function *after* the stopper removes the tasks from the > + * core queue, and not before. This is just to play it safe. > + */ > + if (t2 == NULL) { > + if (t1->core_task_cookie) { > + sched_core_put_task_cookie(t1->core_task_cookie); > + sched_core_put_after_stopper = true; > + wr.tasks[0] = t1; /* Keep wr.cookies[0] reset for t1. */ > + } > + } else if (t1 == t2) { > + /* Assign a unique per-task cookie solely for t1. */ > + > + cookie = sched_core_alloc_task_cookie(); > + if (!cookie) > + goto out_unlock; > + > + if (t1->core_task_cookie) { > + sched_core_put_task_cookie(t1->core_task_cookie); > + sched_core_put_after_stopper = true; > + } > + wr.tasks[0] = t1; > + wr.cookies[0] = cookie; > + } else > + /* > + * t1 joining t2 > + * CASE 1: > + * before 0 0 > + * after new cookie new cookie > + * > + * CASE 2: > + * before X (non-zero) 0 > + * after 0 0 > + * > + * CASE 3: > + * before 0 X (non-zero) > + * after X X > + * > + * CASE 4: > + * before Y (non-zero) X (non-zero) > + * after X X > + */ > + if (!t1->core_task_cookie && !t2->core_task_cookie) { > + /* CASE 1. */ > + cookie = sched_core_alloc_task_cookie(); > + if (!cookie) > + goto out_unlock; > + > + /* Add another reference for the other task. */ > + if (!sched_core_get_task_cookie(cookie)) { > + return -EINVAL; should be: ret = -EINVAL; > + goto out_unlock; > + } > + > + wr.tasks[0] = t1; > + wr.tasks[1] = t2; > + wr.cookies[0] = wr.cookies[1] = cookie; > + > + } else if (t1->core_task_cookie && !t2->core_task_cookie) { > + /* CASE 2. */ > + sched_core_put_task_cookie(t1->core_task_cookie); > + sched_core_put_after_stopper = true; > + > + wr.tasks[0] = t1; /* Reset cookie for t1. */ > + > + } else if (!t1->core_task_cookie && t2->core_task_cookie) { > + /* CASE 3. */ > + if (!sched_core_get_task_cookie(t2->core_task_cookie)) { > + ret = -EINVAL; > + goto out_unlock; > + } > + > + wr.tasks[0] = t1; > + wr.cookies[0] = t2->core_task_cookie; > + > + } else { > + /* CASE 4. */ > + if (!sched_core_get_task_cookie(t2->core_task_cookie)) { > + ret = -EINVAL; > + goto out_unlock; > + } > + sched_core_put_task_cookie(t1->core_task_cookie); > + sched_core_put_after_stopper = true; > + > + wr.tasks[0] = t1; > + wr.cookies[0] = t2->core_task_cookie; > + } > + > + stop_machine(sched_core_task_join_stopper, (void *)&wr, NULL); > + > + if (sched_core_put_after_stopper) > + sched_core_put(); > + > + ret = 0; > +out_unlock: > + mutex_unlock(&sched_core_mutex); > + return ret; > +} > + > +/* CGroup interface */ > static u64 cpu_core_tag_read_u64(struct cgroup_subsys_state *css, struct cftype *cft) > { > struct task_group *tg = css_tg(css); > @@ -9207,18 +9435,9 @@ static int __sched_write_tag(void *data) > * when we set cgroup tag to 0 when the loop is done below. > */ > while ((p = css_task_iter_next(&it))) { > - p->core_cookie = !!val ? (unsigned long)tg : 0UL; > - > - if (sched_core_enqueued(p)) { > - sched_core_dequeue(task_rq(p), p); > - if (!p->core_cookie) > - continue; > - } > - > - if (sched_core_enabled(task_rq(p)) && > - p->core_cookie && task_on_rq_queued(p)) > - sched_core_enqueue(task_rq(p), p); > + unsigned long cookie = !!val ? (unsigned long)tg : 0UL; > > + sched_core_tag_requeue(p, cookie, true /* group */); > } > css_task_iter_end(&it); > > diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c > index c8fee8d9dfd4..88bf45267672 100644 > --- a/kernel/sched/debug.c > +++ b/kernel/sched/debug.c > @@ -1024,6 +1024,10 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns, > __PS("clock-delta", t1-t0); > } > > +#ifdef CONFIG_SCHED_CORE > + __PS("core_cookie", p->core_cookie); > +#endif > + > sched_show_numa(p, m); > } > > -chrish ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v8 -tip 17/26] sched: Split the cookie and setup per-task cookie on fork 2020-11-04 22:30 ` chris hyser @ 2020-11-05 14:49 ` Joel Fernandes 2020-11-09 23:30 ` chris hyser 1 sibling, 0 replies; 7+ messages in thread From: Joel Fernandes @ 2020-11-05 14:49 UTC (permalink / raw) To: chris hyser Cc: Nishanth Aravamudan, Julien Desfossez, Peter Zijlstra, Tim Chen, Vineeth Pillai, Aaron Lu, Aubrey Li, tglx, linux-kernel, mingo, torvalds, fweisbec, keescook, kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini, vineeth, Chen Yu, Christian Brauner, Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, pjt, rostedt, derkling, benbjiang, Alexandre Chartre, James.Bottomley, OWeisse, Dhaval Giani, Junaid Shahid, jsbarnes, Aubrey Li, Paul E. McKenney, Tim Chen On Wed, Nov 04, 2020 at 05:30:24PM -0500, chris hyser wrote: [..] > + wr.cookies[0] = cookie; > > + } else > > + /* > > + * t1 joining t2 > > + * CASE 1: > > + * before 0 0 > > + * after new cookie new cookie > > + * > > + * CASE 2: > > + * before X (non-zero) 0 > > + * after 0 0 > > + * > > + * CASE 3: > > + * before 0 X (non-zero) > > + * after X X > > + * > > + * CASE 4: > > + * before Y (non-zero) X (non-zero) > > + * after X X > > + */ > > + if (!t1->core_task_cookie && !t2->core_task_cookie) { > > + /* CASE 1. */ > > + cookie = sched_core_alloc_task_cookie(); > > + if (!cookie) > > + goto out_unlock; > > + > > + /* Add another reference for the other task. */ > > + if (!sched_core_get_task_cookie(cookie)) { > > + return -EINVAL; > > should be: ret = -EINVAL; Fixed now, thanks. thanks, - Joel > > > + goto out_unlock; > > + } > > + > > + wr.tasks[0] = t1; > > + wr.tasks[1] = t2; > > + wr.cookies[0] = wr.cookies[1] = cookie; > > + > > + } else if (t1->core_task_cookie && !t2->core_task_cookie) { > > + /* CASE 2. */ > > + sched_core_put_task_cookie(t1->core_task_cookie); > > + sched_core_put_after_stopper = true; > > + > > + wr.tasks[0] = t1; /* Reset cookie for t1. */ > > + > > + } else if (!t1->core_task_cookie && t2->core_task_cookie) { > > + /* CASE 3. */ > > + if (!sched_core_get_task_cookie(t2->core_task_cookie)) { > > + ret = -EINVAL; > > + goto out_unlock; > > + } > > + > > + wr.tasks[0] = t1; > > + wr.cookies[0] = t2->core_task_cookie; > > + > > + } else { > > + /* CASE 4. */ > > + if (!sched_core_get_task_cookie(t2->core_task_cookie)) { > > + ret = -EINVAL; > > + goto out_unlock; > > + } > > + sched_core_put_task_cookie(t1->core_task_cookie); > > + sched_core_put_after_stopper = true; > > + > > + wr.tasks[0] = t1; > > + wr.cookies[0] = t2->core_task_cookie; > > + } > > + > > + stop_machine(sched_core_task_join_stopper, (void *)&wr, NULL); > > + > > + if (sched_core_put_after_stopper) > > + sched_core_put(); > > + > > + ret = 0; > > +out_unlock: > > + mutex_unlock(&sched_core_mutex); > > + return ret; > > +} > > + > > +/* CGroup interface */ > > static u64 cpu_core_tag_read_u64(struct cgroup_subsys_state *css, struct cftype *cft) > > { > > struct task_group *tg = css_tg(css); > > @@ -9207,18 +9435,9 @@ static int __sched_write_tag(void *data) > > * when we set cgroup tag to 0 when the loop is done below. > > */ > > while ((p = css_task_iter_next(&it))) { > > - p->core_cookie = !!val ? (unsigned long)tg : 0UL; > > - > > - if (sched_core_enqueued(p)) { > > - sched_core_dequeue(task_rq(p), p); > > - if (!p->core_cookie) > > - continue; > > - } > > - > > - if (sched_core_enabled(task_rq(p)) && > > - p->core_cookie && task_on_rq_queued(p)) > > - sched_core_enqueue(task_rq(p), p); > > + unsigned long cookie = !!val ? (unsigned long)tg : 0UL; > > + sched_core_tag_requeue(p, cookie, true /* group */); > > } > > css_task_iter_end(&it); > > diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c > > index c8fee8d9dfd4..88bf45267672 100644 > > --- a/kernel/sched/debug.c > > +++ b/kernel/sched/debug.c > > @@ -1024,6 +1024,10 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns, > > __PS("clock-delta", t1-t0); > > } > > +#ifdef CONFIG_SCHED_CORE > > + __PS("core_cookie", p->core_cookie); > > +#endif > > + > > sched_show_numa(p, m); > > } > > > > -chrish ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v8 -tip 17/26] sched: Split the cookie and setup per-task cookie on fork 2020-11-04 22:30 ` chris hyser 2020-11-05 14:49 ` Joel Fernandes @ 2020-11-09 23:30 ` chris hyser 1 sibling, 0 replies; 7+ messages in thread From: chris hyser @ 2020-11-09 23:30 UTC (permalink / raw) To: Joel Fernandes (Google), Nishanth Aravamudan, Julien Desfossez, Peter Zijlstra, Tim Chen, Vineeth Pillai, Aaron Lu, Aubrey Li, tglx, linux-kernel Cc: mingo, torvalds, fweisbec, keescook, kerrnel, Phil Auld, Valentin Schneider, Mel Gorman, Pawan Gupta, Paolo Bonzini, vineeth, Chen Yu, Christian Brauner, Agata Gruza, Antonio Gomez Iglesias, graf, konrad.wilk, dfaggioli, pjt, rostedt, derkling, benbjiang, Alexandre Chartre, James.Bottomley, OWeisse, Dhaval Giani, Junaid Shahid, jsbarnes, Aubrey Li, Paul E. McKenney, Tim Chen On 11/4/20 5:30 PM, chris hyser wrote: > On 10/19/20 9:43 PM, Joel Fernandes (Google) wrote: >> In order to prevent interference and clearly support both per-task and CGroup >> APIs, split the cookie into 2 and allow it to be set from either per-task, or >> CGroup API. The final cookie is the combined value of both and is computed when >> the stop-machine executes during a change of cookie. >> >> Also, for the per-task cookie, it will get weird if we use pointers of any >> emphemeral objects. For this reason, introduce a refcounted object who's sole >> purpose is to assign unique cookie value by way of the object's pointer. >> >> While at it, refactor the CGroup code a bit. Future patches will introduce more >> APIs and support. >> >> Tested-by: Julien Desfossez <jdesfossez@digitalocean.com> >> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> >> --- >> include/linux/sched.h | 2 + >> kernel/sched/core.c | 241 ++++++++++++++++++++++++++++++++++++++++-- >> kernel/sched/debug.c | 4 + >> 3 files changed, 236 insertions(+), 11 deletions(-) >> >> diff --git a/include/linux/sched.h b/include/linux/sched.h >> index fe6f225bfbf9..c6034c00846a 100644 >> --- a/include/linux/sched.h >> +++ b/include/linux/sched.h >> @@ -688,6 +688,8 @@ struct task_struct { >> #ifdef CONFIG_SCHED_CORE >> struct rb_node core_node; >> unsigned long core_cookie; >> + unsigned long core_task_cookie; >> + unsigned long core_group_cookie; >> unsigned int core_occupation; >> #endif >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index bab4ea2f5cd8..30a9e4cb5ce1 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -346,11 +346,14 @@ void sched_core_put(void) >> mutex_unlock(&sched_core_mutex); >> } >> +static int sched_core_share_tasks(struct task_struct *t1, struct task_struct *t2); >> + >> #else /* !CONFIG_SCHED_CORE */ >> static inline void sched_core_enqueue(struct rq *rq, struct task_struct *p) { } >> static inline void sched_core_dequeue(struct rq *rq, struct task_struct *p) { } >> static bool sched_core_enqueued(struct task_struct *task) { return false; } >> +static int sched_core_share_tasks(struct task_struct *t1, struct task_struct *t2) { } >> #endif /* CONFIG_SCHED_CORE */ >> @@ -3583,6 +3586,20 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p) >> #endif >> #ifdef CONFIG_SCHED_CORE >> RB_CLEAR_NODE(&p->core_node); >> + >> + /* >> + * Tag child via per-task cookie only if parent is tagged via per-task >> + * cookie. This is independent of, but can be additive to the CGroup tagging. >> + */ >> + if (current->core_task_cookie) { >> + >> + /* If it is not CLONE_THREAD fork, assign a unique per-task tag. */ >> + if (!(clone_flags & CLONE_THREAD)) { >> + return sched_core_share_tasks(p, p); >> + } >> + /* Otherwise share the parent's per-task tag. */ >> + return sched_core_share_tasks(p, current); >> + } >> #endif >> return 0; >> } sched_core_share_tasks() looks at the value of the new tasks core_task_cookie which is non-zero on a process or thread clone and causes underflow for both the enable flag itself and for cookie ref counts. So just zero it in __sched_fork(). -chrish ------ sched: zero out the core scheduling cookie on clone As the cookie is reference counted, even if inherited, zero this and allow explicit sharing. Signed-off-by: Chris Hyser <chris.hyser@oracle.com> --- kernel/sched/core.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index fd3cc03..2af0ea6 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3378,6 +3378,9 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p) p->capture_control = NULL; #endif init_numa_balancing(clone_flags, p); +#ifdef CONFIG_SCHED_CORE + p->core_task_cookie = 0; +#endif #ifdef CONFIG_SMP p->wake_entry.u_flags = CSD_TYPE_TTWU; #endif ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-11-17 0:45 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <4ff7f030-b797-6711-fb6a-fe39bb02075a@oracle.com> 2020-11-09 23:23 ` [PATCH v8 -tip 17/26] sched: Split the cookie and setup per-task cookie on fork chris hyser 2020-11-10 14:15 ` Joel Fernandes 2020-11-17 0:45 ` Joel Fernandes 2020-10-20 1:43 [PATCH v8 -tip 00/26] Core scheduling Joel Fernandes (Google) 2020-10-20 1:43 ` [PATCH v8 -tip 17/26] sched: Split the cookie and setup per-task cookie on fork Joel Fernandes (Google) 2020-11-04 22:30 ` chris hyser 2020-11-05 14:49 ` Joel Fernandes 2020-11-09 23:30 ` chris hyser
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).