All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: Nishanth Aravamudan <naravamudan@digitalocean.com>,
	Julien Desfossez <jdesfossez@digitalocean.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Vineeth Pillai <viremana@linux.microsoft.com>,
	Aaron Lu <aaron.lwe@gmail.com>,
	Aubrey Li <aubrey.intel@gmail.com>,
	tglx@linutronix.de, linux-kernel@vger.kernel.org,
	hongyu.ning@linux.intel.com
Cc: mingo@kernel.org, torvalds@linux-foundation.org,
	fweisbec@gmail.com, keescook@chromium.org, kerrnel@google.com,
	Phil Auld <pauld@redhat.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	vineeth@bitbyteword.org, Chen Yu <yu.c.chen@intel.com>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Agata Gruza <agata.gruza@intel.com>,
	Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com>,
	graf@amazon.com, konrad.wilk@oracle.com, dfaggioli@suse.com,
	pjt@google.com, rostedt@goodmis.org, derkling@google.com,
	benbjiang@tencent.com,
	Alexandre Chartre <alexandre.chartre@oracle.com>,
	James.Bottomley@hansenpartnership.com, OWeisse@umich.edu,
	Dhaval Giani <dhaval.giani@oracle.com>,
	Junaid Shahid <junaids@google.com>,
	jsbarnes@google.com, chris.hyser@oracle.com,
	Aubrey Li <aubrey.li@linux.intel.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Tim Chen <tim.c.chen@intel.com>
Subject: [RFT for v9] (Was Re: [PATCH v8 -tip 00/26] Core scheduling)
Date: Fri, 6 Nov 2020 15:55:06 -0500	[thread overview]
Message-ID: <20201106205506.GA3109656@google.com> (raw)
In-Reply-To: <20201020014336.2076526-1-joel@joelfernandes.org>

All,

I am getting ready to send the next v9 series based on tip/master
branch. Could you please give the below tree a try and report any results in
your testing?
git tree:
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git (branch coresched)
git log:
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/log/?h=coresched

The major changes in this series are the improvements:
(1)
"sched: Make snapshotting of min_vruntime more CGroup-friendly"
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=coresched-v9-for-test&id=9a20a6652b3c50fd51faa829f7947004239a04eb

(2)
"sched: Simplify the core pick loop for optimized case"
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=coresched-v9-for-test&id=0370117b4fd418cdaaa6b1489bfc14f305691152

And a bug fix:
(1)
"sched: Enqueue task into core queue only after vruntime is updated"
https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/commit/?h=coresched-v9-for-test&id=401dad5536e7e05d1299d0864e6fc5072029f492

There are also 2 more bug fixes that I squashed-in related to kernel
protection and a crash seen on the tip/master branch.

Hoping to send the series next week out to the list.

Have a great weekend, and Thanks!

 - Joel


On Mon, Oct 19, 2020 at 09:43:10PM -0400, Joel Fernandes (Google) wrote:
> Eighth iteration of the Core-Scheduling feature.
> 
> Core scheduling is a feature that allows only trusted tasks to run
> concurrently on cpus sharing compute resources (eg: hyperthreads on a
> core). The goal is to mitigate the core-level side-channel attacks
> without requiring to disable SMT (which has a significant impact on
> performance in some situations). Core scheduling (as of v7) mitigates
> user-space to user-space attacks and user to kernel attack when one of
> the siblings enters the kernel via interrupts or system call.
> 
> By default, the feature doesn't change any of the current scheduler
> behavior. The user decides which tasks can run simultaneously on the
> same core (for now by having them in the same tagged cgroup). When a tag
> is enabled in a cgroup and a task from that cgroup is running on a
> hardware thread, the scheduler ensures that only idle or trusted tasks
> run on the other sibling(s). Besides security concerns, this feature can
> also be beneficial for RT and performance applications where we want to
> control how tasks make use of SMT dynamically.
> 
> This iteration focuses on the the following stuff:
> - Redesigned API.
> - Rework of Kernel Protection feature based on Thomas's entry work.
> - Rework of hotplug fixes.
> - Address review comments in v7
> 
> Joel: Both a CGroup and Per-task interface via prctl(2) are provided for
> configuring core sharing. More details are provided in documentation patch.
> Kselftests are provided to verify the correctness/rules of the interface.
> 
> Julien: TPCC tests showed improvements with core-scheduling. With kernel
> protection enabled, it does not show any regression. Possibly ASI will improve
> the performance for those who choose kernel protection (can be toggled through
> sched_core_protect_kernel sysctl). Results:
> v8				average		stdev		diff
> baseline (SMT on)		1197.272	44.78312824	
> core sched (   kernel protect)	412.9895	45.42734343	-65.51%
> core sched (no kernel protect)	686.6515	71.77756931	-42.65%
> nosmt				408.667		39.39042872	-65.87%
> 
> v8 is rebased on tip/master.
> 
> Future work
> ===========
> - Load balancing/Migration fixes for core scheduling.
>   With v6, Load balancing is partially coresched aware, but has some
>   issues w.r.t process/taskgroup weights:
>   https://lwn.net/ml/linux-kernel/20200225034438.GA617271@z...
> - Core scheduling test framework: kselftests, torture tests etc
> 
> Changes in v8
> =============
> - New interface/API implementation
>   - Joel
> - Revised kernel protection patch
>   - Joel
> - Revised Hotplug fixes
>   - Joel
> - Minor bug fixes and address review comments
>   - Vineeth
> 
> Changes in v7
> =============
> - Kernel protection from untrusted usermode tasks
>   - Joel, Vineeth
> - Fix for hotplug crashes and hangs
>   - Joel, Vineeth
> 
> Changes in v6
> =============
> - Documentation
>   - Joel
> - Pause siblings on entering nmi/irq/softirq
>   - Joel, Vineeth
> - Fix for RCU crash
>   - Joel
> - Fix for a crash in pick_next_task
>   - Yu Chen, Vineeth
> - Minor re-write of core-wide vruntime comparison
>   - Aaron Lu
> - Cleanup: Address Review comments
> - Cleanup: Remove hotplug support (for now)
> - Build fixes: 32 bit, SMT=n, AUTOGROUP=n etc
>   - Joel, Vineeth
> 
> Changes in v5
> =============
> - Fixes for cgroup/process tagging during corner cases like cgroup
>   destroy, task moving across cgroups etc
>   - Tim Chen
> - Coresched aware task migrations
>   - Aubrey Li
> - Other minor stability fixes.
> 
> Changes in v4
> =============
> - Implement a core wide min_vruntime for vruntime comparison of tasks
>   across cpus in a core.
>   - Aaron Lu
> - Fixes a typo bug in setting the forced_idle cpu.
>   - Aaron Lu
> 
> Changes in v3
> =============
> - Fixes the issue of sibling picking up an incompatible task
>   - Aaron Lu
>   - Vineeth Pillai
>   - Julien Desfossez
> - Fixes the issue of starving threads due to forced idle
>   - Peter Zijlstra
> - Fixes the refcounting issue when deleting a cgroup with tag
>   - Julien Desfossez
> - Fixes a crash during cpu offline/online with coresched enabled
>   - Vineeth Pillai
> - Fixes a comparison logic issue in sched_core_find
>   - Aaron Lu
> 
> Changes in v2
> =============
> - Fixes for couple of NULL pointer dereference crashes
>   - Subhra Mazumdar
>   - Tim Chen
> - Improves priority comparison logic for process in different cpus
>   - Peter Zijlstra
>   - Aaron Lu
> - Fixes a hard lockup in rq locking
>   - Vineeth Pillai
>   - Julien Desfossez
> - Fixes a performance issue seen on IO heavy workloads
>   - Vineeth Pillai
>   - Julien Desfossez
> - Fix for 32bit build
>   - Aubrey Li
> 
> Aubrey Li (1):
> sched: migration changes for core scheduling
> 
> Joel Fernandes (Google) (13):
> sched/fair: Snapshot the min_vruntime of CPUs on force idle
> arch/x86: Add a new TIF flag for untrusted tasks
> kernel/entry: Add support for core-wide protection of kernel-mode
> entry/idle: Enter and exit kernel protection during idle entry and
> exit
> sched: Split the cookie and setup per-task cookie on fork
> sched: Add a per-thread core scheduling interface
> sched: Add a second-level tag for nested CGroup usecase
> sched: Release references to the per-task cookie on exit
> sched: Handle task addition to CGroup
> sched/debug: Add CGroup node for printing group cookie if SCHED_DEBUG
> kselftest: Add tests for core-sched interface
> sched: Move core-scheduler interfacing code to a new file
> Documentation: Add core scheduling documentation
> 
> Peter Zijlstra (10):
> sched: Wrap rq::lock access
> sched: Introduce sched_class::pick_task()
> sched: Core-wide rq->lock
> sched/fair: Add a few assertions
> sched: Basic tracking of matching tasks
> sched: Add core wide task selection and scheduling.
> sched: Trivial forced-newidle balancer
> irq_work: Cleanup
> sched: cgroup tagging interface for core scheduling
> sched: Debug bits...
> 
> Vineeth Pillai (2):
> sched/fair: Fix forced idle sibling starvation corner case
> entry/kvm: Protect the kernel when entering from guest
> 
> .../admin-guide/hw-vuln/core-scheduling.rst   |  312 +++++
> Documentation/admin-guide/hw-vuln/index.rst   |    1 +
> .../admin-guide/kernel-parameters.txt         |    7 +
> arch/x86/include/asm/thread_info.h            |    2 +
> arch/x86/kvm/x86.c                            |    3 +
> drivers/gpu/drm/i915/i915_request.c           |    4 +-
> include/linux/entry-common.h                  |   20 +-
> include/linux/entry-kvm.h                     |   12 +
> include/linux/irq_work.h                      |   33 +-
> include/linux/irqflags.h                      |    4 +-
> include/linux/sched.h                         |   27 +-
> include/uapi/linux/prctl.h                    |    3 +
> kernel/Kconfig.preempt                        |    6 +
> kernel/bpf/stackmap.c                         |    2 +-
> kernel/entry/common.c                         |   25 +-
> kernel/entry/kvm.c                            |   13 +
> kernel/fork.c                                 |    1 +
> kernel/irq_work.c                             |   18 +-
> kernel/printk/printk.c                        |    6 +-
> kernel/rcu/tree.c                             |    3 +-
> kernel/sched/Makefile                         |    1 +
> kernel/sched/core.c                           | 1135 ++++++++++++++++-
> kernel/sched/coretag.c                        |  468 +++++++
> kernel/sched/cpuacct.c                        |   12 +-
> kernel/sched/deadline.c                       |   34 +-
> kernel/sched/debug.c                          |    8 +-
> kernel/sched/fair.c                           |  272 ++--
> kernel/sched/idle.c                           |   24 +-
> kernel/sched/pelt.h                           |    2 +-
> kernel/sched/rt.c                             |   22 +-
> kernel/sched/sched.h                          |  302 ++++-
> kernel/sched/stop_task.c                      |   13 +-
> kernel/sched/topology.c                       |    4 +-
> kernel/sys.c                                  |    3 +
> kernel/time/tick-sched.c                      |    6 +-
> kernel/trace/bpf_trace.c                      |    2 +-
> tools/include/uapi/linux/prctl.h              |    3 +
> tools/testing/selftests/sched/.gitignore      |    1 +
> tools/testing/selftests/sched/Makefile        |   14 +
> tools/testing/selftests/sched/config          |    1 +
> .../testing/selftests/sched/test_coresched.c  |  840 ++++++++++++
> 41 files changed, 3437 insertions(+), 232 deletions(-)
> create mode 100644 Documentation/admin-guide/hw-vuln/core-scheduling.rst
> create mode 100644 kernel/sched/coretag.c
> create mode 100644 tools/testing/selftests/sched/.gitignore
> create mode 100644 tools/testing/selftests/sched/Makefile
> create mode 100644 tools/testing/selftests/sched/config
> create mode 100644 tools/testing/selftests/sched/test_coresched.c
> 
> --
> 2.29.0.rc1.297.gfa9743e501-goog
> 

  parent reply	other threads:[~2020-11-06 20:55 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-20  1:43 [PATCH v8 -tip 00/26] Core scheduling Joel Fernandes (Google)
2020-10-20  1:43 ` [PATCH v8 -tip 01/26] sched: Wrap rq::lock access Joel Fernandes (Google)
2020-10-20  1:43 ` [PATCH v8 -tip 02/26] sched: Introduce sched_class::pick_task() Joel Fernandes (Google)
2020-10-22  7:59   ` Li, Aubrey
2020-10-22 15:25     ` Joel Fernandes
2020-10-23  5:25       ` Li, Aubrey
2020-10-23 21:47         ` Joel Fernandes
2020-10-24  2:48           ` Li, Aubrey
2020-10-24 11:10             ` Vineeth Pillai
2020-10-24 12:27               ` Vineeth Pillai
2020-10-24 23:48                 ` Li, Aubrey
2020-10-26  9:01                 ` Peter Zijlstra
2020-10-27  3:17                   ` Li, Aubrey
2020-10-27 14:19                   ` Joel Fernandes
2020-10-27 15:23                     ` Joel Fernandes
2020-10-27 14:14                 ` Joel Fernandes
2020-10-20  1:43 ` [PATCH v8 -tip 03/26] sched: Core-wide rq->lock Joel Fernandes (Google)
2020-10-26 11:59   ` Peter Zijlstra
2020-10-27 16:27     ` Joel Fernandes
2020-10-20  1:43 ` [PATCH v8 -tip 04/26] sched/fair: Add a few assertions Joel Fernandes (Google)
2020-10-20  1:43 ` [PATCH v8 -tip 05/26] sched: Basic tracking of matching tasks Joel Fernandes (Google)
2020-10-20  1:43 ` [PATCH v8 -tip 06/26] sched: Add core wide task selection and scheduling Joel Fernandes (Google)
2020-10-23 13:51   ` Peter Zijlstra
2020-10-23 13:54     ` Peter Zijlstra
2020-10-23 17:57       ` Joel Fernandes
2020-10-23 19:26         ` Peter Zijlstra
2020-10-23 21:31           ` Joel Fernandes
2020-10-26  8:28             ` Peter Zijlstra
2020-10-27 16:58               ` Joel Fernandes
2020-10-26  9:31             ` Peter Zijlstra
2020-11-05 18:50               ` Joel Fernandes
2020-11-05 22:07                 ` Joel Fernandes
2020-10-23 15:05   ` Peter Zijlstra
2020-10-23 17:59     ` Joel Fernandes
2020-10-20  1:43 ` [PATCH v8 -tip 07/26] sched/fair: Fix forced idle sibling starvation corner case Joel Fernandes (Google)
2020-10-20  1:43 ` [PATCH v8 -tip 08/26] sched/fair: Snapshot the min_vruntime of CPUs on force idle Joel Fernandes (Google)
2020-10-26 12:47   ` Peter Zijlstra
2020-10-28 15:29     ` Joel Fernandes
2020-10-28 18:39     ` Joel Fernandes
2020-10-29 16:59     ` Joel Fernandes
2020-10-29 18:24     ` Joel Fernandes
2020-10-29 18:59       ` Peter Zijlstra
2020-10-30  2:36         ` Joel Fernandes
2020-10-30  2:42           ` Joel Fernandes
2020-10-30  8:41             ` Peter Zijlstra
2020-10-31 21:41               ` Joel Fernandes
2020-10-20  1:43 ` [PATCH v8 -tip 09/26] sched: Trivial forced-newidle balancer Joel Fernandes (Google)
2020-10-20  1:43 ` [PATCH v8 -tip 10/26] sched: migration changes for core scheduling Joel Fernandes (Google)
2020-10-20  1:43 ` [PATCH v8 -tip 11/26] irq_work: Cleanup Joel Fernandes (Google)
2020-10-20  1:43 ` [PATCH v8 -tip 12/26] arch/x86: Add a new TIF flag for untrusted tasks Joel Fernandes (Google)
2020-10-20  1:43 ` [PATCH v8 -tip 13/26] kernel/entry: Add support for core-wide protection of kernel-mode Joel Fernandes (Google)
2020-10-20  3:41   ` Randy Dunlap
2020-11-03  0:20     ` Joel Fernandes
2020-10-22  5:48   ` Li, Aubrey
2020-11-03  0:50     ` Joel Fernandes
2020-10-30 10:29   ` Alexandre Chartre
2020-11-03  1:20     ` Joel Fernandes
2020-11-06 16:57       ` Alexandre Chartre
2020-11-06 17:43         ` Joel Fernandes
2020-11-06 18:07           ` Alexandre Chartre
2020-11-10  9:35       ` Alexandre Chartre
2020-11-10 22:42         ` Joel Fernandes
2020-11-16 10:08           ` Alexandre Chartre
2020-11-16 14:50             ` Joel Fernandes
2020-11-16 15:43               ` Joel Fernandes
2020-10-20  1:43 ` [PATCH v8 -tip 14/26] entry/idle: Enter and exit kernel protection during idle entry and exit Joel Fernandes (Google)
2020-10-20  1:43 ` [PATCH v8 -tip 15/26] entry/kvm: Protect the kernel when entering from guest Joel Fernandes (Google)
2020-10-20  1:43 ` [PATCH v8 -tip 16/26] sched: cgroup tagging interface for core scheduling Joel Fernandes (Google)
2020-10-20  1:43 ` [PATCH v8 -tip 17/26] sched: Split the cookie and setup per-task cookie on fork Joel Fernandes (Google)
2020-11-04 22:30   ` chris hyser
2020-11-05 14:49     ` Joel Fernandes
2020-11-09 23:30     ` chris hyser
2020-10-20  1:43 ` [PATCH v8 -tip 18/26] sched: Add a per-thread core scheduling interface Joel Fernandes (Google)
2020-10-20  1:43 ` [PATCH v8 -tip 19/26] sched: Add a second-level tag for nested CGroup usecase Joel Fernandes (Google)
2020-10-31  0:42   ` Josh Don
2020-11-03  2:54     ` Joel Fernandes
     [not found]   ` <6c07e70d-52f2-69ff-e1fa-690cd2c97f3d@linux.intel.com>
2020-11-05 15:52     ` Joel Fernandes
2020-10-20  1:43 ` [PATCH v8 -tip 20/26] sched: Release references to the per-task cookie on exit Joel Fernandes (Google)
2020-11-04 21:50   ` chris hyser
2020-11-05 15:46     ` Joel Fernandes
2020-10-20  1:43 ` [PATCH v8 -tip 21/26] sched: Handle task addition to CGroup Joel Fernandes (Google)
2020-10-20  1:43 ` [PATCH v8 -tip 22/26] sched/debug: Add CGroup node for printing group cookie if SCHED_DEBUG Joel Fernandes (Google)
2020-10-20  1:43 ` [PATCH v8 -tip 23/26] kselftest: Add tests for core-sched interface Joel Fernandes (Google)
2020-10-30 22:20   ` [PATCH] sched: Change all 4 space tabs to actual tabs John B. Wyatt IV
2020-10-20  1:43 ` [PATCH v8 -tip 24/26] sched: Move core-scheduler interfacing code to a new file Joel Fernandes (Google)
2020-10-26  1:05   ` Li, Aubrey
2020-11-03  2:58     ` Joel Fernandes
2020-10-20  1:43 ` [PATCH v8 -tip 25/26] Documentation: Add core scheduling documentation Joel Fernandes (Google)
2020-10-20  3:36   ` Randy Dunlap
2020-11-12 16:11     ` Joel Fernandes
2020-10-20  1:43 ` [PATCH v8 -tip 26/26] sched: Debug bits Joel Fernandes (Google)
2020-10-30 13:26 ` [PATCH v8 -tip 00/26] Core scheduling Ning, Hongyu
2020-11-06  2:58   ` Li, Aubrey
2020-11-06 17:54     ` Joel Fernandes
2020-11-09  6:04       ` Li, Aubrey
2020-11-06 20:55 ` Joel Fernandes [this message]
2020-11-13  9:22   ` [RFT for v9] (Was Re: [PATCH v8 -tip 00/26] Core scheduling) Ning, Hongyu
2020-11-13 10:01     ` Ning, Hongyu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201106205506.GA3109656@google.com \
    --to=joel@joelfernandes.org \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=OWeisse@umich.edu \
    --cc=aaron.lwe@gmail.com \
    --cc=agata.gruza@intel.com \
    --cc=alexandre.chartre@oracle.com \
    --cc=antonio.gomez.iglesias@intel.com \
    --cc=aubrey.intel@gmail.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=benbjiang@tencent.com \
    --cc=chris.hyser@oracle.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=derkling@google.com \
    --cc=dfaggioli@suse.com \
    --cc=dhaval.giani@oracle.com \
    --cc=fweisbec@gmail.com \
    --cc=graf@amazon.com \
    --cc=hongyu.ning@linux.intel.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=jsbarnes@google.com \
    --cc=junaids@google.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vineeth@bitbyteword.org \
    --cc=viremana@linux.microsoft.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.