All of lore.kernel.org
 help / color / mirror / Atom feed
From: chris hyser <chris.hyser@oracle.com>
To: Joel Fernandes <joel@joelfernandes.org>,
	Nishanth Aravamudan <naravamudan@digitalocean.com>,
	JulienDesfossez@google.com, jdesfossez@digitalocean.com,
	Peter Zijlstra <peterz@infradead.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	mingo@kernel.org, tglx@linutronix.de, pjt@google.com,
	linux-kernel@vger.kernel.org, fweisbec@gmail.com,
	keescook@chromium.org, Phil Auld <pauld@redhat.com>,
	Aaron Lu <aaron.lwe@gmail.com>,
	Aubrey Li <aubrey.intel@gmail.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Joel Fernandes <joelaf@google.com>,
	vineethrp@gmail.com, Chen Yu <yu.c.chen@intel.com>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	dhaval.giani@gmail.com, paulmck@kernel.org, joshdon@google.com,
	xii@google.com, haoluo@google.com, bsegall@google.com
Subject: Re: [RFC] Design proposal for upstream core-scheduling interface
Date: Mon, 24 Aug 2020 17:42:28 -0400	[thread overview]
Message-ID: <cb5432d1-9909-1f16-5e26-ea77efbee713@oracle.com> (raw)
In-Reply-To: <6d25f0e8-9894-386e-7669-9ecbc176bd5b@oracle.com>



On 8/24/20 4:53 PM, chris hyser wrote:
> On 8/21/20 11:01 PM, Joel Fernandes wrote:
>> Hello!
>> Core-scheduling aims to allow making it safe for more than 1 task that trust
>> each other to safely share hyperthreads within a CPU core [1]. This results
>> in a performance improvement for workloads that can benefit from using
>> hyperthreading safely while limiting core-sharing when it is not safe.
>>
>> Currently no universally agreed set of interface exists and companies have
>> been hacking up their own interface to make use of the patches. This post
>> aims to list usecases which I got after talking to various people at Google
>> and Oracle. After which actual development of code to add interfaces can follow.
>>
>> The below text uses the terms cookie and tag interchangeably. Further, cookie
>> of 0 is assumed to indicate a trusted process - such as kernel threads or
>> system daemons. By default, if nothing is tagged then everything is
>> considered trusted since the scheduler assumes all tasks are a match for each
>> other.
>>
>> Usecase 1: Google's cloud group tags CGroups with a 32-bit integer. This
>> int32 is split into 2 parts, the color and the id. The color can only be set
>> by privileged processes and the id can be set by anyone. The CGroup structure
>> looks like:
>>
>>     A         B
>>    / \      / \ \
>>   C   D    E  F  G
>>
>> Here A and B are container CGroups for 2 jobs are assigned a color by a
>> privileged daemon. The job itself has more sub-CGroups within (for ex, B has
>> E, F and G). When these sub-CGroups are spawned, they inherit the color from
>> the parent. An unprivileged user can then set an id for the sub-CGroup
>> without the knowledge of the privileged daemon if it desires to add further
>> isolation. This setting of id can be an unprivileged operation because the
>> root daemon has already isolated A and B.
>>
>> Usecase 2: Chrome browser - tagging renderers. In Chrome, each tab opened
>> spawns a renderer. A renderer is a sandboxed process and it is assumed it
>> could run arbitrary code (Javascript etc). When a renderer is created, a
>> prctl call is made to tag the renderer. Every thread that is spawned by the
>> renderer is also tagged. Essentially this turns SMT off for the renderer, but
>> still gives a performance boost due to privileged system threads being able
>> to share a core. The tagging also forbids the renderer from sharing a core
>> with privileged system processes. In the future, we plan to allow threads to
>> share a core as well (especially once we get syscall-isolation upstreamed.
>> Patches were posted recently for the same [2]).
>>
>> Usecase 3: ChromeOS VMs - each vCPU thread that is created by the VMM is
>> tagged thus disallowing core sharing between the vCPU thread and any other
>> thread on the system. This is because such VMs may run arbitrary user code
>> and attack both the guest and the host systems sharing the core.
>>
>> Usecase 4: Oracle - Setting a sub-CGroup as trusted (cookie 0). Chris Hyser
>> talked to me on IRC that in a CGroup hierarcy, some CGroups should be allowed
>> to not have to share its parent's CGroup tag. In fact, it should be allowed to
>> untag the child CGroup if needed thus allowing them to share a core with
>> trusted tasks. Others have had similar requirements.
>>
>> Proposal for tagging
>> --------------------
>> We have to support both CGroup and non-CGroup users. CGroup may be overkill
>> for some and the CGroup v2 unified hierarchy may be too inflexible.
>> Regardless, we must support CGroup due its easy of use and existing users.
>>
>> For Usecase #1
>> ----------
>> Usecase #1 requires a 2-level tagging mechanism. I propose 2 new files
>> to the CPU controller:
>> - tag : a boolean (0/1). If set, this CGroup and all sub-CGroups will be
>>    tagged.  (In the kernel, the cookie will be derived from the pointer value
>>    of a ref-counted cookie object.). If reset, then the CGroup will inherit
>>    the parent CGroup's cookie if there is one.
>>
>> - color : The ref-counted object will be aligned say to a 256-byte boundary
>>    (for example), then the lower 8 bits of the pointer can be used to specify
>>    color. Together, the pointer with the color will form a cookie used by the
>>    scheduler.
>>
>> Note that if 2 CGroups belong to 2 different tagged hierarchies, then setting
>> their color to be the same does not imply that the 2 groups will share a
>> core. This is key.  Also, to support usecase #4, we could add a third tag
>> value -- 2, along with the usual 0 and 1 to suggest that the CGroup can share
>> a core with cookie-0 tasks (Chris Hyser feel free to add any more comments
>> here).
> 
> Let em think about this. This looks like it would support delegation of a cgroup subtree, which I suppose containers are 

s/em/me/


      reply	other threads:[~2020-08-24 21:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-22  3:01 [RFC] Design proposal for upstream core-scheduling interface Joel Fernandes
2020-08-24 11:32 ` Vineeth Pillai
2020-08-24 20:31   ` Dhaval Giani
2020-08-24 19:50 ` Dhaval Giani
2020-08-24 22:12   ` Joel Fernandes
2020-08-24 20:53 ` chris hyser
2020-08-24 21:42   ` chris hyser [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cb5432d1-9909-1f16-5e26-ea77efbee713@oracle.com \
    --to=chris.hyser@oracle.com \
    --cc=JulienDesfossez@google.com \
    --cc=aaron.lwe@gmail.com \
    --cc=aubrey.intel@gmail.com \
    --cc=bsegall@google.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=dhaval.giani@gmail.com \
    --cc=fweisbec@gmail.com \
    --cc=haoluo@google.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=joel@joelfernandes.org \
    --cc=joelaf@google.com \
    --cc=joshdon@google.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=valentin.schneider@arm.com \
    --cc=vineethrp@gmail.com \
    --cc=xii@google.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.