linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Haitao Huang" <haitao.huang@linux.intel.com>
To: "Sean Christopherson" <seanjc@google.com>
Cc: "Kai Huang" <kai.huang@intel.com>,
	"Bo Zhang" <zhanb@microsoft.com>,
	"linux-sgx@vger.kernel.org" <linux-sgx@vger.kernel.org>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	"yangjie@microsoft.com" <yangjie@microsoft.com>,
	"Zhiquan1 Li" <zhiquan1.li@intel.com>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"tj@kernel.org" <tj@kernel.org>,
	"anakrish@microsoft.com" <anakrish@microsoft.com>,
	"jarkko@kernel.org" <jarkko@kernel.org>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"mikko.ylinen@linux.intel.com" <mikko.ylinen@linux.intel.com>,
	"Sohil Mehta" <sohil.mehta@intel.com>,
	"bp@alien8.de" <bp@alien8.de>, "x86@kernel.org" <x86@kernel.org>,
	"kristen@linux.intel.com" <kristen@linux.intel.com>
Subject: Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC
Date: Wed, 11 Oct 2023 10:02:25 -0500	[thread overview]
Message-ID: <op.2cnn2bplwjvjmi@hhuan26-mobl.amr.corp.intel.com> (raw)
In-Reply-To: <ZSXl1VXTM0c8qpZj@google.com>

On Tue, 10 Oct 2023 19:01:25 -0500, Sean Christopherson  
<seanjc@google.com> wrote:

> On Tue, Oct 10, 2023, Haitao Huang wrote:
>> On Mon, 09 Oct 2023 21:23:12 -0500, Huang, Kai <kai.huang@intel.com>  
>> wrote:
>>
>> > On Mon, 2023-10-09 at 20:42 -0500, Haitao Huang wrote:
>> > > Hi Sean
>> > >
>> > > On Mon, 09 Oct 2023 19:23:04 -0500, Sean Christopherson
>> > > <seanjc@google.com> wrote:
>> > > > I can see userspace wanting to explicitly terminate the VM  
>> instead of
>> > > > "silently"
>> > > > the VM's enclaves, but that seems like it should be a knob in the
>> > > > virtual EPC
>> > > > code.
>> > >
>> > > If my understanding above is correct and understanding your  
>> statement
>> > > above correctly, then don't see we really need separate knob for  
>> vEPC
>> > > code. Reaching a cgroup limit by a running guest (assuming dynamic
>> > > allocation implemented) should not translate automatically killing
>> > > the VM.
>> > > Instead, it's user space job to work with guest to handle allocation
>> > > failure. Guest could page and kill enclaves.
>> > >
>> >
>> > IIUC Sean was talking about changing misc.max _after_ you launch SGX  
>> VMs:
>> >
>> > 1) misc.max = 100M
>> > 2) Launch VMs with total virtual EPC size = 100M	<- success
>> > 3) misc.max = 50M
>> >
>> > 3) will also succeed, but nothing will happen, the VMs will be still
>> > holding 100M EPC.
>> >
>> > You need to somehow track virtual EPC and kill VM instead.
>> >
>> > (or somehow fail to do 3) if it is also an acceptable option.)
>> >
>> Thanks for explaining it.
>>
>> There is an error code to return from max_write. I can add that too to  
>> the
>> callback definition and fail it when it can't be enforced for any  
>> reason.
>> Would like some community feedback if this is acceptable though.
>
> That likely isn't acceptable.  E.g. create a cgroup with both a host  
> enclave and
> virtual EPC, set the hard limit to 100MiB.  Virtual EPC consumes 50MiB,  
> and the
> host enclave consumes 50MiB.  Userspace lowers the limit to 49MiB.  The  
> cgroup
> code would reclaim all of the enclave's reclaimable EPC, and then kill  
> the enclave
> because it's still over the limit.  And then fail the max_write because  
> the cgroup
> is *still* over the limit.  So in addition to burning a lot of cycles,  
> from
> userspace's perspective its enclave was killed for no reason, as the new  
> limit
> wasn't actually set.

I was thinking before reclaiming enclave pages, if we know the untracked  
vepc pages (current-tracked) is larger than limit, we just return error  
without enforcing the limit. That way user also knows something is wrong.

But I get that we want to be able to kill VMs to enforce the newer lower  
limit. I assume we can't optimize efficiency/priority of killing: enclave  
pages will be reclaimed first no matter what just because they are  
reclaimable; no good rules to choose victim between enclave and VMs in  
your example so enclaves could be killed still before VMs.

>> I think to solve it ultimately, we need be able to adjust 'capacity' of  
>> VMs
>> not to just kill them, which is basically the same as dynamic allocation
>> support for VMs (being able to increase/decrease epc size when it is
>> running). For now, we only have static allocation so max can't be  
>> enforced
>> once it is launched.
>
> No, reclaiming virtual EPC is not a requirement.  VMM EPC  
> oversubscription is
> insanely complex, and I highly doubt any users actually want to  
> oversubcribe VMs.
>
> There are use cases for cgroups beyond oversubscribing/swapping, e.g.  
> privileged
> userspace may set limits on a container to ensure the container doesn't  
> *accidentally*
> consume more EPC than it was allotted, e.g. due to a configuration bug  
> that created
> a VM with more EPC than it was supposed to have.
>
> My comments on virtual EPC vs. cgroups is much more about having sane,  
> well-defined
> behavior, not about saying the kernel actually needs to support  
> oversubscribing EPC
> for KVM guests.

So here are the steps I see now, let me know if this is aligned with your  
thinking or I'm off track.

0) when all left are enclave VA, SECS pages and VEPC in a cgroup, the OOM  
kill process starts.
1) The cgroup identifies a victim vepc for OOM kill(this could be before  
or after enclaves being killed), sets a new flag VEPC_NO_MEMORY in the  
vepc.
2) call sgx_vepc_remove_all(), ignore failure counts returned. This will  
perform best effort to eremove all normal pages used by the vepc.
3) Guest may trigger fault again, return SIGBUS in sgx_vepc_fault when  
VEPC_NO_MEMORY is set. Do the same for mmap.
4) That would eventually cause sgx_vepc_release() before VM dies or killed  
by user space, which does the final cleanup.

Q: should we reset VEPC_NO_MEMORY flag if #4 never happens and cgroup  
usage is below limit? I suppose we can do it, but not sure it is sane  
because VM would try to load as much pages as configured originally.

I'm still thinking about using one unreclaimable list, justification is  
simplicity and lack of rules to select items from separate lists, but may  
change my mind if I found it's inconvenient.

Not sure how age/youngness can be accounted for guest pages though Kai  
indicated we don't need that for first version. So I assume we can deal  
with it later and add separate list for vEPC if needed
for that reason.

Thanks a lot for your input.
Haitao

  reply	other threads:[~2023-10-11 15:02 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-23  3:06 [PATCH v5 00/18] Add Cgroup support for SGX EPC memory Haitao Huang
2023-09-23  3:06 ` [PATCH v5 01/18] cgroup/misc: Add per resource callbacks for CSS events Haitao Huang
2023-09-25 17:09   ` Jarkko Sakkinen
2023-09-26  3:04     ` Haitao Huang
2023-09-26 13:10       ` Jarkko Sakkinen
2023-09-26 13:13         ` Jarkko Sakkinen
2023-09-27  1:56           ` Haitao Huang
2023-10-02 22:47             ` Jarkko Sakkinen
2023-10-02 22:55               ` Jarkko Sakkinen
2023-10-04 15:45                 ` Haitao Huang
2023-10-04 17:18                   ` Tejun Heo
2023-09-27  9:20   ` Huang, Kai
2023-10-03 14:29     ` Haitao Huang
2023-10-17 18:55   ` Michal Koutný
2023-09-23  3:06 ` [PATCH v5 02/18] cgroup/misc: Add SGX EPC resource type and export APIs for SGX driver Haitao Huang
2023-09-25 18:50   ` Tejun Heo
2023-09-28  3:59   ` Huang, Kai
2023-10-03  7:00     ` Haitao Huang
2023-10-03 19:33       ` Huang, Kai
2023-09-23  3:06 ` [PATCH v5 03/18] x86/sgx: Add sgx_epc_lru_lists to encapsulate LRU lists Haitao Huang
2023-09-23  3:06 ` [PATCH v5 04/18] x86/sgx: Use sgx_epc_lru_lists for existing active page list Haitao Huang
2023-09-23  3:06 ` [PATCH v5 05/18] x86/sgx: Store reclaimable EPC pages in sgx_epc_lru_lists Haitao Huang
2023-09-27 10:14   ` Huang, Kai
2023-09-23  3:06 ` [PATCH v5 06/18] x86/sgx: Introduce EPC page states Haitao Huang
2023-09-25 17:11   ` Jarkko Sakkinen
2023-09-27 10:28   ` Huang, Kai
2023-10-03  4:49     ` Haitao Huang
2023-10-03 20:03       ` Huang, Kai
2023-10-04 15:24         ` Haitao Huang
2023-10-04 21:05           ` Huang, Kai
2023-09-23  3:06 ` [PATCH v5 07/18] x86/sgx: Introduce RECLAIM_IN_PROGRESS state Haitao Huang
2023-09-25 17:13   ` Jarkko Sakkinen
2023-09-27 10:42   ` Huang, Kai
2023-09-23  3:06 ` [PATCH v5 08/18] x86/sgx: Use a list to track to-be-reclaimed pages Haitao Huang
2023-09-28  9:28   ` Huang, Kai
2023-10-03  5:09     ` Haitao Huang
2023-09-23  3:06 ` [PATCH v5 09/18] x86/sgx: Store struct sgx_encl when allocating new VA pages Haitao Huang
2023-09-27 11:14   ` Huang, Kai
2023-09-27 15:35     ` Haitao Huang
2023-09-27 21:21       ` Huang, Kai
2023-09-29 15:06         ` Haitao Huang
2023-10-02 11:05           ` Huang, Kai
2023-09-27 11:35   ` Huang, Kai
2023-10-03  6:45     ` Haitao Huang
2023-10-03 20:07       ` Huang, Kai
2023-10-04 15:03         ` Haitao Huang
2023-10-04 21:13           ` Huang, Kai
2023-10-05  4:22             ` Haitao Huang
2023-10-05  6:49               ` Huang, Kai
2023-09-23  3:06 ` [PATCH v5 10/18] x86/sgx: Add EPC page flags to identify owner types Haitao Huang
2023-09-23  3:06 ` [PATCH v5 11/18] x86/sgx: store unreclaimable pages in LRU lists Haitao Huang
2023-09-27 11:57   ` Huang, Kai
2023-10-03  5:42     ` Haitao Huang
2023-09-28  9:41   ` Huang, Kai
2023-10-03  5:15     ` Haitao Huang
2023-10-03 20:12       ` Huang, Kai
2023-09-23  3:06 ` [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC Haitao Huang
2023-10-09 23:45   ` Huang, Kai
2023-10-10  0:23     ` Sean Christopherson
2023-10-10  0:50       ` Huang, Kai
2023-10-10  1:34         ` Huang, Kai
2023-10-10 16:49           ` Haitao Huang
2023-10-11  0:51             ` Huang, Kai
2023-10-12 13:27               ` Haitao Huang
2023-10-16 10:57                 ` Huang, Kai
2023-10-16 19:52                   ` Haitao Huang
2023-10-16 21:09                     ` Huang, Kai
2023-10-17  0:10                       ` Haitao Huang
2023-10-17  1:34                         ` Huang, Kai
2023-10-17 12:58                           ` Haitao Huang
2023-10-17 18:54                             ` Michal Koutný
2023-10-17 19:13                               ` Michal Koutný
2023-10-18  4:39                                 ` Haitao Huang
2023-10-18  4:37                               ` Haitao Huang
2023-10-18 13:55                                 ` Dave Hansen
2023-10-18 15:26                                   ` Haitao Huang
2023-10-18 15:37                                     ` Dave Hansen
2023-10-18 15:52                                       ` Michal Koutný
2023-10-18 16:25                                         ` Haitao Huang
2023-10-16 21:32                     ` Sean Christopherson
2023-10-17  0:09                       ` Haitao Huang
2023-10-17 15:43                         ` Sean Christopherson
2023-10-17 11:49                       ` Mikko Ylinen
2023-10-11  1:14             ` Huang, Kai
2023-10-16 11:02               ` Huang, Kai
2023-10-10  1:42       ` Haitao Huang
2023-10-10  2:23         ` Huang, Kai
2023-10-10 13:26           ` Haitao Huang
2023-10-11  0:01             ` Sean Christopherson
2023-10-11 15:02               ` Haitao Huang [this message]
2023-10-10  1:04     ` Haitao Huang
2023-10-10  1:18       ` Huang, Kai
2023-10-10  1:38         ` Haitao Huang
2023-10-10  2:12           ` Huang, Kai
2023-10-10 17:05             ` Haitao Huang
2023-10-11  0:31               ` Huang, Kai
2023-10-11 16:04                 ` Haitao Huang
2023-09-23  3:06 ` [PATCH v5 13/18] x86/sgx: Expose sgx_reclaim_pages() for use by EPC cgroup Haitao Huang
2023-10-05 12:24   ` Huang, Kai
2023-10-05 19:23     ` Haitao Huang
2023-10-05 20:25       ` Huang, Kai
2023-09-23  3:06 ` [PATCH v5 14/18] x86/sgx: Add helper to grab pages from an arbitrary EPC LRU Haitao Huang
2023-09-23  3:06 ` [PATCH v5 15/18] x86/sgx: Prepare for multiple LRUs Haitao Huang
2023-10-05 12:30   ` Huang, Kai
2023-10-05 19:33     ` Haitao Huang
2023-10-05 20:38       ` Huang, Kai
2023-09-23  3:06 ` [PATCH v5 16/18] x86/sgx: Limit process EPC usage with misc cgroup controller Haitao Huang
2023-09-25 17:15   ` Jarkko Sakkinen
2023-10-05 21:01   ` Huang, Kai
2023-10-10  0:12   ` Huang, Kai
2023-10-10  0:16   ` Huang, Kai
2023-10-10  0:26   ` Huang, Kai
2023-10-22 18:26     ` Haitao Huang
2023-10-10  9:19   ` Huang, Kai
2023-10-10  9:32   ` Huang, Kai
2023-10-17 18:54   ` Michal Koutný
2023-10-19 16:05     ` Haitao Huang
2023-09-23  3:06 ` [PATCH v5 17/18] Docs/x86/sgx: Add description for cgroup support Haitao Huang
2023-09-23  3:06 ` [PATCH v5 18/18] selftests/sgx: Add scripts for EPC cgroup testing Haitao Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=op.2cnn2bplwjvjmi@hhuan26-mobl.amr.corp.intel.com \
    --to=haitao.huang@linux.intel.com \
    --cc=anakrish@microsoft.com \
    --cc=bp@alien8.de \
    --cc=cgroups@vger.kernel.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jarkko@kernel.org \
    --cc=kai.huang@intel.com \
    --cc=kristen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=mikko.ylinen@linux.intel.com \
    --cc=mingo@redhat.com \
    --cc=seanjc@google.com \
    --cc=sohil.mehta@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=x86@kernel.org \
    --cc=yangjie@microsoft.com \
    --cc=zhanb@microsoft.com \
    --cc=zhiquan1.li@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).