All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Huang, Kai" <kai.huang@intel.com>
To: "Christopherson,, Sean" <seanjc@google.com>
Cc: "linux-sgx@vger.kernel.org" <linux-sgx@vger.kernel.org>,
	"Chatre, Reinette" <reinette.chatre@intel.com>,
	"jarkko@kernel.org" <jarkko@kernel.org>,
	"Dhanraj, Vijay" <vijay.dhanraj@intel.com>,
	"haitao.huang@linux.intel.com" <haitao.huang@linux.intel.com>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>
Subject: Re: [RFC PATCH v4 2/4] x86/sgx: Implement support for MADV_WILLNEED
Date: Mon, 19 Jun 2023 11:17:45 +0000	[thread overview]
Message-ID: <fc214963a01ca975d6de2991b63c93f04114c562.camel@intel.com> (raw)
In-Reply-To: <ZIzcrmU1RA7oEeQ7@google.com>

On Fri, 2023-06-16 at 15:05 -0700, Sean Christopherson wrote:
> On Fri, Jun 16, 2023, Kai Huang wrote:
> > On Tue, 2023-06-06 at 04:11 +0000, Huang, Kai wrote:
> > > On Fri, 2023-05-26 at 19:32 -0500, Haitao Huang wrote:
> > > > Hi Kai, Jarkko and Dave
> > > > 
> > > > On Thu, 09 Mar 2023 05:31:29 -0600, Huang, Kai <kai.huang@intel.com> wrote:
> > > > > 
> > > > > So I am still a little bit confused about where does "SGX driver uses
> > > > > MAP_ANONYMOUS semantics for fd-based mmap()" come from.
> > > > > 
> > > > > Anyway, we certainly don't want to break userspace.  However, IIUC,
> > > > > even  from now on we change the driver to depend on userspace to pass
> > > > > the correct  pgoff in mmap(), this won't break userspace, because old
> > > > > userspace which doesn't  use fadvice() and pgoff actually doesn't
> > > > > matter.  For new userspace which  uses fadvice(), it needs to pass the
> > > > > correct pgoff.
> > > > > 
> > > > > I am not saying we should do this, but it doesn't seem we can break  
> > > > > userspace?
> > > > > 
> > > > 
> > > > Sorry for delayed update but I thought about this more and likely to  
> > > > propose a new EAUG ioctl for this and for enabling SGX-CET shadow stack  
> > > > pages. But regardless, I'd like to wrap up this discussion to just clarify  
> > > > this anonymous semantics design in documentation so people won't get  
> > > > confused in future.
> > > > 
> > > > I think we all agree to keep this semantics so no user space would need  
> > > > specify 'offset' for mmap with enclave fd. And here is my proposed  
> > > > documentation changes.
> > > > 
> > > > --- a/Documentation/x86/sgx.rst
> > > > +++ b/Documentation/x86/sgx.rst
> > > > @@ -100,6 +100,23 @@ pages and establish enclave page permissions.
> > > >                  sgx_ioc_enclave_init
> > > >                  sgx_ioc_enclave_provision
> > > > 
> > > > +Enclave memory mapping
> > > > +----------------------
> > > > +
> > > > +A file descriptor created from opening **/dev/sgx_enclave** represents an
> > > > +enclave object. The mmap() syscall with enclave file descriptors does not
> > > > +support non-zero value for the 'offset' parameter.
> > > 
> > > I think we all need to understand better why SGX driver requires anonymous
> > > semantics mmap() against /dev/sgx_enclave, and as a result of that, requires
> > > mmap() to pass  0 as pgoff (which looks wasn't even discussed when upstreaming
> > > the driver).
> > > 
> > > I'll do some investigation and try to summerize and report back.  Thanks.
> > > 
> > 
> > + Sean.
> > 
> > Hi Sean,
> > 
> > If you see this and have time, please help to comment.  Thanks.
> > 
> > I've spent plenty of time to look into the discussions around v20/v28/v29 and
> > roughly v38/v39 to find out why SGX driver requires MAP_ANONYMOUS semantics,
> > AFAICT it turns out it was never explicitly discussed.  Or perhaps the
> > "MAP_ANONYMOUS semantics" actually just means "MAP_SHARED | MAP_FIXED + pgoff is
> > ignored", and everyone believed there was no need to explain what does "SGX
> > driver uses MAP_ANONYMOUS semantics for mmap()" mean.
> > 
> > Details:
> > 
> > The v20 story (that I spent most of my time on) mentioned by Haitao was actually
> > about how to make SGX and LSM work together but not related to SGX driver mmap()
> > semantic. 
> > 
> > Also Haitao mentioned "the use of anonymous mapping can be traced back to v29"
> > but this actually was just about how to use the first mmap() to "reserve the
> > ELRANGE before ECREATE".  It wasn't about to changing mmap(/dev/sgx_enclave)
> > semantics at all.
> > 
> > Sean actually suggested to explicitly document "how does SGX driver recommend
> > the user to reserve ELRANGE", but Jarkko didn't think we should do:
> > 
> > https://lore.kernel.org/linux-sgx/20200528111910.GB1666298@linux.intel.com/
> > 
> > which is a pity IMHO, because I believe for anyone, naturally, the first
> > instinct to reserve ELRANGE is to use mmap(/dev/sgx_enclave) but not
> > mmap(MAP_ANONYMOUS).  If we suggest user to use the latter then there must be
> > some reason and IMHO such suggestion and reason should be documented.
> 
> Ya, the use of mmap() on fd=-1 is done in order to find an available, naturally
> aligned chunk of virtual memory[*].  IIRC, there was a (very brief) discussion
> about enhancing .mmap() so that userspace wouldn't be responsible for doing the
> alignment, but I think we didn't pursue that idea very because we had bigger fish
> to fry.

Thanks for your time :)

Yeah I noticed in v20, the sgx_get_unmapped_area() internally would allocate
2*len and then manually return the aligned address.  This wouldn't work for
mmap()ing the small chunks with an offset smaller than the size 

> 
> But I think this is unrelated to what you really care about, e.g. a userspace that
> tightly controls its virtual memory could hardcode enclave placement (IIRC graphene
> did/does do that).  I.e. the alignment issue is a completely different discussion.
> 
> [*] https://lore.kernel.org/all/20190522153836.GA24833@linux.intel.com

Right.  For the sake of making progress of this Haitao's series, we want to
understand where did "SGX driver requires mmap(/dev/sgx_enclave) to use
MAP_ANONYMOUS semantics" come from.

But for this topic (how to reserve ELRANGE).  I am not sure the reason that we
prefer mmap(MAP_ANONYMOUS) still stands.  For instance, the discussion in your
above link was old implementation/assumption -- e.g., MAP_FIXED wasn't even
used/supported for mmap()ing enclave chunks.

So I am wondering now whether we suggest user to use mmap(MAP_ANONYMOUS) to get
a size-aligned address still stands?  The current SGX driver's
get_unmapped_area:

static unsigned long sgx_get_unmapped_area(struct file *file,                  
                                           unsigned long addr,
                                           unsigned long len,                  
                                           unsigned long pgoff,                
                                           unsigned long flags)                
{       
        if ((flags & MAP_TYPE) == MAP_PRIVATE)                                 
                return -EINVAL;
        
        if (flags & MAP_FIXED)
                return addr;
                        
        return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);  
}               
 

As you can see if we don't pass MAP_FIXED, which is the case for mmap() to
reserve ELRANGE, looks there's no difference between mmap(MAP_ANONYMOUS) and
mmap(/dev/sgx_enclave)?

> 
> > Also, if I am not missing something, the current driver doesn't prevent using
> > mmap(/dev/sgx_enclave, PROT_NONE) to reserve ELANGE.  So having clear
> > documentation is helpful for SGX users to choose how to write their apps.
> > 
> > Go back to the "SGX driver uses MAP_ANONYMOUS semantics for mmap()", I believe
> > this just is "SGX driver requires mmap() after ECREATE/EINIT to use MAP_SHARED |
> > MAP_FIXED and pgoff is ignored".  Or more precisely, pgoff is "not _used_ by SGX
> > driver".
> > 
> > In fact, I think "pgoff is ignored/not used" is technically wrong for enclave.
> 
> Yeah, it's wrong.  It works because, until now apparently, there was never a reason
> a need to care about the file offset since ELRANGE base always provided the necessary
> information.  It wasn't a deliberate design choice, we simply overlooked that detail
> (unless Jarkko was holding back on me all those years ago :-) ).

Yeah.  But I am not sure whether there are corner cases that we can have
potential bug around here, since those VMA's are not aligned between the core-mm
and the driver.

I haven't thought carefully though.  I guess from core-mm's perspective there
are multi-VMAs mapping to the same/overlapping part of the enclave, while the
SGX driver treats them as mapping to different non-overlapping parts.  Perhaps
it's OK as long as SGX driver doesn't use vma->pgoff to do something???

>  
> > IMHO we should stop saying SGX driver uses MAP_ANONYMOUS semantics, because the
> > truth is it just takes advantage of MAP_FIXED and carelessly ignores the pgoff
> > due to the nature of SGX w/o considering from core-MM's perspective.
> >   
> > And IMHO there are two ways to fix:
> > 
> > 1) From now on, we ask SGX apps to use the correct pgoff in their
> > mmap(/dev/sgx_enclave).  This shouldn't impact the existing SGX apps because SGX
> > driver doesn't use vma->pgoff anyway.
> 
> Heh, just "asking" won't help.  And breaking userspace, i.e. requiring all apps
> to update, will just get you yelled at :-)

We can document properly and assume the new apps will follow.  As I mentioned
above, the old apps, which doesn't/shouldn't have been using fadvice() anyway,
doesn't need to be changed, i.e.,  they should continue to work. :)

No?

> 
> > 2) For the sake of avoiding having to ask existing SGX apps to change their
> > mmap()s, we _officially_ say that userspace isn't required to pass a correct
> > pgoff to mmap() (i.e. passing 0 as did in existing apps), but the kernel should
> > fix the vma->pgoff internally.
> 
> I recommend you don't do this.  Overwriting vm_pgoff would probably work, but it's
> going to make a flawed API even messier.  E.g. you'll have painted SGX into a
> corner if you ever want to decouple vma->start/end from encl->base.  I highly
> doubt that will ever happen given how ELRANGE works, but I don't think a hack-a-fix
> buys you enough to justify any more ugliness.

I also found it's not feasible to cleanly fix the pgoff from userspace (I
thought the pgoff could be fixed at very early stage of do_mmap() so everything
later just worked, but it's not the case).  In do_mmap() the pgoff from
userspace is already used to VMA merging/splitting etc before creating the
target VMA.

Hmm.. Now I think we shouldn't silently fix pgoff in SGX driver as it may
surprise the core-mm later because the core-mm has already done some job around
VMAs before vma->pgoff is changed.

> 
> > I do prefer option 2) because it has no harm to anyone: 1) No changes to
> > existing SGX apps; 2) It aligns with the core-MM to so all existing mmap()
> > operations should work as expected, meaning no surprise; 3) And this patchset
> > from Haitao to use fadvice() to accelerate EAUG flow just works.
> 
> I think you can have your cake and eat it too.  IIUC, the goal is to get fadvise()
> working, and to do that without creating a truly heinous uAPI, you need an accurate
> vm_pgoff.  So, use a carrot and stick approach.
> 
> If userspace properly defines vm_pgoff during mmap() and doesn't specify MAP_ANONYMOUS,
> then they get to use fadvise() (give 'em a carrot).  But if *any* mmap() on the
> enclave doesn't followo those rules, mark the enclave as tainted or whatever and
> disallow fadvise() (hit 'em with a stick).

If we are supposed to use mmap(MAP_ANONYMOUS) to reserve ELRANGE, then I think
userspace will just use MAP_FIXED for all mmap()s to /dev/sgx_enclave.  To
detect whether a VMA has the correct pgoff, we need to somehow mark it in
sgx_mmap(), but currently we don't have facility to do that.  Even we can do,
the core madvice() -> vfs_fadvice() cannot be aware of such VMA either, and in
sgx_fadvice() we may have problem locating the correct VMA to do EAUG (because
the vfs_fadvice()  uses  0 pgoff to calculate the file offset).

So, now I think we should just let userspace itself to pass the correct pgoff
when it wants to use fadvice(), which is the option 1) I mentioned above.  The
kernel doesn't/shouldn't need to fix vma->pgoff.

In another words, I think:

 - For the old apps, doesn't matter, continue to work;
 - For new apps which want to use fadvice() to accelerate EAUG, it should pass 
   the correct pgoff in mmap(/dev/sgx_enclave) even with  MAP_FIXED.

And we don't say "SGX driver uses MAP_ANONYMOUS semantics for
mmap(/dev/sgx_enclave) any more".

Does this make sense?


  reply	other threads:[~2023-06-19 11:17 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-28  4:55 [RFC PATCH v4 0/4] x86/sgx: implement support for MADV_WILLNEED Haitao Huang
2023-01-28  4:55 ` [RFC PATCH v4 1/4] x86/sgx: Export sgx_encl_eaug_page Haitao Huang
2023-01-28  4:55   ` [RFC PATCH v4 2/4] x86/sgx: Implement support for MADV_WILLNEED Haitao Huang
2023-01-28  4:55     ` [RFC PATCH v4 3/4] selftests/sgx: add len field for EACCEPT op Haitao Huang
2023-01-28  4:55       ` [RFC PATCH v4 4/4] selftests/sgx: Add test for madvise(..., WILLNEED) Haitao Huang
2023-02-07 23:30         ` Jarkko Sakkinen
2023-02-15  2:38         ` Huang, Kai
2023-02-15  4:42           ` Haitao Huang
2023-02-15  8:46             ` Huang, Kai
2023-02-17 22:29               ` jarkko
2023-02-07 23:29       ` [RFC PATCH v4 3/4] selftests/sgx: add len field for EACCEPT op Jarkko Sakkinen
2023-02-07 23:28     ` [RFC PATCH v4 2/4] x86/sgx: Implement support for MADV_WILLNEED Jarkko Sakkinen
2023-02-14  9:47     ` Huang, Kai
2023-02-14 19:18       ` Haitao Huang
2023-02-14 20:54         ` Huang, Kai
2023-02-14 21:42           ` Haitao Huang
2023-02-14 22:36             ` Huang, Kai
2023-02-15  3:59               ` Haitao Huang
2023-02-15  8:51                 ` Huang, Kai
2023-02-15 15:42                   ` Haitao Huang
2023-02-16  7:53                     ` Huang, Kai
2023-02-16 17:12                       ` Haitao Huang
2023-02-17 22:32                     ` jarkko
2023-02-17 23:03                       ` Haitao Huang
2023-02-21 22:10                         ` jarkko
2023-02-22  1:37                           ` Haitao Huang
2023-03-07 23:32                             ` Huang, Kai
2023-03-09  0:50                               ` Haitao Huang
2023-03-09 11:31                                 ` Huang, Kai
2023-03-14 14:54                                   ` Haitao Huang
2023-03-19 13:26                                     ` jarkko
2023-03-20  9:36                                       ` Huang, Kai
2023-03-20 14:04                                         ` jarkko
2023-05-27  0:32                                   ` Haitao Huang
2023-06-06  4:11                                     ` Huang, Kai
2023-06-07 16:59                                       ` Haitao Huang
2023-06-16  3:49                                       ` Huang, Kai
2023-06-16 22:05                                         ` Sean Christopherson
2023-06-19 11:17                                           ` Huang, Kai [this message]
2023-06-22 22:01                                             ` Sean Christopherson
2023-06-22 23:21                                               ` Huang, Kai
2023-06-26 22:28                                                 ` Sean Christopherson
2023-06-27 11:43                                                   ` Huang, Kai
2023-06-27 14:50                                                     ` Sean Christopherson
2023-06-28  9:37                                                       ` Huang, Kai
2023-06-28 14:57                                                         ` Sean Christopherson
2023-06-29  3:10                                                           ` Huang, Kai
2023-06-29 14:23                                                             ` Sean Christopherson
2023-06-29 23:29                                                               ` Huang, Kai
2023-06-30  0:14                                                                 ` Sean Christopherson
2023-06-30  0:56                                                                   ` Huang, Kai
2023-06-30  1:54                                                                 ` Jarkko Sakkinen
2023-06-30  1:57                                                                   ` Jarkko Sakkinen
2023-06-30  4:26                                                                     ` Huang, Kai
2023-06-30  9:35                                                                       ` Jarkko Sakkinen
2023-03-12  1:25                               ` jarkko
2023-03-12 22:25                                 ` Huang, Kai
2023-02-17 22:07           ` jarkko
2023-02-17 21:50       ` jarkko
2023-02-07 23:26   ` [RFC PATCH v4 1/4] x86/sgx: Export sgx_encl_eaug_page Jarkko Sakkinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fc214963a01ca975d6de2991b63c93f04114c562.camel@intel.com \
    --to=kai.huang@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=haitao.huang@linux.intel.com \
    --cc=jarkko@kernel.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=reinette.chatre@intel.com \
    --cc=seanjc@google.com \
    --cc=vijay.dhanraj@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.