linux-sgx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Huang, Kai" <kai.huang@intel.com>
To: "Christopherson,, Sean" <seanjc@google.com>
Cc: "linux-sgx@vger.kernel.org" <linux-sgx@vger.kernel.org>,
	"Chatre, Reinette" <reinette.chatre@intel.com>,
	"jarkko@kernel.org" <jarkko@kernel.org>,
	"Dhanraj, Vijay" <vijay.dhanraj@intel.com>,
	"haitao.huang@linux.intel.com" <haitao.huang@linux.intel.com>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>
Subject: Re: [RFC PATCH v4 2/4] x86/sgx: Implement support for MADV_WILLNEED
Date: Thu, 22 Jun 2023 23:21:59 +0000	[thread overview]
Message-ID: <66f22efc5a920a805d6863dcc152c3c12e8fb6fb.camel@intel.com> (raw)
In-Reply-To: <ZJTEpH2yYtHmTRWr@google.com>

On Thu, 2023-06-22 at 15:01 -0700, Sean Christopherson wrote:
> On Mon, Jun 19, 2023, Kai Huang wrote:
> > On Fri, 2023-06-16 at 15:05 -0700, Sean Christopherson wrote:
> > > But I think this is unrelated to what you really care about, e.g. a userspace that
> > > tightly controls its virtual memory could hardcode enclave placement (IIRC graphene
> > > did/does do that).  I.e. the alignment issue is a completely different discussion.
> > > 
> > > [*] https://lore.kernel.org/all/20190522153836.GA24833@linux.intel.com
> > 
> > Right.  For the sake of making progress of this Haitao's series, we want to
> > understand where did "SGX driver requires mmap(/dev/sgx_enclave) to use
> > MAP_ANONYMOUS semantics" come from.
> > 
> > But for this topic (how to reserve ELRANGE).  I am not sure the reason that we
> > prefer mmap(MAP_ANONYMOUS) still stands.  For instance, the discussion in your
> > above link was old implementation/assumption -- e.g., MAP_FIXED wasn't even
> > used/supported for mmap()ing enclave chunks.
> > 
> > So I am wondering now whether we suggest user to use mmap(MAP_ANONYMOUS) to get
> > a size-aligned address still stands?  The current SGX driver's
> > get_unmapped_area:
> > 
> > static unsigned long sgx_get_unmapped_area(struct file *file,                  
> >                                            unsigned long addr,
> >                                            unsigned long len,                  
> >                                            unsigned long pgoff,                
> >                                            unsigned long flags)                
> > {       
> >         if ((flags & MAP_TYPE) == MAP_PRIVATE)                                 
> >                 return -EINVAL;
> >         
> >         if (flags & MAP_FIXED)
> >                 return addr;
> >                         
> >         return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);  
> > }               
> >  
> > 
> > As you can see if we don't pass MAP_FIXED, which is the case for mmap() to
> > reserve ELRANGE, looks there's no difference between mmap(MAP_ANONYMOUS) and
> > mmap(/dev/sgx_enclave)?
> 
> In practice, there's *probably* no significant difference.  Using an anonymous
> mapping is advantageous because there's no additional overhead, e.g. for locking
> the file, it can be done in advance of actually opening /dev/sgx_enclave, helps
> document (in userspace) that the code isn't actually creating an enclave, just
> finding a naturally aligned area in virtual memory (which isn't SGX specific), etc.

Sure, and I agree this perhaps is cleaner and simpler for SGX.  But for this
purpose I think we should just enforce this policy in SGX driver, i.e., we
should disable mmap(/dev/sgx_enclave) before ECREATE.

To me at least the SGX driver should be clear whether it supports userspace to
use mmap(/dev/sgx_enclave) to reserve ELRANGE.  There should be no ambiguity
here.

The current driver only disallows mmap(/dev/sgx_enclave) outside of enclave
range after EINIT:

int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start,
                     unsigned long end, unsigned long vm_flags)
{
	...

        /* Disallow mapping outside enclave's address range. */
        if (test_bit(SGX_ENCL_INITIALIZED, &encl->flags) &&
            (start < encl->base || end > encl->base + encl->size))
                return -EACCES;
}

which has what value btw?

> 
> There are definitely differences, e.g. an LSM could restrict access to
> /dev/sgx_enclave.  That particular one is obviously a "don't care", 
> 

I think you meant "RW->RX" requires EXECMOD, etc?

Yeah I am not sure whether this matters to reserving ELRANGE.  That part should
apply to the mmap(/dev/sgx_enclave) after ECREATE.

> but it's quite
> difficult to say that mmap(/dev/sgx_enclave) and mmap(NULL) are equivalent due to
> the amount of code that's involved in handling mmap().

Sure.  I was talking about only from "allocating size-aligned" part.  Arguably,
in terms of reserving ELRANGE, userspace only cares about:

 1) The ELRANGE is big enough to hold the enclave
 2) The ELRANGE meets alignment requirement
 3) Usperspace can mmap(/dev/sgx_enclave) and/or mprotect() small areas to get
    the correct enclave addr within ELRANGE with desired permission

> 
> > > > Also, if I am not missing something, the current driver doesn't prevent using
> > > > mmap(/dev/sgx_enclave, PROT_NONE) to reserve ELANGE.  So having clear
> > > > documentation is helpful for SGX users to choose how to write their apps.
> > > > 
> > > > Go back to the "SGX driver uses MAP_ANONYMOUS semantics for mmap()", I believe
> > > > this just is "SGX driver requires mmap() after ECREATE/EINIT to use MAP_SHARED |
> > > > MAP_FIXED and pgoff is ignored".  Or more precisely, pgoff is "not _used_ by SGX
> > > > driver".
> > > > 
> > > > In fact, I think "pgoff is ignored/not used" is technically wrong for enclave.
> > > 
> > > Yeah, it's wrong.  It works because, until now apparently, there was never a reason
> > > a need to care about the file offset since ELRANGE base always provided the necessary
> > > information.  It wasn't a deliberate design choice, we simply overlooked that detail
> > > (unless Jarkko was holding back on me all those years ago :-) ).
> > 
> > Yeah.  But I am not sure whether there are corner cases that we can have
> > potential bug around here, since those VMA's are not aligned between the core-mm
> > and the driver.
> > 
> > I haven't thought carefully though.  I guess from core-mm's perspective there
> > are multi-VMAs mapping to the same/overlapping part of the enclave, while the
> > SGX driver treats them as mapping to different non-overlapping parts.  Perhaps
> > it's OK as long as SGX driver doesn't use vma->pgoff to do something???
> 
> Ya, exactly.
> 
> > > > IMHO we should stop saying SGX driver uses MAP_ANONYMOUS semantics, because the
> > > > truth is it just takes advantage of MAP_FIXED and carelessly ignores the pgoff
> > > > due to the nature of SGX w/o considering from core-MM's perspective.
> > > >   
> > > > And IMHO there are two ways to fix:
> > > > 
> > > > 1) From now on, we ask SGX apps to use the correct pgoff in their
> > > > mmap(/dev/sgx_enclave).  This shouldn't impact the existing SGX apps because SGX
> > > > driver doesn't use vma->pgoff anyway.
> > > 
> > > Heh, just "asking" won't help.  And breaking userspace, i.e. requiring all apps
> > > to update, will just get you yelled at :-)
> > 
> > We can document properly and assume the new apps will follow.  As I mentioned
> > above, the old apps, which doesn't/shouldn't have been using fadvice() anyway,
> > doesn't need to be changed, i.e.,  they should continue to work. :)
> > 
> > No?
> 
> You can't build an ABI on assumptions.  E.g. even if userspace *intends* to behave,
> it wouldn't take much to compute the wrong offset (math is hard).

But I don't think we have an well established ABI now?  Nothing is documented.  

No?

> 
> > > > 2) For the sake of avoiding having to ask existing SGX apps to change their
> > > > mmap()s, we _officially_ say that userspace isn't required to pass a correct
> > > > pgoff to mmap() (i.e. passing 0 as did in existing apps), but the kernel should
> > > > fix the vma->pgoff internally.
> > > 
> > > I recommend you don't do this.  Overwriting vm_pgoff would probably work, but it's
> > > going to make a flawed API even messier.  E.g. you'll have painted SGX into a
> > > corner if you ever want to decouple vma->start/end from encl->base.  I highly
> > > doubt that will ever happen given how ELRANGE works, but I don't think a hack-a-fix
> > > buys you enough to justify any more ugliness.
> > 
> > I also found it's not feasible to cleanly fix the pgoff from userspace (I
> > thought the pgoff could be fixed at very early stage of do_mmap() so everything
> > later just worked, but it's not the case).  In do_mmap() the pgoff from
> > userspace is already used to VMA merging/splitting etc before creating the
> > target VMA.
> > 
> > Hmm.. Now I think we shouldn't silently fix pgoff in SGX driver as it may
> > surprise the core-mm later because the core-mm has already done some job around
> > VMAs before vma->pgoff is changed.
> > 
> > > 
> > > > I do prefer option 2) because it has no harm to anyone: 1) No changes to
> > > > existing SGX apps; 2) It aligns with the core-MM to so all existing mmap()
> > > > operations should work as expected, meaning no surprise; 3) And this patchset
> > > > from Haitao to use fadvice() to accelerate EAUG flow just works.
> > > 
> > > I think you can have your cake and eat it too.  IIUC, the goal is to get fadvise()
> > > working, and to do that without creating a truly heinous uAPI, you need an accurate
> > > vm_pgoff.  So, use a carrot and stick approach.
> > > 
> > > If userspace properly defines vm_pgoff during mmap() and doesn't specify MAP_ANONYMOUS,
> > > then they get to use fadvise() (give 'em a carrot).  But if *any* mmap() on the
> > > enclave doesn't followo those rules, mark the enclave as tainted or whatever and
> > > disallow fadvise() (hit 'em with a stick).
> > 
> > If we are supposed to use mmap(MAP_ANONYMOUS) to reserve ELRANGE, then I think
> > userspace will just use MAP_FIXED for all mmap()s to /dev/sgx_enclave.
> 
> There is no "supposed" to.  Using mmap(NULL) is purely a suggestion to avoid
> running into overheads and checks that are ultimately unnecessary.  The only
> requirement is that userspace provide a compatible chunk for virtual memory,
> *how* userspace finds that chunk can be done in a number of ways.  mmap(NULL)
> just happens to be the most convenient one (IMO).

Even this, I think we should document it IMO, at least whether
mmap(/dev/sgx_enclave) can be used to reserve ELRANGE.

> 
> > To detect whether a VMA has the correct pgoff, we need to somehow mark it in
> > sgx_mmap(), but currently we don't have facility to do that.  Even we can do,
> > the core madvice() -> vfs_fadvice() cannot be aware of such VMA either, and
> > in sgx_fadvice() we may have problem locating the correct VMA to do EAUG
> > (because the vfs_fadvice()  uses  0 pgoff to calculate the file offset).
> > 
> > So, now I think we should just let userspace itself to pass the correct pgoff
> > when it wants to use fadvice(), which is the option 1) I mentioned above.  The
> > kernel doesn't/shouldn't need to fix vma->pgoff.
> > 
> > In another words, I think:
> > 
> >  - For the old apps, doesn't matter, continue to work;
> >  - For new apps which want to use fadvice() to accelerate EAUG, it should pass�
> >    the correct pgoff in mmap(/dev/sgx_enclave) even with  MAP_FIXED.
> > 
> > And we don't say "SGX driver uses MAP_ANONYMOUS semantics for
> > mmap(/dev/sgx_enclave) any more".
> > 
> > Does this make sense?
> 
> Yes, but as above, IMO the kernel needs to enforce that userspace has properly
> set the offset, otherwise userspace will end up with completely nonsensical
> behavior if it fails to set the correct offset.  The proposed sgx_fadvise() already
> takes mmap_lock for read, i.e. is mutually exclusive with sgx_mmap(), so encforcing
> the requirement should be quite straightforward, e.g.
> 
>         mmap_read_lock(current->mm);
> 
>         vma = find_vma(current->mm, start);

The "start" here may already be wrong because before vfs_fadvice() uses VMA's pgoff
to calculate the start before passing it to sgx_fadvice().

>         if (!vma)
>                 goto unlock;
>         if (vma->vm_private_data != encl)
>                 goto unlock;
> 	if (encl->has_mismatched_offsets)  <======
> 		goto unlock;

Sorry I am a little bit slow, how do you set "has_mismatched_offsests" to true?

> 
>         pos = start;
>         if (pos < vma->vm_start || end > vma->vm_end) {
>                 /* Don't allow any gaps */
>                 goto unlock;
>         }



  reply	other threads:[~2023-06-22 23:22 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-28  4:55 [RFC PATCH v4 0/4] x86/sgx: implement support for MADV_WILLNEED Haitao Huang
2023-01-28  4:55 ` [RFC PATCH v4 1/4] x86/sgx: Export sgx_encl_eaug_page Haitao Huang
2023-01-28  4:55   ` [RFC PATCH v4 2/4] x86/sgx: Implement support for MADV_WILLNEED Haitao Huang
2023-01-28  4:55     ` [RFC PATCH v4 3/4] selftests/sgx: add len field for EACCEPT op Haitao Huang
2023-01-28  4:55       ` [RFC PATCH v4 4/4] selftests/sgx: Add test for madvise(..., WILLNEED) Haitao Huang
2023-02-07 23:30         ` Jarkko Sakkinen
2023-02-15  2:38         ` Huang, Kai
2023-02-15  4:42           ` Haitao Huang
2023-02-15  8:46             ` Huang, Kai
2023-02-17 22:29               ` jarkko
2023-02-07 23:29       ` [RFC PATCH v4 3/4] selftests/sgx: add len field for EACCEPT op Jarkko Sakkinen
2023-02-07 23:28     ` [RFC PATCH v4 2/4] x86/sgx: Implement support for MADV_WILLNEED Jarkko Sakkinen
2023-02-14  9:47     ` Huang, Kai
2023-02-14 19:18       ` Haitao Huang
2023-02-14 20:54         ` Huang, Kai
2023-02-14 21:42           ` Haitao Huang
2023-02-14 22:36             ` Huang, Kai
2023-02-15  3:59               ` Haitao Huang
2023-02-15  8:51                 ` Huang, Kai
2023-02-15 15:42                   ` Haitao Huang
2023-02-16  7:53                     ` Huang, Kai
2023-02-16 17:12                       ` Haitao Huang
2023-02-17 22:32                     ` jarkko
2023-02-17 23:03                       ` Haitao Huang
2023-02-21 22:10                         ` jarkko
2023-02-22  1:37                           ` Haitao Huang
2023-03-07 23:32                             ` Huang, Kai
2023-03-09  0:50                               ` Haitao Huang
2023-03-09 11:31                                 ` Huang, Kai
2023-03-14 14:54                                   ` Haitao Huang
2023-03-19 13:26                                     ` jarkko
2023-03-20  9:36                                       ` Huang, Kai
2023-03-20 14:04                                         ` jarkko
2023-05-27  0:32                                   ` Haitao Huang
2023-06-06  4:11                                     ` Huang, Kai
2023-06-07 16:59                                       ` Haitao Huang
2023-06-16  3:49                                       ` Huang, Kai
2023-06-16 22:05                                         ` Sean Christopherson
2023-06-19 11:17                                           ` Huang, Kai
2023-06-22 22:01                                             ` Sean Christopherson
2023-06-22 23:21                                               ` Huang, Kai [this message]
2023-06-26 22:28                                                 ` Sean Christopherson
2023-06-27 11:43                                                   ` Huang, Kai
2023-06-27 14:50                                                     ` Sean Christopherson
2023-06-28  9:37                                                       ` Huang, Kai
2023-06-28 14:57                                                         ` Sean Christopherson
2023-06-29  3:10                                                           ` Huang, Kai
2023-06-29 14:23                                                             ` Sean Christopherson
2023-06-29 23:29                                                               ` Huang, Kai
2023-06-30  0:14                                                                 ` Sean Christopherson
2023-06-30  0:56                                                                   ` Huang, Kai
2023-06-30  1:54                                                                 ` Jarkko Sakkinen
2023-06-30  1:57                                                                   ` Jarkko Sakkinen
2023-06-30  4:26                                                                     ` Huang, Kai
2023-06-30  9:35                                                                       ` Jarkko Sakkinen
2023-03-12  1:25                               ` jarkko
2023-03-12 22:25                                 ` Huang, Kai
2023-02-17 22:07           ` jarkko
2023-02-17 21:50       ` jarkko
2023-02-07 23:26   ` [RFC PATCH v4 1/4] x86/sgx: Export sgx_encl_eaug_page Jarkko Sakkinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=66f22efc5a920a805d6863dcc152c3c12e8fb6fb.camel@intel.com \
    --to=kai.huang@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=haitao.huang@linux.intel.com \
    --cc=jarkko@kernel.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=reinette.chatre@intel.com \
    --cc=seanjc@google.com \
    --cc=vijay.dhanraj@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).