linux-sgx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Unable to load large enclave
@ 2020-09-29 15:52 Jethro Beekman
  2020-09-30  1:16 ` Jarkko Sakkinen
  0 siblings, 1 reply; 14+ messages in thread
From: Jethro Beekman @ 2020-09-29 15:52 UTC (permalink / raw)
  To: linux-sgx

[-- Attachment #1: Type: text/plain, Size: 384 bytes --]

Since the latest API changes, I'm unable to load a large enclave. The test program at https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs always fails with ENOMEM after loading 0xffd6 pages.

I've tested this with v36, if there's reason to believe it has been fixed I'd be happy to try it out on a newer patch set.

-- 
Jethro Beekman | Fortanix


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4490 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unable to load large enclave
  2020-09-29 15:52 Unable to load large enclave Jethro Beekman
@ 2020-09-30  1:16 ` Jarkko Sakkinen
  2020-09-30  7:12   ` Jethro Beekman
  0 siblings, 1 reply; 14+ messages in thread
From: Jarkko Sakkinen @ 2020-09-30  1:16 UTC (permalink / raw)
  To: Jethro Beekman; +Cc: linux-sgx

On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
> Since the latest API changes, I'm unable to load a large enclave. The
> test program at
> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
> always fails with ENOMEM after loading 0xffd6 pages.
> 
> I've tested this with v36, if there's reason to believe it has been
> fixed I'd be happy to try it out on a newer patch set.

I recommend using v39-rc1 tag that I created for testing because API is
reverted back to be compatible with v36.

The repository has now also a new location:

git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git

> 
> -- 
> Jethro Beekman | Fortanix
> 

/Jarkko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unable to load large enclave
  2020-09-30  1:16 ` Jarkko Sakkinen
@ 2020-09-30  7:12   ` Jethro Beekman
  2020-09-30 11:45     ` Jarkko Sakkinen
  0 siblings, 1 reply; 14+ messages in thread
From: Jethro Beekman @ 2020-09-30  7:12 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: linux-sgx

[-- Attachment #1: Type: text/plain, Size: 965 bytes --]

On 2020-09-30 03:16, Jarkko Sakkinen wrote:
> On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
>> Since the latest API changes, I'm unable to load a large enclave. The
>> test program at
>> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
>> always fails with ENOMEM after loading 0xffd6 pages.
>>
>> I've tested this with v36, if there's reason to believe it has been
>> fixed I'd be happy to try it out on a newer patch set.
> 
> I recommend using v39-rc1 tag that I created for testing because API is
> reverted back to be compatible with v36.

Not sure what you're saying. I tested with v36. You're saying v39-rc1 will be the same? Or did you fix the issue since v36?

--
Jethro Beekman | Fortanix

> 
> The repository has now also a new location:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git
> 
>>
>> -- 
>> Jethro Beekman | Fortanix
>>
> 
> /Jarkko
> 


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4490 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unable to load large enclave
  2020-09-30  7:12   ` Jethro Beekman
@ 2020-09-30 11:45     ` Jarkko Sakkinen
  2020-10-03 13:12       ` Jarkko Sakkinen
  2020-10-05 22:56       ` Sean Christopherson
  0 siblings, 2 replies; 14+ messages in thread
From: Jarkko Sakkinen @ 2020-09-30 11:45 UTC (permalink / raw)
  To: Jethro Beekman; +Cc: linux-sgx

On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
> On 2020-09-30 03:16, Jarkko Sakkinen wrote:
> > On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
> >> Since the latest API changes, I'm unable to load a large enclave. The
> >> test program at
> >> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
> >> always fails with ENOMEM after loading 0xffd6 pages.
> >>
> >> I've tested this with v36, if there's reason to believe it has been
> >> fixed I'd be happy to try it out on a newer patch set.
> > 
> > I recommend using v39-rc1 tag that I created for testing because API is
> > reverted back to be compatible with v36.
> 
> Not sure what you're saying. I tested with v36. You're saying v39-rc1
> will be the same? Or did you fix the issue since v36?

v37 and v38 has an API change that is reverted in v39:

https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/

I'm not sure of the root cause yet but you asked to try to out a newer
patch set and v39-rc1 is the best option.

There was off-by-one error in enclave maximum size calculation fixed in
v37 (it was actually a bug in SDM inherited to the code) but that should
not result the situation you just described.

/Jarkko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unable to load large enclave
  2020-09-30 11:45     ` Jarkko Sakkinen
@ 2020-10-03 13:12       ` Jarkko Sakkinen
  2020-10-05 22:56       ` Sean Christopherson
  1 sibling, 0 replies; 14+ messages in thread
From: Jarkko Sakkinen @ 2020-10-03 13:12 UTC (permalink / raw)
  To: Jethro Beekman; +Cc: linux-sgx, dave.hansen, sean.j.christopherson

On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote:
> On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
> > On 2020-09-30 03:16, Jarkko Sakkinen wrote:
> > > On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
> > >> Since the latest API changes, I'm unable to load a large enclave. The
> > >> test program at
> > >> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
> > >> always fails with ENOMEM after loading 0xffd6 pages.
> > >>
> > >> I've tested this with v36, if there's reason to believe it has been
> > >> fixed I'd be happy to try it out on a newer patch set.
> > > 
> > > I recommend using v39-rc1 tag that I created for testing because API is
> > > reverted back to be compatible with v36.
> > 
> > Not sure what you're saying. I tested with v36. You're saying v39-rc1
> > will be the same? Or did you fix the issue since v36?
> 
> v37 and v38 has an API change that is reverted in v39:
> 
> https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/
> 
> I'm not sure of the root cause yet but you asked to try to out a newer
> patch set and v39-rc1 is the best option.
> 
> There was off-by-one error in enclave maximum size calculation fixed in
> v37 (it was actually a bug in SDM inherited to the code) but that should
> not result the situation you just described.

Jethro,

I'll try to set up your environment and start looking into this, but in
the mean time can you provide a trivial ftrace dump?

Here's what you shoud do:

1. Install trace-cmd. It's tool that works as frontend for ftrace
   among other things. ftrace is one of the many tracing frameworks
   in the Linux kernel.
2. Run trace-cmd start -p function -l 'sgx*'. This will start
   to trace exported sgx prefixed functions.
3. Run your test.
4. Dump trace-cmd show output to a text file and send that to me.
5. trace-cmd stop stops the tracing framework.

Thank you.

/Jarkko


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unable to load large enclave
  2020-09-30 11:45     ` Jarkko Sakkinen
  2020-10-03 13:12       ` Jarkko Sakkinen
@ 2020-10-05 22:56       ` Sean Christopherson
  2020-10-06 15:13         ` Jarkko Sakkinen
  1 sibling, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2020-10-05 22:56 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: Jethro Beekman, linux-sgx

On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote:
> On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
> > On 2020-09-30 03:16, Jarkko Sakkinen wrote:
> > > On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
> > >> Since the latest API changes, I'm unable to load a large enclave. The
> > >> test program at
> > >> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
> > >> always fails with ENOMEM after loading 0xffd6 pages.
> > >>
> > >> I've tested this with v36, if there's reason to believe it has been
> > >> fixed I'd be happy to try it out on a newer patch set.
> > > 
> > > I recommend using v39-rc1 tag that I created for testing because API is
> > > reverted back to be compatible with v36.
> > 
> > Not sure what you're saying. I tested with v36. You're saying v39-rc1
> > will be the same? Or did you fix the issue since v36?
> 
> v37 and v38 has an API change that is reverted in v39:
> 
> https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/
> 
> I'm not sure of the root cause yet but you asked to try to out a newer
> patch set and v39-rc1 is the best option.
> 
> There was off-by-one error in enclave maximum size calculation fixed in
> v37 (it was actually a bug in SDM inherited to the code) but that should
> not result the situation you just described.

My money is on the XArray changes, that's the most notable change in v36 and
IIRC the only thing that touched EPC/memory management.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unable to load large enclave
  2020-10-05 22:56       ` Sean Christopherson
@ 2020-10-06 15:13         ` Jarkko Sakkinen
  2020-10-07 15:49           ` Jarkko Sakkinen
  0 siblings, 1 reply; 14+ messages in thread
From: Jarkko Sakkinen @ 2020-10-06 15:13 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Jethro Beekman, linux-sgx

On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote:
> On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote:
> > On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
> > > On 2020-09-30 03:16, Jarkko Sakkinen wrote:
> > > > On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
> > > >> Since the latest API changes, I'm unable to load a large enclave. The
> > > >> test program at
> > > >> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
> > > >> always fails with ENOMEM after loading 0xffd6 pages.
> > > >>
> > > >> I've tested this with v36, if there's reason to believe it has been
> > > >> fixed I'd be happy to try it out on a newer patch set.
> > > > 
> > > > I recommend using v39-rc1 tag that I created for testing because API is
> > > > reverted back to be compatible with v36.
> > > 
> > > Not sure what you're saying. I tested with v36. You're saying v39-rc1
> > > will be the same? Or did you fix the issue since v36?
> > 
> > v37 and v38 has an API change that is reverted in v39:
> > 
> > https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/
> > 
> > I'm not sure of the root cause yet but you asked to try to out a newer
> > patch set and v39-rc1 is the best option.
> > 
> > There was off-by-one error in enclave maximum size calculation fixed in
> > v37 (it was actually a bug in SDM inherited to the code) but that should
> > not result the situation you just described.
> 
> My money is on the XArray changes, that's the most notable change in v36 and
> IIRC the only thing that touched EPC/memory management.

Yeah, that's what we've been speculating for some days now. That's
somewhat deprecated email. It all started to enroll when I asked
Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information
required to root cause the bug.

/Jarkko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unable to load large enclave
  2020-10-06 15:13         ` Jarkko Sakkinen
@ 2020-10-07 15:49           ` Jarkko Sakkinen
  2020-10-07 16:13             ` Jethro Beekman
  0 siblings, 1 reply; 14+ messages in thread
From: Jarkko Sakkinen @ 2020-10-07 15:49 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Jethro Beekman, linux-sgx

On Tue, Oct 06, 2020 at 06:13:28PM +0300, Jarkko Sakkinen wrote:
> On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote:
> > On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote:
> > > On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
> > > > On 2020-09-30 03:16, Jarkko Sakkinen wrote:
> > > > > On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
> > > > >> Since the latest API changes, I'm unable to load a large enclave. The
> > > > >> test program at
> > > > >> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
> > > > >> always fails with ENOMEM after loading 0xffd6 pages.
> > > > >>
> > > > >> I've tested this with v36, if there's reason to believe it has been
> > > > >> fixed I'd be happy to try it out on a newer patch set.
> > > > > 
> > > > > I recommend using v39-rc1 tag that I created for testing because API is
> > > > > reverted back to be compatible with v36.
> > > > 
> > > > Not sure what you're saying. I tested with v36. You're saying v39-rc1
> > > > will be the same? Or did you fix the issue since v36?
> > > 
> > > v37 and v38 has an API change that is reverted in v39:
> > > 
> > > https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/
> > > 
> > > I'm not sure of the root cause yet but you asked to try to out a newer
> > > patch set and v39-rc1 is the best option.
> > > 
> > > There was off-by-one error in enclave maximum size calculation fixed in
> > > v37 (it was actually a bug in SDM inherited to the code) but that should
> > > not result the situation you just described.
> > 
> > My money is on the XArray changes, that's the most notable change in v36 and
> > IIRC the only thing that touched EPC/memory management.
> 
> Yeah, that's what we've been speculating for some days now. That's
> somewhat deprecated email. It all started to enroll when I asked
> Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information
> required to root cause the bug.

I run the failing test and filtered SGX mmap's and ioctl's with this
eBPF script:

kretprobe:sgx_ioctl /retval != 0/
{
        printf("sgx_ioctl: %d\n", retval)
}

kretprobe:sgx_mmap /retval != 0/
{
        printf("sgx_mmap: %d\n", retval)
}

This results zero positives, i.e. empty output, when run with bpftrace.

I'd go instead after RLIMIT_AS [*].

With these conclusions, I'm done with this bug.

[*] https://man7.org/linux/man-pages/man2/getrlimit.2.html

/Jarkko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unable to load large enclave
  2020-10-07 15:49           ` Jarkko Sakkinen
@ 2020-10-07 16:13             ` Jethro Beekman
  2020-10-07 17:20               ` Jarkko Sakkinen
  0 siblings, 1 reply; 14+ messages in thread
From: Jethro Beekman @ 2020-10-07 16:13 UTC (permalink / raw)
  To: Jarkko Sakkinen, Sean Christopherson; +Cc: linux-sgx

[-- Attachment #1: Type: text/plain, Size: 2753 bytes --]

On 2020-10-07 17:49, Jarkko Sakkinen wrote:
> On Tue, Oct 06, 2020 at 06:13:28PM +0300, Jarkko Sakkinen wrote:
>> On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote:
>>> On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote:
>>>> On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
>>>>> On 2020-09-30 03:16, Jarkko Sakkinen wrote:
>>>>>> On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
>>>>>>> Since the latest API changes, I'm unable to load a large enclave. The
>>>>>>> test program at
>>>>>>> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
>>>>>>> always fails with ENOMEM after loading 0xffd6 pages.
>>>>>>>
>>>>>>> I've tested this with v36, if there's reason to believe it has been
>>>>>>> fixed I'd be happy to try it out on a newer patch set.
>>>>>>
>>>>>> I recommend using v39-rc1 tag that I created for testing because API is
>>>>>> reverted back to be compatible with v36.
>>>>>
>>>>> Not sure what you're saying. I tested with v36. You're saying v39-rc1
>>>>> will be the same? Or did you fix the issue since v36?
>>>>
>>>> v37 and v38 has an API change that is reverted in v39:
>>>>
>>>> https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/
>>>>
>>>> I'm not sure of the root cause yet but you asked to try to out a newer
>>>> patch set and v39-rc1 is the best option.
>>>>
>>>> There was off-by-one error in enclave maximum size calculation fixed in
>>>> v37 (it was actually a bug in SDM inherited to the code) but that should
>>>> not result the situation you just described.
>>>
>>> My money is on the XArray changes, that's the most notable change in v36 and
>>> IIRC the only thing that touched EPC/memory management.
>>
>> Yeah, that's what we've been speculating for some days now. That's
>> somewhat deprecated email. It all started to enroll when I asked
>> Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information
>> required to root cause the bug.
> 
> I run the failing test and filtered SGX mmap's and ioctl's with this
> eBPF script:
> 
> kretprobe:sgx_ioctl /retval != 0/
> {
>         printf("sgx_ioctl: %d\n", retval)
> }
> 
> kretprobe:sgx_mmap /retval != 0/
> {
>         printf("sgx_mmap: %d\n", retval)
> }
> 
> This results zero positives, i.e. empty output, when run with bpftrace.
> 
> I'd go instead after RLIMIT_AS [*].
> 
> With these conclusions, I'm done with this bug.
> 

How can it be RLIMIT_AS? With the current flow, you mmap the whole range before mmaping the individual pages over it?

Also, I can easily load a 1GB enclave with the old driver.

Also:

$ ulimit -v
unlimited


--
Jethro Beekman | Fortanix


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4490 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unable to load large enclave
  2020-10-07 16:13             ` Jethro Beekman
@ 2020-10-07 17:20               ` Jarkko Sakkinen
  2020-10-07 18:14                 ` Jethro Beekman
  2020-10-07 18:25                 ` Jarkko Sakkinen
  0 siblings, 2 replies; 14+ messages in thread
From: Jarkko Sakkinen @ 2020-10-07 17:20 UTC (permalink / raw)
  To: Jethro Beekman; +Cc: Sean Christopherson, linux-sgx

On Wed, Oct 07, 2020 at 06:13:49PM +0200, Jethro Beekman wrote:
> On 2020-10-07 17:49, Jarkko Sakkinen wrote:
> > On Tue, Oct 06, 2020 at 06:13:28PM +0300, Jarkko Sakkinen wrote:
> >> On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote:
> >>> On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote:
> >>>> On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
> >>>>> On 2020-09-30 03:16, Jarkko Sakkinen wrote:
> >>>>>> On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
> >>>>>>> Since the latest API changes, I'm unable to load a large enclave. The
> >>>>>>> test program at
> >>>>>>> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
> >>>>>>> always fails with ENOMEM after loading 0xffd6 pages.
> >>>>>>>
> >>>>>>> I've tested this with v36, if there's reason to believe it has been
> >>>>>>> fixed I'd be happy to try it out on a newer patch set.
> >>>>>>
> >>>>>> I recommend using v39-rc1 tag that I created for testing because API is
> >>>>>> reverted back to be compatible with v36.
> >>>>>
> >>>>> Not sure what you're saying. I tested with v36. You're saying v39-rc1
> >>>>> will be the same? Or did you fix the issue since v36?
> >>>>
> >>>> v37 and v38 has an API change that is reverted in v39:
> >>>>
> >>>> https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/
> >>>>
> >>>> I'm not sure of the root cause yet but you asked to try to out a newer
> >>>> patch set and v39-rc1 is the best option.
> >>>>
> >>>> There was off-by-one error in enclave maximum size calculation fixed in
> >>>> v37 (it was actually a bug in SDM inherited to the code) but that should
> >>>> not result the situation you just described.
> >>>
> >>> My money is on the XArray changes, that's the most notable change in v36 and
> >>> IIRC the only thing that touched EPC/memory management.
> >>
> >> Yeah, that's what we've been speculating for some days now. That's
> >> somewhat deprecated email. It all started to enroll when I asked
> >> Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information
> >> required to root cause the bug.
> > 
> > I run the failing test and filtered SGX mmap's and ioctl's with this
> > eBPF script:
> > 
> > kretprobe:sgx_ioctl /retval != 0/
> > {
> >         printf("sgx_ioctl: %d\n", retval)
> > }
> > 
> > kretprobe:sgx_mmap /retval != 0/
> > {
> >         printf("sgx_mmap: %d\n", retval)
> > }
> > 
> > This results zero positives, i.e. empty output, when run with bpftrace.
> > 
> > I'd go instead after RLIMIT_AS [*].
> > 
> > With these conclusions, I'm done with this bug.
> > 
> 
> How can it be RLIMIT_AS? With the current flow, you mmap the whole range before mmaping the individual pages over it?
> 
> Also, I can easily load a 1GB enclave with the old driver.
> 
> Also:
> 
> $ ulimit -v
> unlimited

➜  ~ (master) ✔ sudo bpftrace sgx_ret.bt
Attaching 3 probes...
ksys_mmap_pgoff: -12
^C

~ (master) ✔ cat sgx_ret.bt
kretprobe:sgx_ioctl /retval != 0/
{
        printf("sgx_ioctl: %d\n", retval)
}

kretprobe:sgx_mmap /retval != 0/
{
        printf("sgx_mmap: %d\n", retval)
}

kretprobe:ksys_mmap_pgoff /retval == (uint64)-12/
{
        printf("ksys_mmap_pgoff: %d\n", retval)
}

This shows that it fails before reaching sgx_mmap().

/Jarkko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unable to load large enclave
  2020-10-07 17:20               ` Jarkko Sakkinen
@ 2020-10-07 18:14                 ` Jethro Beekman
  2020-10-07 18:34                   ` Jarkko Sakkinen
  2020-10-07 18:25                 ` Jarkko Sakkinen
  1 sibling, 1 reply; 14+ messages in thread
From: Jethro Beekman @ 2020-10-07 18:14 UTC (permalink / raw)
  To: Jarkko Sakkinen; +Cc: Sean Christopherson, linux-sgx

[-- Attachment #1: Type: text/plain, Size: 3817 bytes --]

On 2020-10-07 19:20, Jarkko Sakkinen wrote:
> On Wed, Oct 07, 2020 at 06:13:49PM +0200, Jethro Beekman wrote:
>> On 2020-10-07 17:49, Jarkko Sakkinen wrote:
>>> On Tue, Oct 06, 2020 at 06:13:28PM +0300, Jarkko Sakkinen wrote:
>>>> On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote:
>>>>> On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote:
>>>>>> On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
>>>>>>> On 2020-09-30 03:16, Jarkko Sakkinen wrote:
>>>>>>>> On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
>>>>>>>>> Since the latest API changes, I'm unable to load a large enclave. The
>>>>>>>>> test program at
>>>>>>>>> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
>>>>>>>>> always fails with ENOMEM after loading 0xffd6 pages.
>>>>>>>>>
>>>>>>>>> I've tested this with v36, if there's reason to believe it has been
>>>>>>>>> fixed I'd be happy to try it out on a newer patch set.
>>>>>>>>
>>>>>>>> I recommend using v39-rc1 tag that I created for testing because API is
>>>>>>>> reverted back to be compatible with v36.
>>>>>>>
>>>>>>> Not sure what you're saying. I tested with v36. You're saying v39-rc1
>>>>>>> will be the same? Or did you fix the issue since v36?
>>>>>>
>>>>>> v37 and v38 has an API change that is reverted in v39:
>>>>>>
>>>>>> https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/
>>>>>>
>>>>>> I'm not sure of the root cause yet but you asked to try to out a newer
>>>>>> patch set and v39-rc1 is the best option.
>>>>>>
>>>>>> There was off-by-one error in enclave maximum size calculation fixed in
>>>>>> v37 (it was actually a bug in SDM inherited to the code) but that should
>>>>>> not result the situation you just described.
>>>>>
>>>>> My money is on the XArray changes, that's the most notable change in v36 and
>>>>> IIRC the only thing that touched EPC/memory management.
>>>>
>>>> Yeah, that's what we've been speculating for some days now. That's
>>>> somewhat deprecated email. It all started to enroll when I asked
>>>> Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information
>>>> required to root cause the bug.
>>>
>>> I run the failing test and filtered SGX mmap's and ioctl's with this
>>> eBPF script:
>>>
>>> kretprobe:sgx_ioctl /retval != 0/
>>> {
>>>         printf("sgx_ioctl: %d\n", retval)
>>> }
>>>
>>> kretprobe:sgx_mmap /retval != 0/
>>> {
>>>         printf("sgx_mmap: %d\n", retval)
>>> }
>>>
>>> This results zero positives, i.e. empty output, when run with bpftrace.
>>>
>>> I'd go instead after RLIMIT_AS [*].
>>>
>>> With these conclusions, I'm done with this bug.
>>>
>>
>> How can it be RLIMIT_AS? With the current flow, you mmap the whole range before mmaping the individual pages over it?
>>
>> Also, I can easily load a 1GB enclave with the old driver.
>>
>> Also:
>>
>> $ ulimit -v
>> unlimited
> 
> ➜  ~ (master) ✔ sudo bpftrace sgx_ret.bt
> Attaching 3 probes...
> ksys_mmap_pgoff: -12
> ^C
> 
> ~ (master) ✔ cat sgx_ret.bt
> kretprobe:sgx_ioctl /retval != 0/
> {
>         printf("sgx_ioctl: %d\n", retval)
> }
> 
> kretprobe:sgx_mmap /retval != 0/
> {
>         printf("sgx_mmap: %d\n", retval)
> }
> 
> kretprobe:ksys_mmap_pgoff /retval == (uint64)-12/
> {
>         printf("ksys_mmap_pgoff: %d\n", retval)
> }
> 
> This shows that it fails before reaching sgx_mmap().
> 
> /Jarkko
> 

It's this one in do_mmap():

	/* Too many mappings? */
	if (mm->map_count > sysctl_max_map_count)
		return -ENOMEM;

I've verified that I'm no longer getting the problem when increasing /proc/sys/vm/max_map_count . Why do I need to change this from the default compared to before?

--
Jethro Beekman | Fortanix



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4490 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unable to load large enclave
  2020-10-07 17:20               ` Jarkko Sakkinen
  2020-10-07 18:14                 ` Jethro Beekman
@ 2020-10-07 18:25                 ` Jarkko Sakkinen
  1 sibling, 0 replies; 14+ messages in thread
From: Jarkko Sakkinen @ 2020-10-07 18:25 UTC (permalink / raw)
  To: Jethro Beekman; +Cc: Sean Christopherson, linux-sgx

On Wed, Oct 07, 2020 at 08:20:58PM +0300, Jarkko Sakkinen wrote:
> On Wed, Oct 07, 2020 at 06:13:49PM +0200, Jethro Beekman wrote:
> > On 2020-10-07 17:49, Jarkko Sakkinen wrote:
> > > On Tue, Oct 06, 2020 at 06:13:28PM +0300, Jarkko Sakkinen wrote:
> > >> On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote:
> > >>> On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote:
> > >>>> On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
> > >>>>> On 2020-09-30 03:16, Jarkko Sakkinen wrote:
> > >>>>>> On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
> > >>>>>>> Since the latest API changes, I'm unable to load a large enclave. The
> > >>>>>>> test program at
> > >>>>>>> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
> > >>>>>>> always fails with ENOMEM after loading 0xffd6 pages.
> > >>>>>>>
> > >>>>>>> I've tested this with v36, if there's reason to believe it has been
> > >>>>>>> fixed I'd be happy to try it out on a newer patch set.
> > >>>>>>
> > >>>>>> I recommend using v39-rc1 tag that I created for testing because API is
> > >>>>>> reverted back to be compatible with v36.
> > >>>>>
> > >>>>> Not sure what you're saying. I tested with v36. You're saying v39-rc1
> > >>>>> will be the same? Or did you fix the issue since v36?
> > >>>>
> > >>>> v37 and v38 has an API change that is reverted in v39:
> > >>>>
> > >>>> https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/
> > >>>>
> > >>>> I'm not sure of the root cause yet but you asked to try to out a newer
> > >>>> patch set and v39-rc1 is the best option.
> > >>>>
> > >>>> There was off-by-one error in enclave maximum size calculation fixed in
> > >>>> v37 (it was actually a bug in SDM inherited to the code) but that should
> > >>>> not result the situation you just described.
> > >>>
> > >>> My money is on the XArray changes, that's the most notable change in v36 and
> > >>> IIRC the only thing that touched EPC/memory management.
> > >>
> > >> Yeah, that's what we've been speculating for some days now. That's
> > >> somewhat deprecated email. It all started to enroll when I asked
> > >> Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information
> > >> required to root cause the bug.
> > > 
> > > I run the failing test and filtered SGX mmap's and ioctl's with this
> > > eBPF script:
> > > 
> > > kretprobe:sgx_ioctl /retval != 0/
> > > {
> > >         printf("sgx_ioctl: %d\n", retval)
> > > }
> > > 
> > > kretprobe:sgx_mmap /retval != 0/
> > > {
> > >         printf("sgx_mmap: %d\n", retval)
> > > }
> > > 
> > > This results zero positives, i.e. empty output, when run with bpftrace.
> > > 
> > > I'd go instead after RLIMIT_AS [*].
> > > 
> > > With these conclusions, I'm done with this bug.
> > > 
> > 
> > How can it be RLIMIT_AS? With the current flow, you mmap the whole range before mmaping the individual pages over it?
> > 
> > Also, I can easily load a 1GB enclave with the old driver.
> > 
> > Also:
> > 
> > $ ulimit -v
> > unlimited
> 
> ➜  ~ (master) ✔ sudo bpftrace sgx_ret.bt
> Attaching 3 probes...
> ksys_mmap_pgoff: -12
> ^C
> 
> ~ (master) ✔ cat sgx_ret.bt
> kretprobe:sgx_ioctl /retval != 0/
> {
>         printf("sgx_ioctl: %d\n", retval)
> }
> 
> kretprobe:sgx_mmap /retval != 0/
> {
>         printf("sgx_mmap: %d\n", retval)
> }
> 
> kretprobe:ksys_mmap_pgoff /retval == (uint64)-12/
> {
>         printf("ksys_mmap_pgoff: %d\n", retval)
> }
> 
> This shows that it fails before reaching sgx_mmap().

➜  ~ (master) ✔ sudo bpftrace -e 'kr:ksys_mmap_pgoff { @[comm] = count(); }'
Attaching 1 probe...
^C

@[zsh]: 44
@[git]: 47
@[date]: 48
@[network.sh]: 48
@[battery.sh]: 56
@[which]: 84
@[cargo]: 94
@[head]: 96
@[iw]: 126
@[uname]: 144
@[cat]: 168
@[sh]: 175
@[sed]: 198
@[bash]: 216
@[ping]: 222
@[ls]: 324
@[sgx-load-large-]: 65510

65510 is the default value for /proc/sys/vm/max_map_count [*].

[*] https://www.kernel.org/doc/Documentation/sysctl/vm.txt

/Jarkko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unable to load large enclave
  2020-10-07 18:14                 ` Jethro Beekman
@ 2020-10-07 18:34                   ` Jarkko Sakkinen
  2020-10-07 18:36                     ` Jarkko Sakkinen
  0 siblings, 1 reply; 14+ messages in thread
From: Jarkko Sakkinen @ 2020-10-07 18:34 UTC (permalink / raw)
  To: Jethro Beekman; +Cc: Sean Christopherson, linux-sgx

On Wed, Oct 07, 2020 at 08:14:48PM +0200, Jethro Beekman wrote:
> It's this one in do_mmap():
> 
> 	/* Too many mappings? */
> 	if (mm->map_count > sysctl_max_map_count)
> 		return -ENOMEM;
> 
> I've verified that I'm no longer getting the problem when increasing
> /proc/sys/vm/max_map_count . Why do I need to change this from the
> default compared to before?

Yes, you are correct. I came into same conclusion and responded (once
again) to my own email after running this:

➜  ~ (master) ✔ sudo bpftrace -e 'kr:ksys_mmap_pgoff { @[comm] = count(); }' &> log.txt
^C
➜  ~ (master) ✔ cat log.txt
Attaching 1 probe...


@[cat]: 18
@[git]: 47
@[zsh]: 49
@[cargo]: 94
@[sgx-load-large-]: 65510

That is the default value for /proc/sys/vm/max_map_count.

Re-responding just in case because I thought that the bpftrace snippet
might have some value for you. I don't why I cannot see my email at
lore.kernel.org.

/Jarkko

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Unable to load large enclave
  2020-10-07 18:34                   ` Jarkko Sakkinen
@ 2020-10-07 18:36                     ` Jarkko Sakkinen
  0 siblings, 0 replies; 14+ messages in thread
From: Jarkko Sakkinen @ 2020-10-07 18:36 UTC (permalink / raw)
  To: Jethro Beekman; +Cc: Sean Christopherson, linux-sgx

On Wed, Oct 07, 2020 at 09:35:06PM +0300, Jarkko Sakkinen wrote:
> On Wed, Oct 07, 2020 at 08:14:48PM +0200, Jethro Beekman wrote:
> > It's this one in do_mmap():
> > 
> > 	/* Too many mappings? */
> > 	if (mm->map_count > sysctl_max_map_count)
> > 		return -ENOMEM;
> > 
> > I've verified that I'm no longer getting the problem when increasing
> > /proc/sys/vm/max_map_count . Why do I need to change this from the
> > default compared to before?
> 
> Yes, you are correct. I came into same conclusion and responded (once
> again) to my own email after running this:
> 
> ➜  ~ (master) ✔ sudo bpftrace -e 'kr:ksys_mmap_pgoff { @[comm] = count(); }' &> log.txt
> ^C
> ➜  ~ (master) ✔ cat log.txt
> Attaching 1 probe...
> 
> 
> @[cat]: 18
> @[git]: 47
> @[zsh]: 49
> @[cargo]: 94
> @[sgx-load-large-]: 65510
> 
> That is the default value for /proc/sys/vm/max_map_count.
> 
> Re-responding just in case because I thought that the bpftrace snippet
> might have some value for you. I don't why I cannot see my email at
> lore.kernel.org.

... looking also forward to run this test and the test suite (have
not tried yet) in the master branch. This all is really useful for
me that you are doing.

/Jarkko

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-10-07 19:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-29 15:52 Unable to load large enclave Jethro Beekman
2020-09-30  1:16 ` Jarkko Sakkinen
2020-09-30  7:12   ` Jethro Beekman
2020-09-30 11:45     ` Jarkko Sakkinen
2020-10-03 13:12       ` Jarkko Sakkinen
2020-10-05 22:56       ` Sean Christopherson
2020-10-06 15:13         ` Jarkko Sakkinen
2020-10-07 15:49           ` Jarkko Sakkinen
2020-10-07 16:13             ` Jethro Beekman
2020-10-07 17:20               ` Jarkko Sakkinen
2020-10-07 18:14                 ` Jethro Beekman
2020-10-07 18:34                   ` Jarkko Sakkinen
2020-10-07 18:36                     ` Jarkko Sakkinen
2020-10-07 18:25                 ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).