* Unable to load large enclave
@ 2020-09-29 15:52 Jethro Beekman
2020-09-30 1:16 ` Jarkko Sakkinen
0 siblings, 1 reply; 14+ messages in thread
From: Jethro Beekman @ 2020-09-29 15:52 UTC (permalink / raw)
To: linux-sgx
[-- Attachment #1: Type: text/plain, Size: 384 bytes --]
Since the latest API changes, I'm unable to load a large enclave. The test program at https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs always fails with ENOMEM after loading 0xffd6 pages.
I've tested this with v36, if there's reason to believe it has been fixed I'd be happy to try it out on a newer patch set.
--
Jethro Beekman | Fortanix
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4490 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave
2020-09-29 15:52 Unable to load large enclave Jethro Beekman
@ 2020-09-30 1:16 ` Jarkko Sakkinen
2020-09-30 7:12 ` Jethro Beekman
0 siblings, 1 reply; 14+ messages in thread
From: Jarkko Sakkinen @ 2020-09-30 1:16 UTC (permalink / raw)
To: Jethro Beekman; +Cc: linux-sgx
On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
> Since the latest API changes, I'm unable to load a large enclave. The
> test program at
> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
> always fails with ENOMEM after loading 0xffd6 pages.
>
> I've tested this with v36, if there's reason to believe it has been
> fixed I'd be happy to try it out on a newer patch set.
I recommend using v39-rc1 tag that I created for testing because API is
reverted back to be compatible with v36.
The repository has now also a new location:
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git
>
> --
> Jethro Beekman | Fortanix
>
/Jarkko
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave
2020-09-30 1:16 ` Jarkko Sakkinen
@ 2020-09-30 7:12 ` Jethro Beekman
2020-09-30 11:45 ` Jarkko Sakkinen
0 siblings, 1 reply; 14+ messages in thread
From: Jethro Beekman @ 2020-09-30 7:12 UTC (permalink / raw)
To: Jarkko Sakkinen; +Cc: linux-sgx
[-- Attachment #1: Type: text/plain, Size: 965 bytes --]
On 2020-09-30 03:16, Jarkko Sakkinen wrote:
> On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
>> Since the latest API changes, I'm unable to load a large enclave. The
>> test program at
>> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
>> always fails with ENOMEM after loading 0xffd6 pages.
>>
>> I've tested this with v36, if there's reason to believe it has been
>> fixed I'd be happy to try it out on a newer patch set.
>
> I recommend using v39-rc1 tag that I created for testing because API is
> reverted back to be compatible with v36.
Not sure what you're saying. I tested with v36. You're saying v39-rc1 will be the same? Or did you fix the issue since v36?
--
Jethro Beekman | Fortanix
>
> The repository has now also a new location:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git
>
>>
>> --
>> Jethro Beekman | Fortanix
>>
>
> /Jarkko
>
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4490 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave
2020-09-30 7:12 ` Jethro Beekman
@ 2020-09-30 11:45 ` Jarkko Sakkinen
2020-10-03 13:12 ` Jarkko Sakkinen
2020-10-05 22:56 ` Sean Christopherson
0 siblings, 2 replies; 14+ messages in thread
From: Jarkko Sakkinen @ 2020-09-30 11:45 UTC (permalink / raw)
To: Jethro Beekman; +Cc: linux-sgx
On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
> On 2020-09-30 03:16, Jarkko Sakkinen wrote:
> > On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
> >> Since the latest API changes, I'm unable to load a large enclave. The
> >> test program at
> >> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
> >> always fails with ENOMEM after loading 0xffd6 pages.
> >>
> >> I've tested this with v36, if there's reason to believe it has been
> >> fixed I'd be happy to try it out on a newer patch set.
> >
> > I recommend using v39-rc1 tag that I created for testing because API is
> > reverted back to be compatible with v36.
>
> Not sure what you're saying. I tested with v36. You're saying v39-rc1
> will be the same? Or did you fix the issue since v36?
v37 and v38 has an API change that is reverted in v39:
https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/
I'm not sure of the root cause yet but you asked to try to out a newer
patch set and v39-rc1 is the best option.
There was off-by-one error in enclave maximum size calculation fixed in
v37 (it was actually a bug in SDM inherited to the code) but that should
not result the situation you just described.
/Jarkko
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave
2020-09-30 11:45 ` Jarkko Sakkinen
@ 2020-10-03 13:12 ` Jarkko Sakkinen
2020-10-05 22:56 ` Sean Christopherson
1 sibling, 0 replies; 14+ messages in thread
From: Jarkko Sakkinen @ 2020-10-03 13:12 UTC (permalink / raw)
To: Jethro Beekman; +Cc: linux-sgx, dave.hansen, sean.j.christopherson
On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote:
> On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
> > On 2020-09-30 03:16, Jarkko Sakkinen wrote:
> > > On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
> > >> Since the latest API changes, I'm unable to load a large enclave. The
> > >> test program at
> > >> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
> > >> always fails with ENOMEM after loading 0xffd6 pages.
> > >>
> > >> I've tested this with v36, if there's reason to believe it has been
> > >> fixed I'd be happy to try it out on a newer patch set.
> > >
> > > I recommend using v39-rc1 tag that I created for testing because API is
> > > reverted back to be compatible with v36.
> >
> > Not sure what you're saying. I tested with v36. You're saying v39-rc1
> > will be the same? Or did you fix the issue since v36?
>
> v37 and v38 has an API change that is reverted in v39:
>
> https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/
>
> I'm not sure of the root cause yet but you asked to try to out a newer
> patch set and v39-rc1 is the best option.
>
> There was off-by-one error in enclave maximum size calculation fixed in
> v37 (it was actually a bug in SDM inherited to the code) but that should
> not result the situation you just described.
Jethro,
I'll try to set up your environment and start looking into this, but in
the mean time can you provide a trivial ftrace dump?
Here's what you shoud do:
1. Install trace-cmd. It's tool that works as frontend for ftrace
among other things. ftrace is one of the many tracing frameworks
in the Linux kernel.
2. Run trace-cmd start -p function -l 'sgx*'. This will start
to trace exported sgx prefixed functions.
3. Run your test.
4. Dump trace-cmd show output to a text file and send that to me.
5. trace-cmd stop stops the tracing framework.
Thank you.
/Jarkko
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave
2020-09-30 11:45 ` Jarkko Sakkinen
2020-10-03 13:12 ` Jarkko Sakkinen
@ 2020-10-05 22:56 ` Sean Christopherson
2020-10-06 15:13 ` Jarkko Sakkinen
1 sibling, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2020-10-05 22:56 UTC (permalink / raw)
To: Jarkko Sakkinen; +Cc: Jethro Beekman, linux-sgx
On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote:
> On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
> > On 2020-09-30 03:16, Jarkko Sakkinen wrote:
> > > On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
> > >> Since the latest API changes, I'm unable to load a large enclave. The
> > >> test program at
> > >> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
> > >> always fails with ENOMEM after loading 0xffd6 pages.
> > >>
> > >> I've tested this with v36, if there's reason to believe it has been
> > >> fixed I'd be happy to try it out on a newer patch set.
> > >
> > > I recommend using v39-rc1 tag that I created for testing because API is
> > > reverted back to be compatible with v36.
> >
> > Not sure what you're saying. I tested with v36. You're saying v39-rc1
> > will be the same? Or did you fix the issue since v36?
>
> v37 and v38 has an API change that is reverted in v39:
>
> https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/
>
> I'm not sure of the root cause yet but you asked to try to out a newer
> patch set and v39-rc1 is the best option.
>
> There was off-by-one error in enclave maximum size calculation fixed in
> v37 (it was actually a bug in SDM inherited to the code) but that should
> not result the situation you just described.
My money is on the XArray changes, that's the most notable change in v36 and
IIRC the only thing that touched EPC/memory management.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave
2020-10-05 22:56 ` Sean Christopherson
@ 2020-10-06 15:13 ` Jarkko Sakkinen
2020-10-07 15:49 ` Jarkko Sakkinen
0 siblings, 1 reply; 14+ messages in thread
From: Jarkko Sakkinen @ 2020-10-06 15:13 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Jethro Beekman, linux-sgx
On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote:
> On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote:
> > On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
> > > On 2020-09-30 03:16, Jarkko Sakkinen wrote:
> > > > On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
> > > >> Since the latest API changes, I'm unable to load a large enclave. The
> > > >> test program at
> > > >> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
> > > >> always fails with ENOMEM after loading 0xffd6 pages.
> > > >>
> > > >> I've tested this with v36, if there's reason to believe it has been
> > > >> fixed I'd be happy to try it out on a newer patch set.
> > > >
> > > > I recommend using v39-rc1 tag that I created for testing because API is
> > > > reverted back to be compatible with v36.
> > >
> > > Not sure what you're saying. I tested with v36. You're saying v39-rc1
> > > will be the same? Or did you fix the issue since v36?
> >
> > v37 and v38 has an API change that is reverted in v39:
> >
> > https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/
> >
> > I'm not sure of the root cause yet but you asked to try to out a newer
> > patch set and v39-rc1 is the best option.
> >
> > There was off-by-one error in enclave maximum size calculation fixed in
> > v37 (it was actually a bug in SDM inherited to the code) but that should
> > not result the situation you just described.
>
> My money is on the XArray changes, that's the most notable change in v36 and
> IIRC the only thing that touched EPC/memory management.
Yeah, that's what we've been speculating for some days now. That's
somewhat deprecated email. It all started to enroll when I asked
Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information
required to root cause the bug.
/Jarkko
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave
2020-10-06 15:13 ` Jarkko Sakkinen
@ 2020-10-07 15:49 ` Jarkko Sakkinen
2020-10-07 16:13 ` Jethro Beekman
0 siblings, 1 reply; 14+ messages in thread
From: Jarkko Sakkinen @ 2020-10-07 15:49 UTC (permalink / raw)
To: Sean Christopherson; +Cc: Jethro Beekman, linux-sgx
On Tue, Oct 06, 2020 at 06:13:28PM +0300, Jarkko Sakkinen wrote:
> On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote:
> > On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote:
> > > On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
> > > > On 2020-09-30 03:16, Jarkko Sakkinen wrote:
> > > > > On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
> > > > >> Since the latest API changes, I'm unable to load a large enclave. The
> > > > >> test program at
> > > > >> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
> > > > >> always fails with ENOMEM after loading 0xffd6 pages.
> > > > >>
> > > > >> I've tested this with v36, if there's reason to believe it has been
> > > > >> fixed I'd be happy to try it out on a newer patch set.
> > > > >
> > > > > I recommend using v39-rc1 tag that I created for testing because API is
> > > > > reverted back to be compatible with v36.
> > > >
> > > > Not sure what you're saying. I tested with v36. You're saying v39-rc1
> > > > will be the same? Or did you fix the issue since v36?
> > >
> > > v37 and v38 has an API change that is reverted in v39:
> > >
> > > https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/
> > >
> > > I'm not sure of the root cause yet but you asked to try to out a newer
> > > patch set and v39-rc1 is the best option.
> > >
> > > There was off-by-one error in enclave maximum size calculation fixed in
> > > v37 (it was actually a bug in SDM inherited to the code) but that should
> > > not result the situation you just described.
> >
> > My money is on the XArray changes, that's the most notable change in v36 and
> > IIRC the only thing that touched EPC/memory management.
>
> Yeah, that's what we've been speculating for some days now. That's
> somewhat deprecated email. It all started to enroll when I asked
> Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information
> required to root cause the bug.
I run the failing test and filtered SGX mmap's and ioctl's with this
eBPF script:
kretprobe:sgx_ioctl /retval != 0/
{
printf("sgx_ioctl: %d\n", retval)
}
kretprobe:sgx_mmap /retval != 0/
{
printf("sgx_mmap: %d\n", retval)
}
This results zero positives, i.e. empty output, when run with bpftrace.
I'd go instead after RLIMIT_AS [*].
With these conclusions, I'm done with this bug.
[*] https://man7.org/linux/man-pages/man2/getrlimit.2.html
/Jarkko
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave
2020-10-07 15:49 ` Jarkko Sakkinen
@ 2020-10-07 16:13 ` Jethro Beekman
2020-10-07 17:20 ` Jarkko Sakkinen
0 siblings, 1 reply; 14+ messages in thread
From: Jethro Beekman @ 2020-10-07 16:13 UTC (permalink / raw)
To: Jarkko Sakkinen, Sean Christopherson; +Cc: linux-sgx
[-- Attachment #1: Type: text/plain, Size: 2753 bytes --]
On 2020-10-07 17:49, Jarkko Sakkinen wrote:
> On Tue, Oct 06, 2020 at 06:13:28PM +0300, Jarkko Sakkinen wrote:
>> On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote:
>>> On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote:
>>>> On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
>>>>> On 2020-09-30 03:16, Jarkko Sakkinen wrote:
>>>>>> On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
>>>>>>> Since the latest API changes, I'm unable to load a large enclave. The
>>>>>>> test program at
>>>>>>> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
>>>>>>> always fails with ENOMEM after loading 0xffd6 pages.
>>>>>>>
>>>>>>> I've tested this with v36, if there's reason to believe it has been
>>>>>>> fixed I'd be happy to try it out on a newer patch set.
>>>>>>
>>>>>> I recommend using v39-rc1 tag that I created for testing because API is
>>>>>> reverted back to be compatible with v36.
>>>>>
>>>>> Not sure what you're saying. I tested with v36. You're saying v39-rc1
>>>>> will be the same? Or did you fix the issue since v36?
>>>>
>>>> v37 and v38 has an API change that is reverted in v39:
>>>>
>>>> https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/
>>>>
>>>> I'm not sure of the root cause yet but you asked to try to out a newer
>>>> patch set and v39-rc1 is the best option.
>>>>
>>>> There was off-by-one error in enclave maximum size calculation fixed in
>>>> v37 (it was actually a bug in SDM inherited to the code) but that should
>>>> not result the situation you just described.
>>>
>>> My money is on the XArray changes, that's the most notable change in v36 and
>>> IIRC the only thing that touched EPC/memory management.
>>
>> Yeah, that's what we've been speculating for some days now. That's
>> somewhat deprecated email. It all started to enroll when I asked
>> Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information
>> required to root cause the bug.
>
> I run the failing test and filtered SGX mmap's and ioctl's with this
> eBPF script:
>
> kretprobe:sgx_ioctl /retval != 0/
> {
> printf("sgx_ioctl: %d\n", retval)
> }
>
> kretprobe:sgx_mmap /retval != 0/
> {
> printf("sgx_mmap: %d\n", retval)
> }
>
> This results zero positives, i.e. empty output, when run with bpftrace.
>
> I'd go instead after RLIMIT_AS [*].
>
> With these conclusions, I'm done with this bug.
>
How can it be RLIMIT_AS? With the current flow, you mmap the whole range before mmaping the individual pages over it?
Also, I can easily load a 1GB enclave with the old driver.
Also:
$ ulimit -v
unlimited
--
Jethro Beekman | Fortanix
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4490 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave
2020-10-07 16:13 ` Jethro Beekman
@ 2020-10-07 17:20 ` Jarkko Sakkinen
2020-10-07 18:14 ` Jethro Beekman
2020-10-07 18:25 ` Jarkko Sakkinen
0 siblings, 2 replies; 14+ messages in thread
From: Jarkko Sakkinen @ 2020-10-07 17:20 UTC (permalink / raw)
To: Jethro Beekman; +Cc: Sean Christopherson, linux-sgx
On Wed, Oct 07, 2020 at 06:13:49PM +0200, Jethro Beekman wrote:
> On 2020-10-07 17:49, Jarkko Sakkinen wrote:
> > On Tue, Oct 06, 2020 at 06:13:28PM +0300, Jarkko Sakkinen wrote:
> >> On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote:
> >>> On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote:
> >>>> On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
> >>>>> On 2020-09-30 03:16, Jarkko Sakkinen wrote:
> >>>>>> On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
> >>>>>>> Since the latest API changes, I'm unable to load a large enclave. The
> >>>>>>> test program at
> >>>>>>> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
> >>>>>>> always fails with ENOMEM after loading 0xffd6 pages.
> >>>>>>>
> >>>>>>> I've tested this with v36, if there's reason to believe it has been
> >>>>>>> fixed I'd be happy to try it out on a newer patch set.
> >>>>>>
> >>>>>> I recommend using v39-rc1 tag that I created for testing because API is
> >>>>>> reverted back to be compatible with v36.
> >>>>>
> >>>>> Not sure what you're saying. I tested with v36. You're saying v39-rc1
> >>>>> will be the same? Or did you fix the issue since v36?
> >>>>
> >>>> v37 and v38 has an API change that is reverted in v39:
> >>>>
> >>>> https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/
> >>>>
> >>>> I'm not sure of the root cause yet but you asked to try to out a newer
> >>>> patch set and v39-rc1 is the best option.
> >>>>
> >>>> There was off-by-one error in enclave maximum size calculation fixed in
> >>>> v37 (it was actually a bug in SDM inherited to the code) but that should
> >>>> not result the situation you just described.
> >>>
> >>> My money is on the XArray changes, that's the most notable change in v36 and
> >>> IIRC the only thing that touched EPC/memory management.
> >>
> >> Yeah, that's what we've been speculating for some days now. That's
> >> somewhat deprecated email. It all started to enroll when I asked
> >> Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information
> >> required to root cause the bug.
> >
> > I run the failing test and filtered SGX mmap's and ioctl's with this
> > eBPF script:
> >
> > kretprobe:sgx_ioctl /retval != 0/
> > {
> > printf("sgx_ioctl: %d\n", retval)
> > }
> >
> > kretprobe:sgx_mmap /retval != 0/
> > {
> > printf("sgx_mmap: %d\n", retval)
> > }
> >
> > This results zero positives, i.e. empty output, when run with bpftrace.
> >
> > I'd go instead after RLIMIT_AS [*].
> >
> > With these conclusions, I'm done with this bug.
> >
>
> How can it be RLIMIT_AS? With the current flow, you mmap the whole range before mmaping the individual pages over it?
>
> Also, I can easily load a 1GB enclave with the old driver.
>
> Also:
>
> $ ulimit -v
> unlimited
➜ ~ (master) ✔ sudo bpftrace sgx_ret.bt
Attaching 3 probes...
ksys_mmap_pgoff: -12
^C
~ (master) ✔ cat sgx_ret.bt
kretprobe:sgx_ioctl /retval != 0/
{
printf("sgx_ioctl: %d\n", retval)
}
kretprobe:sgx_mmap /retval != 0/
{
printf("sgx_mmap: %d\n", retval)
}
kretprobe:ksys_mmap_pgoff /retval == (uint64)-12/
{
printf("ksys_mmap_pgoff: %d\n", retval)
}
This shows that it fails before reaching sgx_mmap().
/Jarkko
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave
2020-10-07 17:20 ` Jarkko Sakkinen
@ 2020-10-07 18:14 ` Jethro Beekman
2020-10-07 18:34 ` Jarkko Sakkinen
2020-10-07 18:25 ` Jarkko Sakkinen
1 sibling, 1 reply; 14+ messages in thread
From: Jethro Beekman @ 2020-10-07 18:14 UTC (permalink / raw)
To: Jarkko Sakkinen; +Cc: Sean Christopherson, linux-sgx
[-- Attachment #1: Type: text/plain, Size: 3817 bytes --]
On 2020-10-07 19:20, Jarkko Sakkinen wrote:
> On Wed, Oct 07, 2020 at 06:13:49PM +0200, Jethro Beekman wrote:
>> On 2020-10-07 17:49, Jarkko Sakkinen wrote:
>>> On Tue, Oct 06, 2020 at 06:13:28PM +0300, Jarkko Sakkinen wrote:
>>>> On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote:
>>>>> On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote:
>>>>>> On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
>>>>>>> On 2020-09-30 03:16, Jarkko Sakkinen wrote:
>>>>>>>> On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
>>>>>>>>> Since the latest API changes, I'm unable to load a large enclave. The
>>>>>>>>> test program at
>>>>>>>>> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
>>>>>>>>> always fails with ENOMEM after loading 0xffd6 pages.
>>>>>>>>>
>>>>>>>>> I've tested this with v36, if there's reason to believe it has been
>>>>>>>>> fixed I'd be happy to try it out on a newer patch set.
>>>>>>>>
>>>>>>>> I recommend using v39-rc1 tag that I created for testing because API is
>>>>>>>> reverted back to be compatible with v36.
>>>>>>>
>>>>>>> Not sure what you're saying. I tested with v36. You're saying v39-rc1
>>>>>>> will be the same? Or did you fix the issue since v36?
>>>>>>
>>>>>> v37 and v38 has an API change that is reverted in v39:
>>>>>>
>>>>>> https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/
>>>>>>
>>>>>> I'm not sure of the root cause yet but you asked to try to out a newer
>>>>>> patch set and v39-rc1 is the best option.
>>>>>>
>>>>>> There was off-by-one error in enclave maximum size calculation fixed in
>>>>>> v37 (it was actually a bug in SDM inherited to the code) but that should
>>>>>> not result the situation you just described.
>>>>>
>>>>> My money is on the XArray changes, that's the most notable change in v36 and
>>>>> IIRC the only thing that touched EPC/memory management.
>>>>
>>>> Yeah, that's what we've been speculating for some days now. That's
>>>> somewhat deprecated email. It all started to enroll when I asked
>>>> Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information
>>>> required to root cause the bug.
>>>
>>> I run the failing test and filtered SGX mmap's and ioctl's with this
>>> eBPF script:
>>>
>>> kretprobe:sgx_ioctl /retval != 0/
>>> {
>>> printf("sgx_ioctl: %d\n", retval)
>>> }
>>>
>>> kretprobe:sgx_mmap /retval != 0/
>>> {
>>> printf("sgx_mmap: %d\n", retval)
>>> }
>>>
>>> This results zero positives, i.e. empty output, when run with bpftrace.
>>>
>>> I'd go instead after RLIMIT_AS [*].
>>>
>>> With these conclusions, I'm done with this bug.
>>>
>>
>> How can it be RLIMIT_AS? With the current flow, you mmap the whole range before mmaping the individual pages over it?
>>
>> Also, I can easily load a 1GB enclave with the old driver.
>>
>> Also:
>>
>> $ ulimit -v
>> unlimited
>
> ➜ ~ (master) ✔ sudo bpftrace sgx_ret.bt
> Attaching 3 probes...
> ksys_mmap_pgoff: -12
> ^C
>
> ~ (master) ✔ cat sgx_ret.bt
> kretprobe:sgx_ioctl /retval != 0/
> {
> printf("sgx_ioctl: %d\n", retval)
> }
>
> kretprobe:sgx_mmap /retval != 0/
> {
> printf("sgx_mmap: %d\n", retval)
> }
>
> kretprobe:ksys_mmap_pgoff /retval == (uint64)-12/
> {
> printf("ksys_mmap_pgoff: %d\n", retval)
> }
>
> This shows that it fails before reaching sgx_mmap().
>
> /Jarkko
>
It's this one in do_mmap():
/* Too many mappings? */
if (mm->map_count > sysctl_max_map_count)
return -ENOMEM;
I've verified that I'm no longer getting the problem when increasing /proc/sys/vm/max_map_count . Why do I need to change this from the default compared to before?
--
Jethro Beekman | Fortanix
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4490 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave
2020-10-07 17:20 ` Jarkko Sakkinen
2020-10-07 18:14 ` Jethro Beekman
@ 2020-10-07 18:25 ` Jarkko Sakkinen
1 sibling, 0 replies; 14+ messages in thread
From: Jarkko Sakkinen @ 2020-10-07 18:25 UTC (permalink / raw)
To: Jethro Beekman; +Cc: Sean Christopherson, linux-sgx
On Wed, Oct 07, 2020 at 08:20:58PM +0300, Jarkko Sakkinen wrote:
> On Wed, Oct 07, 2020 at 06:13:49PM +0200, Jethro Beekman wrote:
> > On 2020-10-07 17:49, Jarkko Sakkinen wrote:
> > > On Tue, Oct 06, 2020 at 06:13:28PM +0300, Jarkko Sakkinen wrote:
> > >> On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote:
> > >>> On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote:
> > >>>> On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote:
> > >>>>> On 2020-09-30 03:16, Jarkko Sakkinen wrote:
> > >>>>>> On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote:
> > >>>>>>> Since the latest API changes, I'm unable to load a large enclave. The
> > >>>>>>> test program at
> > >>>>>>> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs
> > >>>>>>> always fails with ENOMEM after loading 0xffd6 pages.
> > >>>>>>>
> > >>>>>>> I've tested this with v36, if there's reason to believe it has been
> > >>>>>>> fixed I'd be happy to try it out on a newer patch set.
> > >>>>>>
> > >>>>>> I recommend using v39-rc1 tag that I created for testing because API is
> > >>>>>> reverted back to be compatible with v36.
> > >>>>>
> > >>>>> Not sure what you're saying. I tested with v36. You're saying v39-rc1
> > >>>>> will be the same? Or did you fix the issue since v36?
> > >>>>
> > >>>> v37 and v38 has an API change that is reverted in v39:
> > >>>>
> > >>>> https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/
> > >>>>
> > >>>> I'm not sure of the root cause yet but you asked to try to out a newer
> > >>>> patch set and v39-rc1 is the best option.
> > >>>>
> > >>>> There was off-by-one error in enclave maximum size calculation fixed in
> > >>>> v37 (it was actually a bug in SDM inherited to the code) but that should
> > >>>> not result the situation you just described.
> > >>>
> > >>> My money is on the XArray changes, that's the most notable change in v36 and
> > >>> IIRC the only thing that touched EPC/memory management.
> > >>
> > >> Yeah, that's what we've been speculating for some days now. That's
> > >> somewhat deprecated email. It all started to enroll when I asked
> > >> Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information
> > >> required to root cause the bug.
> > >
> > > I run the failing test and filtered SGX mmap's and ioctl's with this
> > > eBPF script:
> > >
> > > kretprobe:sgx_ioctl /retval != 0/
> > > {
> > > printf("sgx_ioctl: %d\n", retval)
> > > }
> > >
> > > kretprobe:sgx_mmap /retval != 0/
> > > {
> > > printf("sgx_mmap: %d\n", retval)
> > > }
> > >
> > > This results zero positives, i.e. empty output, when run with bpftrace.
> > >
> > > I'd go instead after RLIMIT_AS [*].
> > >
> > > With these conclusions, I'm done with this bug.
> > >
> >
> > How can it be RLIMIT_AS? With the current flow, you mmap the whole range before mmaping the individual pages over it?
> >
> > Also, I can easily load a 1GB enclave with the old driver.
> >
> > Also:
> >
> > $ ulimit -v
> > unlimited
>
> ➜ ~ (master) ✔ sudo bpftrace sgx_ret.bt
> Attaching 3 probes...
> ksys_mmap_pgoff: -12
> ^C
>
> ~ (master) ✔ cat sgx_ret.bt
> kretprobe:sgx_ioctl /retval != 0/
> {
> printf("sgx_ioctl: %d\n", retval)
> }
>
> kretprobe:sgx_mmap /retval != 0/
> {
> printf("sgx_mmap: %d\n", retval)
> }
>
> kretprobe:ksys_mmap_pgoff /retval == (uint64)-12/
> {
> printf("ksys_mmap_pgoff: %d\n", retval)
> }
>
> This shows that it fails before reaching sgx_mmap().
➜ ~ (master) ✔ sudo bpftrace -e 'kr:ksys_mmap_pgoff { @[comm] = count(); }'
Attaching 1 probe...
^C
@[zsh]: 44
@[git]: 47
@[date]: 48
@[network.sh]: 48
@[battery.sh]: 56
@[which]: 84
@[cargo]: 94
@[head]: 96
@[iw]: 126
@[uname]: 144
@[cat]: 168
@[sh]: 175
@[sed]: 198
@[bash]: 216
@[ping]: 222
@[ls]: 324
@[sgx-load-large-]: 65510
65510 is the default value for /proc/sys/vm/max_map_count [*].
[*] https://www.kernel.org/doc/Documentation/sysctl/vm.txt
/Jarkko
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave
2020-10-07 18:14 ` Jethro Beekman
@ 2020-10-07 18:34 ` Jarkko Sakkinen
2020-10-07 18:36 ` Jarkko Sakkinen
0 siblings, 1 reply; 14+ messages in thread
From: Jarkko Sakkinen @ 2020-10-07 18:34 UTC (permalink / raw)
To: Jethro Beekman; +Cc: Sean Christopherson, linux-sgx
On Wed, Oct 07, 2020 at 08:14:48PM +0200, Jethro Beekman wrote:
> It's this one in do_mmap():
>
> /* Too many mappings? */
> if (mm->map_count > sysctl_max_map_count)
> return -ENOMEM;
>
> I've verified that I'm no longer getting the problem when increasing
> /proc/sys/vm/max_map_count . Why do I need to change this from the
> default compared to before?
Yes, you are correct. I came into same conclusion and responded (once
again) to my own email after running this:
➜ ~ (master) ✔ sudo bpftrace -e 'kr:ksys_mmap_pgoff { @[comm] = count(); }' &> log.txt
^C
➜ ~ (master) ✔ cat log.txt
Attaching 1 probe...
@[cat]: 18
@[git]: 47
@[zsh]: 49
@[cargo]: 94
@[sgx-load-large-]: 65510
That is the default value for /proc/sys/vm/max_map_count.
Re-responding just in case because I thought that the bpftrace snippet
might have some value for you. I don't why I cannot see my email at
lore.kernel.org.
/Jarkko
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave
2020-10-07 18:34 ` Jarkko Sakkinen
@ 2020-10-07 18:36 ` Jarkko Sakkinen
0 siblings, 0 replies; 14+ messages in thread
From: Jarkko Sakkinen @ 2020-10-07 18:36 UTC (permalink / raw)
To: Jethro Beekman; +Cc: Sean Christopherson, linux-sgx
On Wed, Oct 07, 2020 at 09:35:06PM +0300, Jarkko Sakkinen wrote:
> On Wed, Oct 07, 2020 at 08:14:48PM +0200, Jethro Beekman wrote:
> > It's this one in do_mmap():
> >
> > /* Too many mappings? */
> > if (mm->map_count > sysctl_max_map_count)
> > return -ENOMEM;
> >
> > I've verified that I'm no longer getting the problem when increasing
> > /proc/sys/vm/max_map_count . Why do I need to change this from the
> > default compared to before?
>
> Yes, you are correct. I came into same conclusion and responded (once
> again) to my own email after running this:
>
> ➜ ~ (master) ✔ sudo bpftrace -e 'kr:ksys_mmap_pgoff { @[comm] = count(); }' &> log.txt
> ^C
> ➜ ~ (master) ✔ cat log.txt
> Attaching 1 probe...
>
>
> @[cat]: 18
> @[git]: 47
> @[zsh]: 49
> @[cargo]: 94
> @[sgx-load-large-]: 65510
>
> That is the default value for /proc/sys/vm/max_map_count.
>
> Re-responding just in case because I thought that the bpftrace snippet
> might have some value for you. I don't why I cannot see my email at
> lore.kernel.org.
... looking also forward to run this test and the test suite (have
not tried yet) in the master branch. This all is really useful for
me that you are doing.
/Jarkko
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-10-07 19:59 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-29 15:52 Unable to load large enclave Jethro Beekman
2020-09-30 1:16 ` Jarkko Sakkinen
2020-09-30 7:12 ` Jethro Beekman
2020-09-30 11:45 ` Jarkko Sakkinen
2020-10-03 13:12 ` Jarkko Sakkinen
2020-10-05 22:56 ` Sean Christopherson
2020-10-06 15:13 ` Jarkko Sakkinen
2020-10-07 15:49 ` Jarkko Sakkinen
2020-10-07 16:13 ` Jethro Beekman
2020-10-07 17:20 ` Jarkko Sakkinen
2020-10-07 18:14 ` Jethro Beekman
2020-10-07 18:34 ` Jarkko Sakkinen
2020-10-07 18:36 ` Jarkko Sakkinen
2020-10-07 18:25 ` Jarkko Sakkinen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).