* Unable to load large enclave @ 2020-09-29 15:52 Jethro Beekman 2020-09-30 1:16 ` Jarkko Sakkinen 0 siblings, 1 reply; 14+ messages in thread From: Jethro Beekman @ 2020-09-29 15:52 UTC (permalink / raw) To: linux-sgx [-- Attachment #1: Type: text/plain, Size: 384 bytes --] Since the latest API changes, I'm unable to load a large enclave. The test program at https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs always fails with ENOMEM after loading 0xffd6 pages. I've tested this with v36, if there's reason to believe it has been fixed I'd be happy to try it out on a newer patch set. -- Jethro Beekman | Fortanix [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 4490 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave 2020-09-29 15:52 Unable to load large enclave Jethro Beekman @ 2020-09-30 1:16 ` Jarkko Sakkinen 2020-09-30 7:12 ` Jethro Beekman 0 siblings, 1 reply; 14+ messages in thread From: Jarkko Sakkinen @ 2020-09-30 1:16 UTC (permalink / raw) To: Jethro Beekman; +Cc: linux-sgx On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote: > Since the latest API changes, I'm unable to load a large enclave. The > test program at > https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs > always fails with ENOMEM after loading 0xffd6 pages. > > I've tested this with v36, if there's reason to believe it has been > fixed I'd be happy to try it out on a newer patch set. I recommend using v39-rc1 tag that I created for testing because API is reverted back to be compatible with v36. The repository has now also a new location: git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git > > -- > Jethro Beekman | Fortanix > /Jarkko ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave 2020-09-30 1:16 ` Jarkko Sakkinen @ 2020-09-30 7:12 ` Jethro Beekman 2020-09-30 11:45 ` Jarkko Sakkinen 0 siblings, 1 reply; 14+ messages in thread From: Jethro Beekman @ 2020-09-30 7:12 UTC (permalink / raw) To: Jarkko Sakkinen; +Cc: linux-sgx [-- Attachment #1: Type: text/plain, Size: 965 bytes --] On 2020-09-30 03:16, Jarkko Sakkinen wrote: > On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote: >> Since the latest API changes, I'm unable to load a large enclave. The >> test program at >> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs >> always fails with ENOMEM after loading 0xffd6 pages. >> >> I've tested this with v36, if there's reason to believe it has been >> fixed I'd be happy to try it out on a newer patch set. > > I recommend using v39-rc1 tag that I created for testing because API is > reverted back to be compatible with v36. Not sure what you're saying. I tested with v36. You're saying v39-rc1 will be the same? Or did you fix the issue since v36? -- Jethro Beekman | Fortanix > > The repository has now also a new location: > > git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git > >> >> -- >> Jethro Beekman | Fortanix >> > > /Jarkko > [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 4490 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave 2020-09-30 7:12 ` Jethro Beekman @ 2020-09-30 11:45 ` Jarkko Sakkinen 2020-10-03 13:12 ` Jarkko Sakkinen 2020-10-05 22:56 ` Sean Christopherson 0 siblings, 2 replies; 14+ messages in thread From: Jarkko Sakkinen @ 2020-09-30 11:45 UTC (permalink / raw) To: Jethro Beekman; +Cc: linux-sgx On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote: > On 2020-09-30 03:16, Jarkko Sakkinen wrote: > > On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote: > >> Since the latest API changes, I'm unable to load a large enclave. The > >> test program at > >> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs > >> always fails with ENOMEM after loading 0xffd6 pages. > >> > >> I've tested this with v36, if there's reason to believe it has been > >> fixed I'd be happy to try it out on a newer patch set. > > > > I recommend using v39-rc1 tag that I created for testing because API is > > reverted back to be compatible with v36. > > Not sure what you're saying. I tested with v36. You're saying v39-rc1 > will be the same? Or did you fix the issue since v36? v37 and v38 has an API change that is reverted in v39: https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/ I'm not sure of the root cause yet but you asked to try to out a newer patch set and v39-rc1 is the best option. There was off-by-one error in enclave maximum size calculation fixed in v37 (it was actually a bug in SDM inherited to the code) but that should not result the situation you just described. /Jarkko ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave 2020-09-30 11:45 ` Jarkko Sakkinen @ 2020-10-03 13:12 ` Jarkko Sakkinen 2020-10-05 22:56 ` Sean Christopherson 1 sibling, 0 replies; 14+ messages in thread From: Jarkko Sakkinen @ 2020-10-03 13:12 UTC (permalink / raw) To: Jethro Beekman; +Cc: linux-sgx, dave.hansen, sean.j.christopherson On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote: > On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote: > > On 2020-09-30 03:16, Jarkko Sakkinen wrote: > > > On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote: > > >> Since the latest API changes, I'm unable to load a large enclave. The > > >> test program at > > >> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs > > >> always fails with ENOMEM after loading 0xffd6 pages. > > >> > > >> I've tested this with v36, if there's reason to believe it has been > > >> fixed I'd be happy to try it out on a newer patch set. > > > > > > I recommend using v39-rc1 tag that I created for testing because API is > > > reverted back to be compatible with v36. > > > > Not sure what you're saying. I tested with v36. You're saying v39-rc1 > > will be the same? Or did you fix the issue since v36? > > v37 and v38 has an API change that is reverted in v39: > > https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/ > > I'm not sure of the root cause yet but you asked to try to out a newer > patch set and v39-rc1 is the best option. > > There was off-by-one error in enclave maximum size calculation fixed in > v37 (it was actually a bug in SDM inherited to the code) but that should > not result the situation you just described. Jethro, I'll try to set up your environment and start looking into this, but in the mean time can you provide a trivial ftrace dump? Here's what you shoud do: 1. Install trace-cmd. It's tool that works as frontend for ftrace among other things. ftrace is one of the many tracing frameworks in the Linux kernel. 2. Run trace-cmd start -p function -l 'sgx*'. This will start to trace exported sgx prefixed functions. 3. Run your test. 4. Dump trace-cmd show output to a text file and send that to me. 5. trace-cmd stop stops the tracing framework. Thank you. /Jarkko ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave 2020-09-30 11:45 ` Jarkko Sakkinen 2020-10-03 13:12 ` Jarkko Sakkinen @ 2020-10-05 22:56 ` Sean Christopherson 2020-10-06 15:13 ` Jarkko Sakkinen 1 sibling, 1 reply; 14+ messages in thread From: Sean Christopherson @ 2020-10-05 22:56 UTC (permalink / raw) To: Jarkko Sakkinen; +Cc: Jethro Beekman, linux-sgx On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote: > On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote: > > On 2020-09-30 03:16, Jarkko Sakkinen wrote: > > > On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote: > > >> Since the latest API changes, I'm unable to load a large enclave. The > > >> test program at > > >> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs > > >> always fails with ENOMEM after loading 0xffd6 pages. > > >> > > >> I've tested this with v36, if there's reason to believe it has been > > >> fixed I'd be happy to try it out on a newer patch set. > > > > > > I recommend using v39-rc1 tag that I created for testing because API is > > > reverted back to be compatible with v36. > > > > Not sure what you're saying. I tested with v36. You're saying v39-rc1 > > will be the same? Or did you fix the issue since v36? > > v37 and v38 has an API change that is reverted in v39: > > https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/ > > I'm not sure of the root cause yet but you asked to try to out a newer > patch set and v39-rc1 is the best option. > > There was off-by-one error in enclave maximum size calculation fixed in > v37 (it was actually a bug in SDM inherited to the code) but that should > not result the situation you just described. My money is on the XArray changes, that's the most notable change in v36 and IIRC the only thing that touched EPC/memory management. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave 2020-10-05 22:56 ` Sean Christopherson @ 2020-10-06 15:13 ` Jarkko Sakkinen 2020-10-07 15:49 ` Jarkko Sakkinen 0 siblings, 1 reply; 14+ messages in thread From: Jarkko Sakkinen @ 2020-10-06 15:13 UTC (permalink / raw) To: Sean Christopherson; +Cc: Jethro Beekman, linux-sgx On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote: > On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote: > > On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote: > > > On 2020-09-30 03:16, Jarkko Sakkinen wrote: > > > > On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote: > > > >> Since the latest API changes, I'm unable to load a large enclave. The > > > >> test program at > > > >> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs > > > >> always fails with ENOMEM after loading 0xffd6 pages. > > > >> > > > >> I've tested this with v36, if there's reason to believe it has been > > > >> fixed I'd be happy to try it out on a newer patch set. > > > > > > > > I recommend using v39-rc1 tag that I created for testing because API is > > > > reverted back to be compatible with v36. > > > > > > Not sure what you're saying. I tested with v36. You're saying v39-rc1 > > > will be the same? Or did you fix the issue since v36? > > > > v37 and v38 has an API change that is reverted in v39: > > > > https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/ > > > > I'm not sure of the root cause yet but you asked to try to out a newer > > patch set and v39-rc1 is the best option. > > > > There was off-by-one error in enclave maximum size calculation fixed in > > v37 (it was actually a bug in SDM inherited to the code) but that should > > not result the situation you just described. > > My money is on the XArray changes, that's the most notable change in v36 and > IIRC the only thing that touched EPC/memory management. Yeah, that's what we've been speculating for some days now. That's somewhat deprecated email. It all started to enroll when I asked Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information required to root cause the bug. /Jarkko ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave 2020-10-06 15:13 ` Jarkko Sakkinen @ 2020-10-07 15:49 ` Jarkko Sakkinen 2020-10-07 16:13 ` Jethro Beekman 0 siblings, 1 reply; 14+ messages in thread From: Jarkko Sakkinen @ 2020-10-07 15:49 UTC (permalink / raw) To: Sean Christopherson; +Cc: Jethro Beekman, linux-sgx On Tue, Oct 06, 2020 at 06:13:28PM +0300, Jarkko Sakkinen wrote: > On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote: > > On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote: > > > On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote: > > > > On 2020-09-30 03:16, Jarkko Sakkinen wrote: > > > > > On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote: > > > > >> Since the latest API changes, I'm unable to load a large enclave. The > > > > >> test program at > > > > >> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs > > > > >> always fails with ENOMEM after loading 0xffd6 pages. > > > > >> > > > > >> I've tested this with v36, if there's reason to believe it has been > > > > >> fixed I'd be happy to try it out on a newer patch set. > > > > > > > > > > I recommend using v39-rc1 tag that I created for testing because API is > > > > > reverted back to be compatible with v36. > > > > > > > > Not sure what you're saying. I tested with v36. You're saying v39-rc1 > > > > will be the same? Or did you fix the issue since v36? > > > > > > v37 and v38 has an API change that is reverted in v39: > > > > > > https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/ > > > > > > I'm not sure of the root cause yet but you asked to try to out a newer > > > patch set and v39-rc1 is the best option. > > > > > > There was off-by-one error in enclave maximum size calculation fixed in > > > v37 (it was actually a bug in SDM inherited to the code) but that should > > > not result the situation you just described. > > > > My money is on the XArray changes, that's the most notable change in v36 and > > IIRC the only thing that touched EPC/memory management. > > Yeah, that's what we've been speculating for some days now. That's > somewhat deprecated email. It all started to enroll when I asked > Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information > required to root cause the bug. I run the failing test and filtered SGX mmap's and ioctl's with this eBPF script: kretprobe:sgx_ioctl /retval != 0/ { printf("sgx_ioctl: %d\n", retval) } kretprobe:sgx_mmap /retval != 0/ { printf("sgx_mmap: %d\n", retval) } This results zero positives, i.e. empty output, when run with bpftrace. I'd go instead after RLIMIT_AS [*]. With these conclusions, I'm done with this bug. [*] https://man7.org/linux/man-pages/man2/getrlimit.2.html /Jarkko ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave 2020-10-07 15:49 ` Jarkko Sakkinen @ 2020-10-07 16:13 ` Jethro Beekman 2020-10-07 17:20 ` Jarkko Sakkinen 0 siblings, 1 reply; 14+ messages in thread From: Jethro Beekman @ 2020-10-07 16:13 UTC (permalink / raw) To: Jarkko Sakkinen, Sean Christopherson; +Cc: linux-sgx [-- Attachment #1: Type: text/plain, Size: 2753 bytes --] On 2020-10-07 17:49, Jarkko Sakkinen wrote: > On Tue, Oct 06, 2020 at 06:13:28PM +0300, Jarkko Sakkinen wrote: >> On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote: >>> On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote: >>>> On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote: >>>>> On 2020-09-30 03:16, Jarkko Sakkinen wrote: >>>>>> On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote: >>>>>>> Since the latest API changes, I'm unable to load a large enclave. The >>>>>>> test program at >>>>>>> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs >>>>>>> always fails with ENOMEM after loading 0xffd6 pages. >>>>>>> >>>>>>> I've tested this with v36, if there's reason to believe it has been >>>>>>> fixed I'd be happy to try it out on a newer patch set. >>>>>> >>>>>> I recommend using v39-rc1 tag that I created for testing because API is >>>>>> reverted back to be compatible with v36. >>>>> >>>>> Not sure what you're saying. I tested with v36. You're saying v39-rc1 >>>>> will be the same? Or did you fix the issue since v36? >>>> >>>> v37 and v38 has an API change that is reverted in v39: >>>> >>>> https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/ >>>> >>>> I'm not sure of the root cause yet but you asked to try to out a newer >>>> patch set and v39-rc1 is the best option. >>>> >>>> There was off-by-one error in enclave maximum size calculation fixed in >>>> v37 (it was actually a bug in SDM inherited to the code) but that should >>>> not result the situation you just described. >>> >>> My money is on the XArray changes, that's the most notable change in v36 and >>> IIRC the only thing that touched EPC/memory management. >> >> Yeah, that's what we've been speculating for some days now. That's >> somewhat deprecated email. It all started to enroll when I asked >> Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information >> required to root cause the bug. > > I run the failing test and filtered SGX mmap's and ioctl's with this > eBPF script: > > kretprobe:sgx_ioctl /retval != 0/ > { > printf("sgx_ioctl: %d\n", retval) > } > > kretprobe:sgx_mmap /retval != 0/ > { > printf("sgx_mmap: %d\n", retval) > } > > This results zero positives, i.e. empty output, when run with bpftrace. > > I'd go instead after RLIMIT_AS [*]. > > With these conclusions, I'm done with this bug. > How can it be RLIMIT_AS? With the current flow, you mmap the whole range before mmaping the individual pages over it? Also, I can easily load a 1GB enclave with the old driver. Also: $ ulimit -v unlimited -- Jethro Beekman | Fortanix [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 4490 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave 2020-10-07 16:13 ` Jethro Beekman @ 2020-10-07 17:20 ` Jarkko Sakkinen 2020-10-07 18:14 ` Jethro Beekman 2020-10-07 18:25 ` Jarkko Sakkinen 0 siblings, 2 replies; 14+ messages in thread From: Jarkko Sakkinen @ 2020-10-07 17:20 UTC (permalink / raw) To: Jethro Beekman; +Cc: Sean Christopherson, linux-sgx On Wed, Oct 07, 2020 at 06:13:49PM +0200, Jethro Beekman wrote: > On 2020-10-07 17:49, Jarkko Sakkinen wrote: > > On Tue, Oct 06, 2020 at 06:13:28PM +0300, Jarkko Sakkinen wrote: > >> On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote: > >>> On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote: > >>>> On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote: > >>>>> On 2020-09-30 03:16, Jarkko Sakkinen wrote: > >>>>>> On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote: > >>>>>>> Since the latest API changes, I'm unable to load a large enclave. The > >>>>>>> test program at > >>>>>>> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs > >>>>>>> always fails with ENOMEM after loading 0xffd6 pages. > >>>>>>> > >>>>>>> I've tested this with v36, if there's reason to believe it has been > >>>>>>> fixed I'd be happy to try it out on a newer patch set. > >>>>>> > >>>>>> I recommend using v39-rc1 tag that I created for testing because API is > >>>>>> reverted back to be compatible with v36. > >>>>> > >>>>> Not sure what you're saying. I tested with v36. You're saying v39-rc1 > >>>>> will be the same? Or did you fix the issue since v36? > >>>> > >>>> v37 and v38 has an API change that is reverted in v39: > >>>> > >>>> https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/ > >>>> > >>>> I'm not sure of the root cause yet but you asked to try to out a newer > >>>> patch set and v39-rc1 is the best option. > >>>> > >>>> There was off-by-one error in enclave maximum size calculation fixed in > >>>> v37 (it was actually a bug in SDM inherited to the code) but that should > >>>> not result the situation you just described. > >>> > >>> My money is on the XArray changes, that's the most notable change in v36 and > >>> IIRC the only thing that touched EPC/memory management. > >> > >> Yeah, that's what we've been speculating for some days now. That's > >> somewhat deprecated email. It all started to enroll when I asked > >> Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information > >> required to root cause the bug. > > > > I run the failing test and filtered SGX mmap's and ioctl's with this > > eBPF script: > > > > kretprobe:sgx_ioctl /retval != 0/ > > { > > printf("sgx_ioctl: %d\n", retval) > > } > > > > kretprobe:sgx_mmap /retval != 0/ > > { > > printf("sgx_mmap: %d\n", retval) > > } > > > > This results zero positives, i.e. empty output, when run with bpftrace. > > > > I'd go instead after RLIMIT_AS [*]. > > > > With these conclusions, I'm done with this bug. > > > > How can it be RLIMIT_AS? With the current flow, you mmap the whole range before mmaping the individual pages over it? > > Also, I can easily load a 1GB enclave with the old driver. > > Also: > > $ ulimit -v > unlimited ➜ ~ (master) ✔ sudo bpftrace sgx_ret.bt Attaching 3 probes... ksys_mmap_pgoff: -12 ^C ~ (master) ✔ cat sgx_ret.bt kretprobe:sgx_ioctl /retval != 0/ { printf("sgx_ioctl: %d\n", retval) } kretprobe:sgx_mmap /retval != 0/ { printf("sgx_mmap: %d\n", retval) } kretprobe:ksys_mmap_pgoff /retval == (uint64)-12/ { printf("ksys_mmap_pgoff: %d\n", retval) } This shows that it fails before reaching sgx_mmap(). /Jarkko ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave 2020-10-07 17:20 ` Jarkko Sakkinen @ 2020-10-07 18:14 ` Jethro Beekman 2020-10-07 18:34 ` Jarkko Sakkinen 2020-10-07 18:25 ` Jarkko Sakkinen 1 sibling, 1 reply; 14+ messages in thread From: Jethro Beekman @ 2020-10-07 18:14 UTC (permalink / raw) To: Jarkko Sakkinen; +Cc: Sean Christopherson, linux-sgx [-- Attachment #1: Type: text/plain, Size: 3817 bytes --] On 2020-10-07 19:20, Jarkko Sakkinen wrote: > On Wed, Oct 07, 2020 at 06:13:49PM +0200, Jethro Beekman wrote: >> On 2020-10-07 17:49, Jarkko Sakkinen wrote: >>> On Tue, Oct 06, 2020 at 06:13:28PM +0300, Jarkko Sakkinen wrote: >>>> On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote: >>>>> On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote: >>>>>> On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote: >>>>>>> On 2020-09-30 03:16, Jarkko Sakkinen wrote: >>>>>>>> On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote: >>>>>>>>> Since the latest API changes, I'm unable to load a large enclave. The >>>>>>>>> test program at >>>>>>>>> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs >>>>>>>>> always fails with ENOMEM after loading 0xffd6 pages. >>>>>>>>> >>>>>>>>> I've tested this with v36, if there's reason to believe it has been >>>>>>>>> fixed I'd be happy to try it out on a newer patch set. >>>>>>>> >>>>>>>> I recommend using v39-rc1 tag that I created for testing because API is >>>>>>>> reverted back to be compatible with v36. >>>>>>> >>>>>>> Not sure what you're saying. I tested with v36. You're saying v39-rc1 >>>>>>> will be the same? Or did you fix the issue since v36? >>>>>> >>>>>> v37 and v38 has an API change that is reverted in v39: >>>>>> >>>>>> https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/ >>>>>> >>>>>> I'm not sure of the root cause yet but you asked to try to out a newer >>>>>> patch set and v39-rc1 is the best option. >>>>>> >>>>>> There was off-by-one error in enclave maximum size calculation fixed in >>>>>> v37 (it was actually a bug in SDM inherited to the code) but that should >>>>>> not result the situation you just described. >>>>> >>>>> My money is on the XArray changes, that's the most notable change in v36 and >>>>> IIRC the only thing that touched EPC/memory management. >>>> >>>> Yeah, that's what we've been speculating for some days now. That's >>>> somewhat deprecated email. It all started to enroll when I asked >>>> Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information >>>> required to root cause the bug. >>> >>> I run the failing test and filtered SGX mmap's and ioctl's with this >>> eBPF script: >>> >>> kretprobe:sgx_ioctl /retval != 0/ >>> { >>> printf("sgx_ioctl: %d\n", retval) >>> } >>> >>> kretprobe:sgx_mmap /retval != 0/ >>> { >>> printf("sgx_mmap: %d\n", retval) >>> } >>> >>> This results zero positives, i.e. empty output, when run with bpftrace. >>> >>> I'd go instead after RLIMIT_AS [*]. >>> >>> With these conclusions, I'm done with this bug. >>> >> >> How can it be RLIMIT_AS? With the current flow, you mmap the whole range before mmaping the individual pages over it? >> >> Also, I can easily load a 1GB enclave with the old driver. >> >> Also: >> >> $ ulimit -v >> unlimited > > ➜ ~ (master) ✔ sudo bpftrace sgx_ret.bt > Attaching 3 probes... > ksys_mmap_pgoff: -12 > ^C > > ~ (master) ✔ cat sgx_ret.bt > kretprobe:sgx_ioctl /retval != 0/ > { > printf("sgx_ioctl: %d\n", retval) > } > > kretprobe:sgx_mmap /retval != 0/ > { > printf("sgx_mmap: %d\n", retval) > } > > kretprobe:ksys_mmap_pgoff /retval == (uint64)-12/ > { > printf("ksys_mmap_pgoff: %d\n", retval) > } > > This shows that it fails before reaching sgx_mmap(). > > /Jarkko > It's this one in do_mmap(): /* Too many mappings? */ if (mm->map_count > sysctl_max_map_count) return -ENOMEM; I've verified that I'm no longer getting the problem when increasing /proc/sys/vm/max_map_count . Why do I need to change this from the default compared to before? -- Jethro Beekman | Fortanix [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 4490 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave 2020-10-07 18:14 ` Jethro Beekman @ 2020-10-07 18:34 ` Jarkko Sakkinen 2020-10-07 18:36 ` Jarkko Sakkinen 0 siblings, 1 reply; 14+ messages in thread From: Jarkko Sakkinen @ 2020-10-07 18:34 UTC (permalink / raw) To: Jethro Beekman; +Cc: Sean Christopherson, linux-sgx On Wed, Oct 07, 2020 at 08:14:48PM +0200, Jethro Beekman wrote: > It's this one in do_mmap(): > > /* Too many mappings? */ > if (mm->map_count > sysctl_max_map_count) > return -ENOMEM; > > I've verified that I'm no longer getting the problem when increasing > /proc/sys/vm/max_map_count . Why do I need to change this from the > default compared to before? Yes, you are correct. I came into same conclusion and responded (once again) to my own email after running this: ➜ ~ (master) ✔ sudo bpftrace -e 'kr:ksys_mmap_pgoff { @[comm] = count(); }' &> log.txt ^C ➜ ~ (master) ✔ cat log.txt Attaching 1 probe... @[cat]: 18 @[git]: 47 @[zsh]: 49 @[cargo]: 94 @[sgx-load-large-]: 65510 That is the default value for /proc/sys/vm/max_map_count. Re-responding just in case because I thought that the bpftrace snippet might have some value for you. I don't why I cannot see my email at lore.kernel.org. /Jarkko ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave 2020-10-07 18:34 ` Jarkko Sakkinen @ 2020-10-07 18:36 ` Jarkko Sakkinen 0 siblings, 0 replies; 14+ messages in thread From: Jarkko Sakkinen @ 2020-10-07 18:36 UTC (permalink / raw) To: Jethro Beekman; +Cc: Sean Christopherson, linux-sgx On Wed, Oct 07, 2020 at 09:35:06PM +0300, Jarkko Sakkinen wrote: > On Wed, Oct 07, 2020 at 08:14:48PM +0200, Jethro Beekman wrote: > > It's this one in do_mmap(): > > > > /* Too many mappings? */ > > if (mm->map_count > sysctl_max_map_count) > > return -ENOMEM; > > > > I've verified that I'm no longer getting the problem when increasing > > /proc/sys/vm/max_map_count . Why do I need to change this from the > > default compared to before? > > Yes, you are correct. I came into same conclusion and responded (once > again) to my own email after running this: > > ➜ ~ (master) ✔ sudo bpftrace -e 'kr:ksys_mmap_pgoff { @[comm] = count(); }' &> log.txt > ^C > ➜ ~ (master) ✔ cat log.txt > Attaching 1 probe... > > > @[cat]: 18 > @[git]: 47 > @[zsh]: 49 > @[cargo]: 94 > @[sgx-load-large-]: 65510 > > That is the default value for /proc/sys/vm/max_map_count. > > Re-responding just in case because I thought that the bpftrace snippet > might have some value for you. I don't why I cannot see my email at > lore.kernel.org. ... looking also forward to run this test and the test suite (have not tried yet) in the master branch. This all is really useful for me that you are doing. /Jarkko ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Unable to load large enclave 2020-10-07 17:20 ` Jarkko Sakkinen 2020-10-07 18:14 ` Jethro Beekman @ 2020-10-07 18:25 ` Jarkko Sakkinen 1 sibling, 0 replies; 14+ messages in thread From: Jarkko Sakkinen @ 2020-10-07 18:25 UTC (permalink / raw) To: Jethro Beekman; +Cc: Sean Christopherson, linux-sgx On Wed, Oct 07, 2020 at 08:20:58PM +0300, Jarkko Sakkinen wrote: > On Wed, Oct 07, 2020 at 06:13:49PM +0200, Jethro Beekman wrote: > > On 2020-10-07 17:49, Jarkko Sakkinen wrote: > > > On Tue, Oct 06, 2020 at 06:13:28PM +0300, Jarkko Sakkinen wrote: > > >> On Mon, Oct 05, 2020 at 03:56:52PM -0700, Sean Christopherson wrote: > > >>> On Wed, Sep 30, 2020 at 02:45:54PM +0300, Jarkko Sakkinen wrote: > > >>>> On Wed, Sep 30, 2020 at 09:12:06AM +0200, Jethro Beekman wrote: > > >>>>> On 2020-09-30 03:16, Jarkko Sakkinen wrote: > > >>>>>> On Tue, Sep 29, 2020 at 05:52:48PM +0200, Jethro Beekman wrote: > > >>>>>>> Since the latest API changes, I'm unable to load a large enclave. The > > >>>>>>> test program at > > >>>>>>> https://github.com/fortanix/rust-sgx/blob/sgx-load-large-enclave-test/src/main.rs > > >>>>>>> always fails with ENOMEM after loading 0xffd6 pages. > > >>>>>>> > > >>>>>>> I've tested this with v36, if there's reason to believe it has been > > >>>>>>> fixed I'd be happy to try it out on a newer patch set. > > >>>>>> > > >>>>>> I recommend using v39-rc1 tag that I created for testing because API is > > >>>>>> reverted back to be compatible with v36. > > >>>>> > > >>>>> Not sure what you're saying. I tested with v36. You're saying v39-rc1 > > >>>>> will be the same? Or did you fix the issue since v36? > > >>>> > > >>>> v37 and v38 has an API change that is reverted in v39: > > >>>> > > >>>> https://lore.kernel.org/linux-sgx/20200921195822.GA58176@linux.intel.com/ > > >>>> > > >>>> I'm not sure of the root cause yet but you asked to try to out a newer > > >>>> patch set and v39-rc1 is the best option. > > >>>> > > >>>> There was off-by-one error in enclave maximum size calculation fixed in > > >>>> v37 (it was actually a bug in SDM inherited to the code) but that should > > >>>> not result the situation you just described. > > >>> > > >>> My money is on the XArray changes, that's the most notable change in v36 and > > >>> IIRC the only thing that touched EPC/memory management. > > >> > > >> Yeah, that's what we've been speculating for some days now. That's > > >> somewhat deprecated email. It all started to enroll when I asked > > >> Haitao to turn CONFIG_PROVE_LOCKING on, and we got the information > > >> required to root cause the bug. > > > > > > I run the failing test and filtered SGX mmap's and ioctl's with this > > > eBPF script: > > > > > > kretprobe:sgx_ioctl /retval != 0/ > > > { > > > printf("sgx_ioctl: %d\n", retval) > > > } > > > > > > kretprobe:sgx_mmap /retval != 0/ > > > { > > > printf("sgx_mmap: %d\n", retval) > > > } > > > > > > This results zero positives, i.e. empty output, when run with bpftrace. > > > > > > I'd go instead after RLIMIT_AS [*]. > > > > > > With these conclusions, I'm done with this bug. > > > > > > > How can it be RLIMIT_AS? With the current flow, you mmap the whole range before mmaping the individual pages over it? > > > > Also, I can easily load a 1GB enclave with the old driver. > > > > Also: > > > > $ ulimit -v > > unlimited > > ➜ ~ (master) ✔ sudo bpftrace sgx_ret.bt > Attaching 3 probes... > ksys_mmap_pgoff: -12 > ^C > > ~ (master) ✔ cat sgx_ret.bt > kretprobe:sgx_ioctl /retval != 0/ > { > printf("sgx_ioctl: %d\n", retval) > } > > kretprobe:sgx_mmap /retval != 0/ > { > printf("sgx_mmap: %d\n", retval) > } > > kretprobe:ksys_mmap_pgoff /retval == (uint64)-12/ > { > printf("ksys_mmap_pgoff: %d\n", retval) > } > > This shows that it fails before reaching sgx_mmap(). ➜ ~ (master) ✔ sudo bpftrace -e 'kr:ksys_mmap_pgoff { @[comm] = count(); }' Attaching 1 probe... ^C @[zsh]: 44 @[git]: 47 @[date]: 48 @[network.sh]: 48 @[battery.sh]: 56 @[which]: 84 @[cargo]: 94 @[head]: 96 @[iw]: 126 @[uname]: 144 @[cat]: 168 @[sh]: 175 @[sed]: 198 @[bash]: 216 @[ping]: 222 @[ls]: 324 @[sgx-load-large-]: 65510 65510 is the default value for /proc/sys/vm/max_map_count [*]. [*] https://www.kernel.org/doc/Documentation/sysctl/vm.txt /Jarkko ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-10-07 19:59 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-29 15:52 Unable to load large enclave Jethro Beekman 2020-09-30 1:16 ` Jarkko Sakkinen 2020-09-30 7:12 ` Jethro Beekman 2020-09-30 11:45 ` Jarkko Sakkinen 2020-10-03 13:12 ` Jarkko Sakkinen 2020-10-05 22:56 ` Sean Christopherson 2020-10-06 15:13 ` Jarkko Sakkinen 2020-10-07 15:49 ` Jarkko Sakkinen 2020-10-07 16:13 ` Jethro Beekman 2020-10-07 17:20 ` Jarkko Sakkinen 2020-10-07 18:14 ` Jethro Beekman 2020-10-07 18:34 ` Jarkko Sakkinen 2020-10-07 18:36 ` Jarkko Sakkinen 2020-10-07 18:25 ` Jarkko Sakkinen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).