linux-sgx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* x86/sgx: v23-rc1
@ 2019-10-09 14:07 Jarkko Sakkinen
  2019-10-09 14:35 ` Jarkko Sakkinen
  0 siblings, 1 reply; 5+ messages in thread
From: Jarkko Sakkinen @ 2019-10-09 14:07 UTC (permalink / raw)
  To: linux-sgx; +Cc: sean.j.christopherson, shay.katz-zamir, serge.ayoun

Baseline before adding Sean's updates. This contains only my updates. I
spent this day mostly fixing diff's. Especially these two were somewhat
unclean:

1. x86/sgx: Add a page reclaimer
2. x86/sgx: Linux Enclave Driver

Now they pile up nicely (I think). So I decided to do this tag since now
commit's in the sense of form and shape are legit. And also because
things, well, work.

I'll continue from this by integrating Sean's changes. You can see below
what has been already changed.

/Jarkko

tag v23-rc1
Tagger: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Date:   Wed Oct 9 16:59:10 2019 +0300

x86/sgx: v23-rc1 patch set

* Return -EIO instead of -ECANCELED when ptrace() fails to read a TCS page.
* In the reclaimer, pin page before ENCLS[EBLOCK] because pinning can fail
  (because of OOM) even in legit behaviour and after EBLOCK the reclaiming
  flow can be only reverted by killing the whole enclave.
* Fixed SGX_ATTR_RESERVED_MASK. Bit 7 was marked as reserved while in fact
  it should have been bit 6 (Table 37-3 in the SDM).
* Return -EPERM from SGX_IOC_ENCLAVE_INIT when ENCLS[EINIT] returns an SGX
  error code.
-----BEGIN PGP SIGNATURE-----

iJYEABYIAD4WIQRE6pSOnaBC00OEHEIaerohdGur0gUCXZ3nxCAcamFya2tvLnNh
a2tpbmVuQGxpbnV4LmludGVsLmNvbQAKCRAaerohdGur0mKVAQDcmIGs2f8y8hDY
b7zaQdNbaAMgsEkQ3ohMA88fbm2UQwD+P7y5AcAxzdccbgh++7RDy6XR2Ow2pluW
vCGUvRAhgwY=
=LCI3
-----END PGP SIGNATURE-----

/Jarkko

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: x86/sgx: v23-rc1
  2019-10-09 14:07 x86/sgx: v23-rc1 Jarkko Sakkinen
@ 2019-10-09 14:35 ` Jarkko Sakkinen
  2019-10-10  7:34   ` Jarkko Sakkinen
  0 siblings, 1 reply; 5+ messages in thread
From: Jarkko Sakkinen @ 2019-10-09 14:35 UTC (permalink / raw)
  To: linux-sgx; +Cc: sean.j.christopherson, shay.katz-zamir, serge.ayoun

On Wed, Oct 09, 2019 at 05:07:23PM +0300, Jarkko Sakkinen wrote:
> Baseline before adding Sean's updates. This contains only my updates. I
> spent this day mostly fixing diff's. Especially these two were somewhat
> unclean:
> 
> 1. x86/sgx: Add a page reclaimer
> 2. x86/sgx: Linux Enclave Driver
> 
> Now they pile up nicely (I think). So I decided to do this tag since now
> commit's in the sense of form and shape are legit. And also because
> things, well, work.
> 
> I'll continue from this by integrating Sean's changes. You can see below
> what has been already changed.
> 
> /Jarkko
> 
> tag v23-rc1
> Tagger: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> Date:   Wed Oct 9 16:59:10 2019 +0300
> 
> x86/sgx: v23-rc1 patch set
> 
> * Return -EIO instead of -ECANCELED when ptrace() fails to read a TCS page.
> * In the reclaimer, pin page before ENCLS[EBLOCK] because pinning can fail
>   (because of OOM) even in legit behaviour and after EBLOCK the reclaiming
>   flow can be only reverted by killing the whole enclave.
> * Fixed SGX_ATTR_RESERVED_MASK. Bit 7 was marked as reserved while in fact
>   it should have been bit 6 (Table 37-3 in the SDM).
> * Return -EPERM from SGX_IOC_ENCLAVE_INIT when ENCLS[EINIT] returns an SGX
>   error code.
> -----BEGIN PGP SIGNATURE-----
> 
> iJYEABYIAD4WIQRE6pSOnaBC00OEHEIaerohdGur0gUCXZ3nxCAcamFya2tvLnNh
> a2tpbmVuQGxpbnV4LmludGVsLmNvbQAKCRAaerohdGur0mKVAQDcmIGs2f8y8hDY
> b7zaQdNbaAMgsEkQ3ohMA88fbm2UQwD+P7y5AcAxzdccbgh++7RDy6XR2Ow2pluW
> vCGUvRAhgwY=
> =LCI3
> -----END PGP SIGNATURE-----
> 
> /Jarkko

Getting this with rc1 (after running selftest). Leaving from office.
No time to check this today but here are anyway logs.

[   96.906523] ============================================
[   96.906600] WARNING: possible recursive locking detected
[   96.906679] 5.4.0-rc1-custom #66 Not tainted
[   96.906741] --------------------------------------------
[   96.906817] test_sgx/1297 is trying to acquire lock:
[   96.906889] ffff99032aebdb18 (&mm->mmap_sem#2){++++}, at: __do_page_fault+0x424/0x4f0
[   96.907009]
               but task is already holding lock:
[   96.907091] ffff99032aebdb18 (&mm->mmap_sem#2){++++}, at: sgx_ioc_enclave_add_page+0x1fc/0x620
[   96.907217]
               other info that might help us debug this:
[   96.907308]  Possible unsafe locking scenario:

[   96.907391]        CPU0
[   96.907428]        ----
[   96.907464]   lock(&mm->mmap_sem#2);
[   96.907516]   lock(&mm->mmap_sem#2);
[   96.907569]
                *** DEADLOCK ***

[   96.907650]  May be due to missing lock nesting notation

[   96.907745] 2 locks held by test_sgx/1297:
[   96.907804]  #0: ffff99032aebdb18 (&mm->mmap_sem#2){++++}, at: sgx_ioc_enclave_add_page+0x1fc/0x620
[   96.907935]  #1: ffff990322de0080 (&encl->lock){+.+.}, at: sgx_ioc_enclave_add_page+0x212/0x620
[   96.910109]
               stack backtrace:
[   96.914616] CPU: 0 PID: 1297 Comm: test_sgx Not tainted 5.4.0-rc1-custom #66
[   96.918182] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0047.2018.1219.1246 12/19/2018
[   96.921795] Call Trace:
[   96.925209]  dump_stack+0x8e/0xd5
[   96.928462]  __lock_acquire+0xeab/0x1470
[   96.931648]  ? __do_fault+0x57/0x11d
[   96.934761]  lock_acquire+0xa3/0x180
[   96.937820]  ? __do_page_fault+0x424/0x4f0
[   96.940866]  down_read+0x30/0x150
[   96.943867]  ? __do_page_fault+0x424/0x4f0
[   96.946889]  __do_page_fault+0x424/0x4f0
[   96.949792]  do_page_fault+0x2c/0x1a0
[   96.952602]  page_fault+0x39/0x40
[   96.955400] RIP: 0010:sgx_ioc_enclave_add_page+0x3aa/0x620
[   96.958181] Code: 9d 10 ff ff ff 48 89 c8 48 81 e1 00 f0 ff ff 83 e0 0f 48 8d 14 40 48 8d 04 90 49 8d 04 c0 48 2b 08 48 03 48 08 b8 01 00 00 00 <0f> 01 cf 31 c0 0f 01 ca 85 c0 0f 85 0b 02 00 00 65 48 8b 04 25 c0
[   96.964133] RSP: 0018:ffffb46640df3c80 EFLAGS: 00050286
[   96.967141] RAX: 0000000000000001 RBX: ffffb46640df3cc0 RCX: ffffb4664ddfd000
[   96.970254] RDX: 0000000000000000 RSI: 00007f2d7a30f000 RDI: 0000000000000246
[   96.973440] RBP: ffffb46640df3db0 R08: ffffffffa8faf8a0 R09: 0000000000000000
[   96.976698] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990322de0000
[   96.980048] R13: ffff99032ba198c0 R14: 0000000000000000 R15: ffff99033a10cfe0
[   96.983425]  ? avc_has_extended_perms+0x1f6/0x610
[   96.986830]  sgx_ioctl+0x87/0x470
[   96.990247]  ? sgx_ioctl+0x87/0x470
[   96.993696]  do_vfs_ioctl+0xa9/0x6d0
[   96.997151]  ? tomoyo_file_ioctl+0x19/0x20
[   97.000571]  ksys_ioctl+0x75/0x80
[   97.003983]  ? do_syscall_64+0x17/0x230
[   97.007285]  __x64_sys_ioctl+0x1a/0x20
[   97.010487]  do_syscall_64+0x5f/0x230
[   97.013612]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   97.016791] RIP: 0033:0x7f2d79e135d7
[   97.019897] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
[   97.026332] RSP: 002b:00007ffd983721f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[   97.029673] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2d79e135d7
[   97.033037] RDX: 00007ffd98372260 RSI: 000000004020a401 RDI: 0000000000000003
[   97.036426] RBP: 00007ffd98372330 R08: 0000000000000003 R09: 0000000000000000
[   97.039813] R10: 00007ffd98372350 R11: 0000000000000202 R12: 00005565ae6e89e0
[   97.043190] R13: 00007ffd98373c40 R14: 0000000000000000 R15: 0000000000000000
[  681.794211] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

$ sudo cat /sys/kernel/debug/kmemleak
[sudo] password for jsakkine:
unreferenced object 0xffff990325b69eb0 (size 16):
  comm "kworker/u8:1", pid 31, jiffies 4294895395 (age 1718.288s)
  hex dump (first 16 bytes):
    6d 65 6d 73 74 69 63 6b 30 00 01 00 00 00 00 00  memstick0.......
  backtrace:
    [<0000000010512df5>] __kmalloc_track_caller+0x139/0x280
    [<00000000a5374cb0>] kstrdup+0x31/0x60
    [<00000000c59be911>] kstrdup_const+0x24/0x30
    [<00000000ff88e957>] kvasprintf_const+0x86/0xa0
    [<0000000050affb9a>] kobject_set_name_vargs+0x23/0x90
    [<00000000839b8dd7>] dev_set_name+0x4e/0x70
    [<0000000069897a8c>] memstick_check+0xdf/0x3a3 [memstick]
    [<00000000dffb0c9f>] process_one_work+0x281/0x5c0
    [<00000000090981e2>] worker_thread+0x34/0x400
    [<00000000bb117b3c>] kthread+0x121/0x140
    [<000000004d2f4c32>] ret_from_fork+0x24/0x50

/Jarkko

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: x86/sgx: v23-rc1
  2019-10-09 14:35 ` Jarkko Sakkinen
@ 2019-10-10  7:34   ` Jarkko Sakkinen
  2019-10-10  7:38     ` Jarkko Sakkinen
  0 siblings, 1 reply; 5+ messages in thread
From: Jarkko Sakkinen @ 2019-10-10  7:34 UTC (permalink / raw)
  To: linux-sgx; +Cc: sean.j.christopherson, shay.katz-zamir, serge.ayoun

On Wed, Oct 09, 2019 at 05:35:29PM +0300, Jarkko Sakkinen wrote:
> On Wed, Oct 09, 2019 at 05:07:23PM +0300, Jarkko Sakkinen wrote:
> > Baseline before adding Sean's updates. This contains only my updates. I
> > spent this day mostly fixing diff's. Especially these two were somewhat
> > unclean:
> > 
> > 1. x86/sgx: Add a page reclaimer
> > 2. x86/sgx: Linux Enclave Driver
> > 
> > Now they pile up nicely (I think). So I decided to do this tag since now
> > commit's in the sense of form and shape are legit. And also because
> > things, well, work.
> > 
> > I'll continue from this by integrating Sean's changes. You can see below
> > what has been already changed.
> > 
> > /Jarkko
> > 
> > tag v23-rc1
> > Tagger: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> > Date:   Wed Oct 9 16:59:10 2019 +0300
> > 
> > x86/sgx: v23-rc1 patch set
> > 
> > * Return -EIO instead of -ECANCELED when ptrace() fails to read a TCS page.
> > * In the reclaimer, pin page before ENCLS[EBLOCK] because pinning can fail
> >   (because of OOM) even in legit behaviour and after EBLOCK the reclaiming
> >   flow can be only reverted by killing the whole enclave.
> > * Fixed SGX_ATTR_RESERVED_MASK. Bit 7 was marked as reserved while in fact
> >   it should have been bit 6 (Table 37-3 in the SDM).
> > * Return -EPERM from SGX_IOC_ENCLAVE_INIT when ENCLS[EINIT] returns an SGX
> >   error code.
> > -----BEGIN PGP SIGNATURE-----
> > 
> > iJYEABYIAD4WIQRE6pSOnaBC00OEHEIaerohdGur0gUCXZ3nxCAcamFya2tvLnNh
> > a2tpbmVuQGxpbnV4LmludGVsLmNvbQAKCRAaerohdGur0mKVAQDcmIGs2f8y8hDY
> > b7zaQdNbaAMgsEkQ3ohMA88fbm2UQwD+P7y5AcAxzdccbgh++7RDy6XR2Ow2pluW
> > vCGUvRAhgwY=
> > =LCI3
> > -----END PGP SIGNATURE-----
> > 
> > /Jarkko
> 
> Getting this with rc1 (after running selftest). Leaving from office.
> No time to check this today but here are anyway logs.
> 
> [   96.906523] ============================================
> [   96.906600] WARNING: possible recursive locking detected
> [   96.906679] 5.4.0-rc1-custom #66 Not tainted
> [   96.906741] --------------------------------------------
> [   96.906817] test_sgx/1297 is trying to acquire lock:
> [   96.906889] ffff99032aebdb18 (&mm->mmap_sem#2){++++}, at: __do_page_fault+0x424/0x4f0
> [   96.907009]
>                but task is already holding lock:
> [   96.907091] ffff99032aebdb18 (&mm->mmap_sem#2){++++}, at: sgx_ioc_enclave_add_page+0x1fc/0x620
> [   96.907217]
>                other info that might help us debug this:
> [   96.907308]  Possible unsafe locking scenario:
> 
> [   96.907391]        CPU0
> [   96.907428]        ----
> [   96.907464]   lock(&mm->mmap_sem#2);
> [   96.907516]   lock(&mm->mmap_sem#2);
> [   96.907569]
>                 *** DEADLOCK ***
> 
> [   96.907650]  May be due to missing lock nesting notation
> 
> [   96.907745] 2 locks held by test_sgx/1297:
> [   96.907804]  #0: ffff99032aebdb18 (&mm->mmap_sem#2){++++}, at: sgx_ioc_enclave_add_page+0x1fc/0x620
> [   96.907935]  #1: ffff990322de0080 (&encl->lock){+.+.}, at: sgx_ioc_enclave_add_page+0x212/0x620
> [   96.910109]
>                stack backtrace:
> [   96.914616] CPU: 0 PID: 1297 Comm: test_sgx Not tainted 5.4.0-rc1-custom #66
> [   96.918182] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0047.2018.1219.1246 12/19/2018
> [   96.921795] Call Trace:
> [   96.925209]  dump_stack+0x8e/0xd5
> [   96.928462]  __lock_acquire+0xeab/0x1470
> [   96.931648]  ? __do_fault+0x57/0x11d
> [   96.934761]  lock_acquire+0xa3/0x180
> [   96.937820]  ? __do_page_fault+0x424/0x4f0
> [   96.940866]  down_read+0x30/0x150
> [   96.943867]  ? __do_page_fault+0x424/0x4f0
> [   96.946889]  __do_page_fault+0x424/0x4f0
> [   96.949792]  do_page_fault+0x2c/0x1a0
> [   96.952602]  page_fault+0x39/0x40
> [   96.955400] RIP: 0010:sgx_ioc_enclave_add_page+0x3aa/0x620
> [   96.958181] Code: 9d 10 ff ff ff 48 89 c8 48 81 e1 00 f0 ff ff 83 e0 0f 48 8d 14 40 48 8d 04 90 49 8d 04 c0 48 2b 08 48 03 48 08 b8 01 00 00 00 <0f> 01 cf 31 c0 0f 01 ca 85 c0 0f 85 0b 02 00 00 65 48 8b 04 25 c0
> [   96.964133] RSP: 0018:ffffb46640df3c80 EFLAGS: 00050286
> [   96.967141] RAX: 0000000000000001 RBX: ffffb46640df3cc0 RCX: ffffb4664ddfd000
> [   96.970254] RDX: 0000000000000000 RSI: 00007f2d7a30f000 RDI: 0000000000000246
> [   96.973440] RBP: ffffb46640df3db0 R08: ffffffffa8faf8a0 R09: 0000000000000000
> [   96.976698] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990322de0000
> [   96.980048] R13: ffff99032ba198c0 R14: 0000000000000000 R15: ffff99033a10cfe0
> [   96.983425]  ? avc_has_extended_perms+0x1f6/0x610
> [   96.986830]  sgx_ioctl+0x87/0x470
> [   96.990247]  ? sgx_ioctl+0x87/0x470
> [   96.993696]  do_vfs_ioctl+0xa9/0x6d0
> [   96.997151]  ? tomoyo_file_ioctl+0x19/0x20
> [   97.000571]  ksys_ioctl+0x75/0x80
> [   97.003983]  ? do_syscall_64+0x17/0x230
> [   97.007285]  __x64_sys_ioctl+0x1a/0x20
> [   97.010487]  do_syscall_64+0x5f/0x230
> [   97.013612]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [   97.016791] RIP: 0033:0x7f2d79e135d7
> [   97.019897] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
> [   97.026332] RSP: 002b:00007ffd983721f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
> [   97.029673] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2d79e135d7
> [   97.033037] RDX: 00007ffd98372260 RSI: 000000004020a401 RDI: 0000000000000003
> [   97.036426] RBP: 00007ffd98372330 R08: 0000000000000003 R09: 0000000000000000
> [   97.039813] R10: 00007ffd98372350 R11: 0000000000000202 R12: 00005565ae6e89e0
> [   97.043190] R13: 00007ffd98373c40 R14: 0000000000000000 R15: 0000000000000000
> [  681.794211] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> 
> $ sudo cat /sys/kernel/debug/kmemleak
> [sudo] password for jsakkine:
> unreferenced object 0xffff990325b69eb0 (size 16):
>   comm "kworker/u8:1", pid 31, jiffies 4294895395 (age 1718.288s)
>   hex dump (first 16 bytes):
>     6d 65 6d 73 74 69 63 6b 30 00 01 00 00 00 00 00  memstick0.......
>   backtrace:
>     [<0000000010512df5>] __kmalloc_track_caller+0x139/0x280
>     [<00000000a5374cb0>] kstrdup+0x31/0x60
>     [<00000000c59be911>] kstrdup_const+0x24/0x30
>     [<00000000ff88e957>] kvasprintf_const+0x86/0xa0
>     [<0000000050affb9a>] kobject_set_name_vargs+0x23/0x90
>     [<00000000839b8dd7>] dev_set_name+0x4e/0x70
>     [<0000000069897a8c>] memstick_check+0xdf/0x3a3 [memstick]
>     [<00000000dffb0c9f>] process_one_work+0x281/0x5c0
>     [<00000000090981e2>] worker_thread+0x34/0x400
>     [<00000000bb117b3c>] kthread+0x121/0x140
>     [<000000004d2f4c32>] ret_from_fork+0x24/0x50
> 
> /Jarkko

The locking order is all wrong:

	up_read(&current->mm->mmap_sem);

	if (ret)
		goto err_out;

	ret = __sgx_encl_extend(encl, epc_page, addp->mrmask);
	if (ret)
		goto err_out;

	encl_page->encl = encl;
	encl_page->epc_page = epc_page;
	encl->secs_child_cnt++;

	sgx_mark_page_reclaimable(encl_page->epc_page);

	mutex_unlock(&encl->lock);

Sean: what might be reason for this? Probably is caused by add
page worker changes. Is this just something that has happend when
squashing patches by accident?

/Jarkko

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: x86/sgx: v23-rc1
  2019-10-10  7:34   ` Jarkko Sakkinen
@ 2019-10-10  7:38     ` Jarkko Sakkinen
  2019-10-10  8:07       ` Jarkko Sakkinen
  0 siblings, 1 reply; 5+ messages in thread
From: Jarkko Sakkinen @ 2019-10-10  7:38 UTC (permalink / raw)
  To: linux-sgx; +Cc: sean.j.christopherson, shay.katz-zamir, serge.ayoun

On Thu, Oct 10, 2019 at 10:34:48AM +0300, Jarkko Sakkinen wrote:
> On Wed, Oct 09, 2019 at 05:35:29PM +0300, Jarkko Sakkinen wrote:
> > On Wed, Oct 09, 2019 at 05:07:23PM +0300, Jarkko Sakkinen wrote:
> > > Baseline before adding Sean's updates. This contains only my updates. I
> > > spent this day mostly fixing diff's. Especially these two were somewhat
> > > unclean:
> > > 
> > > 1. x86/sgx: Add a page reclaimer
> > > 2. x86/sgx: Linux Enclave Driver
> > > 
> > > Now they pile up nicely (I think). So I decided to do this tag since now
> > > commit's in the sense of form and shape are legit. And also because
> > > things, well, work.
> > > 
> > > I'll continue from this by integrating Sean's changes. You can see below
> > > what has been already changed.
> > > 
> > > /Jarkko
> > > 
> > > tag v23-rc1
> > > Tagger: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> > > Date:   Wed Oct 9 16:59:10 2019 +0300
> > > 
> > > x86/sgx: v23-rc1 patch set
> > > 
> > > * Return -EIO instead of -ECANCELED when ptrace() fails to read a TCS page.
> > > * In the reclaimer, pin page before ENCLS[EBLOCK] because pinning can fail
> > >   (because of OOM) even in legit behaviour and after EBLOCK the reclaiming
> > >   flow can be only reverted by killing the whole enclave.
> > > * Fixed SGX_ATTR_RESERVED_MASK. Bit 7 was marked as reserved while in fact
> > >   it should have been bit 6 (Table 37-3 in the SDM).
> > > * Return -EPERM from SGX_IOC_ENCLAVE_INIT when ENCLS[EINIT] returns an SGX
> > >   error code.
> > > -----BEGIN PGP SIGNATURE-----
> > > 
> > > iJYEABYIAD4WIQRE6pSOnaBC00OEHEIaerohdGur0gUCXZ3nxCAcamFya2tvLnNh
> > > a2tpbmVuQGxpbnV4LmludGVsLmNvbQAKCRAaerohdGur0mKVAQDcmIGs2f8y8hDY
> > > b7zaQdNbaAMgsEkQ3ohMA88fbm2UQwD+P7y5AcAxzdccbgh++7RDy6XR2Ow2pluW
> > > vCGUvRAhgwY=
> > > =LCI3
> > > -----END PGP SIGNATURE-----
> > > 
> > > /Jarkko
> > 
> > Getting this with rc1 (after running selftest). Leaving from office.
> > No time to check this today but here are anyway logs.
> > 
> > [   96.906523] ============================================
> > [   96.906600] WARNING: possible recursive locking detected
> > [   96.906679] 5.4.0-rc1-custom #66 Not tainted
> > [   96.906741] --------------------------------------------
> > [   96.906817] test_sgx/1297 is trying to acquire lock:
> > [   96.906889] ffff99032aebdb18 (&mm->mmap_sem#2){++++}, at: __do_page_fault+0x424/0x4f0
> > [   96.907009]
> >                but task is already holding lock:
> > [   96.907091] ffff99032aebdb18 (&mm->mmap_sem#2){++++}, at: sgx_ioc_enclave_add_page+0x1fc/0x620
> > [   96.907217]
> >                other info that might help us debug this:
> > [   96.907308]  Possible unsafe locking scenario:
> > 
> > [   96.907391]        CPU0
> > [   96.907428]        ----
> > [   96.907464]   lock(&mm->mmap_sem#2);
> > [   96.907516]   lock(&mm->mmap_sem#2);
> > [   96.907569]
> >                 *** DEADLOCK ***
> > 
> > [   96.907650]  May be due to missing lock nesting notation
> > 
> > [   96.907745] 2 locks held by test_sgx/1297:
> > [   96.907804]  #0: ffff99032aebdb18 (&mm->mmap_sem#2){++++}, at: sgx_ioc_enclave_add_page+0x1fc/0x620
> > [   96.907935]  #1: ffff990322de0080 (&encl->lock){+.+.}, at: sgx_ioc_enclave_add_page+0x212/0x620
> > [   96.910109]
> >                stack backtrace:
> > [   96.914616] CPU: 0 PID: 1297 Comm: test_sgx Not tainted 5.4.0-rc1-custom #66
> > [   96.918182] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0047.2018.1219.1246 12/19/2018
> > [   96.921795] Call Trace:
> > [   96.925209]  dump_stack+0x8e/0xd5
> > [   96.928462]  __lock_acquire+0xeab/0x1470
> > [   96.931648]  ? __do_fault+0x57/0x11d
> > [   96.934761]  lock_acquire+0xa3/0x180
> > [   96.937820]  ? __do_page_fault+0x424/0x4f0
> > [   96.940866]  down_read+0x30/0x150
> > [   96.943867]  ? __do_page_fault+0x424/0x4f0
> > [   96.946889]  __do_page_fault+0x424/0x4f0
> > [   96.949792]  do_page_fault+0x2c/0x1a0
> > [   96.952602]  page_fault+0x39/0x40
> > [   96.955400] RIP: 0010:sgx_ioc_enclave_add_page+0x3aa/0x620
> > [   96.958181] Code: 9d 10 ff ff ff 48 89 c8 48 81 e1 00 f0 ff ff 83 e0 0f 48 8d 14 40 48 8d 04 90 49 8d 04 c0 48 2b 08 48 03 48 08 b8 01 00 00 00 <0f> 01 cf 31 c0 0f 01 ca 85 c0 0f 85 0b 02 00 00 65 48 8b 04 25 c0
> > [   96.964133] RSP: 0018:ffffb46640df3c80 EFLAGS: 00050286
> > [   96.967141] RAX: 0000000000000001 RBX: ffffb46640df3cc0 RCX: ffffb4664ddfd000
> > [   96.970254] RDX: 0000000000000000 RSI: 00007f2d7a30f000 RDI: 0000000000000246
> > [   96.973440] RBP: ffffb46640df3db0 R08: ffffffffa8faf8a0 R09: 0000000000000000
> > [   96.976698] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990322de0000
> > [   96.980048] R13: ffff99032ba198c0 R14: 0000000000000000 R15: ffff99033a10cfe0
> > [   96.983425]  ? avc_has_extended_perms+0x1f6/0x610
> > [   96.986830]  sgx_ioctl+0x87/0x470
> > [   96.990247]  ? sgx_ioctl+0x87/0x470
> > [   96.993696]  do_vfs_ioctl+0xa9/0x6d0
> > [   96.997151]  ? tomoyo_file_ioctl+0x19/0x20
> > [   97.000571]  ksys_ioctl+0x75/0x80
> > [   97.003983]  ? do_syscall_64+0x17/0x230
> > [   97.007285]  __x64_sys_ioctl+0x1a/0x20
> > [   97.010487]  do_syscall_64+0x5f/0x230
> > [   97.013612]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > [   97.016791] RIP: 0033:0x7f2d79e135d7
> > [   97.019897] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
> > [   97.026332] RSP: 002b:00007ffd983721f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
> > [   97.029673] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2d79e135d7
> > [   97.033037] RDX: 00007ffd98372260 RSI: 000000004020a401 RDI: 0000000000000003
> > [   97.036426] RBP: 00007ffd98372330 R08: 0000000000000003 R09: 0000000000000000
> > [   97.039813] R10: 00007ffd98372350 R11: 0000000000000202 R12: 00005565ae6e89e0
> > [   97.043190] R13: 00007ffd98373c40 R14: 0000000000000000 R15: 0000000000000000
> > [  681.794211] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > 
> > $ sudo cat /sys/kernel/debug/kmemleak
> > [sudo] password for jsakkine:
> > unreferenced object 0xffff990325b69eb0 (size 16):
> >   comm "kworker/u8:1", pid 31, jiffies 4294895395 (age 1718.288s)
> >   hex dump (first 16 bytes):
> >     6d 65 6d 73 74 69 63 6b 30 00 01 00 00 00 00 00  memstick0.......
> >   backtrace:
> >     [<0000000010512df5>] __kmalloc_track_caller+0x139/0x280
> >     [<00000000a5374cb0>] kstrdup+0x31/0x60
> >     [<00000000c59be911>] kstrdup_const+0x24/0x30
> >     [<00000000ff88e957>] kvasprintf_const+0x86/0xa0
> >     [<0000000050affb9a>] kobject_set_name_vargs+0x23/0x90
> >     [<00000000839b8dd7>] dev_set_name+0x4e/0x70
> >     [<0000000069897a8c>] memstick_check+0xdf/0x3a3 [memstick]
> >     [<00000000dffb0c9f>] process_one_work+0x281/0x5c0
> >     [<00000000090981e2>] worker_thread+0x34/0x400
> >     [<00000000bb117b3c>] kthread+0x121/0x140
> >     [<000000004d2f4c32>] ret_from_fork+0x24/0x50
> > 
> > /Jarkko
> 
> The locking order is all wrong:
> 
> 	up_read(&current->mm->mmap_sem);
> 
> 	if (ret)
> 		goto err_out;
> 
> 	ret = __sgx_encl_extend(encl, epc_page, addp->mrmask);
> 	if (ret)
> 		goto err_out;
> 
> 	encl_page->encl = encl;
> 	encl_page->epc_page = epc_page;
> 	encl->secs_child_cnt++;
> 
> 	sgx_mark_page_reclaimable(encl_page->epc_page);
> 
> 	mutex_unlock(&encl->lock);
> 
> Sean: what might be reason for this? Probably is caused by add
> page worker changes. Is this just something that has happend when
> squashing patches by accident?

I guess it is a real regression as the same happens in the
error path.

/Jarkko

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: x86/sgx: v23-rc1
  2019-10-10  7:38     ` Jarkko Sakkinen
@ 2019-10-10  8:07       ` Jarkko Sakkinen
  0 siblings, 0 replies; 5+ messages in thread
From: Jarkko Sakkinen @ 2019-10-10  8:07 UTC (permalink / raw)
  To: linux-sgx; +Cc: sean.j.christopherson, shay.katz-zamir, serge.ayoun

On Thu, Oct 10, 2019 at 10:38:15AM +0300, Jarkko Sakkinen wrote:
> On Thu, Oct 10, 2019 at 10:34:48AM +0300, Jarkko Sakkinen wrote:
> > On Wed, Oct 09, 2019 at 05:35:29PM +0300, Jarkko Sakkinen wrote:
> > > On Wed, Oct 09, 2019 at 05:07:23PM +0300, Jarkko Sakkinen wrote:
> > > > Baseline before adding Sean's updates. This contains only my updates. I
> > > > spent this day mostly fixing diff's. Especially these two were somewhat
> > > > unclean:
> > > > 
> > > > 1. x86/sgx: Add a page reclaimer
> > > > 2. x86/sgx: Linux Enclave Driver
> > > > 
> > > > Now they pile up nicely (I think). So I decided to do this tag since now
> > > > commit's in the sense of form and shape are legit. And also because
> > > > things, well, work.
> > > > 
> > > > I'll continue from this by integrating Sean's changes. You can see below
> > > > what has been already changed.
> > > > 
> > > > /Jarkko
> > > > 
> > > > tag v23-rc1
> > > > Tagger: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
> > > > Date:   Wed Oct 9 16:59:10 2019 +0300
> > > > 
> > > > x86/sgx: v23-rc1 patch set
> > > > 
> > > > * Return -EIO instead of -ECANCELED when ptrace() fails to read a TCS page.
> > > > * In the reclaimer, pin page before ENCLS[EBLOCK] because pinning can fail
> > > >   (because of OOM) even in legit behaviour and after EBLOCK the reclaiming
> > > >   flow can be only reverted by killing the whole enclave.
> > > > * Fixed SGX_ATTR_RESERVED_MASK. Bit 7 was marked as reserved while in fact
> > > >   it should have been bit 6 (Table 37-3 in the SDM).
> > > > * Return -EPERM from SGX_IOC_ENCLAVE_INIT when ENCLS[EINIT] returns an SGX
> > > >   error code.
> > > > -----BEGIN PGP SIGNATURE-----
> > > > 
> > > > iJYEABYIAD4WIQRE6pSOnaBC00OEHEIaerohdGur0gUCXZ3nxCAcamFya2tvLnNh
> > > > a2tpbmVuQGxpbnV4LmludGVsLmNvbQAKCRAaerohdGur0mKVAQDcmIGs2f8y8hDY
> > > > b7zaQdNbaAMgsEkQ3ohMA88fbm2UQwD+P7y5AcAxzdccbgh++7RDy6XR2Ow2pluW
> > > > vCGUvRAhgwY=
> > > > =LCI3
> > > > -----END PGP SIGNATURE-----
> > > > 
> > > > /Jarkko
> > > 
> > > Getting this with rc1 (after running selftest). Leaving from office.
> > > No time to check this today but here are anyway logs.
> > > 
> > > [   96.906523] ============================================
> > > [   96.906600] WARNING: possible recursive locking detected
> > > [   96.906679] 5.4.0-rc1-custom #66 Not tainted
> > > [   96.906741] --------------------------------------------
> > > [   96.906817] test_sgx/1297 is trying to acquire lock:
> > > [   96.906889] ffff99032aebdb18 (&mm->mmap_sem#2){++++}, at: __do_page_fault+0x424/0x4f0
> > > [   96.907009]
> > >                but task is already holding lock:
> > > [   96.907091] ffff99032aebdb18 (&mm->mmap_sem#2){++++}, at: sgx_ioc_enclave_add_page+0x1fc/0x620
> > > [   96.907217]
> > >                other info that might help us debug this:
> > > [   96.907308]  Possible unsafe locking scenario:
> > > 
> > > [   96.907391]        CPU0
> > > [   96.907428]        ----
> > > [   96.907464]   lock(&mm->mmap_sem#2);
> > > [   96.907516]   lock(&mm->mmap_sem#2);
> > > [   96.907569]
> > >                 *** DEADLOCK ***
> > > 
> > > [   96.907650]  May be due to missing lock nesting notation
> > > 
> > > [   96.907745] 2 locks held by test_sgx/1297:
> > > [   96.907804]  #0: ffff99032aebdb18 (&mm->mmap_sem#2){++++}, at: sgx_ioc_enclave_add_page+0x1fc/0x620
> > > [   96.907935]  #1: ffff990322de0080 (&encl->lock){+.+.}, at: sgx_ioc_enclave_add_page+0x212/0x620
> > > [   96.910109]
> > >                stack backtrace:
> > > [   96.914616] CPU: 0 PID: 1297 Comm: test_sgx Not tainted 5.4.0-rc1-custom #66
> > > [   96.918182] Hardware name: Intel Corporation NUC7CJYH/NUC7JYB, BIOS JYGLKCPX.86A.0047.2018.1219.1246 12/19/2018
> > > [   96.921795] Call Trace:
> > > [   96.925209]  dump_stack+0x8e/0xd5
> > > [   96.928462]  __lock_acquire+0xeab/0x1470
> > > [   96.931648]  ? __do_fault+0x57/0x11d
> > > [   96.934761]  lock_acquire+0xa3/0x180
> > > [   96.937820]  ? __do_page_fault+0x424/0x4f0
> > > [   96.940866]  down_read+0x30/0x150
> > > [   96.943867]  ? __do_page_fault+0x424/0x4f0
> > > [   96.946889]  __do_page_fault+0x424/0x4f0
> > > [   96.949792]  do_page_fault+0x2c/0x1a0
> > > [   96.952602]  page_fault+0x39/0x40
> > > [   96.955400] RIP: 0010:sgx_ioc_enclave_add_page+0x3aa/0x620
> > > [   96.958181] Code: 9d 10 ff ff ff 48 89 c8 48 81 e1 00 f0 ff ff 83 e0 0f 48 8d 14 40 48 8d 04 90 49 8d 04 c0 48 2b 08 48 03 48 08 b8 01 00 00 00 <0f> 01 cf 31 c0 0f 01 ca 85 c0 0f 85 0b 02 00 00 65 48 8b 04 25 c0
> > > [   96.964133] RSP: 0018:ffffb46640df3c80 EFLAGS: 00050286
> > > [   96.967141] RAX: 0000000000000001 RBX: ffffb46640df3cc0 RCX: ffffb4664ddfd000
> > > [   96.970254] RDX: 0000000000000000 RSI: 00007f2d7a30f000 RDI: 0000000000000246
> > > [   96.973440] RBP: ffffb46640df3db0 R08: ffffffffa8faf8a0 R09: 0000000000000000
> > > [   96.976698] R10: 0000000000000000 R11: 0000000000000000 R12: ffff990322de0000
> > > [   96.980048] R13: ffff99032ba198c0 R14: 0000000000000000 R15: ffff99033a10cfe0
> > > [   96.983425]  ? avc_has_extended_perms+0x1f6/0x610
> > > [   96.986830]  sgx_ioctl+0x87/0x470
> > > [   96.990247]  ? sgx_ioctl+0x87/0x470
> > > [   96.993696]  do_vfs_ioctl+0xa9/0x6d0
> > > [   96.997151]  ? tomoyo_file_ioctl+0x19/0x20
> > > [   97.000571]  ksys_ioctl+0x75/0x80
> > > [   97.003983]  ? do_syscall_64+0x17/0x230
> > > [   97.007285]  __x64_sys_ioctl+0x1a/0x20
> > > [   97.010487]  do_syscall_64+0x5f/0x230
> > > [   97.013612]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > > [   97.016791] RIP: 0033:0x7f2d79e135d7
> > > [   97.019897] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48
> > > [   97.026332] RSP: 002b:00007ffd983721f8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
> > > [   97.029673] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2d79e135d7
> > > [   97.033037] RDX: 00007ffd98372260 RSI: 000000004020a401 RDI: 0000000000000003
> > > [   97.036426] RBP: 00007ffd98372330 R08: 0000000000000003 R09: 0000000000000000
> > > [   97.039813] R10: 00007ffd98372350 R11: 0000000000000202 R12: 00005565ae6e89e0
> > > [   97.043190] R13: 00007ffd98373c40 R14: 0000000000000000 R15: 0000000000000000
> > > [  681.794211] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
> > > 
> > > $ sudo cat /sys/kernel/debug/kmemleak
> > > [sudo] password for jsakkine:
> > > unreferenced object 0xffff990325b69eb0 (size 16):
> > >   comm "kworker/u8:1", pid 31, jiffies 4294895395 (age 1718.288s)
> > >   hex dump (first 16 bytes):
> > >     6d 65 6d 73 74 69 63 6b 30 00 01 00 00 00 00 00  memstick0.......
> > >   backtrace:
> > >     [<0000000010512df5>] __kmalloc_track_caller+0x139/0x280
> > >     [<00000000a5374cb0>] kstrdup+0x31/0x60
> > >     [<00000000c59be911>] kstrdup_const+0x24/0x30
> > >     [<00000000ff88e957>] kvasprintf_const+0x86/0xa0
> > >     [<0000000050affb9a>] kobject_set_name_vargs+0x23/0x90
> > >     [<00000000839b8dd7>] dev_set_name+0x4e/0x70
> > >     [<0000000069897a8c>] memstick_check+0xdf/0x3a3 [memstick]
> > >     [<00000000dffb0c9f>] process_one_work+0x281/0x5c0
> > >     [<00000000090981e2>] worker_thread+0x34/0x400
> > >     [<00000000bb117b3c>] kthread+0x121/0x140
> > >     [<000000004d2f4c32>] ret_from_fork+0x24/0x50
> > > 
> > > /Jarkko
> > 
> > The locking order is all wrong:
> > 
> > 	up_read(&current->mm->mmap_sem);
> > 
> > 	if (ret)
> > 		goto err_out;
> > 
> > 	ret = __sgx_encl_extend(encl, epc_page, addp->mrmask);
> > 	if (ret)
> > 		goto err_out;
> > 
> > 	encl_page->encl = encl;
> > 	encl_page->epc_page = epc_page;
> > 	encl->secs_child_cnt++;
> > 
> > 	sgx_mark_page_reclaimable(encl_page->epc_page);
> > 
> > 	mutex_unlock(&encl->lock);
> > 
> > Sean: what might be reason for this? Probably is caused by add
> > page worker changes. Is this just something that has happend when
> > squashing patches by accident?
> 
> I guess it is a real regression as the same happens in the
> error path.

The problem is that __uaccess_begin() can ignite #PF, which causes the
deadlock.

When I saw this first time I proposed using get_user_pages() because the
code looked at first sight a bit flakky (e.g. you don't ignite fault
handlers explicitly). Now I think it is the only acceptable way to fix
this.

/Jarkko

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-10-10  8:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-09 14:07 x86/sgx: v23-rc1 Jarkko Sakkinen
2019-10-09 14:35 ` Jarkko Sakkinen
2019-10-10  7:34   ` Jarkko Sakkinen
2019-10-10  7:38     ` Jarkko Sakkinen
2019-10-10  8:07       ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).