All of lore.kernel.org
 help / color / mirror / Atom feed
* Oops with tip/x86/fpu
@ 2015-03-04 18:30 Dave Hansen
  2015-03-04 19:06 ` Oleg Nesterov
                   ` (6 more replies)
  0 siblings, 7 replies; 126+ messages in thread
From: Dave Hansen @ 2015-03-04 18:30 UTC (permalink / raw)
  To: Andy Lutomirski, Borislav Petkov, Ingo Molnar, Linus Torvalds,
	Oleg Nesterov, Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML,
	Yu, Fenghua

I'm running a commit from the tip/x86/fpu branch: ae486033b98.  It's on
a system which I normally boot with 'noxsaves'.  When I boot without
'noxsaves' it is getting a GPF around the time that init is forked off.

The full oops is below, but addr2line points to the "alternative_input("
line in xrstor_state().

The one that oopses has this in bootup:

   xsave: enabled xstate_bv 0x1f, cntxt size 0x3c0 using compacted form

The one that works says:

   xsave: enabled xstate_bv 0x1f, cntxt size 0x440 using standard form

I bisected it down to:

> commit 110d7f7513bbb916b8654da9e2973ac5bed929a9
> Author: Oleg Nesterov <oleg@redhat.com>
> Date:   Mon Jan 19 19:52:12 2015 +0100
> 
>     x86/fpu: Don't abuse FPU in kernel threads if use_eager_fpu()
>     
>     AFAICS, there is no reason why kernel threads should have FPU context
>     even if use_eager_fpu() == T. Now that interrupted_kernel_fpu_idle()
>     does not check __thread_has_fpu() in the use_eager_fpu() case, we
>     can remove the init_fpu() code from eager_fpu_init() and change
>     flush_thread() called by do_execve() to initialize FPU.
>     
>     Note: of course, the change in flush_thread() is horrible and must be
>     cleanuped. We need the new helper, and flush_thread() should return the
>     error if init_fpu() fails.

It disassembles to:

> All code
> ========
>    0:	00 00                	add    %al,(%rax)
>    2:	48 c7 c7 58 a4 12 82 	mov    $0xffffffff8212a458,%rdi
>    9:	e8 03 13 14 00       	callq  0x141311
>    e:	db e2                	fnclex 
>   10:	0f 77                	emms   
>   12:	db 83 3c 05 00 00    	fildl  0x53c(%rbx)
>   18:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>   1d:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
>   22:	48 8b bb 40 05 00 00 	mov    0x540(%rbx),%rdi
>   29:	89 c2                	mov    %eax,%edx
>   2b:*	48 0f c7 1f          	xrstors64 (%rdi)		<-- trapping instruction
>   2f:	31 c0                	xor    %eax,%eax
>   31:	45 31 e4             	xor    %r12d,%r12d
>   34:	85 c0                	test   %eax,%eax
>   36:	48 c7 c7 a8 a4 12 82 	mov    $0xffffffff8212a4a8,%rdi
>   3d:	41                   	rex.B
>   3e:	0f                   	.byte 0xf
>   3f:	95                   	xchg   %eax,%ebp

...
> [   14.193801] Freeing unused kernel memory: 560K (ffff880001974000 - ffff880001a00000)
> [   14.203661] Freeing unused kernel memory: 1916K (ffff880001e21000 - ffff880002000000)
> [   14.213132] general protection fault: 0000 [#1] SMP 
> [   14.218786] Modules linked in:
> [   14.222273] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.19.0-00430-gae48603-dirty #1428
> [   14.231375] Hardware name: Intel Corporation Skylake Client platform/Skylake Y LPDDR3 RVP3, BIOS SKLSE2P1.86C.X062.R00.1411270820 11/27/2014
> [   14.245698] task: ffff8801485a8000 ti: ffff880148620000 task.ti: ffff880148620000
> [   14.254189] RIP: 0010:[<ffffffff81004eda>]  [<ffffffff81004eda>] math_state_restore+0x13a/0x380
> [   14.264076] RSP: 0000:ffff880148623b98  EFLAGS: 00010296
> [   14.270090] RAX: 00000000ffffffff RBX: ffff8801485a8000 RCX: 0000000000000000
> [   14.278186] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffff88007f5f0000
> [   14.286277] RBP: ffff880148623bb8 R08: 0000000000000000 R09: ffff88007f5f0000
> [   14.294371] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8801485a8000
> [   14.302468] R13: ffff88007f5e0000 R14: ffff8801485a8000 R15: ffffffff821ca800
> [   14.310574] FS:  0000000000000000(0000) GS:ffff88014e440000(0000) knlGS:0000000000000000
> [   14.319794] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   14.326323] CR2: 0000000000000000 CR3: 000000007f820000 CR4: 00000000003407e0
> [   14.334420] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   14.342516] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   14.350612] Stack:
> [   14.352896]  ffff8801485a8000 0000000000000000 ffff8801485a8000 ffff88007f5e0000
> [   14.361366]  ffff880148623be8 ffffffff8101210d 0000000000000000 ffff88007f590db0
> [   14.369810]  ffff8801485a8000 ffff88007f5e0000 ffff880148623c58 ffffffff811f5074
> [   14.378267] Call Trace:
> [   14.381056]  [<ffffffff8101210d>] flush_thread+0x1ad/0x270
> [   14.387281]  [<ffffffff811f5074>] flush_old_exec+0x774/0xee0
> [   14.393702]  [<ffffffff81256703>] load_elf_binary+0x353/0x1870
> [   14.400317]  [<ffffffff811f3f47>] ? search_binary_handler+0x97/0x1f0
> [   14.407532]  [<ffffffff810c491c>] ? do_raw_read_unlock+0x2c/0x50
> [   14.414361]  [<ffffffff811f3f38>] search_binary_handler+0x88/0x1f0
> [   14.421374]  [<ffffffff81255fc4>] load_script+0x274/0x2b0
> [   14.427503]  [<ffffffff811f3ee8>] ? search_binary_handler+0x38/0x1f0
> [   14.434722]  [<ffffffff810c491c>] ? do_raw_read_unlock+0x2c/0x50
> [   14.441563]  [<ffffffff811f3f38>] search_binary_handler+0x88/0x1f0
> [   14.448577]  [<ffffffff811f6436>] do_execveat_common.isra.32+0x746/0xa30
> [   14.456184]  [<ffffffff811f6386>] ? do_execveat_common.isra.32+0x696/0xa30
> [   14.463988]  [<ffffffff8194ad50>] ? rest_init+0x150/0x150
> [   14.470115]  [<ffffffff811f674c>] do_execve+0x2c/0x30
> [   14.475848]  [<ffffffff8100023b>] run_init_process+0x2b/0x30
> [   14.482264]  [<ffffffff8194ad92>] kernel_init+0x42/0xf0
> [   14.488222]  [<ffffffff8196b67c>] ret_from_fork+0x7c/0xb0
> [   14.494351]  [<ffffffff8194ad50>] ? rest_init+0x150/0x150
> [   14.500481] Code: 00 00 48 c7 c7 58 a4 12 82 e8 03 13 14 00 db e2 0f 77 db 83 3c 05 00 00 0f 1f 44 00 00 b8 ff ff ff ff 48 8b bb 40 05 00 00 89 c2 <48> 0f c7 1f 31 c0 45 31 e4 85 c0 48 c7 c7 a8 a4 12 82 41 0f 95 
> [   14.522792] RIP  [<ffffffff81004eda>] math_state_restore+0x13a/0x380
> [   14.530031]  RSP <ffff880148623b98>
> [   14.534061] ---[ end trace f99d58de7d83269b ]---
> [   14.539711] usb 1-5: New USB device found, idVendor=14dd, idProduct=1007
> [   14.549577] usb 1-5: New USB device strings: Mfr=1, Product=2, SerialNumber=7
> [   14.560957] usb 1-5: Product: D2CIM-DVUSB
> [   14.567717] usb 1-5: Manufacturer: Raritan
> [   14.573636] usb 1-5: SerialNumber: HUX45017210000007
> [   14.579421] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [   14.579421] 
> [   14.580548] usb 1-5: ep 0x81 - rounding interval to 64 microframes, ep desc says 80 microframes
> [   14.580595] usb 1-5: ep 0x82 - rounding interval to 64 microframes, ep desc says 80 microframes
> [   14.580634] usb 1-5: ep 0x83 - rounding interval to 64 microframes, ep desc says 80 microframes
> [   14.592305] input: Raritan D2CIM-DVUSB as /devices/pci0000:00/0000:00:14.0/usb1/1-5/1-5:1.0/0003:14DD:1007.0001/input/input7
> [   14.632243] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
> [   14.656356] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [   14.656356] 
> 

Config is here:

https://www.sr71.net/~dave/intel/config-20150303

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: Oops with tip/x86/fpu
  2015-03-04 18:30 Oops with tip/x86/fpu Dave Hansen
@ 2015-03-04 19:06 ` Oleg Nesterov
  2015-03-04 19:12   ` Dave Hansen
                     ` (2 more replies)
  2015-03-05 19:51 ` [PATCH 0/1] x86/fpu: math_state_restore() should not blindly disable irqs Oleg Nesterov
                   ` (5 subsequent siblings)
  6 siblings, 3 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-04 19:06 UTC (permalink / raw)
  To: Dave Hansen, Quentin Casasnovas
  Cc: Andy Lutomirski, Borislav Petkov, Ingo Molnar, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua

Thanks. I'll try to investigate tomorrow.

Well, the kernel crashes because xrstor_state() is buggy, Quentin already
has a fix.

But #GP should be explained...

On 03/04, Dave Hansen wrote:
>
> I'm running a commit from the tip/x86/fpu branch: ae486033b98.  It's on
> a system which I normally boot with 'noxsaves'.  When I boot without
> 'noxsaves' it is getting a GPF around the time that init is forked off.

And I assume that (before this commit) the kernel runs fine if you boot
without 'noxsaves'?

> 
> The full oops is below, but addr2line points to the "alternative_input("
> line in xrstor_state().
> 
> The one that oopses has this in bootup:
> 
>    xsave: enabled xstate_bv 0x1f, cntxt size 0x3c0 using compacted form
> 
> The one that works says:
> 
>    xsave: enabled xstate_bv 0x1f, cntxt size 0x440 using standard form
> 
> I bisected it down to:
> 
> > commit 110d7f7513bbb916b8654da9e2973ac5bed929a9
> > Author: Oleg Nesterov <oleg@redhat.com>
> > Date:   Mon Jan 19 19:52:12 2015 +0100
> > 
> >     x86/fpu: Don't abuse FPU in kernel threads if use_eager_fpu()
> >     
> >     AFAICS, there is no reason why kernel threads should have FPU context
> >     even if use_eager_fpu() == T. Now that interrupted_kernel_fpu_idle()
> >     does not check __thread_has_fpu() in the use_eager_fpu() case, we
> >     can remove the init_fpu() code from eager_fpu_init() and change
> >     flush_thread() called by do_execve() to initialize FPU.
> >     
> >     Note: of course, the change in flush_thread() is horrible and must be
> >     cleanuped. We need the new helper, and flush_thread() should return the
> >     error if init_fpu() fails.
> 
> It disassembles to:
> 
> > All code
> > ========
> >    0:	00 00                	add    %al,(%rax)
> >    2:	48 c7 c7 58 a4 12 82 	mov    $0xffffffff8212a458,%rdi
> >    9:	e8 03 13 14 00       	callq  0x141311
> >    e:	db e2                	fnclex 
> >   10:	0f 77                	emms   
> >   12:	db 83 3c 05 00 00    	fildl  0x53c(%rbx)
> >   18:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
> >   1d:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
> >   22:	48 8b bb 40 05 00 00 	mov    0x540(%rbx),%rdi
> >   29:	89 c2                	mov    %eax,%edx
> >   2b:*	48 0f c7 1f          	xrstors64 (%rdi)		<-- trapping instruction
> >   2f:	31 c0                	xor    %eax,%eax
> >   31:	45 31 e4             	xor    %r12d,%r12d
> >   34:	85 c0                	test   %eax,%eax
> >   36:	48 c7 c7 a8 a4 12 82 	mov    $0xffffffff8212a4a8,%rdi
> >   3d:	41                   	rex.B
> >   3e:	0f                   	.byte 0xf
> >   3f:	95                   	xchg   %eax,%ebp
> 
> ...
> > [   14.193801] Freeing unused kernel memory: 560K (ffff880001974000 - ffff880001a00000)
> > [   14.203661] Freeing unused kernel memory: 1916K (ffff880001e21000 - ffff880002000000)
> > [   14.213132] general protection fault: 0000 [#1] SMP 
> > [   14.218786] Modules linked in:
> > [   14.222273] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.19.0-00430-gae48603-dirty #1428
> > [   14.231375] Hardware name: Intel Corporation Skylake Client platform/Skylake Y LPDDR3 RVP3, BIOS SKLSE2P1.86C.X062.R00.1411270820 11/27/2014
> > [   14.245698] task: ffff8801485a8000 ti: ffff880148620000 task.ti: ffff880148620000
> > [   14.254189] RIP: 0010:[<ffffffff81004eda>]  [<ffffffff81004eda>] math_state_restore+0x13a/0x380
> > [   14.264076] RSP: 0000:ffff880148623b98  EFLAGS: 00010296
> > [   14.270090] RAX: 00000000ffffffff RBX: ffff8801485a8000 RCX: 0000000000000000
> > [   14.278186] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffff88007f5f0000
> > [   14.286277] RBP: ffff880148623bb8 R08: 0000000000000000 R09: ffff88007f5f0000
> > [   14.294371] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8801485a8000
> > [   14.302468] R13: ffff88007f5e0000 R14: ffff8801485a8000 R15: ffffffff821ca800
> > [   14.310574] FS:  0000000000000000(0000) GS:ffff88014e440000(0000) knlGS:0000000000000000
> > [   14.319794] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   14.326323] CR2: 0000000000000000 CR3: 000000007f820000 CR4: 00000000003407e0
> > [   14.334420] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [   14.342516] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [   14.350612] Stack:
> > [   14.352896]  ffff8801485a8000 0000000000000000 ffff8801485a8000 ffff88007f5e0000
> > [   14.361366]  ffff880148623be8 ffffffff8101210d 0000000000000000 ffff88007f590db0
> > [   14.369810]  ffff8801485a8000 ffff88007f5e0000 ffff880148623c58 ffffffff811f5074
> > [   14.378267] Call Trace:
> > [   14.381056]  [<ffffffff8101210d>] flush_thread+0x1ad/0x270
> > [   14.387281]  [<ffffffff811f5074>] flush_old_exec+0x774/0xee0
> > [   14.393702]  [<ffffffff81256703>] load_elf_binary+0x353/0x1870
> > [   14.400317]  [<ffffffff811f3f47>] ? search_binary_handler+0x97/0x1f0
> > [   14.407532]  [<ffffffff810c491c>] ? do_raw_read_unlock+0x2c/0x50
> > [   14.414361]  [<ffffffff811f3f38>] search_binary_handler+0x88/0x1f0
> > [   14.421374]  [<ffffffff81255fc4>] load_script+0x274/0x2b0
> > [   14.427503]  [<ffffffff811f3ee8>] ? search_binary_handler+0x38/0x1f0
> > [   14.434722]  [<ffffffff810c491c>] ? do_raw_read_unlock+0x2c/0x50
> > [   14.441563]  [<ffffffff811f3f38>] search_binary_handler+0x88/0x1f0
> > [   14.448577]  [<ffffffff811f6436>] do_execveat_common.isra.32+0x746/0xa30
> > [   14.456184]  [<ffffffff811f6386>] ? do_execveat_common.isra.32+0x696/0xa30
> > [   14.463988]  [<ffffffff8194ad50>] ? rest_init+0x150/0x150
> > [   14.470115]  [<ffffffff811f674c>] do_execve+0x2c/0x30
> > [   14.475848]  [<ffffffff8100023b>] run_init_process+0x2b/0x30
> > [   14.482264]  [<ffffffff8194ad92>] kernel_init+0x42/0xf0
> > [   14.488222]  [<ffffffff8196b67c>] ret_from_fork+0x7c/0xb0
> > [   14.494351]  [<ffffffff8194ad50>] ? rest_init+0x150/0x150
> > [   14.500481] Code: 00 00 48 c7 c7 58 a4 12 82 e8 03 13 14 00 db e2 0f 77 db 83 3c 05 00 00 0f 1f 44 00 00 b8 ff ff ff ff 48 8b bb 40 05 00 00 89 c2 <48> 0f c7 1f 31 c0 45 31 e4 85 c0 48 c7 c7 a8 a4 12 82 41 0f 95 
> > [   14.522792] RIP  [<ffffffff81004eda>] math_state_restore+0x13a/0x380
> > [   14.530031]  RSP <ffff880148623b98>
> > [   14.534061] ---[ end trace f99d58de7d83269b ]---
> > [   14.539711] usb 1-5: New USB device found, idVendor=14dd, idProduct=1007
> > [   14.549577] usb 1-5: New USB device strings: Mfr=1, Product=2, SerialNumber=7
> > [   14.560957] usb 1-5: Product: D2CIM-DVUSB
> > [   14.567717] usb 1-5: Manufacturer: Raritan
> > [   14.573636] usb 1-5: SerialNumber: HUX45017210000007
> > [   14.579421] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> > [   14.579421] 
> > [   14.580548] usb 1-5: ep 0x81 - rounding interval to 64 microframes, ep desc says 80 microframes
> > [   14.580595] usb 1-5: ep 0x82 - rounding interval to 64 microframes, ep desc says 80 microframes
> > [   14.580634] usb 1-5: ep 0x83 - rounding interval to 64 microframes, ep desc says 80 microframes
> > [   14.592305] input: Raritan D2CIM-DVUSB as /devices/pci0000:00/0000:00:14.0/usb1/1-5/1-5:1.0/0003:14DD:1007.0001/input/input7
> > [   14.632243] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
> > [   14.656356] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> > [   14.656356] 
> > 
> 
> Config is here:
> 
> https://www.sr71.net/~dave/intel/config-20150303


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: Oops with tip/x86/fpu
  2015-03-04 19:06 ` Oleg Nesterov
@ 2015-03-04 19:12   ` Dave Hansen
  2015-03-04 20:06   ` Borislav Petkov
  2015-03-05  8:38   ` Quentin Casasnovas
  2 siblings, 0 replies; 126+ messages in thread
From: Dave Hansen @ 2015-03-04 19:12 UTC (permalink / raw)
  To: Oleg Nesterov, Quentin Casasnovas
  Cc: Andy Lutomirski, Borislav Petkov, Ingo Molnar, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua

On 03/04/2015 11:06 AM, Oleg Nesterov wrote:
> On 03/04, Dave Hansen wrote:
>> > I'm running a commit from the tip/x86/fpu branch: ae486033b98.  It's on
>> > a system which I normally boot with 'noxsaves'.  When I boot without
>> > 'noxsaves' it is getting a GPF around the time that init is forked off.
> And I assume that (before this commit) the kernel runs fine if you boot
> without 'noxsaves'?

Before this commit without 'noxsaves': GOOD
Before this commit with 'noxsaves': GOOD
After  this commit with 'noxsaves': GOOD
After  this commit without 'noxsaves': BAD



^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: Oops with tip/x86/fpu
  2015-03-04 19:06 ` Oleg Nesterov
  2015-03-04 19:12   ` Dave Hansen
@ 2015-03-04 20:06   ` Borislav Petkov
  2015-03-05 15:14     ` Oleg Nesterov
  2015-03-05  8:38   ` Quentin Casasnovas
  2 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-04 20:06 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Quentin Casasnovas, Andy Lutomirski, Ingo Molnar,
	Linus Torvalds, Pekka Riikonen, Rik van Riel, Suresh Siddha,
	LKML, Yu, Fenghua

On Wed, Mar 04, 2015 at 08:06:51PM +0100, Oleg Nesterov wrote:
> Thanks. I'll try to investigate tomorrow.
> 
> Well, the kernel crashes because xrstor_state() is buggy, Quentin already
> has a fix.
> 
> But #GP should be explained...

Could it be one of those conditions for which XRSTORS #GPs, like

"If XRSTORS attempts to load MXCSR with an illegal value, a
general-protection exception (#GP) occurs."

for example? I'm looking at the SDM section for XRSTORS.

I mean, math_state_restore() does init_fpu() and down that road we're
allocating an FPU state ... but we did init_fpu() before too, in
eager_fpu_init(). So what changed?

Maybe I'm looking in a totally wrong direction, it is too late here to
stare at FPU code anyway...

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: Oops with tip/x86/fpu
  2015-03-04 19:06 ` Oleg Nesterov
  2015-03-04 19:12   ` Dave Hansen
  2015-03-04 20:06   ` Borislav Petkov
@ 2015-03-05  8:38   ` Quentin Casasnovas
  2015-03-05 15:13     ` Oleg Nesterov
  2 siblings, 1 reply; 126+ messages in thread
From: Quentin Casasnovas @ 2015-03-05  8:38 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Quentin Casasnovas, Andy Lutomirski,
	Borislav Petkov, Ingo Molnar, Linus Torvalds, Pekka Riikonen,
	Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua

On Wed, Mar 04, 2015 at 08:06:51PM +0100, Oleg Nesterov wrote:
> On 03/04, Dave Hansen wrote:
> >
> > I'm running a commit from the tip/x86/fpu branch: ae486033b98.  It's on
> > a system which I normally boot with 'noxsaves'.  When I boot without
> > 'noxsaves' it is getting a GPF around the time that init is forked off.
> 
> And I assume that (before this commit) the kernel runs fine if you boot
> without 'noxsaves'?
> 
> > 
> > The full oops is below, but addr2line points to the "alternative_input("
> > line in xrstor_state().
> > 
> > The one that oopses has this in bootup:
> > 
> >    xsave: enabled xstate_bv 0x1f, cntxt size 0x3c0 using compacted form
> > 
> > The one that works says:
> > 
> >    xsave: enabled xstate_bv 0x1f, cntxt size 0x440 using standard form
> > 
> Thanks. I'll try to investigate tomorrow.
> 
> Well, the kernel crashes because xrstor_state() is buggy, Quentin already
> has a fix.
> 
> But #GP should be explained...
> 

Hopefully the couple of fixes to prevent the #GP should be merged soon, but
they only cure the symptoms and not the root cause of this issue, I think.

Quentin

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: Oops with tip/x86/fpu
  2015-03-05  8:38   ` Quentin Casasnovas
@ 2015-03-05 15:13     ` Oleg Nesterov
  2015-03-05 18:42       ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-05 15:13 UTC (permalink / raw)
  To: Quentin Casasnovas
  Cc: Dave Hansen, Andy Lutomirski, Borislav Petkov, Ingo Molnar,
	Linus Torvalds, Pekka Riikonen, Rik van Riel, Suresh Siddha,
	LKML, Yu, Fenghua

On 03/05, Quentin Casasnovas wrote:
>
> On Wed, Mar 04, 2015 at 08:06:51PM +0100, Oleg Nesterov wrote:
> >
> > Well, the kernel crashes because xrstor_state() is buggy, Quentin already
> > has a fix.
> >
> > But #GP should be explained...
> >
>
> Hopefully the couple of fixes to prevent the #GP should be merged soon, but
> they only cure the symptoms and not the root cause of this issue, I think.

Yes, yes, sure. That is what I meant, sorry for confusion.

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: Oops with tip/x86/fpu
  2015-03-04 20:06   ` Borislav Petkov
@ 2015-03-05 15:14     ` Oleg Nesterov
       [not found]       ` <20150305182203.GA4203@redhat.com>
  0 siblings, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-05 15:14 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Quentin Casasnovas, Andy Lutomirski, Ingo Molnar,
	Linus Torvalds, Pekka Riikonen, Rik van Riel, Suresh Siddha,
	LKML, Yu, Fenghua

On 03/04, Borislav Petkov wrote:
>
> On Wed, Mar 04, 2015 at 08:06:51PM +0100, Oleg Nesterov wrote:
> > Thanks. I'll try to investigate tomorrow.
> >
> > Well, the kernel crashes because xrstor_state() is buggy, Quentin already
> > has a fix.
> >
> > But #GP should be explained...
>
> Could it be one of those conditions for which XRSTORS #GPs, like
>
> "If XRSTORS attempts to load MXCSR with an illegal value, a
> general-protection exception (#GP) occurs."
>
> for example? I'm looking at the SDM section for XRSTORS.
>
> I mean, math_state_restore() does init_fpu() and down that road we're
> allocating an FPU state ... but we did init_fpu() before too, in
> eager_fpu_init(). So what changed?

I _think_ that the difference is that eager_fpu_init()->xrstor_state()
was called before apply_alternatives(), so it used XRSTOR.

Note also that (before this commit) restore_fpu_checking() was almost
never called right after init_fpu(). If use_eager_fpu() == T.

After this commit the first xrstor_state() uses XRSTORS. And that is
how (I think) 'noxsaves' makes the difference.


So. I can be easily wrong, but so far I _think_ that this commit disclosed
another problem. And even if I am wrong and this commit is buggy, we need
to understand why ;)

I'll try to think about debugging patch, I can't reproduce this problem
on my machine...

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: Oops with tip/x86/fpu
       [not found]       ` <20150305182203.GA4203@redhat.com>
@ 2015-03-05 18:34         ` Dave Hansen
  2015-03-05 18:46           ` Oleg Nesterov
  2015-03-05 18:41         ` Dave Hansen
  2015-03-26 22:37         ` Yu, Fenghua
  2 siblings, 1 reply; 126+ messages in thread
From: Dave Hansen @ 2015-03-05 18:34 UTC (permalink / raw)
  To: Oleg Nesterov, Borislav Petkov
  Cc: Quentin Casasnovas, Andy Lutomirski, Ingo Molnar, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua

On 03/05/2015 10:22 AM, Oleg Nesterov wrote:
> On 03/05, Oleg Nesterov wrote:
>> I _think_ that the difference is that eager_fpu_init()->xrstor_state()
>> was called before apply_alternatives(), so it used XRSTOR.
>>
>> Note also that (before this commit) restore_fpu_checking() was almost
>> never called right after init_fpu(). If use_eager_fpu() == T.
>>
>> After this commit the first xrstor_state() uses XRSTORS. And that is
>> how (I think) 'noxsaves' makes the difference.
>>
>>
>> So. I can be easily wrong, but so far I _think_ that this commit disclosed
>> another problem. And even if I am wrong and this commit is buggy, we need
>> to understand why ;)
>>
>> I'll try to think about debugging patch, I can't reproduce this problem
>> on my machine...
> 
> Dave. could please run the test-case below?
> 
> Without 'noxsaves', and without my commit.

So you want it tested at 4b2e762e2e5 in tip/x86/fpu?



^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: Oops with tip/x86/fpu
       [not found]       ` <20150305182203.GA4203@redhat.com>
  2015-03-05 18:34         ` Dave Hansen
@ 2015-03-05 18:41         ` Dave Hansen
  2015-03-26 22:37         ` Yu, Fenghua
  2 siblings, 0 replies; 126+ messages in thread
From: Dave Hansen @ 2015-03-05 18:41 UTC (permalink / raw)
  To: Oleg Nesterov, Borislav Petkov
  Cc: Quentin Casasnovas, Andy Lutomirski, Ingo Molnar, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua

On 03/05/2015 10:22 AM, Oleg Nesterov wrote:
> On 03/05, Oleg Nesterov wrote:
>>
>> I _think_ that the difference is that eager_fpu_init()->xrstor_state()
>> was called before apply_alternatives(), so it used XRSTOR.
>>
>> Note also that (before this commit) restore_fpu_checking() was almost
>> never called right after init_fpu(). If use_eager_fpu() == T.
>>
>> After this commit the first xrstor_state() uses XRSTORS. And that is
>> how (I think) 'noxsaves' makes the difference.
>>
>>
>> So. I can be easily wrong, but so far I _think_ that this commit disclosed
>> another problem. And even if I am wrong and this commit is buggy, we need
>> to understand why ;)
>>
>> I'll try to think about debugging patch, I can't reproduce this problem
>> on my machine...
> 
> Dave. could please run the test-case below?
> 
> Without 'noxsaves', and without my commit.
> 
> Please compile it "cc --static -m32 -Wall T.c". In case you do not have
> the 32-bit libs, I also attached the static binary.
> 
> It should trigger another known problem which I was going to fix later,
> math_state_restore() wrongly does cli/sti. Please ignore the "sleeping
> function called from invalid context" warning in dmesg.
> 
> Does it trigger something else on your machine?

Triggers this:

> [  125.384358] general protection fault: 0000 [#1] SMP 
> [  125.390033] Modules linked in:
> [  125.393521] CPU: 0 PID: 1417 Comm: oleg-test Not tainted 3.19.0-00428-g4b2e762 #774
> [  125.402222] Hardware name: Intel Corporation Skylake Client platform/Skylake Y LPDDR3 RVP3, BIOS SKLSE2P1.86C.X062.R00.1411270820 11/27/2014
> [  125.416537] task: ffff88009af0e100 ti: ffff88009bf24000 task.ti: ffff88009bf24000
> [  125.425034] RIP: 0010:[<ffffffff810556cb>]  [<ffffffff810556cb>] math_state_restore+0x8b/0x1c0
> [  125.434839] RSP: 0000:ffff88009bf27e08  EFLAGS: 00010046
> [  125.440873] RAX: 00000000ffffffff RBX: ffff88009af0e100 RCX: 0000000000000000
> [  125.448972] RDX: 00000000ffffffff RSI: 0000000000000000 RDI: ffff880148020780
> [  125.457073] RBP: ffff88009bf27e18 R08: 0000000000000000 R09: ffff880148020780
> [  125.465175] R10: 0000000000000001 R11: ffffffff817a0829 R12: ffff88009af0e100
> [  125.473274] R13: 0000000000000071 R14: 0000000000000200 R15: 0000000000000000
> [  125.481367] FS:  0000000000000000(0003) GS:ffff88014e400000(0063) knlGS:00000000089ea840
> [  125.490551] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
> [  125.497075] CR2: 0000000000000071 CR3: 000000007f8ce000 CR4: 00000000003407f0
> [  125.505177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  125.513283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  125.521381] Stack:
> [  125.523664]  0000000000000000 00000000ffffffff ffff88009bf27ed8 ffffffff81061106
> [  125.532114]  ffff880148020780 000000019be24928 0000000000000000 0000000000000000
> [  125.540571]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [  125.549032] Call Trace:
> [  125.551814]  [<ffffffff81061106>] __restore_xstate_sig+0x246/0x6c0
> [  125.558841]  [<ffffffff810ace1f>] ? recalc_sigpending+0x1f/0x60
> [  125.565555]  [<ffffffff8109d054>] ia32_restore_sigcontext+0x194/0x1b0
> [  125.572874]  [<ffffffff8109d4bd>] sys32_rt_sigreturn+0xad/0xd0
> [  125.579505]  [<ffffffff817a0985>] ia32_ptregs_common+0x25/0x4b
> [  125.586129] Code: fb 7e e9 11 00 00 00 db e2 0f 77 db 83 44 06 00 00 0f 1f 80 00 00 00 00 0f 1f 44 00 00 b8 ff ff ff ff 48 8b bb 48 06 00 00 89 c2 <48> 0f c7 1f 31 c0 eb 20 0f 1f 44 00 00 bf 9a 00 00 00 e8 ce 11 
> [  125.608467] RIP  [<ffffffff810556cb>] math_state_restore+0x8b/0x1c0
> [  125.615599]  RSP <ffff88009bf27e08>
> [  125.619563] ---[ end trace 71f0a6784c4b2590 ]---


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: Oops with tip/x86/fpu
  2015-03-05 15:13     ` Oleg Nesterov
@ 2015-03-05 18:42       ` Borislav Petkov
  2015-03-05 22:16         ` Dave Hansen
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-05 18:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Quentin Casasnovas, Dave Hansen, Andy Lutomirski, Ingo Molnar,
	Linus Torvalds, Pekka Riikonen, Rik van Riel, Suresh Siddha,
	LKML, Yu, Fenghua

On Thu, Mar 05, 2015 at 04:13:38PM +0100, Oleg Nesterov wrote:
> Yes, yes, sure. That is what I meant, sorry for confusion.

It might be worth to try with Quentin's patch which fixes the exception
tables. I can imagine with wrong exception tables us jumping somewhere
in the fields and causing this #GP and XRSTORS being there at rIP but
not really causing the #GP itself.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: Oops with tip/x86/fpu
  2015-03-05 18:34         ` Dave Hansen
@ 2015-03-05 18:46           ` Oleg Nesterov
  0 siblings, 0 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-05 18:46 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Borislav Petkov, Quentin Casasnovas, Andy Lutomirski,
	Ingo Molnar, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua

On 03/05, Dave Hansen wrote:
>
> On 03/05/2015 10:22 AM, Oleg Nesterov wrote:
> > On 03/05, Oleg Nesterov wrote:
> >> I _think_ that the difference is that eager_fpu_init()->xrstor_state()
> >> was called before apply_alternatives(), so it used XRSTOR.
> >>
> >> Note also that (before this commit) restore_fpu_checking() was almost
> >> never called right after init_fpu(). If use_eager_fpu() == T.
> >>
> >> After this commit the first xrstor_state() uses XRSTORS. And that is
> >> how (I think) 'noxsaves' makes the difference.
> >>
> >>
> >> So. I can be easily wrong, but so far I _think_ that this commit disclosed
> >> another problem. And even if I am wrong and this commit is buggy, we need
> >> to understand why ;)
> >>
> >> I'll try to think about debugging patch, I can't reproduce this problem
> >> on my machine...
> >
> > Dave. could please run the test-case below?
> >
> > Without 'noxsaves', and without my commit.
>
> So you want it tested at 4b2e762e2e5 in tip/x86/fpu?

Yes, or even before, this doesn't really matter I think.

Thanks,

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH 0/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-04 18:30 Oops with tip/x86/fpu Dave Hansen
  2015-03-04 19:06 ` Oleg Nesterov
@ 2015-03-05 19:51 ` Oleg Nesterov
  2015-03-05 19:51   ` [PATCH 1/1] " Oleg Nesterov
  2015-03-07 15:38   ` [PATCH 0/1] x86/fpu: x86/fpu: avoid math_state_restore() without used_math() in __restore_xstate_sig() Oleg Nesterov
  2015-03-05 20:35 ` [PATCH 0/1] x86/fpu: math_state_restore() should not blindly disable irqs Oleg Nesterov
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-05 19:51 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov, Ingo Molnar
  Cc: Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, Quentin Casasnovas

On 03/05, Oleg Nesterov wrote:
>
> And this also means I have another off-topic fix for 4.0/stable, will
> send a patch in a minute...

I knew about this problem, but I didn't realize that restore_sigcontext()
can call a sleeping function after math_state_restore(). So I was going
to fix this later, because we need a lot more cleanups in these paths.

But it turns out, it is trivial to trigger the "BUG: sleeping function
called from invalid context" warning, see the test-case I sent to Dave.

To avoid the confusion, this has nothing to do with the problems we
discuss in other threads, or with the recent changes in tip/x86/fpu.

The patch is horrible, yes. But simple, and math_state_restore/init_fpu
are already horrible and need the cleanups. Hopefully I'll send them "soon".

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-05 19:51 ` [PATCH 0/1] x86/fpu: math_state_restore() should not blindly disable irqs Oleg Nesterov
@ 2015-03-05 19:51   ` Oleg Nesterov
  2015-03-05 20:11     ` Ingo Molnar
  2015-03-07 15:38   ` [PATCH 0/1] x86/fpu: x86/fpu: avoid math_state_restore() without used_math() in __restore_xstate_sig() Oleg Nesterov
  1 sibling, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-05 19:51 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov, Ingo Molnar
  Cc: Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, Quentin Casasnovas

math_state_restore() assumes it is called with irqs disabled, but
this is not true if the caller is __restore_xstate_sig().

This means that if ia32_fxstate == T and __copy_from_user() fails
__restore_xstate_sig() returns with irqs disabled too. This trgiggers

	BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41
	[<ffffffff81381499>] dump_stack+0x59/0xa0
	[<ffffffff8106bd05>] ___might_sleep+0x105/0x110
	[<ffffffff8138786d>] ? _raw_spin_unlock_irqrestore+0x3d/0x70
	[<ffffffff8106bd8d>] __might_sleep+0x7d/0xb0
	[<ffffffff81385426>] down_read+0x26/0xa0
	[<ffffffff8138788a>] ? _raw_spin_unlock_irqrestore+0x5a/0x70
	[<ffffffff81136038>] print_vma_addr+0x58/0x130
	[<ffffffff8100239e>] signal_fault+0xbe/0xf0
	[<ffffffff810419aa>] sys32_rt_sigreturn+0xba/0xd0

Change math_state_restore() to check irqs_disabled().

Note: this is the minimal fix for -stable, it is horrible but simple.
We need to rewrite math_state_restore(), init_fpu(), and cleanup their
users.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
---
 arch/x86/kernel/traps.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 51c4658..7310e0e 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -774,7 +774,10 @@ void math_state_restore(void)
 	struct task_struct *tsk = current;
 
 	if (!tsk_used_math(tsk)) {
-		local_irq_enable();
+		bool disabled = irqs_disabled();
+
+		if (disabled)
+			local_irq_enable();
 		/*
 		 * does a slab alloc which can sleep
 		 */
@@ -785,7 +788,9 @@ void math_state_restore(void)
 			do_group_exit(SIGKILL);
 			return;
 		}
-		local_irq_disable();
+
+		if (disabled)
+			local_irq_disable();
 	}
 
 	/* Avoid __kernel_fpu_begin() right after __thread_fpu_begin() */
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-05 19:51   ` [PATCH 1/1] " Oleg Nesterov
@ 2015-03-05 20:11     ` Ingo Molnar
  2015-03-05 21:25       ` Oleg Nesterov
  0 siblings, 1 reply; 126+ messages in thread
From: Ingo Molnar @ 2015-03-05 20:11 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas


* Oleg Nesterov <oleg@redhat.com> wrote:

> math_state_restore() assumes it is called with irqs disabled, but
> this is not true if the caller is __restore_xstate_sig().
> 
> This means that if ia32_fxstate == T and __copy_from_user() fails
> __restore_xstate_sig() returns with irqs disabled too. This trgiggers
> 
> 	BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41
> 	[<ffffffff81381499>] dump_stack+0x59/0xa0
> 	[<ffffffff8106bd05>] ___might_sleep+0x105/0x110
> 	[<ffffffff8138786d>] ? _raw_spin_unlock_irqrestore+0x3d/0x70
> 	[<ffffffff8106bd8d>] __might_sleep+0x7d/0xb0
> 	[<ffffffff81385426>] down_read+0x26/0xa0
> 	[<ffffffff8138788a>] ? _raw_spin_unlock_irqrestore+0x5a/0x70
> 	[<ffffffff81136038>] print_vma_addr+0x58/0x130
> 	[<ffffffff8100239e>] signal_fault+0xbe/0xf0
> 	[<ffffffff810419aa>] sys32_rt_sigreturn+0xba/0xd0
> 
> Change math_state_restore() to check irqs_disabled().
> 
> Note: this is the minimal fix for -stable, it is horrible but simple.
> We need to rewrite math_state_restore(), init_fpu(), and cleanup their
> users.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Cc: <stable@vger.kernel.org>
> ---
>  arch/x86/kernel/traps.c |    9 +++++++--
>  1 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 51c4658..7310e0e 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -774,7 +774,10 @@ void math_state_restore(void)
>  	struct task_struct *tsk = current;
>  
>  	if (!tsk_used_math(tsk)) {
> -		local_irq_enable();
> +		bool disabled = irqs_disabled();
> +
> +		if (disabled)
> +			local_irq_enable();
>  		/*
>  		 * does a slab alloc which can sleep
>  		 */
> @@ -785,7 +788,9 @@ void math_state_restore(void)
>  			do_group_exit(SIGKILL);
>  			return;
>  		}
> -		local_irq_disable();
> +
> +		if (disabled)
> +			local_irq_disable();
>  	}

Yuck!

Is there a fundamental reason why we cannot simply enable irqs and 
leave them enabled? Math state restore is not atomic and cannot really 
be atomic.

[ A potential worry would be kernel code using vector instructions in
  irqs-off regions - but that's totally broken anyway so not a big
  worry IMO, we might even want to warn about it. ]

But I might be missing something?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH 0/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-04 18:30 Oops with tip/x86/fpu Dave Hansen
  2015-03-04 19:06 ` Oleg Nesterov
  2015-03-05 19:51 ` [PATCH 0/1] x86/fpu: math_state_restore() should not blindly disable irqs Oleg Nesterov
@ 2015-03-05 20:35 ` Oleg Nesterov
  2015-03-09 17:10 ` [PATCH] x86/fpu: drop_fpu() should not assume that tsk == current Oleg Nesterov
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-05 20:35 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov, Ingo Molnar
  Cc: Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, Quentin Casasnovas

Sorry if you see this message twice, but it seems that today my messages
go to /dev/null sometimes...

On 03/05, Oleg Nesterov wrote:
>
> And this also means I have another off-topic fix for 4.0/stable, will
> send a patch in a minute...

I knew about this problem, but I didn't realize that restore_sigcontext()
can call a sleeping function after math_state_restore(). So I was going
to fix this later, because we need a lot more cleanups in these paths.

But it turns out, it is trivial to trigger the "BUG: sleeping function
called from invalid context" warning, see the test-case I sent to Dave.

To avoid the confusion, this has nothing to do with the problems we
discuss in other threads, or with the recent changes in tip/x86/fpu.

The patch is horrible, yes. But simple, and math_state_restore/init_fpu
are already horrible and need the cleanups. Hopefully I'll send them "soon".

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-05 20:11     ` Ingo Molnar
@ 2015-03-05 21:25       ` Oleg Nesterov
  2015-03-06  7:58         ` Ingo Molnar
  0 siblings, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-05 21:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dave Hansen, Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/05, Ingo Molnar wrote:
>
> * Oleg Nesterov <oleg@redhat.com> wrote:
>
> > --- a/arch/x86/kernel/traps.c
> > +++ b/arch/x86/kernel/traps.c
> > @@ -774,7 +774,10 @@ void math_state_restore(void)
> >  	struct task_struct *tsk = current;
> >
> >  	if (!tsk_used_math(tsk)) {
> > -		local_irq_enable();
> > +		bool disabled = irqs_disabled();
> > +
> > +		if (disabled)
> > +			local_irq_enable();
> >  		/*
> >  		 * does a slab alloc which can sleep
> >  		 */
> > @@ -785,7 +788,9 @@ void math_state_restore(void)
> >  			do_group_exit(SIGKILL);
> >  			return;
> >  		}
> > -		local_irq_disable();
> > +
> > +		if (disabled)
> > +			local_irq_disable();
> >  	}
>
> Yuck!
>
> Is there a fundamental reason why we cannot simply enable irqs and
> leave them enabled? Math state restore is not atomic and cannot really
> be atomic.

You know, I didn't even try to verify ;) but see below.

Most probably we can simply enable irqs, yes. But what about older kernels,
how can we check?

And let me repeat, I strongly believe that this !tsk_used_math() case in
math_state_restore() must die. And unlazy_fpu() in init_fpu(). And both
__restore_xstate_sig() and flush_thread() should not use math_state_restore()
at all. At least in its current form.

But this is obviously not -stable material.

That said, I'll try to look into git history tomorrow. The patch above
looks "obviously safe", but perhaps I am paranoid too much...

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: Oops with tip/x86/fpu
  2015-03-05 18:42       ` Borislav Petkov
@ 2015-03-05 22:16         ` Dave Hansen
  0 siblings, 0 replies; 126+ messages in thread
From: Dave Hansen @ 2015-03-05 22:16 UTC (permalink / raw)
  To: Borislav Petkov, Oleg Nesterov
  Cc: Quentin Casasnovas, Andy Lutomirski, Ingo Molnar, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua

On 03/05/2015 10:42 AM, Borislav Petkov wrote:
> On Thu, Mar 05, 2015 at 04:13:38PM +0100, Oleg Nesterov wrote:
>> > Yes, yes, sure. That is what I meant, sorry for confusion.
> It might be worth to try with Quentin's patch which fixes the exception
> tables. I can imagine with wrong exception tables us jumping somewhere
> in the fields and causing this #GP and XRSTORS being there at rIP but
> not really causing the #GP itself.

Which patch of Quentin's is this that you want to see tested?

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-05 21:25       ` Oleg Nesterov
@ 2015-03-06  7:58         ` Ingo Molnar
  2015-03-06 13:26           ` Oleg Nesterov
  2015-03-06 17:33           ` Linus Torvalds
  0 siblings, 2 replies; 126+ messages in thread
From: Ingo Molnar @ 2015-03-06  7:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas


* Oleg Nesterov <oleg@redhat.com> wrote:

> On 03/05, Ingo Molnar wrote:
> >
> > * Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > --- a/arch/x86/kernel/traps.c
> > > +++ b/arch/x86/kernel/traps.c
> > > @@ -774,7 +774,10 @@ void math_state_restore(void)
> > >  	struct task_struct *tsk = current;
> > >
> > >  	if (!tsk_used_math(tsk)) {
> > > -		local_irq_enable();
> > > +		bool disabled = irqs_disabled();
> > > +
> > > +		if (disabled)
> > > +			local_irq_enable();
> > >  		/*
> > >  		 * does a slab alloc which can sleep
> > >  		 */
> > > @@ -785,7 +788,9 @@ void math_state_restore(void)
> > >  			do_group_exit(SIGKILL);
> > >  			return;
> > >  		}
> > > -		local_irq_disable();
> > > +
> > > +		if (disabled)
> > > +			local_irq_disable();
> > >  	}
> >
> > Yuck!
> >
> > Is there a fundamental reason why we cannot simply enable irqs and
> > leave them enabled? Math state restore is not atomic and cannot really
> > be atomic.
> 
> You know, I didn't even try to verify ;) but see below.

So I'm thinking about the attached patch.

> Most probably we can simply enable irqs, yes. But what about older 
> kernels, how can we check?
>
> And let me repeat, I strongly believe that this !tsk_used_math() 
> case in math_state_restore() must die. And unlazy_fpu() in 
> init_fpu(). And both __restore_xstate_sig() and flush_thread() 
> should not use math_state_restore() at all. At least in its current 
> form.

Agreed.

> But this is obviously not -stable material.
> 
> That said, I'll try to look into git history tomorrow.

So I think the reasons are:

 - historic: because math_state_restore() started out as an interrupt 
   routine (from the IRQ13 days)

 - hardware imposed: the handler is executed with irqs off

 - it's probably the fastest implementation: we just run with the 
   natural irqs-off state the handler executes with.

So there's nothing outright wrong about executing with irqs off in a 
trap handler.

> [...] The patch above looks "obviously safe", but perhaps I am 
> paranoid too much...

IMHO your hack above isn't really acceptable, even for a backport.
So lets test the patch below (assuming it's the right thing to do)
and move forward?

Thanks,

	Ingo

======================>
From: Ingo Molnar <mingo@kernel.org>
Date: Fri, 6 Mar 2015 08:37:57 +0100
Subject: [PATCH] x86/fpu: Don't disable irqs in math_state_restore()

math_state_restore() was historically called with irqs disabled, 
because that's how the hardware generates the trap, and also because 
back in the days it was possible for it to be an asynchronous 
interrupt and interrupt handlers run with irqs off.

These days it's always an instruction trap, and furthermore it does 
inevitably complex things such as memory allocation and signal 
processing, which is not done with irqs disabled.

So keep irqs enabled.

This might surprise in-kernel FPU users that somehow relied on
interrupts being disabled across FPU usage - but that's
fundamentally fragile anyway due to the inatomicity of FPU state
restores. The trap return will restore interrupts to its previous 
state, but if FPU ops trigger math_state_restore() there's no
guarantee of atomicity anymore.

To warn about in-kernel irqs-off users of FPU state we might want to 
pass 'struct pt_regs' to math_state_restore() and check the trapped 
state for irqs disabled (flags has IF cleared) and kernel context - 
but that's for a later patch.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/traps.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 950815a138e1..52f9e4057cee 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -844,8 +844,9 @@ void math_state_restore(void)
 {
 	struct task_struct *tsk = current;
 
+	local_irq_enable();
+
 	if (!tsk_used_math(tsk)) {
-		local_irq_enable();
 		/*
 		 * does a slab alloc which can sleep
 		 */
@@ -856,7 +857,6 @@ void math_state_restore(void)
 			do_group_exit(SIGKILL);
 			return;
 		}
-		local_irq_disable();
 	}
 
 	/* Avoid __kernel_fpu_begin() right after __thread_fpu_begin() */

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-06  7:58         ` Ingo Molnar
@ 2015-03-06 13:26           ` Oleg Nesterov
  2015-03-06 13:39             ` Oleg Nesterov
  2015-03-06 13:46             ` Ingo Molnar
  2015-03-06 17:33           ` Linus Torvalds
  1 sibling, 2 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-06 13:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dave Hansen, Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/06, Ingo Molnar wrote:
>
> * Oleg Nesterov <oleg@redhat.com> wrote:
>
> > [...] The patch above looks "obviously safe", but perhaps I am
> > paranoid too much...
>
> IMHO your hack above isn't really acceptable, even for a backport.
> So lets test the patch below (assuming it's the right thing to do)
> and move forward?

OK, but please note that this patch is not beckportable. If you think
that -stable doesn't need this fix, then I agree.

If the caller is do_device_not_available(), then we can not enable
irqs before __thread_fpu_begin() + restore_fpu_checking().

1. Preemption in between can destroy ->fpu.state initialized by
   fpu_finit(), __switch_to() will save the live (wrong) FPU state
   again.

2. kernel_fpu_begin() from irq right after __thread_fpu_begin() is
   not nice too. It will do __save_init_fpu() and this overwrites
   ->fpu.state too.

Starting from v4.0 it does kernel_fpu_disable(), but the older kernels
do not.

Ingo, this code is really horrible and fragile. We need to cleanup it
step-by-step, imho.

> ======================>
> From: Ingo Molnar <mingo@kernel.org>
> Date: Fri, 6 Mar 2015 08:37:57 +0100
> Subject: [PATCH] x86/fpu: Don't disable irqs in math_state_restore()
>
> math_state_restore() was historically called with irqs disabled,
> because that's how the hardware generates the trap, and also because
> back in the days it was possible for it to be an asynchronous
> interrupt and interrupt handlers run with irqs off.
>
> These days it's always an instruction trap, and furthermore it does
> inevitably complex things such as memory allocation and signal
> processing, which is not done with irqs disabled.
>
> So keep irqs enabled.
>
> This might surprise in-kernel FPU users that somehow relied on
> interrupts being disabled across FPU usage - but that's
> fundamentally fragile anyway due to the inatomicity of FPU state
> restores. The trap return will restore interrupts to its previous
> state, but if FPU ops trigger math_state_restore() there's no
> guarantee of atomicity anymore.
>
> To warn about in-kernel irqs-off users of FPU state we might want to
> pass 'struct pt_regs' to math_state_restore() and check the trapped
> state for irqs disabled (flags has IF cleared) and kernel context -
> but that's for a later patch.
>
> Cc: Andy Lutomirski <luto@amacapital.net>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Fenghua Yu <fenghua.yu@intel.com>
> Cc: H. Peter Anvin <hpa@zytor.com>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  arch/x86/kernel/traps.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> index 950815a138e1..52f9e4057cee 100644
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -844,8 +844,9 @@ void math_state_restore(void)
>  {
>  	struct task_struct *tsk = current;
>
> +	local_irq_enable();
> +
>  	if (!tsk_used_math(tsk)) {
> -		local_irq_enable();
>  		/*
>  		 * does a slab alloc which can sleep
>  		 */
> @@ -856,7 +857,6 @@ void math_state_restore(void)
>  			do_group_exit(SIGKILL);
>  			return;
>  		}
> -		local_irq_disable();
>  	}
>
>  	/* Avoid __kernel_fpu_begin() right after __thread_fpu_begin() */


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-06 13:26           ` Oleg Nesterov
@ 2015-03-06 13:39             ` Oleg Nesterov
  2015-03-06 13:46             ` Ingo Molnar
  1 sibling, 0 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-06 13:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Dave Hansen, Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/06, Oleg Nesterov wrote:
>
> OK, but please note that this patch is not beckportable. If you think
> that -stable doesn't need this fix, then I agree.
>
> If the caller is do_device_not_available(), then we can not enable
> irqs before __thread_fpu_begin() + restore_fpu_checking().
>
> 1. Preemption in between can destroy ->fpu.state initialized by
>    fpu_finit(), __switch_to() will save the live (wrong) FPU state
>    again.
>
> 2. kernel_fpu_begin() from irq right after __thread_fpu_begin() is
>    not nice too. It will do __save_init_fpu() and this overwrites
>    ->fpu.state too.
>
> Starting from v4.0 it does kernel_fpu_disable(), but the older kernels
> do not.
>
> Ingo, this code is really horrible and fragile. We need to cleanup it
> step-by-step, imho.

Forgot to mention...

And, otoh, if we are not going to backport this change, then I think this
irq_enable() should be called by do_device_not_available().

>
> > ======================>
> > From: Ingo Molnar <mingo@kernel.org>
> > Date: Fri, 6 Mar 2015 08:37:57 +0100
> > Subject: [PATCH] x86/fpu: Don't disable irqs in math_state_restore()
> >
> > math_state_restore() was historically called with irqs disabled,
> > because that's how the hardware generates the trap, and also because
> > back in the days it was possible for it to be an asynchronous
> > interrupt and interrupt handlers run with irqs off.
> >
> > These days it's always an instruction trap, and furthermore it does
> > inevitably complex things such as memory allocation and signal
> > processing, which is not done with irqs disabled.
> >
> > So keep irqs enabled.
> >
> > This might surprise in-kernel FPU users that somehow relied on
> > interrupts being disabled across FPU usage - but that's
> > fundamentally fragile anyway due to the inatomicity of FPU state
> > restores. The trap return will restore interrupts to its previous
> > state, but if FPU ops trigger math_state_restore() there's no
> > guarantee of atomicity anymore.
> >
> > To warn about in-kernel irqs-off users of FPU state we might want to
> > pass 'struct pt_regs' to math_state_restore() and check the trapped
> > state for irqs disabled (flags has IF cleared) and kernel context -
> > but that's for a later patch.
> >
> > Cc: Andy Lutomirski <luto@amacapital.net>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: Fenghua Yu <fenghua.yu@intel.com>
> > Cc: H. Peter Anvin <hpa@zytor.com>
> > Cc: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: Ingo Molnar <mingo@kernel.org>
> > ---
> >  arch/x86/kernel/traps.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
> > index 950815a138e1..52f9e4057cee 100644
> > --- a/arch/x86/kernel/traps.c
> > +++ b/arch/x86/kernel/traps.c
> > @@ -844,8 +844,9 @@ void math_state_restore(void)
> >  {
> >  	struct task_struct *tsk = current;
> >
> > +	local_irq_enable();
> > +
> >  	if (!tsk_used_math(tsk)) {
> > -		local_irq_enable();
> >  		/*
> >  		 * does a slab alloc which can sleep
> >  		 */
> > @@ -856,7 +857,6 @@ void math_state_restore(void)
> >  			do_group_exit(SIGKILL);
> >  			return;
> >  		}
> > -		local_irq_disable();
> >  	}
> >
> >  	/* Avoid __kernel_fpu_begin() right after __thread_fpu_begin() */


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-06 13:26           ` Oleg Nesterov
  2015-03-06 13:39             ` Oleg Nesterov
@ 2015-03-06 13:46             ` Ingo Molnar
  2015-03-06 14:01               ` Oleg Nesterov
  1 sibling, 1 reply; 126+ messages in thread
From: Ingo Molnar @ 2015-03-06 13:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas


* Oleg Nesterov <oleg@redhat.com> wrote:

> On 03/06, Ingo Molnar wrote:
> >
> > * Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > [...] The patch above looks "obviously safe", but perhaps I am
> > > paranoid too much...
> >
> > IMHO your hack above isn't really acceptable, even for a backport.
> > So lets test the patch below (assuming it's the right thing to do)
> > and move forward?
> 
> OK, but please note that this patch is not beckportable. If you think
> that -stable doesn't need this fix, then I agree.
> 
> If the caller is do_device_not_available(), then we can not enable
> irqs before __thread_fpu_begin() + restore_fpu_checking().
> 
> 1. Preemption in between can destroy ->fpu.state initialized by
>    fpu_finit(), __switch_to() will save the live (wrong) FPU state
>    again.
> 
> 2. kernel_fpu_begin() from irq right after __thread_fpu_begin() is
>    not nice too. It will do __save_init_fpu() and this overwrites
>    ->fpu.state too.
> 
> Starting from v4.0 it does kernel_fpu_disable(), but the older kernels
> do not.
> 
> Ingo, this code is really horrible and fragile. We need to cleanup it
> step-by-step, imho.

How about the patch from David Vrabel? That seems to solve the 
irq-disable problem too, right?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-06 13:46             ` Ingo Molnar
@ 2015-03-06 14:01               ` Oleg Nesterov
  2015-03-06 14:17                 ` Oleg Nesterov
  2015-03-06 15:00                 ` David Vrabel
  0 siblings, 2 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-06 14:01 UTC (permalink / raw)
  To: Ingo Molnar, David Vrabel
  Cc: Dave Hansen, Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/06, Ingo Molnar wrote:
>
> * Oleg Nesterov <oleg@redhat.com> wrote:
>
> > On 03/06, Ingo Molnar wrote:
> > >
> > > * Oleg Nesterov <oleg@redhat.com> wrote:
> > >
> > > > [...] The patch above looks "obviously safe", but perhaps I am
> > > > paranoid too much...
> > >
> > > IMHO your hack above isn't really acceptable, even for a backport.
> > > So lets test the patch below (assuming it's the right thing to do)
> > > and move forward?
> >
> > OK, but please note that this patch is not beckportable. If you think
> > that -stable doesn't need this fix, then I agree.
> >
> > If the caller is do_device_not_available(), then we can not enable
> > irqs before __thread_fpu_begin() + restore_fpu_checking().
> >
> > 1. Preemption in between can destroy ->fpu.state initialized by
> >    fpu_finit(), __switch_to() will save the live (wrong) FPU state
> >    again.
> >
> > 2. kernel_fpu_begin() from irq right after __thread_fpu_begin() is
> >    not nice too. It will do __save_init_fpu() and this overwrites
> >    ->fpu.state too.
> >
> > Starting from v4.0 it does kernel_fpu_disable(), but the older kernels
> > do not.
> >
> > Ingo, this code is really horrible and fragile. We need to cleanup it
> > step-by-step, imho.
>
> How about the patch from David Vrabel? That seems to solve the
> irq-disable problem too, right?

I wasn't cc'ed, I guess you mean

	[PATCHv4] x86, fpu: remove the logic of non-eager fpu mem allocation at the first usage
	http://marc.info/?l=linux-kernel&m=142564237705311&w=2

Not sure I understand it correctly after the first quick look, but

1. It conflicts with the recent changes in tip/x86/fpu

2. fpu_ini() initializes current->thread.fpu.state. This looks unneeded,
   the kernel threads no longer have FPU context and do not abuse CPU.

3. I can be easily wrong, but it looks buggy... Note that
   arch_dup_task_struct() doesn't allocate child->fpu.state if
   !tsk_used_math(parent).

Add David...

No, I do not think this patch is a good idea. Perhaps I am wrong, but I
think we need other changes. And they should start from init_fpu().

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-06 14:01               ` Oleg Nesterov
@ 2015-03-06 14:17                 ` Oleg Nesterov
  2015-03-06 15:00                 ` David Vrabel
  1 sibling, 0 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-06 14:17 UTC (permalink / raw)
  To: Ingo Molnar, David Vrabel
  Cc: Dave Hansen, Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/06, Oleg Nesterov wrote:
>
> On 03/06, Ingo Molnar wrote:
> >
> > How about the patch from David Vrabel? That seems to solve the
> > irq-disable problem too, right?
>
> I wasn't cc'ed, I guess you mean
>
> 	[PATCHv4] x86, fpu: remove the logic of non-eager fpu mem allocation at the first usage
> 	http://marc.info/?l=linux-kernel&m=142564237705311&w=2
>
> Not sure I understand it correctly after the first quick look, but
>
> 1. It conflicts with the recent changes in tip/x86/fpu
>
> 2. fpu_ini() initializes current->thread.fpu.state. This looks unneeded,
>    the kernel threads no longer have FPU context and do not abuse CPU.
>
> 3. I can be easily wrong, but it looks buggy... Note that
>    arch_dup_task_struct() doesn't allocate child->fpu.state if
>    !tsk_used_math(parent).
>
> Add David...
>
> No, I do not think this patch is a good idea. Perhaps I am wrong, but I
> think we need other changes. And they should start from init_fpu().

But the change in eager_fpu_init_bp() looks good (on top of tip/x86/fpu),
at least I was going to do the same ;)

In any case, I do not think this patch can target -stable.

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-06 14:01               ` Oleg Nesterov
  2015-03-06 14:17                 ` Oleg Nesterov
@ 2015-03-06 15:00                 ` David Vrabel
  2015-03-06 15:36                   ` Oleg Nesterov
  1 sibling, 1 reply; 126+ messages in thread
From: David Vrabel @ 2015-03-06 15:00 UTC (permalink / raw)
  To: Oleg Nesterov, Ingo Molnar
  Cc: Dave Hansen, Borislav Petkov, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 06/03/15 14:01, Oleg Nesterov wrote:
> On 03/06, Ingo Molnar wrote:
>>
>> * Oleg Nesterov <oleg@redhat.com> wrote:
>>
>>> On 03/06, Ingo Molnar wrote:
>>>>
>>>> * Oleg Nesterov <oleg@redhat.com> wrote:
>>>>
>>>>> [...] The patch above looks "obviously safe", but perhaps I am
>>>>> paranoid too much...
>>>>
>>>> IMHO your hack above isn't really acceptable, even for a backport.
>>>> So lets test the patch below (assuming it's the right thing to do)
>>>> and move forward?
>>>
>>> OK, but please note that this patch is not beckportable. If you think
>>> that -stable doesn't need this fix, then I agree.
>>>
>>> If the caller is do_device_not_available(), then we can not enable
>>> irqs before __thread_fpu_begin() + restore_fpu_checking().
>>>
>>> 1. Preemption in between can destroy ->fpu.state initialized by
>>>    fpu_finit(), __switch_to() will save the live (wrong) FPU state
>>>    again.
>>>
>>> 2. kernel_fpu_begin() from irq right after __thread_fpu_begin() is
>>>    not nice too. It will do __save_init_fpu() and this overwrites
>>>    ->fpu.state too.
>>>
>>> Starting from v4.0 it does kernel_fpu_disable(), but the older kernels
>>> do not.
>>>
>>> Ingo, this code is really horrible and fragile. We need to cleanup it
>>> step-by-step, imho.
>>
>> How about the patch from David Vrabel? That seems to solve the
>> irq-disable problem too, right?
> 
> I wasn't cc'ed, I guess you mean
> 
> 	[PATCHv4] x86, fpu: remove the logic of non-eager fpu mem allocation at the first usage
> 	http://marc.info/?l=linux-kernel&m=142564237705311&w=2

This patch is from Suresh, and was originally against 3.10, so...

> Not sure I understand it correctly after the first quick look, but
> 
> 1. It conflicts with the recent changes in tip/x86/fpu
> 
> 2. fpu_ini() initializes current->thread.fpu.state. This looks unneeded,
>    the kernel threads no longer have FPU context and do not abuse CPU.
> 
> 3. I can be easily wrong, but it looks buggy... Note that
>    arch_dup_task_struct() doesn't allocate child->fpu.state if
>    !tsk_used_math(parent).

...yes. It's bit-rotted a bit.

> No, I do not think this patch is a good idea. Perhaps I am wrong, but I
> think we need other changes. And they should start from init_fpu().

But the general principle of avoiding the allocation in the #NM handler
and hence avoiding the need to re-enable IRQs is still a good idea, yes?

David

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-06 15:00                 ` David Vrabel
@ 2015-03-06 15:36                   ` Oleg Nesterov
  2015-03-06 16:15                     ` David Vrabel
  0 siblings, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-06 15:36 UTC (permalink / raw)
  To: David Vrabel
  Cc: Ingo Molnar, Dave Hansen, Borislav Petkov, Andy Lutomirski,
	Linus Torvalds, Pekka Riikonen, Rik van Riel, Suresh Siddha,
	LKML, Yu, Fenghua, Quentin Casasnovas

On 03/06, David Vrabel wrote:
>
> On 06/03/15 14:01, Oleg Nesterov wrote:
>
> > Not sure I understand it correctly after the first quick look, but
> >
> > 1. It conflicts with the recent changes in tip/x86/fpu
> >
> > 2. fpu_ini() initializes current->thread.fpu.state. This looks unneeded,
> >    the kernel threads no longer have FPU context and do not abuse CPU.
> >
> > 3. I can be easily wrong, but it looks buggy... Note that
> >    arch_dup_task_struct() doesn't allocate child->fpu.state if
> >    !tsk_used_math(parent).
>
> ...yes. It's bit-rotted a bit.
>
> > No, I do not think this patch is a good idea. Perhaps I am wrong, but I
> > think we need other changes. And they should start from init_fpu().
>
> But the general principle of avoiding the allocation in the #NM handler
> and hence avoiding the need to re-enable IRQs is still a good idea, yes?

This needs more discussion, but in short so far I think that fpu_alloc()
from #NM exception is fine if user_mode(regs) == T.

Just do_device_not_available() should simply do conditional_sti(), I think.
Perhaps it can even enable irqs unconditionally, but we need to verify that
this is 100% correct.

And I agree that "if (!tsk_used_math(tsk))" code in math_state_restore()
should be removed. But not to avoid the allocation, and this needs other
changes.

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-06 15:36                   ` Oleg Nesterov
@ 2015-03-06 16:15                     ` David Vrabel
  2015-03-06 16:31                       ` Oleg Nesterov
  0 siblings, 1 reply; 126+ messages in thread
From: David Vrabel @ 2015-03-06 16:15 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Ingo Molnar, Dave Hansen, Borislav Petkov, Andy Lutomirski,
	Linus Torvalds, Pekka Riikonen, Rik van Riel, Suresh Siddha,
	LKML, Yu, Fenghua, Quentin Casasnovas

On 06/03/15 15:36, Oleg Nesterov wrote:
> On 03/06, David Vrabel wrote:
>>
>> On 06/03/15 14:01, Oleg Nesterov wrote:
>>
>>> Not sure I understand it correctly after the first quick look, but
>>>
>>> 1. It conflicts with the recent changes in tip/x86/fpu
>>>
>>> 2. fpu_ini() initializes current->thread.fpu.state. This looks unneeded,
>>>    the kernel threads no longer have FPU context and do not abuse CPU.
>>>
>>> 3. I can be easily wrong, but it looks buggy... Note that
>>>    arch_dup_task_struct() doesn't allocate child->fpu.state if
>>>    !tsk_used_math(parent).
>>
>> ...yes. It's bit-rotted a bit.
>>
>>> No, I do not think this patch is a good idea. Perhaps I am wrong, but I
>>> think we need other changes. And they should start from init_fpu().
>>
>> But the general principle of avoiding the allocation in the #NM handler
>> and hence avoiding the need to re-enable IRQs is still a good idea, yes?
> 
> This needs more discussion, but in short so far I think that fpu_alloc()
> from #NM exception is fine if user_mode(regs) == T.

I think a memory allocation here, where the only behaviour on a failure
is to kill the task, is (and has always been) a crazy idea.

Additionally, in a Xen PV guest the #NM handler is called with TS
already cleared by the hypervisor so the handler must not enable
interrupts (and thus potentially schedule another task) until after the
current task's fpu state has been restored.  If a task was scheduled
before restoring the FPU state, TS would be clear and that task will use
fpu state from a previous task.

> Just do_device_not_available() should simply do conditional_sti(), I think.
> Perhaps it can even enable irqs unconditionally, but we need to verify that
> this is 100% correct.
> 
> And I agree that "if (!tsk_used_math(tsk))" code in math_state_restore()
> should be removed. But not to avoid the allocation, and this needs other
> changes.
> 
> Oleg.
> 


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-06 16:15                     ` David Vrabel
@ 2015-03-06 16:31                       ` Oleg Nesterov
  0 siblings, 0 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-06 16:31 UTC (permalink / raw)
  To: David Vrabel
  Cc: Ingo Molnar, Dave Hansen, Borislav Petkov, Andy Lutomirski,
	Linus Torvalds, Pekka Riikonen, Rik van Riel, Suresh Siddha,
	LKML, Yu, Fenghua, Quentin Casasnovas

On 03/06, David Vrabel wrote:
>
> On 06/03/15 15:36, Oleg Nesterov wrote:
> >
> > This needs more discussion, but in short so far I think that fpu_alloc()
> > from #NM exception is fine if user_mode(regs) == T.
>
> I think a memory allocation here, where the only behaviour on a failure
> is to kill the task, is (and has always been) a crazy idea.

Well, I do not agree. But lets discuss this later. This code should be
rewritten in any case. It has more problems.

> Additionally, in a Xen PV guest the #NM handler is called with TS
> already cleared by the hypervisor so the handler must not enable
> interrupts (and thus potentially schedule another task) until after the
> current task's fpu state has been restored.  If a task was scheduled
> before restoring the FPU state, TS would be clear and that task will use
> fpu state from a previous task.

I can be easily wrong (especially because I know nothing about Xen ;), but
I do not think this is true.

Yes sure, we need to avoid preemption, but we need this in any case, even
without Xen.

Again, lets discuss this a bit later?

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-06  7:58         ` Ingo Molnar
  2015-03-06 13:26           ` Oleg Nesterov
@ 2015-03-06 17:33           ` Linus Torvalds
  2015-03-06 18:15             ` Oleg Nesterov
                               ` (2 more replies)
  1 sibling, 3 replies; 126+ messages in thread
From: Linus Torvalds @ 2015-03-06 17:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Oleg Nesterov, Dave Hansen, Borislav Petkov, Andy Lutomirski,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

[-- Attachment #1: Type: text/plain, Size: 1677 bytes --]

On Thu, Mar 5, 2015 at 11:58 PM, Ingo Molnar <mingo@kernel.org> wrote:
>
> math_state_restore() was historically called with irqs disabled,
> because that's how the hardware generates the trap, and also because
> back in the days it was possible for it to be an asynchronous
> interrupt and interrupt handlers run with irqs off.
>
> These days it's always an instruction trap, and furthermore it does
> inevitably complex things such as memory allocation and signal
> processing, which is not done with irqs disabled.
>
> So keep irqs enabled.

I agree with the "keep irqs enabled".

However, I do *not* agree with the actual patch, which doesn't do that at all.
> @@ -844,8 +844,9 @@ void math_state_restore(void)
>  {
>         struct task_struct *tsk = current;
>
> +       local_irq_enable();
> +

There's a big difference between "keep interrupts enabled" (ok) and
"explicitly enable interrupts in random contexts" (*NOT* ok).

So get rid of the "local_irq_enable()" entirely, and replace it with a

   WARN_ON_ONCE(irqs_disabled());

and let's just fix the cases where this actually gets called with
interrupts off. In particular, let's just make the
device_not_available thing use a trap gate, not an interrupt gate. And
then remove the "conditional_sti()" stuff.

IOW, I think the starting point should be something like the attached
(which doesn't do the WARN_ON_ONCE() - it should be added for
debugging).

*NOT* some kind of "let's just enable interrupts blindly" approach.

This is completely untested, of course. But I don't see why we should
use an interrupt gate for this. Is there anything in
"exception_enter()" that requires it?

                       Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 1852 bytes --]

 arch/x86/kernel/traps.c | 14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 9d2073e2ecc9..f045ac026ff1 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -836,16 +836,12 @@ asmlinkage __visible void __attribute__((weak)) smp_threshold_interrupt(void)
  *
  * Careful.. There are problems with IBM-designed IRQ13 behaviour.
  * Don't touch unless you *really* know how it works.
- *
- * Must be called with kernel preemption disabled (eg with local
- * local interrupts as in the case of do_device_not_available).
  */
 void math_state_restore(void)
 {
 	struct task_struct *tsk = current;
 
 	if (!tsk_used_math(tsk)) {
-		local_irq_enable();
 		/*
 		 * does a slab alloc which can sleep
 		 */
@@ -856,7 +852,6 @@ void math_state_restore(void)
 			do_group_exit(SIGKILL);
 			return;
 		}
-		local_irq_disable();
 	}
 
 	/* Avoid __kernel_fpu_begin() right after __thread_fpu_begin() */
@@ -884,18 +879,13 @@ do_device_not_available(struct pt_regs *regs, long error_code)
 	if (read_cr0() & X86_CR0_EM) {
 		struct math_emu_info info = { };
 
-		conditional_sti(regs);
-
 		info.regs = regs;
 		math_emulate(&info);
 		exception_exit(prev_state);
 		return;
 	}
 #endif
-	math_state_restore(); /* interrupts still off */
-#ifdef CONFIG_X86_32
-	conditional_sti(regs);
-#endif
+	math_state_restore();
 	exception_exit(prev_state);
 }
 NOKPROBE_SYMBOL(do_device_not_available);
@@ -959,7 +949,7 @@ void __init trap_init(void)
 	set_system_intr_gate(X86_TRAP_OF, &overflow);
 	set_intr_gate(X86_TRAP_BR, bounds);
 	set_intr_gate(X86_TRAP_UD, invalid_op);
-	set_intr_gate(X86_TRAP_NM, device_not_available);
+	set_trap_gate(X86_TRAP_NM, device_not_available);
 #ifdef CONFIG_X86_32
 	set_task_gate(X86_TRAP_DF, GDT_ENTRY_DOUBLEFAULT_TSS);
 #else

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-06 17:33           ` Linus Torvalds
@ 2015-03-06 18:15             ` Oleg Nesterov
  2015-03-06 19:23             ` Andy Lutomirski
  2015-03-07 10:32             ` Ingo Molnar
  2 siblings, 0 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-06 18:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Dave Hansen, Borislav Petkov, Andy Lutomirski,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/06, Linus Torvalds wrote:
>
> On Thu, Mar 5, 2015 at 11:58 PM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > math_state_restore() was historically called with irqs disabled,
> > because that's how the hardware generates the trap, and also because
> > back in the days it was possible for it to be an asynchronous
> > interrupt and interrupt handlers run with irqs off.
> >
> > These days it's always an instruction trap, and furthermore it does
> > inevitably complex things such as memory allocation and signal
> > processing, which is not done with irqs disabled.
> >
> > So keep irqs enabled.
>
> I agree with the "keep irqs enabled".

Me too, but not for stable. This patch is wrong without other changes.

> IOW, I think the starting point should be something like the attached
> (which doesn't do the WARN_ON_ONCE() - it should be added for
> debugging).

Yes, agreed.

And. Even if we forget about stable, we need some minor changes before
this one. At least we need to add preempt_disable() into kernel_fpu_disable().


So I still think that the horrible hack I sent makes sense for -stable.
Just we need to cleanup (kill) it "immediately".

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-06 17:33           ` Linus Torvalds
  2015-03-06 18:15             ` Oleg Nesterov
@ 2015-03-06 19:23             ` Andy Lutomirski
  2015-03-06 22:00               ` Linus Torvalds
  2015-03-07 10:32             ` Ingo Molnar
  2 siblings, 1 reply; 126+ messages in thread
From: Andy Lutomirski @ 2015-03-06 19:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Oleg Nesterov, Dave Hansen, Borislav Petkov,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Fri, Mar 6, 2015 at 9:33 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Mar 5, 2015 at 11:58 PM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> math_state_restore() was historically called with irqs disabled,
>> because that's how the hardware generates the trap, and also because
>> back in the days it was possible for it to be an asynchronous
>> interrupt and interrupt handlers run with irqs off.
>>
>> These days it's always an instruction trap, and furthermore it does
>> inevitably complex things such as memory allocation and signal
>> processing, which is not done with irqs disabled.
>>
>> So keep irqs enabled.
>
> I agree with the "keep irqs enabled".
>
> However, I do *not* agree with the actual patch, which doesn't do that at all.
>> @@ -844,8 +844,9 @@ void math_state_restore(void)
>>  {
>>         struct task_struct *tsk = current;
>>
>> +       local_irq_enable();
>> +
>
> There's a big difference between "keep interrupts enabled" (ok) and
> "explicitly enable interrupts in random contexts" (*NOT* ok).
>
> So get rid of the "local_irq_enable()" entirely, and replace it with a
>
>    WARN_ON_ONCE(irqs_disabled());

I like this a lot better.  Ingo's patch scares me because in-kernel
FPU users are fundamentally atomic: if we're using kernel FPU *and*
current has important FPU state, we can't schedule safely because
there's nowhere to save the in-kernel FPU state.  I don't see how we
would end up in this situation with CR0.TS set, but if we somehow do,
then we'll corrupt something with your patch.

>
> and let's just fix the cases where this actually gets called with
> interrupts off. In particular, let's just make the
> device_not_available thing use a trap gate, not an interrupt gate. And
> then remove the "conditional_sti()" stuff.

Please don't.  IMO it's really nice that we don't use trap gates at
all on x86_64, and I find the conditional_sti thing much nicer than
having to audit all of the entry code to see whether it's safe to run
it with IRQs on.

This isn't to say that using trap gates would be a terrible idea in
general.  I think I'd be okay with saying that non-IST
synchronous-only never-from-random-places entries should be trap gates
in general.  We could then audit the entry code and convert various
entries.

Thinks that maybe shouldn't be trap gates:

 - Anything asynchronous.
 - Page faults.  The entry code can cause page faults recursively due
to lazy vmap page table fills.  Maybe this doesn't matter.
 - Anything using IST.  That way lies madness.

FWIW, I just started auditing the entry code a bit, and it's currently
unsafe to use trap gates.  The IRQ entry code in the "interrupt" macro
does this:

    testl $3, CS-RBP(%rsp)
    je 1f
    SWAPGS

That will go boom quite nicely if it happens very early in device_not_available.

I suspect that fixing that will be slow and unpleasant and will
significantly outweigh any benefit.

--Andy

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-06 19:23             ` Andy Lutomirski
@ 2015-03-06 22:00               ` Linus Torvalds
  2015-03-06 22:28                 ` Andy Lutomirski
  0 siblings, 1 reply; 126+ messages in thread
From: Linus Torvalds @ 2015-03-06 22:00 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Oleg Nesterov, Dave Hansen, Borislav Petkov,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Fri, Mar 6, 2015 at 11:23 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> Please don't.  IMO it's really nice that we don't use trap gates at
> all on x86_64, and I find the conditional_sti thing much nicer than
> having to audit all of the entry code to see whether it's safe to run
> it with IRQs on.

So I'm not sure I see much difference, but I'd certainly be ok with
just moving the "conditional_sti()" up unconditionally to be the first
thing in do_device_not_available().

The point being that we still *not* just randomly enable interrupts
because we decide that the callers are wrong.

                       Linus

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-06 22:00               ` Linus Torvalds
@ 2015-03-06 22:28                 ` Andy Lutomirski
  2015-03-07 10:36                   ` Ingo Molnar
  0 siblings, 1 reply; 126+ messages in thread
From: Andy Lutomirski @ 2015-03-06 22:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Oleg Nesterov, Dave Hansen, Borislav Petkov,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Fri, Mar 6, 2015 at 2:00 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Mar 6, 2015 at 11:23 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>>
>> Please don't.  IMO it's really nice that we don't use trap gates at
>> all on x86_64, and I find the conditional_sti thing much nicer than
>> having to audit all of the entry code to see whether it's safe to run
>> it with IRQs on.
>
> So I'm not sure I see much difference, but I'd certainly be ok with
> just moving the "conditional_sti()" up unconditionally to be the first
> thing in do_device_not_available().

I'd be fine with that.  The important difference is that it's after swapgs.

--Andy

>
> The point being that we still *not* just randomly enable interrupts
> because we decide that the callers are wrong.
>
>                        Linus



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-06 17:33           ` Linus Torvalds
  2015-03-06 18:15             ` Oleg Nesterov
  2015-03-06 19:23             ` Andy Lutomirski
@ 2015-03-07 10:32             ` Ingo Molnar
  2 siblings, 0 replies; 126+ messages in thread
From: Ingo Molnar @ 2015-03-07 10:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Nesterov, Dave Hansen, Borislav Petkov, Andy Lutomirski,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, Mar 5, 2015 at 11:58 PM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > math_state_restore() was historically called with irqs disabled,
> > because that's how the hardware generates the trap, and also because
> > back in the days it was possible for it to be an asynchronous
> > interrupt and interrupt handlers run with irqs off.
> >
> > These days it's always an instruction trap, and furthermore it does
> > inevitably complex things such as memory allocation and signal
> > processing, which is not done with irqs disabled.
> >
> > So keep irqs enabled.
> 
> I agree with the "keep irqs enabled".
> 
> However, I do *not* agree with the actual patch, which doesn't do that at all.
> > @@ -844,8 +844,9 @@ void math_state_restore(void)
> >  {
> >         struct task_struct *tsk = current;
> >
> > +       local_irq_enable();
> > +
> 
> There's a big difference between "keep interrupts enabled" (ok) and
> "explicitly enable interrupts in random contexts" (*NOT* ok).

Agreed, so I thought that we already kind of did that:

   if (!tsk_used_math(tsk)) {
           local_irq_enable();

But yeah, my patch brought that to a whole new level by always doing 
it, without starting with adding a warning first.

> 
> So get rid of the "local_irq_enable()" entirely, and replace it with a
>
>    WARN_ON_ONCE(irqs_disabled());

Yeah, agreed absolutely - sorry about scaring (or annoying) you with a 
Signed-off-by patch, that was silly from me.

> and let's just fix the cases where this actually gets called with 
> interrupts off. [...]

Yes. I was a bit blinded by the 'easy to backport' aspect, so I 
concentrated on that, but it's more important to not break stuff.

> @@ -959,7 +949,7 @@ void __init trap_init(void)
>  	set_system_intr_gate(X86_TRAP_OF, &overflow);
>  	set_intr_gate(X86_TRAP_BR, bounds);
>  	set_intr_gate(X86_TRAP_UD, invalid_op);
> -	set_intr_gate(X86_TRAP_NM, device_not_available);
> +	set_trap_gate(X86_TRAP_NM, device_not_available);

So I wasn't this brave.

Historically modern x86 entry code ran with irqs off, because that's 
what the hardware gave us on most entry types. I'm not 100% sure we 
are ready to allow preemption of sensitive entry code on 
CONFIG_PREEMPT=y kernels. But we could try.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-06 22:28                 ` Andy Lutomirski
@ 2015-03-07 10:36                   ` Ingo Molnar
  2015-03-07 20:11                     ` Linus Torvalds
  0 siblings, 1 reply; 126+ messages in thread
From: Ingo Molnar @ 2015-03-07 10:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Oleg Nesterov, Dave Hansen, Borislav Petkov,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas


* Andy Lutomirski <luto@amacapital.net> wrote:

> On Fri, Mar 6, 2015 at 2:00 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Fri, Mar 6, 2015 at 11:23 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> >>
> >> Please don't.  IMO it's really nice that we don't use trap gates at
> >> all on x86_64, and I find the conditional_sti thing much nicer than
> >> having to audit all of the entry code to see whether it's safe to run
> >> it with IRQs on.
> >
> > So I'm not sure I see much difference, but I'd certainly be ok with
> > just moving the "conditional_sti()" up unconditionally to be the first
> > thing in do_device_not_available().
> 
> I'd be fine with that.  The important difference is that it's after swapgs.

The thing is, we have to be careful about NMI contexts anyway. So how 
about being careful in irq contexts as well?

We could shave a good 10 cycles from the FPU trap overhead, maybe 
more?

We could save the same 10 cycles from page fault overhead as well, 
AFAICS.

Hm?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH 0/1] x86/fpu: x86/fpu: avoid math_state_restore() without used_math() in __restore_xstate_sig()
  2015-03-05 19:51 ` [PATCH 0/1] x86/fpu: math_state_restore() should not blindly disable irqs Oleg Nesterov
  2015-03-05 19:51   ` [PATCH 1/1] " Oleg Nesterov
@ 2015-03-07 15:38   ` Oleg Nesterov
  2015-03-07 15:38     ` [PATCH 1/1] " Oleg Nesterov
  1 sibling, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-07 15:38 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov, Ingo Molnar
  Cc: Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, Quentin Casasnovas

On 03/05, Oleg Nesterov wrote:
>
> The patch is horrible, yes. But simple, and math_state_restore/init_fpu
> are already horrible and need the cleanups.

OK, nobody liked it. How about this one for stable?

Ingo, Linus, Andy. I do agree, math_state_restore() should be called with
irqs enabled. And I was going to do this too. But this is wrong without
other changed. I hope I send them soon.

We need to remove this !tsk_used_math() code from math_state_restore().
And init_fpu() should die. Just look at __restore_xstate_sig() changed
by this patch. Why does it call init_fpu() ? We only need fpu_alloc().
fpu_finit() is pointless, we are going to overwrite fpu->state. used_math()
makes no sense at this point. user_fpu_begin() and math_state_restore()
should set this flag. And other changes.

Could you please review?

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH 1/1] x86/fpu: x86/fpu: avoid math_state_restore() without used_math() in __restore_xstate_sig()
  2015-03-07 15:38   ` [PATCH 0/1] x86/fpu: x86/fpu: avoid math_state_restore() without used_math() in __restore_xstate_sig() Oleg Nesterov
@ 2015-03-07 15:38     ` Oleg Nesterov
  2015-03-09 14:07       ` Borislav Petkov
  2015-03-16 12:07       ` [tip:x86/urgent] x86/fpu: Avoid " tip-bot for Oleg Nesterov
  0 siblings, 2 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-07 15:38 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov, Ingo Molnar
  Cc: Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, Quentin Casasnovas

math_state_restore() assumes it is called with irqs disabled, but
this is not true if the caller is __restore_xstate_sig().

This means that if ia32_fxstate == T and __copy_from_user() fails
__restore_xstate_sig() returns with irqs disabled too. This trgiggers

	BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41
	[<ffffffff81381499>] dump_stack+0x59/0xa0
	[<ffffffff8106bd05>] ___might_sleep+0x105/0x110
	[<ffffffff8138786d>] ? _raw_spin_unlock_irqrestore+0x3d/0x70
	[<ffffffff8106bd8d>] __might_sleep+0x7d/0xb0
	[<ffffffff81385426>] down_read+0x26/0xa0
	[<ffffffff8138788a>] ? _raw_spin_unlock_irqrestore+0x5a/0x70
	[<ffffffff81136038>] print_vma_addr+0x58/0x130
	[<ffffffff8100239e>] signal_fault+0xbe/0xf0
	[<ffffffff810419aa>] sys32_rt_sigreturn+0xba/0xd0

Change __restore_xstate_sig() to call set_used_math() unconditionally,
this avoids sti/cli in math_state_restore(). If copy_from_user() fails
we can simply do fpu_finit() by hand.

Note: this is only the first step. math_state_restore() should not check
used_math(), it should set this flag. While init_fpu() should simply die.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
---
 arch/x86/kernel/xsave.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index de9dcf8..dff0ec2 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -378,7 +378,7 @@ int __restore_xstate_sig(void __user *buf, void __user *buf_fx, int size)
 		 * thread's fpu state, reconstruct fxstate from the fsave
 		 * header. Sanitize the copied state etc.
 		 */
-		struct xsave_struct *xsave = &tsk->thread.fpu.state->xsave;
+		struct fpu *fpu = &tsk->thread.fpu;
 		struct user_i387_ia32_struct env;
 		int err = 0;
 
@@ -392,14 +392,15 @@ int __restore_xstate_sig(void __user *buf, void __user *buf_fx, int size)
 		 */
 		drop_fpu(tsk);
 
-		if (__copy_from_user(xsave, buf_fx, state_size) ||
+		if (__copy_from_user(&fpu->state->xsave, buf_fx, state_size) ||
 		    __copy_from_user(&env, buf, sizeof(env))) {
+			fpu_finit(fpu);
 			err = -1;
 		} else {
 			sanitize_restored_xstate(tsk, &env, xstate_bv, fx_only);
-			set_used_math();
 		}
 
+		set_used_math();
 		if (use_eager_fpu()) {
 			preempt_disable();
 			math_state_restore();
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-07 10:36                   ` Ingo Molnar
@ 2015-03-07 20:11                     ` Linus Torvalds
  2015-03-08  8:55                       ` Ingo Molnar
  0 siblings, 1 reply; 126+ messages in thread
From: Linus Torvalds @ 2015-03-07 20:11 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andy Lutomirski, Oleg Nesterov, Dave Hansen, Borislav Petkov,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Sat, Mar 7, 2015 at 2:36 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> We could save the same 10 cycles from page fault overhead as well,
> AFAICS.

Are trap gates actually noticeably faster? Or is it just he
"conditional_sti()" you're worried about?

Anyway, for page faulting, we traditionally actually wanted an
interrupt gate, because of how we wanted to avoid interrupts coming in
and possibly messing up %cr2 due to vmalloc faults, but more
importantly for preemption. vmalloc faults are "harmless" because
we'll notice that it's already done, return, and then re-take the real
fault. But a preemption event before we read %cr2 can cause bad things
to happen:

 - page fault pushes error code on stack, address in %cr2

 - we don't have interrupts disabled, and some interrupt comes in and
causes preemption

 - some other process runs, take another page fault. %cr2 now is the
wrong address

 - we go back to the original thread (perhaps on another cpu), which
now reads %cr2 for the wrong address

 - we send the process a SIGSEGV because we think it's accessing
memory that it has no place touching

So the page fault code actually *needs* interrupts disabled until we
read %cr2. Stupid x86 trap semantics where the error code is on the
thread-safe stack, but %cr2 is not.

Maybe there is some trick I'm missing, but on the whole I think
"interrupt gate + conditional_sti()" does have things going for it.
Yes, it still leaves NMI as being special, but NMI really *is*
special.

                        Linus

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-07 20:11                     ` Linus Torvalds
@ 2015-03-08  8:55                       ` Ingo Molnar
  2015-03-08 11:38                         ` Ingo Molnar
  2015-03-08 13:59                         ` Andy Lutomirski
  0 siblings, 2 replies; 126+ messages in thread
From: Ingo Molnar @ 2015-03-08  8:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Oleg Nesterov, Dave Hansen, Borislav Petkov,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas, Denys Vlasenko


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Sat, Mar 7, 2015 at 2:36 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > We could save the same 10 cycles from page fault overhead as well,
> > AFAICS.
> 
> Are trap gates actually noticeably faster? Or is it just he
> "conditional_sti()" you're worried about?

( I'll talk about the CR2 complication later, please ignore that 
  problem for a moment. )

So I base my thinking on the following hierarchy of fast paths. In a 
typical x86 system there are 4 main types of 'context switches', in 
order of decreasing frequency:

   - syscall    'context switch': entry/exit from syscall
   - trap/fault 'context switch': entry/exit from trap/fault
   - irq        'context switch': entry/exit from irqs
   - task       'context switch': scheduler context-switch

Where each successive level is about an order of magnitude less 
frequently executed on a typical system than the previous one, so to 
optimize a level we are willing to push overhead to the next one(s).

So the primary payoff in executing much of the entry code with irqs 
enabled would be to allow 64-bit *syscalls* to be made without irqs 
disabled: the first, most important level of context entries.

Doing that would give us four (theoretical) performance advantages:

  - No implicit irq disabling overhead when the syscall instruction is
    executed: we could change MSR_SYSCALL_MASK from 0xc0000084 to
    0xc0000284, which removes the implicit CLI on syscall entry.

  - No explicit irq enabling overhead via ENABLE_INTERRUPTS() [STI] in
    system_call.

  - No explicit irq disabling overhead in the ret_from_sys_call fast 
    path, i.e. no DISABLE_INTERRUPTS() [CLI].

  - No implicit irq enabling overhead in ret_from_sys_call's 
    USERGS_SYSRET64: the SYSRETQ instruction would not have to 
    re-enable irqs as the user-space IF in R11 would match that of the 
    current IF.

whether that's an actual performance win in practice as well needs to 
be measured, but I'd be (very!) shocked if it wasn't in the 20+ cycles 
range: which is absolutely huge in terms of system_call optimizations.

Now do I think this could be done realistically? I think yes (by 
re-using the NMI code's paranoid entry codepaths for the irq code as 
well, essentially fixing up the effects of any partial entries in an 
irq entry slow path), but I could be wrong about that.

My main worry isn't even feasibility but maintainability and general 
robustness: all these asm cleanups we are doing now could enable such 
a trick to be pulled off robustly.

But I think it could be done technically, because the NMI code already 
has to be careful about 'preempting' partial entries, so we have the 
logic.

Complications:

 - We now enable double partial entries: partial syscall interrupted
   by an irq interrupted by an NMI context. I think it should all work
   out fine but it's a new scenario.

 - We'd enable interruptible return from system call, which caused
   trouble in the past. Solvable IMHO, by being careful in the irq 
   entry code.

 - We'd now have to be extra careful about the possibility of 
   exceptions triggered in the entry/exit code itself, triggered by 
   any sort of unusual register content or MMU fault.

Simplifications:

 - I'd ruthlessly disable IRQs for any sort of non fast path: for 
   example 32-bit compat entries, ptrace or any other slowpath - at 
   least initially until we map out the long term effects of this 
   optimization.

Does this scare me? Yes, I think it should scare any sane person, but 
I don't think it's all that bad: all the NMI paranoidentry work has 
already the trail blazed, and most of the races will be readily 
triggerable via regular irq loads, so it's not like we'll leave subtle 
bugs in there.

Being able to do the same with certain traps, because irq entry is 
careful about partial entry state, would just be a secondary bonus.

Regarding the CR2 value on page faults:

> Anyway, for page faulting, we traditionally actually wanted an 
> interrupt gate, because of how we wanted to avoid interrupts coming 
> in and possibly messing up %cr2 due to vmalloc faults, but more 
> importantly for preemption. [...]

Here too I think we could take a page from the NMI code: save cr2 in 
the page fault asm code, recognize from the irq code when we are 
interrupting that and dropping into a slowpath that saves cr2 right 
there. Potentially task-context-switching will be safe after that.

Saving cr2 in the early page fault code should be much less of an 
overhead than what the IRQ disabling/enabling costs, so this should be 
a win. The potential win could be similar to that of system calls:

  - Removal of an implicit 'CLI' in irq gates

  - Removal of the explicit 'STI' in conditional_sti in your proposed 
    code

  - Removal of an explicit 'CLI' (DISABLE_INTERRUPTS) in 
    error_exit/retint_swapgs.

  - Removal of an implicit 'STI' when SYSRET enables interrupts from R11

and the same savings would apply to FPU traps as well. I'd leave all 
other low frequency traps as interrupt gates: #GP, debug, int3, etc.

> [...] vmalloc faults are "harmless" because we'll notice that it's 
> already done, return, and then re-take the real fault. But a 
> preemption event before we read %cr2 can cause bad things to happen:
> 
>  - page fault pushes error code on stack, address in %cr2
> 
>  - we don't have interrupts disabled, and some interrupt comes in and
> causes preemption
> 
>  - some other process runs, take another page fault. %cr2 now is the
> wrong address
> 
>  - we go back to the original thread (perhaps on another cpu), which
> now reads %cr2 for the wrong address
> 
>  - we send the process a SIGSEGV because we think it's accessing
> memory that it has no place touching

I think none of this corruption happens if an interrupting context is 
aware of this and 'fixes up' the entry state accordingly. Am I missing 
anything subtle perhaps?

This would be arguably new (and tricky) code, as today the NMI code 
solves this problem by trying to never fault and thus never corrupt 
CR2. But ... unlike NMIs, this triggers so often via a mix of regular 
traps and regular irqs that I'm pretty sure it can be pulled off 
robustly.

> So the page fault code actually *needs* interrupts disabled until we 
> read %cr2. Stupid x86 trap semantics where the error code is on the 
> thread-safe stack, but %cr2 is not.
> 
> Maybe there is some trick I'm missing, but on the whole I think 
> "interrupt gate + conditional_sti()" does have things going for it. 
> Yes, it still leaves NMI as being special, but NMI really *is* 
> special.

So I think it's all doable, the payoffs are significant in terms of 
entry speedups.

But only on a clean code base, as the x86 entry assembly code was way 
beyond any sane maintainability threshold IMHO - it's fortunately 
improving rapidly these days due to the nice work from Andy and Denys!

What do you think?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-08  8:55                       ` Ingo Molnar
@ 2015-03-08 11:38                         ` Ingo Molnar
  2015-03-08 13:59                         ` Andy Lutomirski
  1 sibling, 0 replies; 126+ messages in thread
From: Ingo Molnar @ 2015-03-08 11:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Oleg Nesterov, Dave Hansen, Borislav Petkov,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas, Denys Vlasenko


* Ingo Molnar <mingo@kernel.org> wrote:

> Doing that would give us four (theoretical) performance advantages:
> 
>   - No implicit irq disabling overhead when the syscall instruction is
>     executed: we could change MSR_SYSCALL_MASK from 0xc0000084 to
>     0xc0000284, which removes the implicit CLI on syscall entry.
> 
>   - No explicit irq enabling overhead via ENABLE_INTERRUPTS() [STI] in
>     system_call.
> 
>   - No explicit irq disabling overhead in the ret_from_sys_call fast 
>     path, i.e. no DISABLE_INTERRUPTS() [CLI].
> 
>   - No implicit irq enabling overhead in ret_from_sys_call's 
>     USERGS_SYSRET64: the SYSRETQ instruction would not have to 
>     re-enable irqs as the user-space IF in R11 would match that of the 
>     current IF.
> 
> whether that's an actual performance win in practice as well needs 
> to be measured, but I'd be (very!) shocked if it wasn't in the 20+ 
> cycles range: which is absolutely huge in terms of system_call 
> optimizations.

So just to quantify the potential 64-bit system call entry fast path 
performance savings a bit, I tried to simulate the effects in 
user-space via a 'best case' simulation, where we do a PUSHFQ+CLI+STI 
... CLI+POPFQ simulated syscall sequence (beginning and end 
sufficiently far from each other to not be interacting), on Intel 
family 6 model 62 CPUs (slightly dated but still relevant):

with irq disabling/enabling:

  new best speed: 2710739 loops (158 cycles per iteration).

fully preemptible:

  new best speed: 3389503 loops (113 cycles per iteration).

now that's an about 40 cycles difference, but admittedly the cost very 
much depends on the way we save flags and on the way we restore flags 
and depends on how intelligently the CPU can hide the irq disabling 
and the restoration amongst other processing it has to do on 
entry/exit, which it can do pretty well in a number of important 
cases.

I don't think I can simulate the real thing in user-space:

  - The hardest bit to simulate is SYSRET: POPFQ is expensive, but 
    SYSRET might be able to 'cheat' on the enabling side

  - I _think_ it cannot cheat because user-space might have come in 
    with irqs disabled itself (we still have iopl(3)), so it's a POPFQ
    equivalent instruction.

  - OTOH the CPU might be able to hide the latency of the POPFQ 
    amongst other SYSRET return work (which is significant) - so this 
    is really hard to estimate.

So "we'll have to try it to see it" :-/ [and maybe Intel knows.]

But even if just half of the suspected savings can be realized: a 20 
cycles speedup is very tempting IMHO, given that our 64-bit system 
calls cost around 110 cycles these days.

Yes, it's scary, crazy, potentially fragile, might not even work, etc. 
- but it's also very tempting nevertheless ...

So I'll try to write a prototype of this, just to be able to get some 
numbers - but shoot me down if you think I'm being stupid and if the 
concept is an absolute non-starter to begin with!

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-08  8:55                       ` Ingo Molnar
  2015-03-08 11:38                         ` Ingo Molnar
@ 2015-03-08 13:59                         ` Andy Lutomirski
  2015-03-08 14:38                           ` Andy Lutomirski
  1 sibling, 1 reply; 126+ messages in thread
From: Andy Lutomirski @ 2015-03-08 13:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Fenghua Yu, Quentin Casasnovas, Suresh Siddha, Dave Hansen,
	Linus Torvalds, Denys Vlasenko, Borislav Petkov, Oleg Nesterov,
	Rik van Riel, Pekka Riikonen, LKML

On Mar 8, 2015 4:55 AM, "Ingo Molnar" <mingo@kernel.org> wrote:
>
>
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> > On Sat, Mar 7, 2015 at 2:36 AM, Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > > We could save the same 10 cycles from page fault overhead as well,
> > > AFAICS.
> >
> > Are trap gates actually noticeably faster? Or is it just he
> > "conditional_sti()" you're worried about?
>
> ( I'll talk about the CR2 complication later, please ignore that
>   problem for a moment. )
>
> So I base my thinking on the following hierarchy of fast paths. In a
> typical x86 system there are 4 main types of 'context switches', in
> order of decreasing frequency:
>
>    - syscall    'context switch': entry/exit from syscall
>    - trap/fault 'context switch': entry/exit from trap/fault
>    - irq        'context switch': entry/exit from irqs
>    - task       'context switch': scheduler context-switch
>
> Where each successive level is about an order of magnitude less
> frequently executed on a typical system than the previous one, so to
> optimize a level we are willing to push overhead to the next one(s).
>
> So the primary payoff in executing much of the entry code with irqs
> enabled would be to allow 64-bit *syscalls* to be made without irqs
> disabled: the first, most important level of context entries.
>
> Doing that would give us four (theoretical) performance advantages:
>
>   - No implicit irq disabling overhead when the syscall instruction is
>     executed: we could change MSR_SYSCALL_MASK from 0xc0000084 to
>     0xc0000284, which removes the implicit CLI on syscall entry.
>
>   - No explicit irq enabling overhead via ENABLE_INTERRUPTS() [STI] in
>     system_call.
>
>   - No explicit irq disabling overhead in the ret_from_sys_call fast
>     path, i.e. no DISABLE_INTERRUPTS() [CLI].
>
>   - No implicit irq enabling overhead in ret_from_sys_call's
>     USERGS_SYSRET64: the SYSRETQ instruction would not have to
>     re-enable irqs as the user-space IF in R11 would match that of the
>     current IF.
>
> whether that's an actual performance win in practice as well needs to
> be measured, but I'd be (very!) shocked if it wasn't in the 20+ cycles
> range: which is absolutely huge in terms of system_call optimizations.
>
> Now do I think this could be done realistically? I think yes (by
> re-using the NMI code's paranoid entry codepaths for the irq code as
> well, essentially fixing up the effects of any partial entries in an
> irq entry slow path), but I could be wrong about that.
>
> My main worry isn't even feasibility but maintainability and general
> robustness: all these asm cleanups we are doing now could enable such
> a trick to be pulled off robustly.
>
> But I think it could be done technically, because the NMI code already
> has to be careful about 'preempting' partial entries, so we have the
> logic.
>
> Complications:
>
>  - We now enable double partial entries: partial syscall interrupted
>    by an irq interrupted by an NMI context. I think it should all work
>    out fine but it's a new scenario.
>
>  - We'd enable interruptible return from system call, which caused
>    trouble in the past. Solvable IMHO, by being careful in the irq
>    entry code.
>
>  - We'd now have to be extra careful about the possibility of
>    exceptions triggered in the entry/exit code itself, triggered by
>    any sort of unusual register content or MMU fault.
>
> Simplifications:
>
>  - I'd ruthlessly disable IRQs for any sort of non fast path: for
>    example 32-bit compat entries, ptrace or any other slowpath - at
>    least initially until we map out the long term effects of this
>    optimization.
>
> Does this scare me? Yes, I think it should scare any sane person, but
> I don't think it's all that bad: all the NMI paranoidentry work has
> already the trail blazed, and most of the races will be readily
> triggerable via regular irq loads, so it's not like we'll leave subtle
> bugs in there.
>
> Being able to do the same with certain traps, because irq entry is
> careful about partial entry state, would just be a secondary bonus.
>
> Regarding the CR2 value on page faults:
>
> > Anyway, for page faulting, we traditionally actually wanted an
> > interrupt gate, because of how we wanted to avoid interrupts coming
> > in and possibly messing up %cr2 due to vmalloc faults, but more
> > importantly for preemption. [...]
>
> Here too I think we could take a page from the NMI code: save cr2 in
> the page fault asm code, recognize from the irq code when we are
> interrupting that and dropping into a slowpath that saves cr2 right
> there. Potentially task-context-switching will be safe after that.
>
> Saving cr2 in the early page fault code should be much less of an
> overhead than what the IRQ disabling/enabling costs, so this should be
> a win. The potential win could be similar to that of system calls:
>
>   - Removal of an implicit 'CLI' in irq gates
>
>   - Removal of the explicit 'STI' in conditional_sti in your proposed
>     code
>
>   - Removal of an explicit 'CLI' (DISABLE_INTERRUPTS) in
>     error_exit/retint_swapgs.
>
>   - Removal of an implicit 'STI' when SYSRET enables interrupts from R11
>
> and the same savings would apply to FPU traps as well. I'd leave all
> other low frequency traps as interrupt gates: #GP, debug, int3, etc.
>
> > [...] vmalloc faults are "harmless" because we'll notice that it's
> > already done, return, and then re-take the real fault. But a
> > preemption event before we read %cr2 can cause bad things to happen:
> >
> >  - page fault pushes error code on stack, address in %cr2
> >
> >  - we don't have interrupts disabled, and some interrupt comes in and
> > causes preemption
> >
> >  - some other process runs, take another page fault. %cr2 now is the
> > wrong address
> >
> >  - we go back to the original thread (perhaps on another cpu), which
> > now reads %cr2 for the wrong address
> >
> >  - we send the process a SIGSEGV because we think it's accessing
> > memory that it has no place touching
>
> I think none of this corruption happens if an interrupting context is
> aware of this and 'fixes up' the entry state accordingly. Am I missing
> anything subtle perhaps?
>
> This would be arguably new (and tricky) code, as today the NMI code
> solves this problem by trying to never fault and thus never corrupt
> CR2. But ... unlike NMIs, this triggers so often via a mix of regular
> traps and regular irqs that I'm pretty sure it can be pulled off
> robustly.
>
> > So the page fault code actually *needs* interrupts disabled until we
> > read %cr2. Stupid x86 trap semantics where the error code is on the
> > thread-safe stack, but %cr2 is not.
> >
> > Maybe there is some trick I'm missing, but on the whole I think
> > "interrupt gate + conditional_sti()" does have things going for it.
> > Yes, it still leaves NMI as being special, but NMI really *is*
> > special.
>
> So I think it's all doable, the payoffs are significant in terms of
> entry speedups.
>
> But only on a clean code base, as the x86 entry assembly code was way
> beyond any sane maintainability threshold IMHO - it's fortunately
> improving rapidly these days due to the nice work from Andy and Denys!
>
> What do you think?

>From recent memory, rdmsr of the gs base is around 60 cycles.  I
haven't benchmarked cli and sti lately, but I found some references
suggesting 2-3 cycles each.

There's another problem, though:  We don't have a real stack pointer
just after syscall and just before sysexit, and therefore we *must*
use IST for anything that can happen while we still have
user-controlled rsp.  That includes #DB, #NM, and #MC.  With your
proposal, we'd need to add all the IRQs to that list.  We could do
that -- maybe we'd just switch back off the IST stack immediately, but
this gets nasty with paravirt, where we don't currently know the full
set of RIP values that have funny stack pointers.

Hmm.  We already use weird IRQ stacks, though.  Maybe we could bounce
directly from IST to IRQ stack on IRQ entry.  Of course, this gets
rather confused if we have nested IRQs, but that seems to already work
correctly, so maybe it's not a big deal.  The really gross part would
be scheduling on IRQ exit.

A different optimization that could be interesting: stop using swapgs
entirely. Instead we could use per-cpu pgds.  This hurts context
switches, possibly quite a lot, but it completely removes the swapgs
overhead and shrinks the kernel (no gs prefixes).

We could also use a GPR as a percpu pointer.  We'd just have to fish
it out somewhere on syscall entry -- maybe we could use percpu syscall
entries :)

This changes somewhat with rdgsbase and wrgsbase.  Andi was working on
that -- what happened to it?  It should me much simpler now with the
entry cleanups.

Grumble, AMD really messed up syscalls.  Also, swapgs is lousy.  They
should have given us separate user/kernel gsbase or separate
positive/negative pgds.

--Andy

>
> Thanks,
>
>         Ingo

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: math_state_restore() should not blindly disable irqs
  2015-03-08 13:59                         ` Andy Lutomirski
@ 2015-03-08 14:38                           ` Andy Lutomirski
  0 siblings, 0 replies; 126+ messages in thread
From: Andy Lutomirski @ 2015-03-08 14:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Fenghua Yu, Quentin Casasnovas, Suresh Siddha, Dave Hansen,
	Linus Torvalds, Denys Vlasenko, Borislav Petkov, Oleg Nesterov,
	Rik van Riel, Pekka Riikonen, LKML

On Sun, Mar 8, 2015 at 6:59 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> There's another problem, though:  We don't have a real stack pointer
> just after syscall and just before sysexit, and therefore we *must*
> use IST for anything that can happen while we still have
> user-controlled rsp.  That includes #DB, #NM, and #MC.

I think I faked myself out here.  Why do #DB and #BP use IST?  We
could remove a decent (large?) amount of crud if we changed that.

--Andy

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: x86/fpu: avoid math_state_restore() without used_math() in __restore_xstate_sig()
  2015-03-07 15:38     ` [PATCH 1/1] " Oleg Nesterov
@ 2015-03-09 14:07       ` Borislav Petkov
  2015-03-09 14:34         ` Oleg Nesterov
  2015-03-16 12:07       ` [tip:x86/urgent] x86/fpu: Avoid " tip-bot for Oleg Nesterov
  1 sibling, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-09 14:07 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Sat, Mar 07, 2015 at 04:38:44PM +0100, Oleg Nesterov wrote:
> math_state_restore() assumes it is called with irqs disabled, but
> this is not true if the caller is __restore_xstate_sig().
> 
> This means that if ia32_fxstate == T and __copy_from_user() fails
> __restore_xstate_sig() returns with irqs disabled too. This trgiggers
> 
> 	BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41
> 	[<ffffffff81381499>] dump_stack+0x59/0xa0
> 	[<ffffffff8106bd05>] ___might_sleep+0x105/0x110
> 	[<ffffffff8138786d>] ? _raw_spin_unlock_irqrestore+0x3d/0x70
> 	[<ffffffff8106bd8d>] __might_sleep+0x7d/0xb0
> 	[<ffffffff81385426>] down_read+0x26/0xa0
> 	[<ffffffff8138788a>] ? _raw_spin_unlock_irqrestore+0x5a/0x70
> 	[<ffffffff81136038>] print_vma_addr+0x58/0x130
> 	[<ffffffff8100239e>] signal_fault+0xbe/0xf0
> 	[<ffffffff810419aa>] sys32_rt_sigreturn+0xba/0xd0
> 
> Change __restore_xstate_sig() to call set_used_math() unconditionally,
> this avoids sti/cli in math_state_restore(). If copy_from_user() fails
> we can simply do fpu_finit() by hand.
> 
> Note: this is only the first step. math_state_restore() should not check
> used_math(), it should set this flag. While init_fpu() should simply die.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Cc: <stable@vger.kernel.org>

Makes sense to me. I guess we should wait for Dave to test first though...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: x86/fpu: avoid math_state_restore() without used_math() in __restore_xstate_sig()
  2015-03-09 14:07       ` Borislav Petkov
@ 2015-03-09 14:34         ` Oleg Nesterov
  2015-03-09 15:18           ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-09 14:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/09, Borislav Petkov wrote:
>
> On Sat, Mar 07, 2015 at 04:38:44PM +0100, Oleg Nesterov wrote:
> > math_state_restore() assumes it is called with irqs disabled, but
> > this is not true if the caller is __restore_xstate_sig().
> >
> > This means that if ia32_fxstate == T and __copy_from_user() fails
> > __restore_xstate_sig() returns with irqs disabled too. This trgiggers
> >
> > 	BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41
> > 	[<ffffffff81381499>] dump_stack+0x59/0xa0
> > 	[<ffffffff8106bd05>] ___might_sleep+0x105/0x110
> > 	[<ffffffff8138786d>] ? _raw_spin_unlock_irqrestore+0x3d/0x70
> > 	[<ffffffff8106bd8d>] __might_sleep+0x7d/0xb0
> > 	[<ffffffff81385426>] down_read+0x26/0xa0
> > 	[<ffffffff8138788a>] ? _raw_spin_unlock_irqrestore+0x5a/0x70
> > 	[<ffffffff81136038>] print_vma_addr+0x58/0x130
> > 	[<ffffffff8100239e>] signal_fault+0xbe/0xf0
> > 	[<ffffffff810419aa>] sys32_rt_sigreturn+0xba/0xd0
> >
> > Change __restore_xstate_sig() to call set_used_math() unconditionally,
> > this avoids sti/cli in math_state_restore(). If copy_from_user() fails
> > we can simply do fpu_finit() by hand.
> >
> > Note: this is only the first step. math_state_restore() should not check
> > used_math(), it should set this flag. While init_fpu() should simply die.
> >
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> > Cc: <stable@vger.kernel.org>
>
> Makes sense to me. I guess we should wait for Dave to test first though...

The patch only fixes the problem with irqs disabled, I tested this.

The problem with fpu_init/XRSTORS is another thing...

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: x86/fpu: avoid math_state_restore() without used_math() in __restore_xstate_sig()
  2015-03-09 14:34         ` Oleg Nesterov
@ 2015-03-09 15:18           ` Borislav Petkov
  2015-03-09 16:24             ` Oleg Nesterov
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-09 15:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Mon, Mar 09, 2015 at 03:34:36PM +0100, Oleg Nesterov wrote:
> The patch only fixes the problem with irqs disabled, I tested this.
> 
> The problem with fpu_init/XRSTORS is another thing...

Yet another thing?! Oh boy.

So first Dave reported the #GP, which got fixed by Quentin's patch.
After it, it triggered the rwsem.c bug above. If this patch fixes it,
what is the third problem?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: x86/fpu: avoid math_state_restore() without used_math() in __restore_xstate_sig()
  2015-03-09 15:18           ` Borislav Petkov
@ 2015-03-09 16:24             ` Oleg Nesterov
  2015-03-09 16:53               ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-09 16:24 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/09, Borislav Petkov wrote:
>
> On Mon, Mar 09, 2015 at 03:34:36PM +0100, Oleg Nesterov wrote:
> > The patch only fixes the problem with irqs disabled, I tested this.
> >
> > The problem with fpu_init/XRSTORS is another thing...
>
> Yet another thing?! Oh boy.

Well, this is the same thinhg reported by Dave ;)

> So first Dave reported the #GP, which got fixed by Quentin's patch.

It is not fixed by Quentin's patch.

This patch "fixes" the problem in a sense that the kernel won't crash
after restore_fpu_checking() triggers #GP. Before this patch
do_general_protection()->fixup_exception() does not work in this case
and the kernel panics.

But restore_fpu_checking() should not trigger #GP (and fail).


And just in case... tip/x86/fpu still won't work even with the patch
from Quentin. Again, the kernel won't crash, but /sbin/init will be
killed by SIGSEGV I guess. Because restore_fpu_checking() will fail.

I'll change flush_thread() to rely on init_xstate_buf, I was going to
do this anyway. But this too doesn't fix the problem: fpu_finit() is
buggy on Dave's machine.

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: x86/fpu: avoid math_state_restore() without used_math() in __restore_xstate_sig()
  2015-03-09 16:24             ` Oleg Nesterov
@ 2015-03-09 16:53               ` Borislav Petkov
  2015-03-09 17:05                 ` Oleg Nesterov
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-09 16:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Mon, Mar 09, 2015 at 05:24:04PM +0100, Oleg Nesterov wrote:
> Well, this is the same thinhg reported by Dave ;)

Am I the only one in thinking that this FPU thing went completely nuts?!
Just kill it all and let's start from scratch. :-P

> It is not fixed by Quentin's patch.
> 
> This patch "fixes" the problem in a sense that the kernel won't crash
> after restore_fpu_checking() triggers #GP. Before this patch
> do_general_protection()->fixup_exception() does not work in this case
> and the kernel panics.
> 
> But restore_fpu_checking() should not trigger #GP (and fail).

Right.

...

> But this too doesn't fix the problem: fpu_finit() is buggy on Dave's
> machine.

In talking about fpu_finit() - I still have two cleanup patches queued
from you:

http://git.kernel.org/cgit/linux/kernel/git/bp/bp.git/log/?h=tip-x86-fpu-2

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: x86/fpu: avoid math_state_restore() without used_math() in __restore_xstate_sig()
  2015-03-09 16:53               ` Borislav Petkov
@ 2015-03-09 17:05                 ` Oleg Nesterov
  2015-03-09 17:23                   ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-09 17:05 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/09, Borislav Petkov wrote:
>
> On Mon, Mar 09, 2015 at 05:24:04PM +0100, Oleg Nesterov wrote:
> > Well, this is the same thinhg reported by Dave ;)
>
> Am I the only one in thinking that this FPU thing went completely nuts?!
> Just kill it all and let's start from scratch. :-P

And I just found another problem which needs the urgent/stable fix ;)
I'll send the simple patch in a minute...

> > But this too doesn't fix the problem: fpu_finit() is buggy on Dave's
> > machine.
>
> In talking about fpu_finit() - I still have two cleanup patches queued
> from you:
>
> http://git.kernel.org/cgit/linux/kernel/git/bp/bp.git/log/?h=tip-x86-fpu-2

Yes, thanks, but this is just a cosmetic cleanup.

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH] x86/fpu: drop_fpu() should not assume that tsk == current
  2015-03-04 18:30 Oops with tip/x86/fpu Dave Hansen
                   ` (2 preceding siblings ...)
  2015-03-05 20:35 ` [PATCH 0/1] x86/fpu: math_state_restore() should not blindly disable irqs Oleg Nesterov
@ 2015-03-09 17:10 ` Oleg Nesterov
  2015-03-09 17:36   ` Rik van Riel
                     ` (2 more replies)
  2015-03-11 17:33 ` [PATCH 0/4] x86/fpu: avoid math_state_restore() on kthread exec Oleg Nesterov
                   ` (2 subsequent siblings)
  6 siblings, 3 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-09 17:10 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov, Ingo Molnar
  Cc: Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, Quentin Casasnovas

drop_fpu() does clear_used_math() and usually this is correct because
tsk == current. However switch_fpu_finish()->restore_fpu_checking() is
called before it updates the "current_task" variable. If it fails, we
will wrongly clear the PF_USED_MATH flag of the previous task.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
---
 arch/x86/include/asm/fpu-internal.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h
index 99a2067..81c86fa 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -336,7 +336,7 @@ static inline void drop_fpu(struct task_struct *tsk)
 	preempt_disable();
 	tsk->thread.fpu_counter = 0;
 	__drop_fpu(tsk);
-	clear_used_math();
+	clear_stopped_child_used_math(tsk);
 	preempt_enable();
 }
 
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/fpu: x86/fpu: avoid math_state_restore() without used_math() in __restore_xstate_sig()
  2015-03-09 17:05                 ` Oleg Nesterov
@ 2015-03-09 17:23                   ` Borislav Petkov
  0 siblings, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2015-03-09 17:23 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Mon, Mar 09, 2015 at 06:05:43PM +0100, Oleg Nesterov wrote:
> And I just found another problem which needs the urgent/stable fix ;)
> I'll send the simple patch in a minute...

Ok, let me setup an fpu-urgent branch and collect everything there so
that we have some idea of what goes where...

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH] x86/fpu: drop_fpu() should not assume that tsk == current
  2015-03-09 17:10 ` [PATCH] x86/fpu: drop_fpu() should not assume that tsk == current Oleg Nesterov
@ 2015-03-09 17:36   ` Rik van Riel
  2015-03-09 17:48   ` Borislav Petkov
  2015-03-16 12:07   ` [tip:x86/urgent] x86/fpu: Drop_fpu() should not assume that tsk equals current tip-bot for Oleg Nesterov
  2 siblings, 0 replies; 126+ messages in thread
From: Rik van Riel @ 2015-03-09 17:36 UTC (permalink / raw)
  To: Oleg Nesterov, Dave Hansen, Borislav Petkov, Ingo Molnar
  Cc: Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Suresh Siddha,
	LKML, Yu, Fenghua, Quentin Casasnovas

On 03/09/2015 01:10 PM, Oleg Nesterov wrote:
> drop_fpu() does clear_used_math() and usually this is correct because
> tsk == current. However switch_fpu_finish()->restore_fpu_checking() is
> called before it updates the "current_task" variable. If it fails, we
> will wrongly clear the PF_USED_MATH flag of the previous task.

Ouch. Good catch.

> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Cc: <stable@vger.kernel.org>

Reviewed-by: Rik van Riel <riel@redhat.com>


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH] x86/fpu: drop_fpu() should not assume that tsk == current
  2015-03-09 17:10 ` [PATCH] x86/fpu: drop_fpu() should not assume that tsk == current Oleg Nesterov
  2015-03-09 17:36   ` Rik van Riel
@ 2015-03-09 17:48   ` Borislav Petkov
  2015-03-09 18:06     ` Oleg Nesterov
  2015-03-16 12:07   ` [tip:x86/urgent] x86/fpu: Drop_fpu() should not assume that tsk equals current tip-bot for Oleg Nesterov
  2 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-09 17:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Mon, Mar 09, 2015 at 06:10:41PM +0100, Oleg Nesterov wrote:
> drop_fpu() does clear_used_math() and usually this is correct because
> tsk == current. However switch_fpu_finish()->restore_fpu_checking() is
> called before it updates the "current_task" variable. If it fails, we

You mean here "... before __switch_to() updates the current_task ... ",
I assume?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH] x86/fpu: drop_fpu() should not assume that tsk == current
  2015-03-09 17:48   ` Borislav Petkov
@ 2015-03-09 18:06     ` Oleg Nesterov
  2015-03-09 18:10       ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-09 18:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/09, Borislav Petkov wrote:
>
> On Mon, Mar 09, 2015 at 06:10:41PM +0100, Oleg Nesterov wrote:
> > drop_fpu() does clear_used_math() and usually this is correct because
> > tsk == current. However switch_fpu_finish()->restore_fpu_checking() is
> > called before it updates the "current_task" variable. If it fails, we
>
> You mean here "... before __switch_to() updates the current_task ... ",
> I assume?

Yes... should I resend?

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH] x86/fpu: drop_fpu() should not assume that tsk == current
  2015-03-09 18:06     ` Oleg Nesterov
@ 2015-03-09 18:10       ` Borislav Petkov
  0 siblings, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2015-03-09 18:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Mon, Mar 09, 2015 at 07:06:47PM +0100, Oleg Nesterov wrote:
> Yes... should I resend?

Nah, already fixed up and uploaded to

http://git.kernel.org/cgit/linux/kernel/git/bp/bp.git/log/?h=tip-x86-fpu-urgent

I'm going to collect the fpu urgent-only stuff there so that we don't
lose track of the patches flying back and forth.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH 0/4] x86/fpu: avoid math_state_restore() on kthread exec
  2015-03-04 18:30 Oops with tip/x86/fpu Dave Hansen
                   ` (3 preceding siblings ...)
  2015-03-09 17:10 ` [PATCH] x86/fpu: drop_fpu() should not assume that tsk == current Oleg Nesterov
@ 2015-03-11 17:33 ` Oleg Nesterov
  2015-03-11 17:34   ` [PATCH 1/4] x86/fpu: document user_fpu_begin() Oleg Nesterov
                     ` (3 more replies)
  2015-03-13 18:26 ` [PATCH 0/1] x86/cpu: don't allocate fpu->state for swapper/0 Oleg Nesterov
  2015-03-15 16:49 ` [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user Oleg Nesterov
  6 siblings, 4 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-11 17:33 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov, Ingo Molnar
  Cc: Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, Quentin Casasnovas

Hello.

This should "fix" the kernel crash observed by Dave.

But let me repeat once again, the problem is that fpu_finit() is buggy
on Dave's machine. This series should "hide" this problem, we need to
fix it later anyway.

I was going to do these changes anyway, math_state_restore() was only
used because we did not have the necessary helpers. I was going to start
with init_fpu() cleanups, but since math_state_restore() makes this
fpu_finit() bug more visible lets remove it first.

Note that init_fpu() + user_fpu_begin() is racy, used_math() is already
set so __switch_to() in between can do restore_fpu_checking() too and
trigger the same GPF. But this is fine (to some degree), the task won't
be killed. And this is just another proof that init_fpu() should not
set used_math() and it and its users need more cleanups.

More to come tomorrow.

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH 1/4] x86/fpu: document user_fpu_begin()
  2015-03-11 17:33 ` [PATCH 0/4] x86/fpu: avoid math_state_restore() on kthread exec Oleg Nesterov
@ 2015-03-11 17:34   ` Oleg Nesterov
  2015-03-13  9:47     ` Borislav Petkov
  2015-03-23 12:20     ` [tip:x86/fpu] x86/fpu: Document user_fpu_begin() tip-bot for Oleg Nesterov
  2015-03-11 17:34   ` [PATCH 2/4] x86/fpu: introduce restore_init_xstate() Oleg Nesterov
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-11 17:34 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov, Ingo Molnar
  Cc: Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, Quentin Casasnovas

Currently user_fpu_begin() has a single caller and it is not clear that
why do we actually need it, and why we should not worry about preemption
right after preempt_enable().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/include/asm/fpu-internal.h |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h
index 4bec98f..c615ae9 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -464,7 +464,9 @@ static inline int restore_xstate_sig(void __user *buf, int ia32_frame)
  * Need to be preemption-safe.
  *
  * NOTE! user_fpu_begin() must be used only immediately before restoring
- * it. This function does not do any save/restore on their own.
+ * it. This function does not do any save/restore on its own. In a lazy
+ * fpu mode this is just optimization to avoid a dna fault, the task can
+ * lose FPU right after preempt_enable().
  */
 static inline void user_fpu_begin(void)
 {
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH 2/4] x86/fpu: introduce restore_init_xstate()
  2015-03-11 17:33 ` [PATCH 0/4] x86/fpu: avoid math_state_restore() on kthread exec Oleg Nesterov
  2015-03-11 17:34   ` [PATCH 1/4] x86/fpu: document user_fpu_begin() Oleg Nesterov
@ 2015-03-11 17:34   ` Oleg Nesterov
  2015-03-13 10:34     ` Borislav Petkov
  2015-03-23 12:20     ` [tip:x86/fpu] x86/fpu: Introduce restore_init_xstate() tip-bot for Oleg Nesterov
  2015-03-11 17:34   ` [PATCH 3/4] x86/fpu: use restore_init_xstate() instead of math_state_restore() on kthread exec Oleg Nesterov
  2015-03-11 17:35   ` [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread() Oleg Nesterov
  3 siblings, 2 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-11 17:34 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov, Ingo Molnar
  Cc: Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, Quentin Casasnovas

Extract the "use_eager_fpu()" code from drop_init_fpu() into the new
simple helper, restore_init_xstate(). The next patch adds another user.

- It is not clear why we do not check use_fxsr() like fpu_restore_checking()
  does. eager_fpu_init_bp() calls setup_init_fpu_buf() too, and we have the
  "eagerfpu=on" kernel option.

- Ignoring the fact that init_xstate_buf is "struct xsave_struct *", not
  "union thread_xstate *", it is not clear why we can not simply use
  fpu_restore_checking() and avoid the code duplication.

- It is not clear why we can't call setup_init_fpu_buf() unconditionally
  to always create init_xstate_buf(). Then do_device_not_available() path
  (at least) could use restore_init_xstate() too. It doesn't need to init
  fpu->state, its content doesn't matter until unlazy_fpu/__switch_to/etc
  which overwrites this memory anyway.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/include/asm/fpu-internal.h |   16 ++++++++++------
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h
index c615ae9..d1f8472 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -340,16 +340,20 @@ static inline void drop_fpu(struct task_struct *tsk)
 	preempt_enable();
 }
 
+static inline void restore_init_xstate(void)
+{
+	if (use_xsave())
+		xrstor_state(init_xstate_buf, -1);
+	else
+		fxrstor_checking(&init_xstate_buf->i387);
+}
+
 static inline void drop_init_fpu(struct task_struct *tsk)
 {
 	if (!use_eager_fpu())
 		drop_fpu(tsk);
-	else {
-		if (use_xsave())
-			xrstor_state(init_xstate_buf, -1);
-		else
-			fxrstor_checking(&init_xstate_buf->i387);
-	}
+	else
+		restore_init_xstate();
 }
 
 /*
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH 3/4] x86/fpu: use restore_init_xstate() instead of math_state_restore() on kthread exec
  2015-03-11 17:33 ` [PATCH 0/4] x86/fpu: avoid math_state_restore() on kthread exec Oleg Nesterov
  2015-03-11 17:34   ` [PATCH 1/4] x86/fpu: document user_fpu_begin() Oleg Nesterov
  2015-03-11 17:34   ` [PATCH 2/4] x86/fpu: introduce restore_init_xstate() Oleg Nesterov
@ 2015-03-11 17:34   ` Oleg Nesterov
  2015-03-13 10:48     ` Borislav Petkov
  2015-03-23 12:21     ` [tip:x86/fpu] x86/fpu: Use " tip-bot for Oleg Nesterov
  2015-03-11 17:35   ` [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread() Oleg Nesterov
  3 siblings, 2 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-11 17:34 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov, Ingo Molnar
  Cc: Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, Quentin Casasnovas

Change flush_thread() to do user_fpu_begin() + restore_init_xstate()
and avoid math_state_restore().

Note: "TODO: cleanup this horror" is still valid. We do not need
init_fpu() at all, we only need fpu_alloc() + memset(0). But this needs
other changes, in particular user_fpu_begin() should set used_math().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/kernel/process.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index dd9a069..c396de2 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -142,7 +142,8 @@ void flush_thread(void)
 		/* kthread execs. TODO: cleanup this horror. */
 		if (WARN_ON(init_fpu(current)))
 			force_sig(SIGKILL, current);
-		math_state_restore();
+		user_fpu_begin();
+		restore_init_xstate();
 	}
 }
 
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-11 17:33 ` [PATCH 0/4] x86/fpu: avoid math_state_restore() on kthread exec Oleg Nesterov
                     ` (2 preceding siblings ...)
  2015-03-11 17:34   ` [PATCH 3/4] x86/fpu: use restore_init_xstate() instead of math_state_restore() on kthread exec Oleg Nesterov
@ 2015-03-11 17:35   ` Oleg Nesterov
  2015-03-13 10:52     ` Borislav Petkov
  2015-03-13 17:30     ` [PATCH v2 " Oleg Nesterov
  3 siblings, 2 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-11 17:35 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov, Ingo Molnar
  Cc: Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, Quentin Casasnovas

drop_init_fpu() makes no sense. We need drop_fpu() and only if
!use_eager_fpu().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/kernel/process.c |   11 ++++-------
 1 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index c396de2..2e71120 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -131,14 +131,11 @@ void flush_thread(void)
 	flush_ptrace_hw_breakpoint(tsk);
 	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
 
-	drop_init_fpu(tsk);
-	/*
-	 * Free the FPU state for non xsave platforms. They get reallocated
-	 * lazily at the first use.
-	 */
-	if (!use_eager_fpu())
+	if (!use_eager_fpu()) {
+		/* FPU state will be reallocated lazily at the first use. */
+		drop_fpu(tsk);
 		free_thread_xstate(tsk);
-	else if (!used_math()) {
+	} else if (!used_math()) {
 		/* kthread execs. TODO: cleanup this horror. */
 		if (WARN_ON(init_fpu(current)))
 			force_sig(SIGKILL, current);
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/4] x86/fpu: document user_fpu_begin()
  2015-03-11 17:34   ` [PATCH 1/4] x86/fpu: document user_fpu_begin() Oleg Nesterov
@ 2015-03-13  9:47     ` Borislav Petkov
  2015-03-13 14:34       ` Oleg Nesterov
  2015-03-23 12:20     ` [tip:x86/fpu] x86/fpu: Document user_fpu_begin() tip-bot for Oleg Nesterov
  1 sibling, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-13  9:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Wed, Mar 11, 2015 at 06:34:09PM +0100, Oleg Nesterov wrote:
> Currently user_fpu_begin() has a single caller and it is not clear that
> why do we actually need it, and why we should not worry about preemption
> right after preempt_enable().
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  arch/x86/include/asm/fpu-internal.h |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h
> index 4bec98f..c615ae9 100644
> --- a/arch/x86/include/asm/fpu-internal.h
> +++ b/arch/x86/include/asm/fpu-internal.h
> @@ -464,7 +464,9 @@ static inline int restore_xstate_sig(void __user *buf, int ia32_frame)
>   * Need to be preemption-safe.
>   *
>   * NOTE! user_fpu_begin() must be used only immediately before restoring
> - * it. This function does not do any save/restore on their own.
> + * it. This function does not do any save/restore on its own. In a lazy
> + * fpu mode this is just optimization to avoid a dna fault, the task can
> + * lose FPU right after preempt_enable().
>   */

I cleaned it up a bit more, if you don't mind:

---
From: Oleg Nesterov <oleg@redhat.com>
Date: Wed, 11 Mar 2015 18:34:09 +0100
Subject: [PATCH] x86/fpu: Document user_fpu_begin()

Currently, user_fpu_begin() has a single caller and it is not clear why
do we actually need it and why we should not worry about preemption
right after preempt_enable().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/20150311173409.GC5032@redhat.com
Signed-off-by: 
---
 arch/x86/include/asm/fpu-internal.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h
index 810f20fd4e4e..e8ee3da3b924 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -508,10 +508,12 @@ static inline int restore_xstate_sig(void __user *buf, int ia32_frame)
 }
 
 /*
- * Need to be preemption-safe.
+ * Needs to be preemption-safe.
  *
  * NOTE! user_fpu_begin() must be used only immediately before restoring
- * it. This function does not do any save/restore on their own.
+ * the save state. It does not do any saving/restoring on its own. In
+ * lazy FPU mode, it is just an optimization to avoid a #NM exception,
+ * the task can lose the FPU right after preempt_enable().
  */
 static inline void user_fpu_begin(void)
 {
-- 

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH 2/4] x86/fpu: introduce restore_init_xstate()
  2015-03-11 17:34   ` [PATCH 2/4] x86/fpu: introduce restore_init_xstate() Oleg Nesterov
@ 2015-03-13 10:34     ` Borislav Petkov
  2015-03-13 14:39       ` Oleg Nesterov
  2015-03-23 12:20     ` [tip:x86/fpu] x86/fpu: Introduce restore_init_xstate() tip-bot for Oleg Nesterov
  1 sibling, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-13 10:34 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Wed, Mar 11, 2015 at 06:34:29PM +0100, Oleg Nesterov wrote:
> Extract the "use_eager_fpu()" code from drop_init_fpu() into the new
> simple helper, restore_init_xstate(). The next patch adds another user.
> 
> - It is not clear why we do not check use_fxsr() like fpu_restore_checking()
>   does.

Tell me about it.

> - It is not clear why we can't call setup_init_fpu_buf() unconditionally
>   to always create init_xstate_buf().

I also don't understand what the thought behind xstate_enable_boot_cpu()
and eager_fpu_init_bp() - we do call xstate_enable_boot_cpu() and alloc
init_xstate_buf and then when we come to

eager_fpu_init
|-> eager_fpu_init_bp

we get to init it if not initted yet.

When can that ever happen?

> Then do_device_not_available() path
>   (at least) could use restore_init_xstate() too. It doesn't need to init
>   fpu->state, its content doesn't matter until unlazy_fpu/__switch_to/etc
>   which overwrites this memory anyway.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  arch/x86/include/asm/fpu-internal.h |   16 ++++++++++------
>  1 files changed, 10 insertions(+), 6 deletions(-)

Applied, thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/4] x86/fpu: use restore_init_xstate() instead of math_state_restore() on kthread exec
  2015-03-11 17:34   ` [PATCH 3/4] x86/fpu: use restore_init_xstate() instead of math_state_restore() on kthread exec Oleg Nesterov
@ 2015-03-13 10:48     ` Borislav Petkov
  2015-03-13 14:45       ` Oleg Nesterov
  2015-03-23 12:21     ` [tip:x86/fpu] x86/fpu: Use " tip-bot for Oleg Nesterov
  1 sibling, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-13 10:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Wed, Mar 11, 2015 at 06:34:49PM +0100, Oleg Nesterov wrote:
> Change flush_thread() to do user_fpu_begin() + restore_init_xstate()
> and avoid math_state_restore().
> 
> Note: "TODO: cleanup this horror" is still valid. We do not need
> init_fpu() at all, we only need fpu_alloc() + memset(0). But this needs
> other changes, in particular user_fpu_begin() should set used_math().
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  arch/x86/kernel/process.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index dd9a069..c396de2 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -142,7 +142,8 @@ void flush_thread(void)
>  		/* kthread execs. TODO: cleanup this horror. */
>  		if (WARN_ON(init_fpu(current)))
>  			force_sig(SIGKILL, current);
> -		math_state_restore();
> +		user_fpu_begin();
> +		restore_init_xstate();

Ok, question: so math_state_restore() does kernel_fpu_disable() before
doing those, why is it ok for flush_thread() to not do it?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-11 17:35   ` [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread() Oleg Nesterov
@ 2015-03-13 10:52     ` Borislav Petkov
  2015-03-13 14:55       ` Oleg Nesterov
  2015-03-13 17:30     ` [PATCH v2 " Oleg Nesterov
  1 sibling, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-13 10:52 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Wed, Mar 11, 2015 at 06:35:07PM +0100, Oleg Nesterov wrote:
> drop_init_fpu() makes no sense. We need drop_fpu() and only if

Oh, please explain why. I can try to rhyme it up as something like "we
don't need to restore FPU context when flushing the thread" but I'm not
sure...

> !use_eager_fpu().
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  arch/x86/kernel/process.c |   11 ++++-------
>  1 files changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index c396de2..2e71120 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -131,14 +131,11 @@ void flush_thread(void)
>  	flush_ptrace_hw_breakpoint(tsk);
>  	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
>  
> -	drop_init_fpu(tsk);
> -	/*
> -	 * Free the FPU state for non xsave platforms. They get reallocated
> -	 * lazily at the first use.
> -	 */
> -	if (!use_eager_fpu())
> +	if (!use_eager_fpu()) {
> +		/* FPU state will be reallocated lazily at the first use. */
> +		drop_fpu(tsk);
>  		free_thread_xstate(tsk);
> -	else if (!used_math()) {
> +	} else if (!used_math()) {
>  		/* kthread execs. TODO: cleanup this horror. */
>  		if (WARN_ON(init_fpu(current)))
>  			force_sig(SIGKILL, current);

Also, can we clean up the tsk/current usage here?

We assign current to tsk and we work with it but then later use current
again. Needlessly confusing.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/4] x86/fpu: document user_fpu_begin()
  2015-03-13  9:47     ` Borislav Petkov
@ 2015-03-13 14:34       ` Oleg Nesterov
  0 siblings, 0 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-13 14:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/13, Borislav Petkov wrote:
>
> On Wed, Mar 11, 2015 at 06:34:09PM +0100, Oleg Nesterov wrote:
> > @@ -464,7 +464,9 @@ static inline int restore_xstate_sig(void __user *buf, int ia32_frame)
> >   * Need to be preemption-safe.
> >   *
> >   * NOTE! user_fpu_begin() must be used only immediately before restoring
> > - * it. This function does not do any save/restore on their own.
> > + * it. This function does not do any save/restore on its own. In a lazy
> > + * fpu mode this is just optimization to avoid a dna fault, the task can
> > + * lose FPU right after preempt_enable().
> >   */
>
> I cleaned it up a bit more, if you don't mind:
...

>  /*
> - * Need to be preemption-safe.
> + * Needs to be preemption-safe.
>   *
>   * NOTE! user_fpu_begin() must be used only immediately before restoring
> - * it. This function does not do any save/restore on their own.
> + * the save state. It does not do any saving/restoring on its own. In
> + * lazy FPU mode, it is just an optimization to avoid a #NM exception,
> + * the task can lose the FPU right after preempt_enable().
>   */

Thank!

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 2/4] x86/fpu: introduce restore_init_xstate()
  2015-03-13 10:34     ` Borislav Petkov
@ 2015-03-13 14:39       ` Oleg Nesterov
  2015-03-13 15:20         ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-13 14:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/13, Borislav Petkov wrote:
>
> > - It is not clear why we can't call setup_init_fpu_buf() unconditionally
> >   to always create init_xstate_buf().
>
> I also don't understand what the thought behind xstate_enable_boot_cpu()
> and eager_fpu_init_bp() - we do call xstate_enable_boot_cpu() and alloc
> init_xstate_buf and then when we come to
>
> eager_fpu_init
> |-> eager_fpu_init_bp
>
> we get to init it if not initted yet.
>
> When can that ever happen?

This too needs cleanups. But later ;)

Note that xstate_enable_boot_cpu is not called if !cpu_has_xsave, see the
check in xsave_init(). Howver, eagerfpu=on will force eager_fpu_init() which
calls eager_fpu_init_bp().

Btw, I was also going to kill eager_fpu_init_bp(). Probably I will send the
patch today.

> Applied, thanks.

Thanks!

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/4] x86/fpu: use restore_init_xstate() instead of math_state_restore() on kthread exec
  2015-03-13 10:48     ` Borislav Petkov
@ 2015-03-13 14:45       ` Oleg Nesterov
  2015-03-13 15:51         ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-13 14:45 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/13, Borislav Petkov wrote:
>
> On Wed, Mar 11, 2015 at 06:34:49PM +0100, Oleg Nesterov wrote:
> > Change flush_thread() to do user_fpu_begin() + restore_init_xstate()
> > and avoid math_state_restore().
> >
> > Note: "TODO: cleanup this horror" is still valid. We do not need
> > init_fpu() at all, we only need fpu_alloc() + memset(0). But this needs
> > other changes, in particular user_fpu_begin() should set used_math().
> >
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> > ---
> >  arch/x86/kernel/process.c |    3 ++-
> >  1 files changed, 2 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index dd9a069..c396de2 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -142,7 +142,8 @@ void flush_thread(void)
> >  		/* kthread execs. TODO: cleanup this horror. */
> >  		if (WARN_ON(init_fpu(current)))
> >  			force_sig(SIGKILL, current);
> > -		math_state_restore();
> > +		user_fpu_begin();
> > +		restore_init_xstate();
>
> Ok, question: so math_state_restore() does kernel_fpu_disable() before
> doing those, why is it ok for flush_thread() to not do it?

You mean, why restore_init_xstate() is safe?

Because in math_state_restore() case kernel_fpu_begin()->__save_init_fpu()
will overwrite (corrupt) the same fpu->state buffer we need to restore.
Without kernel_fpu_disable().

restore_init_xstate() obviously differs because it reads init_xstate_buf,
we do not care at all if kernel_fpu_begin() in between overwrites ->state.

And note! this is the yet another proof that init_fpu()->fpu_finit() is
pointless. This (and almost all) users need fpu_alloc() only.

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-13 10:52     ` Borislav Petkov
@ 2015-03-13 14:55       ` Oleg Nesterov
  2015-03-13 16:19         ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-13 14:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/13, Borislav Petkov wrote:
>
> On Wed, Mar 11, 2015 at 06:35:07PM +0100, Oleg Nesterov wrote:
> > drop_init_fpu() makes no sense. We need drop_fpu() and only if
>
> Oh, please explain why. I can try to rhyme it up as something like "we
> don't need to restore FPU context when flushing the thread" but I'm not
> sure...

Hmm. The changelog could be more clear. I'll send v2.

But please look at drop_init_fpu(). If eagerfpu == F it calls drop_fpu() and
this is what we need. flush_thread() already has the "if (!use_eager_fpu())",
we can shift drop_fpu() there.

Otherwise, if eagerfpu == T, drop_init_fpu() does restore_init_xstate() and
this just burns CPU. Until flush_thread user_has_fpu/used_math this state
restore_init_xstate() is pointless, this state will be lost after preemption.

> > +	} else if (!used_math()) {
> >  		/* kthread execs. TODO: cleanup this horror. */
> >  		if (WARN_ON(init_fpu(current)))
> >  			force_sig(SIGKILL, current);
>
> Also, can we clean up the tsk/current usage here?
>
> We assign current to tsk and we work with it but then later use current
> again. Needlessly confusing.

Agreed, will do.

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 2/4] x86/fpu: introduce restore_init_xstate()
  2015-03-13 14:39       ` Oleg Nesterov
@ 2015-03-13 15:20         ` Borislav Petkov
  2015-03-16 19:05           ` Rik van Riel
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-13 15:20 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Fri, Mar 13, 2015 at 03:39:28PM +0100, Oleg Nesterov wrote:
> This too needs cleanups. But later ;)
> 
> Note that xstate_enable_boot_cpu is not called if !cpu_has_xsave, see the
> check in xsave_init(). Howver, eagerfpu=on will force eager_fpu_init() which
> calls eager_fpu_init_bp().

Yahaa.

This FPU cleanup fun will keep us busy until Christmas.

> Btw, I was also going to kill eager_fpu_init_bp(). Probably I will
> send the patch today.

Yap, and I'm wondering if we should kill those func ptrs games there. We
have BSP and AP CPU init paths so we can be much cleaner there.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 3/4] x86/fpu: use restore_init_xstate() instead of math_state_restore() on kthread exec
  2015-03-13 14:45       ` Oleg Nesterov
@ 2015-03-13 15:51         ` Borislav Petkov
  0 siblings, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2015-03-13 15:51 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Fri, Mar 13, 2015 at 03:45:14PM +0100, Oleg Nesterov wrote:
> Because in math_state_restore() case kernel_fpu_begin()->__save_init_fpu()
> will overwrite (corrupt) the same fpu->state buffer we need to restore.
> Without kernel_fpu_disable().

Yes.

> restore_init_xstate() obviously differs because it reads init_xstate_buf,
> we do not care at all if kernel_fpu_begin() in between overwrites ->state.

Ah yes, so we're on the thread exit path and we restore init xstate.
Sure, of course, that makes sense.

> And note! this is the yet another proof that init_fpu()->fpu_finit() is
> pointless. This (and almost all) users need fpu_alloc() only.

Right, applying.

Thanks for explaining!

:-)

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-13 14:55       ` Oleg Nesterov
@ 2015-03-13 16:19         ` Borislav Petkov
  2015-03-13 16:26           ` Oleg Nesterov
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-13 16:19 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Fri, Mar 13, 2015 at 03:55:42PM +0100, Oleg Nesterov wrote:
> But please look at drop_init_fpu(). If eagerfpu == F it calls drop_fpu() and
> this is what we need. flush_thread() already has the "if (!use_eager_fpu())",
> we can shift drop_fpu() there.
> 
> Otherwise, if eagerfpu == T, drop_init_fpu() does restore_init_xstate() and
> this just burns CPU. Until flush_thread user_has_fpu/used_math this state
> restore_init_xstate() is pointless, this state will be lost after preemption.

Yeah, I was wondering why that's there.

One example where drop_init_fpu() seems to make sense is
__kernel_fpu_end(): kernel is done with FPU and current was using the
FPU prior so let's restore it for the eagerfpu case.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-13 16:19         ` Borislav Petkov
@ 2015-03-13 16:26           ` Oleg Nesterov
  2015-03-13 19:27             ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-13 16:26 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/13, Borislav Petkov wrote:
>
> On Fri, Mar 13, 2015 at 03:55:42PM +0100, Oleg Nesterov wrote:
> > But please look at drop_init_fpu(). If eagerfpu == F it calls drop_fpu() and
> > this is what we need. flush_thread() already has the "if (!use_eager_fpu())",
> > we can shift drop_fpu() there.
> >
> > Otherwise, if eagerfpu == T, drop_init_fpu() does restore_init_xstate() and
> > this just burns CPU. Until flush_thread user_has_fpu/used_math this state
> > restore_init_xstate() is pointless, this state will be lost after preemption.
>
> Yeah, I was wondering why that's there.
>
> One example where drop_init_fpu() seems to make sense is
> __kernel_fpu_end(): kernel is done with FPU and current was using the
> FPU prior so let's restore it for the eagerfpu case.

No, no, this is another case or I misunderstood you.

__kernel_fpu_end() needs to restore FPU from current's fpu->state exactly
because current used FPU prior. And that state was saved by __save_init_fpu()
in __kernel_fpu_begin().

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH v2 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-11 17:35   ` [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread() Oleg Nesterov
  2015-03-13 10:52     ` Borislav Petkov
@ 2015-03-13 17:30     ` Oleg Nesterov
  2015-03-14 10:55       ` Borislav Petkov
  2015-03-23 12:21       ` [tip:x86/fpu] x86/fpu: Don't abuse drop_init_fpu() in flush_thread() tip-bot for Oleg Nesterov
  1 sibling, 2 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-13 17:30 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov, Ingo Molnar
  Cc: Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, Quentin Casasnovas

flush_thread() -> drop_init_fpu() is suboptimal and confusing. It does
drop_fpu() or restore_init_xstate() depending on !use_eager_fpu(). But
flush_thread() too checks eagerfpu right after that, and if it is true
then restore_init_xstate() just burns CPU for no reason. We are going to
load init_xstate_buf again after we set used_math/user_has_fpu, until
then the FPU state can't survive after switch_to().

Remove it, and change the "if (!use_eager_fpu())" to call drop_fpu().
While at it, clean up the tsk/current usage.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/kernel/process.c |   15 ++++++---------
 1 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index c396de2..c236306 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -131,17 +131,14 @@ void flush_thread(void)
 	flush_ptrace_hw_breakpoint(tsk);
 	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
 
-	drop_init_fpu(tsk);
-	/*
-	 * Free the FPU state for non xsave platforms. They get reallocated
-	 * lazily at the first use.
-	 */
-	if (!use_eager_fpu())
+	if (!use_eager_fpu()) {
+		/* FPU state will be reallocated lazily at the first use. */
+		drop_fpu(tsk);
 		free_thread_xstate(tsk);
-	else if (!used_math()) {
+	} else if (!used_math()) {
 		/* kthread execs. TODO: cleanup this horror. */
-		if (WARN_ON(init_fpu(current)))
-			force_sig(SIGKILL, current);
+		if (WARN_ON(init_fpu(tsk)))
+			force_sig(SIGKILL, tsk);
 		user_fpu_begin();
 		restore_init_xstate();
 	}
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH 0/1] x86/cpu: don't allocate fpu->state for swapper/0
  2015-03-04 18:30 Oops with tip/x86/fpu Dave Hansen
                   ` (4 preceding siblings ...)
  2015-03-11 17:33 ` [PATCH 0/4] x86/fpu: avoid math_state_restore() on kthread exec Oleg Nesterov
@ 2015-03-13 18:26 ` Oleg Nesterov
  2015-03-13 18:27   ` [PATCH 1/1] " Oleg Nesterov
  2015-03-14 11:16   ` [PATCH 0/1] x86/cpu: don't " Borislav Petkov
  2015-03-15 16:49 ` [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user Oleg Nesterov
  6 siblings, 2 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-13 18:26 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov, Ingo Molnar
  Cc: Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, Quentin Casasnovas

Hello.

This patch is "out of order" a bit, but since Borislav mentioned this
during review...

And I was going to send the 2nd one (below), but it turns out that
__init_refok is not discarded? So is there any way to do

	void __init init_function();

	void non_init_func()
	{
		if (can_only_be_true_before_free_initmem)
			init_function();
	}

and avoid the warning?


Fenghua, could you please explain the SYSTEM_BOOTING check in __save_fpu?
It was added by f41d830fa8900 "x86/xsaves: Save xstate to task's xsave area
in __save_fpu during booting time", the changelog says:

	__save_fpu() can be called during early booting time

how? from where? Do we expect math_error()->unlazy_fpu() at boot time, or what?

Oleg.


--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -532,7 +532,7 @@ void setup_xstate_comp(void)
 /*
  * setup the xstate image representing the init state
  */
-static void __init setup_init_fpu_buf(void)
+static noinline void __init_refok setup_init_fpu_buf(void)
 {
 	/*
 	 * Setup init_xstate_buf to represent the init state of
@@ -677,16 +677,8 @@ void xsave_init(void)
 	this_func();
 }
 
-static inline void __init eager_fpu_init_bp(void)
-{
-	if (!init_xstate_buf)
-		setup_init_fpu_buf();
-}
-
 void eager_fpu_init(void)
 {
-	static __refdata void (*boot_func)(void) = eager_fpu_init_bp;
-
 	WARN_ON(used_math());
 	current_thread_info()->status = 0;
 
@@ -698,10 +690,8 @@ void eager_fpu_init(void)
 		return;
 	}
 
-	if (boot_func) {
-		boot_func();
-		boot_func = NULL;
-	}
+	if (!init_xstate_buf)
+		setup_init_fpu_buf();
 }
 
 /*


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH 1/1] x86/cpu: don't allocate fpu->state for swapper/0
  2015-03-13 18:26 ` [PATCH 0/1] x86/cpu: don't allocate fpu->state for swapper/0 Oleg Nesterov
@ 2015-03-13 18:27   ` Oleg Nesterov
  2015-03-16 10:18     ` Borislav Petkov
  2015-03-23 12:22     ` [tip:x86/fpu] x86/fpu: Don't " tip-bot for Oleg Nesterov
  2015-03-14 11:16   ` [PATCH 0/1] x86/cpu: don't " Borislav Petkov
  1 sibling, 2 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-13 18:27 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov, Ingo Molnar
  Cc: Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, Quentin Casasnovas

Now that kthreads do not use FPU until exec swpper/0 doesn't need
to allocate fpu->state.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/kernel/xsave.c |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index dff0ec2..1cf5667 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -679,8 +679,6 @@ void xsave_init(void)
 
 static inline void __init eager_fpu_init_bp(void)
 {
-	current->thread.fpu.state =
-	    alloc_bootmem_align(xstate_size, __alignof__(struct xsave_struct));
 	if (!init_xstate_buf)
 		setup_init_fpu_buf();
 }
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-13 16:26           ` Oleg Nesterov
@ 2015-03-13 19:27             ` Borislav Petkov
  2015-03-14 14:48               ` Oleg Nesterov
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-13 19:27 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Fri, Mar 13, 2015 at 05:26:54PM +0100, Oleg Nesterov wrote:
> > One example where drop_init_fpu() seems to make sense is
> > __kernel_fpu_end(): kernel is done with FPU and current was using the
> > FPU prior so let's restore it for the eagerfpu case.
> 
> No, no, this is another case or I misunderstood you.
> 
> __kernel_fpu_end() needs to restore FPU from current's fpu->state exactly
> because current used FPU prior. And that state was saved by __save_init_fpu()
> in __kernel_fpu_begin().

That's exactly what I mean. See: "... kernel is done with FPU and current was
using the FPU prior..."

:-D

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH v2 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-13 17:30     ` [PATCH v2 " Oleg Nesterov
@ 2015-03-14 10:55       ` Borislav Petkov
  2015-03-14 10:57         ` [PATCH] x86/fpu: Fold __drop_fpu() into its sole user Borislav Petkov
  2015-03-23 12:21       ` [tip:x86/fpu] x86/fpu: Don't abuse drop_init_fpu() in flush_thread() tip-bot for Oleg Nesterov
  1 sibling, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-14 10:55 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Fri, Mar 13, 2015 at 06:30:30PM +0100, Oleg Nesterov wrote:
> flush_thread() -> drop_init_fpu() is suboptimal and confusing. It does
> drop_fpu() or restore_init_xstate() depending on !use_eager_fpu(). But
> flush_thread() too checks eagerfpu right after that, and if it is true
> then restore_init_xstate() just burns CPU for no reason. We are going to
> load init_xstate_buf again after we set used_math/user_has_fpu, until
> then the FPU state can't survive after switch_to().
> 
> Remove it, and change the "if (!use_eager_fpu())" to call drop_fpu().
> While at it, clean up the tsk/current usage.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Thanks, applied.

Did a trivial cleanup ontop, see reply to this message.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH] x86/fpu: Fold __drop_fpu() into its sole user
  2015-03-14 10:55       ` Borislav Petkov
@ 2015-03-14 10:57         ` Borislav Petkov
  2015-03-14 15:15           ` Oleg Nesterov
  2015-03-16 10:27           ` Ingo Molnar
  0 siblings, 2 replies; 126+ messages in thread
From: Borislav Petkov @ 2015-03-14 10:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

Fold it into drop_fpu(). Phew, one less FPU function to pay attention
to.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/fpu-internal.h | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h
index 7d2f7fa6b2dd..2d4adff428ac 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -378,8 +378,14 @@ static inline void __thread_fpu_begin(struct task_struct *tsk)
 	__thread_set_has_fpu(tsk);
 }
 
-static inline void __drop_fpu(struct task_struct *tsk)
+static inline void drop_fpu(struct task_struct *tsk)
 {
+	/*
+	 * Forget coprocessor state..
+	 */
+	preempt_disable();
+	tsk->thread.fpu_counter = 0;
+
 	if (__thread_has_fpu(tsk)) {
 		/* Ignore delayed exceptions from user space */
 		asm volatile("1: fwait\n"
@@ -387,16 +393,7 @@ static inline void __drop_fpu(struct task_struct *tsk)
 			     _ASM_EXTABLE(1b, 2b));
 		__thread_fpu_end(tsk);
 	}
-}
 
-static inline void drop_fpu(struct task_struct *tsk)
-{
-	/*
-	 * Forget coprocessor state..
-	 */
-	preempt_disable();
-	tsk->thread.fpu_counter = 0;
-	__drop_fpu(tsk);
 	clear_stopped_child_used_math(tsk);
 	preempt_enable();
 }
-- 
2.3.3

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH 0/1] x86/cpu: don't allocate fpu->state for swapper/0
  2015-03-13 18:26 ` [PATCH 0/1] x86/cpu: don't allocate fpu->state for swapper/0 Oleg Nesterov
  2015-03-13 18:27   ` [PATCH 1/1] " Oleg Nesterov
@ 2015-03-14 11:16   ` Borislav Petkov
  2015-03-14 15:13     ` [PATCH 0/1] x86/cpu: kill eager_fpu_init_bp() Oleg Nesterov
  1 sibling, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-14 11:16 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Fri, Mar 13, 2015 at 07:26:56PM +0100, Oleg Nesterov wrote:
> Hello.
> 
> This patch is "out of order" a bit, but since Borislav mentioned this
> during review...
> 
> And I was going to send the 2nd one (below), but it turns out that
> __init_refok is not discarded? So is there any way to do
> 
> 	void __init init_function();
> 
> 	void non_init_func()
> 	{
> 		if (can_only_be_true_before_free_initmem)
> 			init_function();
> 	}
> 
> and avoid the warning?

Actually, I was wondering if we could be even more radical and do
the boot cpu-specific stuff only in the BSP boot path. For example,
somewhere down that path:

start_kernel
|-> check_bugs
   |-> check_fpu		--- btw, we do FPU stuff here already too
   |-> xstate_enable_boot_cpu

instead of this state machine with function pointers hackery.

I'll try to play with this and see whether something breaks.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-13 19:27             ` Borislav Petkov
@ 2015-03-14 14:48               ` Oleg Nesterov
  2015-03-15 17:36                 ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-14 14:48 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/13, Borislav Petkov wrote:
>
> On Fri, Mar 13, 2015 at 05:26:54PM +0100, Oleg Nesterov wrote:
> > > One example where drop_init_fpu() seems to make sense is
> > > __kernel_fpu_end(): kernel is done with FPU and current was using the
> > > FPU prior so let's restore it for the eagerfpu case.
> >
> > No, no, this is another case or I misunderstood you.
> >
> > __kernel_fpu_end() needs to restore FPU from current's fpu->state exactly
> > because current used FPU prior. And that state was saved by __save_init_fpu()
> > in __kernel_fpu_begin().
>
> That's exactly what I mean. See: "... kernel is done with FPU and current was
> using the FPU prior..."

Yes, but my point was that this is why we can _not_ use drop_init_fpu() in
__kernel_fpu_end().

Nevermind, look like I really misunderstood you.

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH 0/1] x86/cpu: kill eager_fpu_init_bp()
  2015-03-14 11:16   ` [PATCH 0/1] x86/cpu: don't " Borislav Petkov
@ 2015-03-14 15:13     ` Oleg Nesterov
  2015-03-14 15:13       ` [PATCH 1/1] " Oleg Nesterov
  0 siblings, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-14 15:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/14, Borislav Petkov wrote:
>
> On Fri, Mar 13, 2015 at 07:26:56PM +0100, Oleg Nesterov wrote:
> > Hello.
> >
> > This patch is "out of order" a bit, but since Borislav mentioned this
> > during review...
> >
> > And I was going to send the 2nd one (below), but it turns out that
> > __init_refok is not discarded? So is there any way to do
> >
> > 	void __init init_function();
> >
> > 	void non_init_func()
> > 	{
> > 		if (can_only_be_true_before_free_initmem)
> > 			init_function();
> > 	}
> >
> > and avoid the warning?

It turns out I _completely_ misunderstood __init_refok.

> Actually, I was wondering if we could be even more radical and do
> the boot cpu-specific stuff only in the BSP boot path.

Yes, yes, agreed. This needs more changes, but imo this would be a nice
cleanup.

Still. I think it makes sense to kill eager_fpu_init_bp() right now, this
won'r complicate the mentioned cleanups.

On top of  "x86/fpu: don't allocate fpu->state for swapper/0".

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH 1/1] x86/cpu: kill eager_fpu_init_bp()
  2015-03-14 15:13     ` [PATCH 0/1] x86/cpu: kill eager_fpu_init_bp() Oleg Nesterov
@ 2015-03-14 15:13       ` Oleg Nesterov
  2015-03-16 12:44         ` Borislav Petkov
  2015-03-23 12:22         ` [tip:x86/fpu] x86/fpu: Kill eager_fpu_init_bp() tip-bot for Oleg Nesterov
  0 siblings, 2 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-14 15:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

Now that eager_fpu_init_bp() does setup_init_fpu_buf() and nothing else
we can remove it and move this code into its "caller", eager_fpu_init().

This avoids the confusing games with "static __refdata void (*boot_func)".
init_xstate_buf can be NULL only on boot, so it is safe to the "__init"
setup_init_fpu_buf() function, just we need to add the "__init_refok"
marker.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/kernel/xsave.c |   16 +++-------------
 1 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 1cf5667..f7e8e0c 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -677,16 +677,8 @@ void xsave_init(void)
 	this_func();
 }
 
-static inline void __init eager_fpu_init_bp(void)
+void __init_refok eager_fpu_init(void)
 {
-	if (!init_xstate_buf)
-		setup_init_fpu_buf();
-}
-
-void eager_fpu_init(void)
-{
-	static __refdata void (*boot_func)(void) = eager_fpu_init_bp;
-
 	WARN_ON(used_math());
 	current_thread_info()->status = 0;
 
@@ -698,10 +690,8 @@ void eager_fpu_init(void)
 		return;
 	}
 
-	if (boot_func) {
-		boot_func();
-		boot_func = NULL;
-	}
+	if (!init_xstate_buf)
+		setup_init_fpu_buf();
 }
 
 /*
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH] x86/fpu: Fold __drop_fpu() into its sole user
  2015-03-14 10:57         ` [PATCH] x86/fpu: Fold __drop_fpu() into its sole user Borislav Petkov
@ 2015-03-14 15:15           ` Oleg Nesterov
  2015-03-16 10:27           ` Ingo Molnar
  1 sibling, 0 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-14 15:15 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/14, Borislav Petkov wrote:
>
> Fold it into drop_fpu(). Phew, one less FPU function to pay attention
> to.
>
> No functionality change.

ACK!

and we can do more.

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user
  2015-03-04 18:30 Oops with tip/x86/fpu Dave Hansen
                   ` (5 preceding siblings ...)
  2015-03-13 18:26 ` [PATCH 0/1] x86/cpu: don't allocate fpu->state for swapper/0 Oleg Nesterov
@ 2015-03-15 16:49 ` Oleg Nesterov
  2015-03-15 16:50   ` [PATCH RFC 1/2] x86: introduce __user_insn() and __check_insn() Oleg Nesterov
                     ` (3 more replies)
  6 siblings, 4 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-15 16:49 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov, Ingo Molnar
  Cc: Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, Quentin Casasnovas,
	H. Peter Anvin

Hello.

Another a bit off-topic change, but I'd like to finish the discussion
with Quentin.

And almost cosmetic. But I added the RFC tag to make it clear that this
needs a review from someone who understands gcc-asm better. In particular
I am worried if that dummy "=m" (*buf) is actually correct.


And I agree with Quentin, user_insn/check_insn can be improved to allow
clobbers, more flexible "output", etc. But imo they already can make this
code look a bit better, and "xstate_fault" must die eventually.

Quentin, could you review? I can't find your last email about this change,
and I can't recall if you agree or not.

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* [PATCH RFC 1/2] x86: introduce __user_insn() and __check_insn()
  2015-03-15 16:49 ` [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user Oleg Nesterov
@ 2015-03-15 16:50   ` Oleg Nesterov
  2015-03-15 16:50   ` [PATCH RFC 2/2] x86/fpu: change xsave_user() and xrestore_user() to use __user_insn() Oleg Nesterov
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-15 16:50 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov, Ingo Molnar
  Cc: Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, Quentin Casasnovas,
	H. Peter Anvin

1. Add __user_insn() and __check_insn() which accept the already
   stringified insn.

2. Move these helpers into uaccess.h to make them visible to xsave.h
   and other potential users.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/include/asm/fpu-internal.h |   31 ------------------------------
 arch/x86/include/asm/uaccess.h      |   36 ++++++++++++++++++++++++++++++++++-
 2 files changed, 35 insertions(+), 32 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h
index 4f1b7b6..d1f8472 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -120,37 +120,6 @@ static inline void sanitize_i387_state(struct task_struct *tsk)
 	__sanitize_i387_state(tsk);
 }
 
-#define user_insn(insn, output, input...)				\
-({									\
-	int err;							\
-	asm volatile(ASM_STAC "\n"					\
-		     "1:" #insn "\n\t"					\
-		     "2: " ASM_CLAC "\n"				\
-		     ".section .fixup,\"ax\"\n"				\
-		     "3:  movl $-1,%[err]\n"				\
-		     "    jmp  2b\n"					\
-		     ".previous\n"					\
-		     _ASM_EXTABLE(1b, 3b)				\
-		     : [err] "=r" (err), output				\
-		     : "0"(0), input);					\
-	err;								\
-})
-
-#define check_insn(insn, output, input...)				\
-({									\
-	int err;							\
-	asm volatile("1:" #insn "\n\t"					\
-		     "2:\n"						\
-		     ".section .fixup,\"ax\"\n"				\
-		     "3:  movl $-1,%[err]\n"				\
-		     "    jmp  2b\n"					\
-		     ".previous\n"					\
-		     _ASM_EXTABLE(1b, 3b)				\
-		     : [err] "=r" (err), output				\
-		     : "0"(0), input);					\
-	err;								\
-})
-
 static inline int fsave_user(struct i387_fsave_struct __user *fx)
 {
 	return user_insn(fnsave %[fx]; fwait,  [fx] "=m" (*fx), "m" (*fx));
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 0d592e0..ccfacd8 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -740,5 +740,39 @@ copy_to_user(void __user *to, const void *from, unsigned long n)
 #undef __copy_from_user_overflow
 #undef __copy_to_user_overflow
 
-#endif /* _ASM_X86_UACCESS_H */
+#define __user_insn(insn, output, input...)				\
+({									\
+	int err;							\
+	asm volatile(ASM_STAC "\n"					\
+		     "1:" insn "\n\t"					\
+		     "2: " ASM_CLAC "\n"				\
+		     ".section .fixup,\"ax\"\n"				\
+		     "3:  movl $-1,%[err]\n"				\
+		     "    jmp  2b\n"					\
+		     ".previous\n"					\
+		     _ASM_EXTABLE(1b, 3b)				\
+		     : [err] "=r" (err), output				\
+		     : "0"(0), input);					\
+	err;								\
+})
+
+#define user_insn(insn, ...)	__user_insn(#insn, ##__VA_ARGS__)
 
+#define __check_insn(insn, output, input...)				\
+({									\
+	int err;							\
+	asm volatile("1:" insn "\n\t"					\
+		     "2:\n"						\
+		     ".section .fixup,\"ax\"\n"				\
+		     "3:  movl $-1,%[err]\n"				\
+		     "    jmp  2b\n"					\
+		     ".previous\n"					\
+		     _ASM_EXTABLE(1b, 3b)				\
+		     : [err] "=r" (err), output				\
+		     : "0"(0), input);					\
+	err;								\
+})
+
+#define check_insn(insn, ...)	__check_insn(#insn, ##__VA_ARGS__)
+
+#endif /* _ASM_X86_UACCESS_H */
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [PATCH RFC 2/2] x86/fpu: change xsave_user() and xrestore_user() to use __user_insn()
  2015-03-15 16:49 ` [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user Oleg Nesterov
  2015-03-15 16:50   ` [PATCH RFC 1/2] x86: introduce __user_insn() and __check_insn() Oleg Nesterov
@ 2015-03-15 16:50   ` Oleg Nesterov
  2015-03-16 22:43     ` Quentin Casasnovas
  2015-03-16 14:36   ` [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user Borislav Petkov
  2015-03-16 22:37   ` Quentin Casasnovas
  3 siblings, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-15 16:50 UTC (permalink / raw)
  To: Dave Hansen, Borislav Petkov, Ingo Molnar
  Cc: Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, Quentin Casasnovas,
	H. Peter Anvin

Change xsave_user() and xrestore_user() to avoid the (imho) horrible
and should-die xstate_fault helper, they both can use __user_insn().

This also removes the "memory" clobber but I think it was never needed.
xrestore_user() doesn't change the memory, it only changes the FPU regs.
xsave_user() does write to "*buf" but this memory is "__user", we must
never access it directly.

This patch adds '"=m" (*buf)' in both cases, but this is only because
currently __user_insn() needs the non-empty "output" arg.

Note: I think we can change all other xstate_fault users too, including
alternative_input's.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/include/asm/xsave.h |   19 +++++--------------
 1 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index 5fa9770..441f171 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -229,12 +229,8 @@ static inline int xsave_user(struct xsave_struct __user *buf)
 	if (unlikely(err))
 		return -EFAULT;
 
-	__asm__ __volatile__(ASM_STAC "\n"
-			     "1:"XSAVE"\n"
-			     "2: " ASM_CLAC "\n"
-			     xstate_fault
-			     : "D" (buf), "a" (-1), "d" (-1), "0" (0)
-			     : "memory");
+	err = __user_insn(XSAVE, "=m" (*buf), /* unneeded */
+				"D" (buf), "a" (-1), "d" (-1));
 	return err;
 }
 
@@ -243,17 +239,12 @@ static inline int xsave_user(struct xsave_struct __user *buf)
  */
 static inline int xrestore_user(struct xsave_struct __user *buf, u64 mask)
 {
-	int err = 0;
-	struct xsave_struct *xstate = ((__force struct xsave_struct *)buf);
 	u32 lmask = mask;
 	u32 hmask = mask >> 32;
+	int err;
 
-	__asm__ __volatile__(ASM_STAC "\n"
-			     "1:"XRSTOR"\n"
-			     "2: " ASM_CLAC "\n"
-			     xstate_fault
-			     : "D" (xstate), "a" (lmask), "d" (hmask), "0" (0)
-			     : "memory");	/* memory required? */
+	err = __user_insn(XRSTOR, "=m" (*buf), /* unneeded */
+				"D" (buf), "a" (lmask), "d" (hmask));
 	return err;
 }
 
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-14 14:48               ` Oleg Nesterov
@ 2015-03-15 17:36                 ` Borislav Petkov
  2015-03-15 18:16                   ` Oleg Nesterov
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-15 17:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Sat, Mar 14, 2015 at 03:48:16PM +0100, Oleg Nesterov wrote:
> On 03/13, Borislav Petkov wrote:
> > On Fri, Mar 13, 2015 at 05:26:54PM +0100, Oleg Nesterov wrote:
> > > > One example where drop_init_fpu() seems to make sense is
> > > > __kernel_fpu_end(): kernel is done with FPU and current was using the
> > > > FPU prior so let's restore it for the eagerfpu case.
> > >
> > > No, no, this is another case or I misunderstood you.
> > >
> > > __kernel_fpu_end() needs to restore FPU from current's fpu->state exactly
> > > because current used FPU prior. And that state was saved by __save_init_fpu()
> > > in __kernel_fpu_begin().
> >
> > That's exactly what I mean. See: "... kernel is done with FPU and current was
> > using the FPU prior..."
> 
> Yes, but my point was that this is why we can _not_ use drop_init_fpu() in
> __kernel_fpu_end().

Hmm, now I'm confused. So __kernel_fpu_end() says kernel finished using
the FPU and we need to do the following:

* current has the FPU => let's restore it. If there was an error doing
that, we do drop_init, i.e. restore init_xstate in the eager case and
otherwise we just drop it. So that makes perfect sense to me.

* otherwise, current didn't have the FPU, we simply set CR0.TS in the
non-eager case so that we can fault on the next use of an FPU insn.

To address your comment from earlier:

> > > __kernel_fpu_end() needs to restore FPU from current's fpu->state exactly
> > > because current used FPU prior. And that state was saved by __save_init_fpu()
> > > in __kernel_fpu_begin().

And we do that:

void __kernel_fpu_end():

...

        if (__thread_has_fpu(me)) {
                if (WARN_ON(restore_fpu_checking(me)))

restore_fpu_checking(current) does try to restore fpu->state and it does
drop_init_fpu() only if it failed.

Ok, now you tell me what I'm missing :)

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-15 17:36                 ` Borislav Petkov
@ 2015-03-15 18:16                   ` Oleg Nesterov
  2015-03-15 18:50                     ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-15 18:16 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/15, Borislav Petkov wrote:
>
> On Sat, Mar 14, 2015 at 03:48:16PM +0100, Oleg Nesterov wrote:
> > > >
> > > > __kernel_fpu_end() needs to restore FPU from current's fpu->state exactly
> > > > because current used FPU prior. And that state was saved by __save_init_fpu()
> > > > in __kernel_fpu_begin().
> > >
> > > That's exactly what I mean. See: "... kernel is done with FPU and current was
> > > using the FPU prior..."
> >
> > Yes, but my point was that this is why we can _not_ use drop_init_fpu() in
> > __kernel_fpu_end().
>
> Hmm, now I'm confused.

Me too...

> void __kernel_fpu_end():
>
> ...
>
>         if (__thread_has_fpu(me)) {
>                 if (WARN_ON(restore_fpu_checking(me)))
>
> restore_fpu_checking(current) does try to restore fpu->state and it does
> drop_init_fpu() only if it failed.
>
> Ok, now you tell me what I'm missing :)

Of course, drop_init_fpu() is fine if restore_fpu_checking() fails.

Did you mean this from the very beginning? In this case I agree of course.

Because I misinterpreted your initial comment:

	One example where drop_init_fpu() seems to make sense is
	__kernel_fpu_end(): kernel is done with FPU and current was using the
	FPU prior so let's restore it for the eagerfpu case.
	
as if you suggest to use it _instead_ of restore_fpu_checking().

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-15 18:16                   ` Oleg Nesterov
@ 2015-03-15 18:50                     ` Borislav Petkov
  2015-03-15 20:04                       ` Oleg Nesterov
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-15 18:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Sun, Mar 15, 2015 at 07:16:43PM +0100, Oleg Nesterov wrote:
> Of course, drop_init_fpu() is fine if restore_fpu_checking() fails.
> 
> Did you mean this from the very beginning? In this case I agree of course.
> 
> Because I misinterpreted your initial comment:
> 
> 	One example where drop_init_fpu() seems to make sense is
> 	__kernel_fpu_end(): kernel is done with FPU and current was using the
> 	FPU prior so let's restore it for the eagerfpu case.
> 	
> as if you suggest to use it _instead_ of restore_fpu_checking().

Nah, not "instead" - I didn't express myself precisely enough. I was
trying to think out loud and look for an example where drop_init_fpu()
would make sense.

In most of the places it is used, it is in the error path of restoring
the FPU state, i.e. we were unable to restore for some reason, let's
reinit instead of just drop only, in the eager case.

And your patch correctly removed it from flush_thread() where it didn't
make any sense except to cause CPUs to get needlessly warmer.

Anyway, we're on the same page and that was a good exercise :-)

Thanks Oleg!

Btw, we probably should start documenting stuff like that so that we
don't have to re-fault all that info 6 months/a year from now when we
have to touch that code again. Hmm, how about something like this:

---
diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h
index 2d4adff428ac..996f20a31f0a 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -406,6 +406,17 @@ static inline void restore_init_xstate(void)
 		fxrstor_checking(&init_xstate_buf->i387);
 }
 
+/*
+ * In addition to "forgetting" FPU state for @tsk, we restore the
+ * default FPU state in the eager case. Note, this is not needed in the
+ * non-eager case because there we will set CR0.TS and fault and setup
+ * an FPU state lazily.
+ *
+ * We restore the default FPU state in the eager case here as a means of
+ * addressing the failure of restoring the FPU state which @tsk points
+ * to and we still need some state to use so we use the default, clean
+ * one.
+ */
 static inline void drop_init_fpu(struct task_struct *tsk)
 {
 	if (!use_eager_fpu())
---

?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-15 18:50                     ` Borislav Petkov
@ 2015-03-15 20:04                       ` Oleg Nesterov
  2015-03-15 20:38                         ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-15 20:04 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/15, Borislav Petkov wrote:
>
> On Sun, Mar 15, 2015 at 07:16:43PM +0100, Oleg Nesterov wrote:
>
> Anyway, we're on the same page and that was a good exercise :-)

Yes, finally ;)

> +/*
> + * In addition to "forgetting" FPU state for @tsk, we restore the
> + * default FPU state in the eager case. Note, this is not needed in the
> + * non-eager case because there we will set CR0.TS and fault and setup
> + * an FPU state lazily.
> + *
> + * We restore the default FPU state in the eager case here as a means of
> + * addressing the failure of restoring the FPU state which @tsk points
> + * to and we still need some state to use so we use the default, clean
> + * one.
> + */
>  static inline void drop_init_fpu(struct task_struct *tsk)
>  {
>  	if (!use_eager_fpu())

But please note that it is not only used after the failure.
See handle_signal() and the first drop_init_fpu() in __restore_xstate_sig().

I think its name is confusing a bit...

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-15 20:04                       ` Oleg Nesterov
@ 2015-03-15 20:38                         ` Borislav Petkov
  2015-03-16  9:35                           ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-15 20:38 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Sun, Mar 15, 2015 at 09:04:36PM +0100, Oleg Nesterov wrote:
> But please note that it is not only used after the failure.
> See handle_signal() and the first drop_init_fpu() in
> __restore_xstate_sig().

Yeah, that's why I said "In the most places it is used..."

> I think its name is confusing a bit...

Yeah, the "init" aspect affects only the eager case...

How about we call this function fpu_reset_state() instead?

This way it doesn't really need to be documented what it does - it
simply resets the FPU state. And resetting is what we do in all
call sites so the usage dictates the name and then "drop" can be
differentiated from "reset" as "drop" is only a part of the "reset"
operation on an FPU state. And so on and so on...

Hmmm?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-15 20:38                         ` Borislav Petkov
@ 2015-03-16  9:35                           ` Borislav Petkov
  2015-03-16 10:28                             ` Ingo Molnar
                                               ` (2 more replies)
  0 siblings, 3 replies; 126+ messages in thread
From: Borislav Petkov @ 2015-03-16  9:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Sun, Mar 15, 2015 at 09:38:16PM +0100, Borislav Petkov wrote:
> How about we call this function fpu_reset_state() instead?

IOW, something like this. Reading the usage sites actually make much
more sense to me now. It could be just me though...

:-)

---
From: Borislav Petkov <bp@suse.de>
Date: Mon, 16 Mar 2015 10:21:55 +0100
Subject: [PATCH] x86/fpu: Rename drop_init_fpu() to fpu_reset_state()

Call it what it does and in accordance with the context where it is
used: we reset the FPU state either because we were unable to restore it
from the one saved in the task or because we simply want to reset it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/fpu-internal.h | 8 ++++++--
 arch/x86/kernel/i387.c              | 2 +-
 arch/x86/kernel/signal.c            | 2 +-
 arch/x86/kernel/traps.c             | 2 +-
 arch/x86/kernel/xsave.c             | 4 ++--
 5 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h
index 2d4adff428ac..da5e96756570 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -406,7 +406,11 @@ static inline void restore_init_xstate(void)
 		fxrstor_checking(&init_xstate_buf->i387);
 }
 
-static inline void drop_init_fpu(struct task_struct *tsk)
+/*
+ * Reset the FPU state in the eager case and drop it in the lazy case (later use
+ * will reinit it).
+ */
+static inline void fpu_reset_state(struct task_struct *tsk)
 {
 	if (!use_eager_fpu())
 		drop_fpu(tsk);
@@ -480,7 +484,7 @@ static inline void switch_fpu_finish(struct task_struct *new, fpu_switch_t fpu)
 {
 	if (fpu.preload) {
 		if (unlikely(restore_fpu_checking(new)))
-			drop_init_fpu(new);
+			fpu_reset_state(new);
 	}
 }
 
diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c
index 29e982ada854..41575b9b1021 100644
--- a/arch/x86/kernel/i387.c
+++ b/arch/x86/kernel/i387.c
@@ -108,7 +108,7 @@ void __kernel_fpu_end(void)
 
 	if (__thread_has_fpu(me)) {
 		if (WARN_ON(restore_fpu_checking(me)))
-			drop_init_fpu(me);
+			fpu_reset_state(me);
 	} else if (!use_eager_fpu()) {
 		stts();
 	}
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index e5042463c1bc..59eaae6185e2 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -679,7 +679,7 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
 		 * Ensure the signal handler starts with the new fpu state.
 		 */
 		if (used_math())
-			drop_init_fpu(current);
+			fpu_reset_state(current);
 	}
 	signal_setup_done(failed, ksig, test_thread_flag(TIF_SINGLESTEP));
 }
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 92b83e299ed3..156d75859466 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -863,7 +863,7 @@ void math_state_restore(void)
 	kernel_fpu_disable();
 	__thread_fpu_begin(tsk);
 	if (unlikely(restore_fpu_checking(tsk))) {
-		drop_init_fpu(tsk);
+		fpu_reset_state(tsk);
 		force_sig_info(SIGSEGV, SEND_SIG_PRIV, tsk);
 	} else {
 		tsk->thread.fpu_counter++;
diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 0bf82c5ac529..65c29b070e09 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -342,7 +342,7 @@ int __restore_xstate_sig(void __user *buf, void __user *buf_fx, int size)
 			 config_enabled(CONFIG_IA32_EMULATION));
 
 	if (!buf) {
-		drop_init_fpu(tsk);
+		fpu_reset_state(tsk);
 		return 0;
 	}
 
@@ -416,7 +416,7 @@ int __restore_xstate_sig(void __user *buf, void __user *buf_fx, int size)
 		 */
 		user_fpu_begin();
 		if (restore_user_xstate(buf_fx, xstate_bv, fx_only)) {
-			drop_init_fpu(tsk);
+			fpu_reset_state(tsk);
 			return -1;
 		}
 	}
-- 
2.3.3

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/cpu: don't allocate fpu->state for swapper/0
  2015-03-13 18:27   ` [PATCH 1/1] " Oleg Nesterov
@ 2015-03-16 10:18     ` Borislav Petkov
  2015-03-23 12:22     ` [tip:x86/fpu] x86/fpu: Don't " tip-bot for Oleg Nesterov
  1 sibling, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2015-03-16 10:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Fri, Mar 13, 2015 at 07:27:16PM +0100, Oleg Nesterov wrote:
> Now that kthreads do not use FPU until exec swpper/0 doesn't need
> to allocate fpu->state.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Applied, thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH] x86/fpu: Fold __drop_fpu() into its sole user
  2015-03-14 10:57         ` [PATCH] x86/fpu: Fold __drop_fpu() into its sole user Borislav Petkov
  2015-03-14 15:15           ` Oleg Nesterov
@ 2015-03-16 10:27           ` Ingo Molnar
  1 sibling, 0 replies; 126+ messages in thread
From: Ingo Molnar @ 2015-03-16 10:27 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Oleg Nesterov, Dave Hansen, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas


* Borislav Petkov <bp@suse.de> wrote:

> Fold it into drop_fpu(). Phew, one less FPU function to pay attention
> to.
> 
> No functionality change.
> 
> Signed-off-by: Borislav Petkov <bp@suse.de>

Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-16  9:35                           ` Borislav Petkov
@ 2015-03-16 10:28                             ` Ingo Molnar
  2015-03-16 14:39                             ` Oleg Nesterov
  2015-03-16 15:34                             ` Andy Lutomirski
  2 siblings, 0 replies; 126+ messages in thread
From: Ingo Molnar @ 2015-03-16 10:28 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Oleg Nesterov, Dave Hansen, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas


* Borislav Petkov <bp@suse.de> wrote:

> On Sun, Mar 15, 2015 at 09:38:16PM +0100, Borislav Petkov wrote:
> > How about we call this function fpu_reset_state() instead?
> 
> IOW, something like this. Reading the usage sites actually make much
> more sense to me now. It could be just me though...
> 
> :-)
> 
> ---
> From: Borislav Petkov <bp@suse.de>
> Date: Mon, 16 Mar 2015 10:21:55 +0100
> Subject: [PATCH] x86/fpu: Rename drop_init_fpu() to fpu_reset_state()
> 
> Call it what it does and in accordance with the context where it is
> used: we reset the FPU state either because we were unable to restore it
> from the one saved in the task or because we simply want to reset it.

Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 126+ messages in thread

* [tip:x86/urgent] x86/fpu: Avoid math_state_restore() without used_math() in __restore_xstate_sig()
  2015-03-07 15:38     ` [PATCH 1/1] " Oleg Nesterov
  2015-03-09 14:07       ` Borislav Petkov
@ 2015-03-16 12:07       ` tip-bot for Oleg Nesterov
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot for Oleg Nesterov @ 2015-03-16 12:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: priikone, fenghua.yu, bp, stable, tglx, bp, dave.hansen,
	sbsiddha, torvalds, linux-kernel, luto, hpa, oleg,
	quentin.casasnovas, mingo, riel

Commit-ID:  a7c80ebcac3068b1c3cb27d538d29558c30010c8
Gitweb:     http://git.kernel.org/tip/a7c80ebcac3068b1c3cb27d538d29558c30010c8
Author:     Oleg Nesterov <oleg@redhat.com>
AuthorDate: Fri, 13 Mar 2015 09:53:09 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 13 Mar 2015 12:44:28 +0100

x86/fpu: Avoid math_state_restore() without used_math() in __restore_xstate_sig()

math_state_restore() assumes it is called with irqs disabled,
but this is not true if the caller is __restore_xstate_sig().

This means that if ia32_fxstate == T and __copy_from_user()
fails, __restore_xstate_sig() returns with irqs disabled too.

This triggers:

  BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:41
   dump_stack
   ___might_sleep
   ? _raw_spin_unlock_irqrestore
   __might_sleep
   down_read
   ? _raw_spin_unlock_irqrestore
   print_vma_addr
   signal_fault
   sys32_rt_sigreturn

Change __restore_xstate_sig() to call set_used_math()
unconditionally. This avoids enabling and disabling interrupts
in math_state_restore(). If copy_from_user() fails, we can
simply do fpu_finit() by hand.

[ Note: this is only the first step. math_state_restore() should
        not check used_math(), it should set this flag. While
	init_fpu() should simply die. ]

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150307153844.GB25954@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/xsave.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 34f66e5..cdc6cf9 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -379,7 +379,7 @@ int __restore_xstate_sig(void __user *buf, void __user *buf_fx, int size)
 		 * thread's fpu state, reconstruct fxstate from the fsave
 		 * header. Sanitize the copied state etc.
 		 */
-		struct xsave_struct *xsave = &tsk->thread.fpu.state->xsave;
+		struct fpu *fpu = &tsk->thread.fpu;
 		struct user_i387_ia32_struct env;
 		int err = 0;
 
@@ -393,14 +393,15 @@ int __restore_xstate_sig(void __user *buf, void __user *buf_fx, int size)
 		 */
 		drop_fpu(tsk);
 
-		if (__copy_from_user(xsave, buf_fx, state_size) ||
+		if (__copy_from_user(&fpu->state->xsave, buf_fx, state_size) ||
 		    __copy_from_user(&env, buf, sizeof(env))) {
+			fpu_finit(fpu);
 			err = -1;
 		} else {
 			sanitize_restored_xstate(tsk, &env, xstate_bv, fx_only);
-			set_used_math();
 		}
 
+		set_used_math();
 		if (use_eager_fpu()) {
 			preempt_disable();
 			math_state_restore();

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip:x86/urgent] x86/fpu: Drop_fpu() should not assume that tsk equals current
  2015-03-09 17:10 ` [PATCH] x86/fpu: drop_fpu() should not assume that tsk == current Oleg Nesterov
  2015-03-09 17:36   ` Rik van Riel
  2015-03-09 17:48   ` Borislav Petkov
@ 2015-03-16 12:07   ` tip-bot for Oleg Nesterov
  2 siblings, 0 replies; 126+ messages in thread
From: tip-bot for Oleg Nesterov @ 2015-03-16 12:07 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: priikone, stable, tglx, torvalds, fenghua.yu, bp, luto,
	linux-kernel, dave.hansen, sbsiddha, hpa, quentin.casasnovas,
	mingo, riel, bp, oleg

Commit-ID:  f4c3686386393c120710dd34df2a74183ab805fd
Gitweb:     http://git.kernel.org/tip/f4c3686386393c120710dd34df2a74183ab805fd
Author:     Oleg Nesterov <oleg@redhat.com>
AuthorDate: Fri, 13 Mar 2015 09:53:10 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 13 Mar 2015 12:44:29 +0100

x86/fpu: Drop_fpu() should not assume that tsk equals current

drop_fpu() does clear_used_math() and usually this is correct
because tsk == current.

However switch_fpu_finish()->restore_fpu_checking() is called before
__switch_to() updates the "current_task" variable. If it fails,
we will wrongly clear the PF_USED_MATH flag of the previous task.

So use clear_stopped_child_used_math() instead.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150309171041.GB11388@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/fpu-internal.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h
index 0dbc082..72ba21a 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -370,7 +370,7 @@ static inline void drop_fpu(struct task_struct *tsk)
 	preempt_disable();
 	tsk->thread.fpu_counter = 0;
 	__drop_fpu(tsk);
-	clear_used_math();
+	clear_stopped_child_used_math(tsk);
 	preempt_enable();
 }
 

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH 1/1] x86/cpu: kill eager_fpu_init_bp()
  2015-03-14 15:13       ` [PATCH 1/1] " Oleg Nesterov
@ 2015-03-16 12:44         ` Borislav Petkov
  2015-03-23 12:22         ` [tip:x86/fpu] x86/fpu: Kill eager_fpu_init_bp() tip-bot for Oleg Nesterov
  1 sibling, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2015-03-16 12:44 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Sat, Mar 14, 2015 at 04:13:34PM +0100, Oleg Nesterov wrote:
> Now that eager_fpu_init_bp() does setup_init_fpu_buf() and nothing else
> we can remove it and move this code into its "caller", eager_fpu_init().
> 
> This avoids the confusing games with "static __refdata void (*boot_func)".
> init_xstate_buf can be NULL only on boot, so it is safe to the "__init"
> setup_init_fpu_buf() function, just we need to add the "__init_refok"
> marker.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Applied, thanks.

I added a note above eager_fpu_init() why it is marked as __init_refok:

---
From: Oleg Nesterov <oleg@redhat.com>
Date: Sat, 14 Mar 2015 16:13:34 +0100
Subject: [PATCH] x86/fpu: Kill eager_fpu_init_bp()

Now that eager_fpu_init_bp() does setup_init_fpu_buf() only and
nothing else, we can remove it and move this code into its "caller",
eager_fpu_init().

This avoids the confusing games with "static __refdata void (*boot_func)":

init_xstate_buf can be NULL only during boot, so it is safe to call the
__init-annotated setup_init_fpu_buf() function in eager_fpu_init(), we
just need to mark it as __init_refok.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Link: http://lkml.kernel.org/r/20150314151334.GC13029@redhat.com
Signed-off-by: 
---
 arch/x86/kernel/xsave.c | 20 +++++++-------------
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index ada8df7b89c0..87a815b85f3e 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -678,16 +678,12 @@ void xsave_init(void)
 	this_func();
 }
 
-static inline void __init eager_fpu_init_bp(void)
-{
-	if (!init_xstate_buf)
-		setup_init_fpu_buf();
-}
-
-void eager_fpu_init(void)
+/*
+ * setup_init_fpu_buf() is __init and it is OK to call it here because
+ * init_xstate_buf will be unset only once during boot.
+ */
+void __init_refok eager_fpu_init(void)
 {
-	static __refdata void (*boot_func)(void) = eager_fpu_init_bp;
-
 	WARN_ON(used_math());
 	current_thread_info()->status = 0;
 
@@ -699,10 +695,8 @@ void eager_fpu_init(void)
 		return;
 	}
 
-	if (boot_func) {
-		boot_func();
-		boot_func = NULL;
-	}
+	if (!init_xstate_buf)
+		setup_init_fpu_buf();
 }
 
 /*
-- 
2.3.3

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user
  2015-03-15 16:49 ` [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user Oleg Nesterov
  2015-03-15 16:50   ` [PATCH RFC 1/2] x86: introduce __user_insn() and __check_insn() Oleg Nesterov
  2015-03-15 16:50   ` [PATCH RFC 2/2] x86/fpu: change xsave_user() and xrestore_user() to use __user_insn() Oleg Nesterov
@ 2015-03-16 14:36   ` Borislav Petkov
  2015-03-16 14:57     ` Oleg Nesterov
  2015-03-16 22:37   ` Quentin Casasnovas
  3 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-16 14:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas, H. Peter Anvin

On Sun, Mar 15, 2015 at 05:49:48PM +0100, Oleg Nesterov wrote:
> Hello.
> 
> Another a bit off-topic change, but I'd like to finish the discussion
> with Quentin.
> 
> And almost cosmetic. But I added the RFC tag to make it clear that this
> needs a review from someone who understands gcc-asm better. In particular
> I am worried if that dummy "=m" (*buf) is actually correct.
> 
> 
> And I agree with Quentin, user_insn/check_insn can be improved to allow
> clobbers, more flexible "output", etc. But imo they already can make this
> code look a bit better, and "xstate_fault" must die eventually.

FWIW, I did poke at that but there's still something wrong with my macros, will
take a look when I get a chance:

---
 arch/x86/include/asm/xsave.h | 130 +++++++++++++++++++++++++++----------------
 1 file changed, 82 insertions(+), 48 deletions(-)

diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index c9a6d68b8d62..a35e9d49843d 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -67,6 +67,51 @@ extern int init_fpu(struct task_struct *child);
 			_ASM_EXTABLE(1b, 3b)		\
 			: [err] "=r" (err)
 
+#define XSTATE_OP(op, st, lmask, hmask, err)				\
+	asm volatile("1:" op "\n\t"					\
+		     "2:\n\t"						\
+		     ".pushsection .fixup,\"ax\"\n\t"			\
+		     "3: movl $-1,%[err]\n\t"				\
+		     "jmp 2b\n\t"					\
+		     ".popsection\n\t"					\
+		     _ASM_EXTABLE(1b, 3b)				\
+		     : [err] "=r" (err)					\
+		     : "D" (st), "m" (*st), "a" (lmask), "d" (hmask)	\
+		     : "memory")
+
+/*
+ * 661 and alt_end_marker labels below are defined in ALTERNATIVE*
+ * and we're reusing  them here so as not to clutter this macro
+ * unnecessarily.
+ */
+#define XSTATE_XSAVE(st, lmask, hmask, err)				\
+	asm volatile(ALTERNATIVE_2(XSAVE,				\
+				   XSAVEOPT, X86_FEATURE_XSAVEOPT,	\
+				   XSAVES,   X86_FEATURE_XSAVES)	\
+		     "\n"						\
+		     ".pushsection .fixup,\"ax\"\n"			\
+		     "3: movl $-1, %[err]\n"				\
+		     "jmp " alt_end_marker "b\n"			\
+		     ".popsection\n"					\
+		     _ASM_EXTABLE(661b, 3b)				\
+		     : [err] "=r" (err)					\
+		     : "D" (st), "a" (lmask), "d" (hmask)		\
+		     : "memory")
+
+#define XSTATE_XRESTORE(st, lmask, hmask, err)				\
+	asm volatile(ALTERNATIVE(XRSTOR,				\
+				 XRSTORS, X86_FEATURE_XSAVES)		\
+		     "\n"						\
+		     ".pushsection .fixup,\"ax\"\n"			\
+		     "3: movl $-1, %[err]\n"				\
+		     "jmp 663b\n"					\
+		     ".popsection\n"					\
+		     _ASM_EXTABLE(661b, 3b)				\
+		     : [err] "=r" (err)					\
+		     : "D" (st), "m" (*st), "a" (lmask), "d" (hmask)	\
+		     : "memory")
+
+
 /*
  * This function is called only during boot time when x86 caps are not set
  * up and alternative can not be used yet.
@@ -77,20 +122,11 @@ static inline int xsave_state_booting(struct xsave_struct *fx, u64 mask)
 	u32 hmask = mask >> 32;
 	int err = 0;
 
-	WARN_ON(system_state != SYSTEM_BOOTING);
-
-	if (boot_cpu_has(X86_FEATURE_XSAVES))
-		asm volatile("1:"XSAVES"\n\t"
-			"2:\n\t"
-			     xstate_fault
-			: "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
-			:   "memory");
+	if (static_cpu_has_safe(X86_FEATURE_XSAVES))
+		XSTATE_OP(XSAVES, fx, lmask, hmask, err);
 	else
-		asm volatile("1:"XSAVE"\n\t"
-			"2:\n\t"
-			     xstate_fault
-			: "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
-			:   "memory");
+		XSTATE_OP(XSAVE, fx, lmask, hmask, err);
+
 	return err;
 }
...

To be continued...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-16  9:35                           ` Borislav Petkov
  2015-03-16 10:28                             ` Ingo Molnar
@ 2015-03-16 14:39                             ` Oleg Nesterov
  2015-03-16 15:26                               ` Borislav Petkov
  2015-03-16 15:34                             ` Andy Lutomirski
  2 siblings, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-16 14:39 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/16, Borislav Petkov wrote:
>
> -static inline void drop_init_fpu(struct task_struct *tsk)
> +/*
> + * Reset the FPU state in the eager case and drop it in the lazy case (later use
> + * will reinit it).
> + */
> +static inline void fpu_reset_state(struct task_struct *tsk)

ACK!

Perhaps you can also find a beter name for __save_init_fpu/etc ;) The
name clearly suggests that it does "save + init" while in fact it does
"save and may be destroy FPU state". At least for the callers, the fact
that "destroy" is actually "init" doesn't really matter.

But lets not rename it right now. This can conflict with the fixes we
need to do first.

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user
  2015-03-16 14:36   ` [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user Borislav Petkov
@ 2015-03-16 14:57     ` Oleg Nesterov
  2015-03-16 17:58       ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-16 14:57 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas, H. Peter Anvin

On 03/16, Borislav Petkov wrote:
>
> On Sun, Mar 15, 2015 at 05:49:48PM +0100, Oleg Nesterov wrote:
> >
> > And I agree with Quentin, user_insn/check_insn can be improved to allow
> > clobbers, more flexible "output", etc. But imo they already can make this
> > code look a bit better, and "xstate_fault" must die eventually.
>
> FWIW, I did poke at that but there's still something wrong with my macros, will
> take a look when I get a chance:

Sure, I won't argue if we use the new macros instead. But we already have
check_insn/user_insn, why not use them?

For example,

> +#define XSTATE_XSAVE(st, lmask, hmask, err)				\
> +	asm volatile(ALTERNATIVE_2(XSAVE,				\
> +				   XSAVEOPT, X86_FEATURE_XSAVEOPT,	\
> +				   XSAVES,   X86_FEATURE_XSAVES)	\
> +		     "\n"						\
> +		     ".pushsection .fixup,\"ax\"\n"			\
> +		     "3: movl $-1, %[err]\n"				\
> +		     "jmp " alt_end_marker "b\n"			\
> +		     ".popsection\n"					\
> +		     _ASM_EXTABLE(661b, 3b)				\
> +		     : [err] "=r" (err)					\
> +		     : "D" (st), "a" (lmask), "d" (hmask)		\
> +		     : "memory")
> +

to me check_insn(ALTERNATIVE_2(...)) looks better. Except we need the
clobber. It is not easy to read the code like this, imo it would be better
to avoid copy-and-paste and use the helpers we already have. Just we need
to improve them.


But let me repeat, I leave this to you and others, I do not understand
asm enough.

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-16 14:39                             ` Oleg Nesterov
@ 2015-03-16 15:26                               ` Borislav Petkov
  0 siblings, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2015-03-16 15:26 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On Mon, Mar 16, 2015 at 03:39:44PM +0100, Oleg Nesterov wrote:
> ACK!

Thanks.

> Perhaps you can also find a beter name for __save_init_fpu/etc ;) The
> name clearly suggests that it does "save + init" while in fact it does
> "save and may be destroy FPU state". At least for the callers, the fact
> that "destroy" is actually "init" doesn't really matter.
> 
> But lets not rename it right now. This can conflict with the fixes we
> need to do first.

Right, so I think we should do fixes/cleanups first so that we can lose
all the fat/cruft this code has grown. I'll make looking at that code
easier later.

I'll push out everything I have collected so far for people to see after
I've finished bisecting another tip/master regression from today.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-16  9:35                           ` Borislav Petkov
  2015-03-16 10:28                             ` Ingo Molnar
  2015-03-16 14:39                             ` Oleg Nesterov
@ 2015-03-16 15:34                             ` Andy Lutomirski
  2015-03-16 15:35                               ` Borislav Petkov
  2 siblings, 1 reply; 126+ messages in thread
From: Andy Lutomirski @ 2015-03-16 15:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Fenghua Yu, Quentin Casasnovas, Suresh Siddha, Linus Torvalds,
	Dave Hansen, Oleg Nesterov, Rik van Riel, Pekka Riikonen, LKML,
	Ingo Molnar

On Mar 16, 2015 2:37 AM, "Borislav Petkov" <bp@suse.de> wrote:
>
> On Sun, Mar 15, 2015 at 09:38:16PM +0100, Borislav Petkov wrote:
> > How about we call this function fpu_reset_state() instead?
>
> IOW, something like this. Reading the usage sites actually make much
> more sense to me now. It could be just me though...
>
> :-)
>
> ---
> From: Borislav Petkov <bp@suse.de>
> Date: Mon, 16 Mar 2015 10:21:55 +0100
> Subject: [PATCH] x86/fpu: Rename drop_init_fpu() to fpu_reset_state()
>
> Call it what it does and in accordance with the context where it is
> used: we reset the FPU state either because we were unable to restore it
> from the one saved in the task or because we simply want to reset it.

Nice!  This is the first time I've actually understood that :). I
still have no idea what "init" referred to...

--Andy

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread()
  2015-03-16 15:34                             ` Andy Lutomirski
@ 2015-03-16 15:35                               ` Borislav Petkov
  0 siblings, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2015-03-16 15:35 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Fenghua Yu, Quentin Casasnovas, Suresh Siddha, Linus Torvalds,
	Dave Hansen, Oleg Nesterov, Rik van Riel, Pekka Riikonen, LKML,
	Ingo Molnar

On Mon, Mar 16, 2015 at 08:34:15AM -0700, Andy Lutomirski wrote:
> Nice!  This is the first time I've actually understood that :). I
> still have no idea what "init" referred to...

Haha, you're not the only one :-)

We're learning as we go.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user
  2015-03-16 14:57     ` Oleg Nesterov
@ 2015-03-16 17:58       ` Borislav Petkov
  0 siblings, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2015-03-16 17:58 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas, H. Peter Anvin

On Mon, Mar 16, 2015 at 03:57:42PM +0100, Oleg Nesterov wrote:
> Sure, I won't argue if we use the new macros instead. But we already have
> check_insn/user_insn, why not use them?

It certainly is worth a try.

> to me check_insn(ALTERNATIVE_2(...)) looks better. Except we need the
> clobber. It is not easy to read the code like this, imo it would be better
> to avoid copy-and-paste and use the helpers we already have. Just we need
> to improve them.

Yeah, I can't say I'm crazy about those check_insn macros either but
certainly worth a try. Apparently they're the way to go in the fpu code
so let's try to use them...

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH 2/4] x86/fpu: introduce restore_init_xstate()
  2015-03-13 15:20         ` Borislav Petkov
@ 2015-03-16 19:05           ` Rik van Riel
  0 siblings, 0 replies; 126+ messages in thread
From: Rik van Riel @ 2015-03-16 19:05 UTC (permalink / raw)
  To: Borislav Petkov, Oleg Nesterov
  Cc: Dave Hansen, Ingo Molnar, Andy Lutomirski, Linus Torvalds,
	Pekka Riikonen, Suresh Siddha, LKML, Yu, Fenghua,
	Quentin Casasnovas

On 03/13/2015 11:20 AM, Borislav Petkov wrote:
> On Fri, Mar 13, 2015 at 03:39:28PM +0100, Oleg Nesterov wrote:
>> This too needs cleanups. But later ;)
>>
>> Note that xstate_enable_boot_cpu is not called if !cpu_has_xsave, see the
>> check in xsave_init(). Howver, eagerfpu=on will force eager_fpu_init() which
>> calls eager_fpu_init_bp().
>
> Yahaa.
>
> This FPU cleanup fun will keep us busy until Christmas.
>
>> Btw, I was also going to kill eager_fpu_init_bp(). Probably I will
>> send the patch today.
>
> Yap, and I'm wondering if we should kill those func ptrs games there. We
> have BSP and AP CPU init paths so we can be much cleaner there.

I'll wait with the optimization that defers FPU state
loading to the kernel -> user space boundary, until this
current stuff has stabilized...


^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user
  2015-03-15 16:49 ` [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user Oleg Nesterov
                     ` (2 preceding siblings ...)
  2015-03-16 14:36   ` [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user Borislav Petkov
@ 2015-03-16 22:37   ` Quentin Casasnovas
  2015-03-17  9:47     ` Borislav Petkov
  3 siblings, 1 reply; 126+ messages in thread
From: Quentin Casasnovas @ 2015-03-16 22:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Borislav Petkov, Ingo Molnar, Andy Lutomirski,
	Linus Torvalds, Pekka Riikonen, Rik van Riel, Suresh Siddha,
	LKML, Yu, Fenghua, Quentin Casasnovas, H. Peter Anvin

On Sun, Mar 15, 2015 at 05:49:48PM +0100, Oleg Nesterov wrote:
> Hello.
> 
> Another a bit off-topic change, but I'd like to finish the discussion
> with Quentin.
>
> And almost cosmetic. But I added the RFC tag to make it clear that this
> needs a review from someone who understands gcc-asm better. In particular
> I am worried if that dummy "=m" (*buf) is actually correct.

Derp, I might have given the wrong impression but I'm certainly not that
guy who understands gcc-asm better!

> 
> And I agree with Quentin, user_insn/check_insn can be improved to allow
> clobbers, more flexible "output", etc. But imo they already can make this
> code look a bit better, and "xstate_fault" must die eventually.
>

So I really think we should have a clean user_insn() for which we could add
arbitrary outputs, inputs and clobbers and which would follow more closely
the extended assembly syntax (so it's easier to parse for the elite who
actually understands GCC extended asm ;).  Would something like this be
Okay?

/**
 * This is so multiple outputs/inputs/clobbers are interpreted as a
 * single macro argument.
 */
#define SINGLE_ARG(...) __VA_ARGS__

#define PASTE_1(...) , __VA_ARGS__
#define PASTE_0(...)
#define PASTE__(Empty, ...) PASTE_ ## Empty(__VA_ARGS__)
#define PASTE_(Empty, ...) PASTE__(Empty, __VA_ARGS__)

#define IS_EMPTY(...) 1

/**
 * Prints ", __VA_ARGS__" (note the comma as prefix) if there are any
 * argument and does not print anything otherwise.
 */
#define PASTE(...) PASTE_(IS_EMPTY(__VA_ARGS), __VA_ARGS__)

#define __user_insn(stingified_instructions, outputs, inputs, clobbers) \
	({								\
		int err;						\
		asm volatile(ASM_STAC                  "\n\t"		\
	                     "try:                      \n\t"		\
	                     stringified_instructions			\
	                     "finally:                  \n\t"		\
	                     ASM_CLAC                  "\n\t"		\
			     ".section .fixup,\"ax\"    \n\t"		\
			     "catch:  movl $-1,%[err]   \n\t"		\
			     "    jmp  finallyb         \n\t"		\
			     ".previous                 \n\t"		\
			     _ASM_EXTABLE(tryb, catchb)			\
			     : [err] "=r" (err) PASTE(outputs)		\
			     : "0" (0) PASTE(inputs)			\
			     : clobbers)				\
			err;						\
	})

Then the callers can use SINGLE_ARG() to pass multiple outputs, inputs or
clobbers as a single argument to __user_insn like this:

  __user_insn("btl [var2], %0		\n\t",
  	      , /* no outputs, no need for dummy arg */
	      SINGLE_ARG("r" (var1), [var2] "r" (var2)), /* two inputs */
	      "cc");

As added bonus, we would not need any dummy operand when there is no
outputs, which I think is preferable.

The IS_EMPTY() macro could be implemented like on the following article
  https://gustedt.wordpress.com/2010/06/08/detect-empty-macro-arguments/

We could then write safe variants of the alternative_*() macros to call
check_insn() since I don't think it is possible as is to do
check_insn(alternative_2(...), ...) as you suggested.

>
> Quentin, could you review? I can't find your last email about this change,
> and I can't recall if you agree or not.
> 

I can give it a try, but really not the best person to ack/nack this kind
of changes..

Quentin

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH RFC 2/2] x86/fpu: change xsave_user() and xrestore_user() to use __user_insn()
  2015-03-15 16:50   ` [PATCH RFC 2/2] x86/fpu: change xsave_user() and xrestore_user() to use __user_insn() Oleg Nesterov
@ 2015-03-16 22:43     ` Quentin Casasnovas
  2015-03-17  9:35       ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Quentin Casasnovas @ 2015-03-16 22:43 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dave Hansen, Borislav Petkov, Ingo Molnar, Andy Lutomirski,
	Linus Torvalds, Pekka Riikonen, Rik van Riel, Suresh Siddha,
	LKML, Yu, Fenghua, Quentin Casasnovas, H. Peter Anvin

On Sun, Mar 15, 2015 at 05:50:36PM +0100, Oleg Nesterov wrote:
> Change xsave_user() and xrestore_user() to avoid the (imho) horrible
> and should-die xstate_fault helper, they both can use __user_insn().
> 
> This also removes the "memory" clobber but I think it was never needed.
> xrestore_user() doesn't change the memory, it only changes the FPU regs.
> xsave_user() does write to "*buf" but this memory is "__user", we must
> never access it directly.
> 

So I'm really not sure about all the callers, but it seems that this
instruction can be used to restore more than just the FPU state and I've no
idea how much can change underneath gcc when we do so.  It "feels" safe
when saving the CPU state, not sure for the restoring case.

> 
> This patch adds '"=m" (*buf)' in both cases, but this is only because
> currently __user_insn() needs the non-empty "output" arg.
>

See if my suggestion on your front e-mail works for you.

> 
> Note: I think we can change all other xstate_fault users too, including
> alternative_input's.
>

I'd agree but I think we'll need new safe versions of alternative_input_*()
macros as opposed to just using check_insn(alternative_input_2(...),...).

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH RFC 2/2] x86/fpu: change xsave_user() and xrestore_user() to use __user_insn()
  2015-03-16 22:43     ` Quentin Casasnovas
@ 2015-03-17  9:35       ` Borislav Petkov
  0 siblings, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2015-03-17  9:35 UTC (permalink / raw)
  To: Quentin Casasnovas
  Cc: Oleg Nesterov, Dave Hansen, Ingo Molnar, Andy Lutomirski,
	Linus Torvalds, Pekka Riikonen, Rik van Riel, Suresh Siddha,
	LKML, Yu, Fenghua, H. Peter Anvin

On Mon, Mar 16, 2015 at 11:43:01PM +0100, Quentin Casasnovas wrote:
> So I'm really not sure about all the callers, but it seems that this
> instruction can be used to restore more than just the FPU state and I've no
> idea how much can change underneath gcc when we do so.  It "feels" safe
> when saving the CPU state, not sure for the restoring case.

The clobber is to prevent gcc from optimizing accesses around the asm
volatile statement. And as Oleg said, this is user memory so if we want
to touch it, we will have a compiler barrier somewhere around that code.
I certainly hope we do...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user
  2015-03-16 22:37   ` Quentin Casasnovas
@ 2015-03-17  9:47     ` Borislav Petkov
  2015-03-17 10:00       ` Quentin Casasnovas
  2015-03-17 10:07       ` Quentin Casasnovas
  0 siblings, 2 replies; 126+ messages in thread
From: Borislav Petkov @ 2015-03-17  9:47 UTC (permalink / raw)
  To: Quentin Casasnovas
  Cc: Oleg Nesterov, Dave Hansen, Ingo Molnar, Andy Lutomirski,
	Linus Torvalds, Pekka Riikonen, Rik van Riel, Suresh Siddha,
	LKML, Yu, Fenghua, H. Peter Anvin

On Mon, Mar 16, 2015 at 11:37:44PM +0100, Quentin Casasnovas wrote:

...

>   __user_insn("btl [var2], %0		\n\t",
>   	      , /* no outputs, no need for dummy arg */
> 	      SINGLE_ARG("r" (var1), [var2] "r" (var2)), /* two inputs */
> 	      "cc");

So this becomes pretty unreadable IMO. And we shouldn't go nuts with
optimizing this and sacrifice readability a lot.

TBH, I'd much prefer:

	if (static_cpu_has_safe(X86_FEATURE_XSAVEOPT)) {
		check_insn(XSAVEOPT, ...);
		return;
	}

	if (static_cpu_has_safe(X86_FEATURE_XSAVES)) {
		check_insn(XSAVES);
		return;
	}

	check_insn(XSAVE, ...)

which is pretty clear.

We can even go a step further and add a static_cpu_has_safe thing which
checks two features instead of one. The penalty we'd get is a single
inconditional JMP which in the face of XSAVE* is nothing.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user
  2015-03-17  9:47     ` Borislav Petkov
@ 2015-03-17 10:00       ` Quentin Casasnovas
  2015-03-17 11:20         ` Borislav Petkov
  2015-03-17 10:07       ` Quentin Casasnovas
  1 sibling, 1 reply; 126+ messages in thread
From: Quentin Casasnovas @ 2015-03-17 10:00 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Quentin Casasnovas, Oleg Nesterov, Dave Hansen, Ingo Molnar,
	Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, H. Peter Anvin

On Tue, Mar 17, 2015 at 10:47:50AM +0100, Borislav Petkov wrote:
> On Mon, Mar 16, 2015 at 11:37:44PM +0100, Quentin Casasnovas wrote:
> 
> ...
> 
> >   __user_insn("btl [var2], %0		\n\t",
> >   	      , /* no outputs, no need for dummy arg */
> > 	      SINGLE_ARG("r" (var1), [var2] "r" (var2)), /* two inputs */
> > 	      "cc");
> 
> So this becomes pretty unreadable IMO. And we shouldn't go nuts with
> optimizing this and sacrifice readability a lot.
> 
> TBH, I'd much prefer:
> 
> 	if (static_cpu_has_safe(X86_FEATURE_XSAVEOPT)) {
> 		check_insn(XSAVEOPT, ...);
                                     ^
> 		return;
> 	}
> 
> 	if (static_cpu_has_safe(X86_FEATURE_XSAVES)) {
> 		check_insn(XSAVES);
> 		return;
> 	}
> 
> 	check_insn(XSAVE, ...)
> 
> which is pretty clear.
>

Fair point, but AFAIUI we can't do check_insn(XSAVES) alone as of today,
and the "..." in your "check_isns(XSAVEOPT, ...)" code above would still
need to contain the outputs operands.

My suggestion was to rework (check|user)_insn() so it can accept zero to N
inputs, outputs or clobbers to make it generic enough so the snippet of
code you've written becomes valid, and maybe move those macro where they
can be used for other sub-systems?

Am I missing something?

Quentin

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user
  2015-03-17  9:47     ` Borislav Petkov
  2015-03-17 10:00       ` Quentin Casasnovas
@ 2015-03-17 10:07       ` Quentin Casasnovas
  1 sibling, 0 replies; 126+ messages in thread
From: Quentin Casasnovas @ 2015-03-17 10:07 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Quentin Casasnovas, Oleg Nesterov, Dave Hansen, Ingo Molnar,
	Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, H. Peter Anvin

On Tue, Mar 17, 2015 at 10:47:50AM +0100, Borislav Petkov wrote:
> 
> We can even go a step further and add a static_cpu_has_safe thing which
> checks two features instead of one. The penalty we'd get is a single
> inconditional JMP which in the face of XSAVE* is nothing.
> 

What was the argument against adding a check_alternative_input(...) so the
ex_table entry are managed inside the macro directly?  It leaves less room
for errors and would still be reable IMO:

err = check_alternative_input_2(XSAVE,
				XSAVESOPT, X86_FEATURE_XSAVEOPT
      				XSAVES, X86_FEATURE_XSAVES,
				<inputs>, <outputs>, <clobbers>);
if (err)
   do_something();

That hypothetical check_alternative_input_2() would call a rework of
check_insn() supporting an arbitrary numbers of inputs, outputs and
clobbers as drafted in my previous e-mail.

Quentin

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user
  2015-03-17 10:00       ` Quentin Casasnovas
@ 2015-03-17 11:20         ` Borislav Petkov
  2015-03-17 11:36           ` Quentin Casasnovas
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-17 11:20 UTC (permalink / raw)
  To: Quentin Casasnovas
  Cc: Oleg Nesterov, Dave Hansen, Ingo Molnar, Andy Lutomirski,
	Linus Torvalds, Pekka Riikonen, Rik van Riel, Suresh Siddha,
	LKML, Yu, Fenghua, H. Peter Anvin

On Tue, Mar 17, 2015 at 11:00:46AM +0100, Quentin Casasnovas wrote:
> Fair point, but AFAIUI we can't do check_insn(XSAVES) alone as of today,
> and the "..." in your "check_isns(XSAVEOPT, ...)" code above would still
> need to contain the outputs operands.

I think we can do this (see diff the end of this mail).

It still explodes my guest with:

[    2.940379] Freeing unused kernel memory: 2860K (ffffffff81a39000 - ffffffff81d04000)
[    2.980722] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    2.980722] 
[    2.984096] CPU: 1 PID: 1 Comm: init Not tainted 4.0.0-rc3+ #22
[    2.984096] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[    2.984096]  ffff88007bcf0000 ffff88007bcfbc58 ffffffff81675eb9 0000000000000001
[    2.984096]  ffffffff818a9ae8 ffff88007bcfbcd8 ffffffff816745be ffff88007bcf0000
[    2.984096]  0000000000000010 ffff88007bcfbce8 ffff88007bcfbc88 ffff88007bfcc0b0
[    2.984096] Call Trace:
[    2.984096]  [<ffffffff81675eb9>] dump_stack+0x4f/0x7b
[    2.984096]  [<ffffffff816745be>] panic+0xc0/0x1dc
[    2.984096]  [<ffffffff81056143>] do_exit+0xc13/0xc50
[    2.984096]  [<ffffffff810574a4>] do_group_exit+0x54/0xe0
[    2.984096]  [<ffffffff81065526>] get_signal+0x266/0xab0
[    2.984096]  [<ffffffff81002523>] do_signal+0x33/0xba0
[    2.984096]  [<ffffffff8109c88e>] ? put_lock_stats.isra.19+0xe/0x30
[    2.984096]  [<ffffffff8167dd81>] ? _raw_spin_unlock_irqrestore+0x41/0x80
[    2.984096]  [<ffffffff8167dd8b>] ? _raw_spin_unlock_irqrestore+0x4b/0x80
[    2.984096]  [<ffffffff8167f9f1>] ? retint_signal+0x11/0x90
[    2.984096]  [<ffffffff810030f5>] do_notify_resume+0x65/0x80
[    2.984096]  [<ffffffff8167fa26>] retint_signal+0x46/0x90
[    2.984096] Kernel Offset: disabled
[    2.984096] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

because, AFAICT and from debugging so far, we call xrstor_state() without a
previous xsave_state() in that path:

[    3.304551] Freeing unused kernel memory: 2860K (ffffffff81a39000 - ffffffff81d04000)
[    3.346556] traps: xrstor_state
[    3.350418] CPU: 1 PID: 1 Comm: init Not tainted 4.0.0-rc3+ #21
[    3.350418] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[    3.350418]  ffff88007b454000 ffff88007bcfbf18 ffffffff81676079 0000000000000000
[    3.350418]  ffff88007bcf0000 ffff88007bcfbf38 ffffffff810038b6 0000000000000000
[    3.350418]  00007f6b95ba91c8 ffff88007bcfbf48 ffffffff81004433 00007ffeecd31bb0
[    3.350418] Call Trace:
[    3.350418]  [<ffffffff81676079>] dump_stack+0x4f/0x7b
[    3.350418]  [<ffffffff810038b6>] math_state_restore+0xa6/0x220
[    3.350418]  [<ffffffff81004433>] do_device_not_available+0x23/0x30
[    3.350418]  [<ffffffff81680865>] device_not_available+0x15/0x20

so I need to sort that one out first.

But including the fault exception table in the macro is already an
improvement IMO.

Thanks.

---
diff --git a/arch/x86/include/asm/xsave.h b/arch/x86/include/asm/xsave.h
index c9a6d68b8d62..0d0cc053c7cc 100644
--- a/arch/x86/include/asm/xsave.h
+++ b/arch/x86/include/asm/xsave.h
@@ -67,6 +67,51 @@ extern int init_fpu(struct task_struct *child);
 			_ASM_EXTABLE(1b, 3b)		\
 			: [err] "=r" (err)
 
+#define XSTATE_OP(op, st, lmask, hmask, err)				\
+	asm volatile("1:" op "\n\t"					\
+		     "2:\n\t"						\
+		     ".pushsection .fixup,\"ax\"\n\t"			\
+		     "3: movl $-1,%[err]\n\t"				\
+		     "jmp 2b\n\t"					\
+		     ".popsection\n\t"					\
+		     _ASM_EXTABLE(1b, 3b)				\
+		     : [err] "=r" (err)					\
+		     : "D" (st), "m" (*st), "a" (lmask), "d" (hmask)	\
+		     : "memory")
+
+/*
+ * 661 and alt_end_marker labels below are defined in ALTERNATIVE*
+ * and we're reusing  them here so as not to clutter this macro
+ * unnecessarily.
+ */
+#define XSTATE_XSAVE(st, lmask, hmask, err)				\
+	asm volatile(ALTERNATIVE_2(XSAVE,				\
+				   XSAVEOPT, X86_FEATURE_XSAVEOPT,	\
+				   XSAVES,   X86_FEATURE_XSAVES)	\
+		     "\n"						\
+		     ".pushsection .fixup,\"ax\"\n"			\
+		     "3: movl $-1, %[err]\n"				\
+		     "jmp " alt_end_marker "b\n"			\
+		     ".popsection\n"					\
+		     _ASM_EXTABLE(661b, 3b)				\
+		     : [err] "=r" (err)					\
+		     : "D" (st), "a" (lmask), "d" (hmask)		\
+		     : "memory")
+
+#define XSTATE_XRESTORE(st, lmask, hmask, err)				\
+	asm volatile(ALTERNATIVE(XRSTOR,				\
+				 XRSTORS, X86_FEATURE_XSAVES)		\
+		     "\n"						\
+		     ".pushsection .fixup,\"ax\"\n"			\
+		     "3: movl $-1, %[err]\n"				\
+		     "jmp 663b\n"					\
+		     ".popsection\n"					\
+		     _ASM_EXTABLE(661b, 3b)				\
+		     : [err] "=r" (err)					\
+		     : "D" (st), "m" (*st), "a" (lmask), "d" (hmask)	\
+		     : "memory")
+
+
 /*
  * This function is called only during boot time when x86 caps are not set
  * up and alternative can not be used yet.
@@ -77,20 +122,11 @@ static inline int xsave_state_booting(struct xsave_struct *fx, u64 mask)
 	u32 hmask = mask >> 32;
 	int err = 0;
 
-	WARN_ON(system_state != SYSTEM_BOOTING);
-
-	if (boot_cpu_has(X86_FEATURE_XSAVES))
-		asm volatile("1:"XSAVES"\n\t"
-			"2:\n\t"
-			     xstate_fault
-			: "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
-			:   "memory");
+	if (static_cpu_has_safe(X86_FEATURE_XSAVES))
+		XSTATE_OP(XSAVES, fx, lmask, hmask, err);
 	else
-		asm volatile("1:"XSAVE"\n\t"
-			"2:\n\t"
-			     xstate_fault
-			: "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
-			:   "memory");
+		XSTATE_OP(XSAVE, fx, lmask, hmask, err);
+
 	return err;
 }
 
@@ -104,20 +140,12 @@ static inline int xrstor_state_booting(struct xsave_struct *fx, u64 mask)
 	u32 hmask = mask >> 32;
 	int err = 0;
 
-	WARN_ON(system_state != SYSTEM_BOOTING);
+       WARN_ON(system_state != SYSTEM_BOOTING);
 
-	if (boot_cpu_has(X86_FEATURE_XSAVES))
-		asm volatile("1:"XRSTORS"\n\t"
-			"2:\n\t"
-			     xstate_fault
-			: "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
-			:   "memory");
+	if (static_cpu_has_safe(X86_FEATURE_XSAVES))
+		XSTATE_OP(XRSTORS, fx, lmask, hmask, err);
 	else
-		asm volatile("1:"XRSTOR"\n\t"
-			"2:\n\t"
-			     xstate_fault
-			: "D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
-			:   "memory");
+		XSTATE_OP(XRSTOR, fx, lmask, hmask, err);
 	return err;
 }
 
@@ -141,18 +169,7 @@ static inline int xsave_state(struct xsave_struct *fx, u64 mask)
 	 *
 	 * If none of xsaves and xsaveopt is enabled, use xsave.
 	 */
-	alternative_input_2(
-		"1:"XSAVE,
-		XSAVEOPT,
-		X86_FEATURE_XSAVEOPT,
-		XSAVES,
-		X86_FEATURE_XSAVES,
-		[fx] "D" (fx), "a" (lmask), "d" (hmask) :
-		"memory");
-	asm volatile("2:\n\t"
-		     xstate_fault
-		     : "0" (0)
-		     : "memory");
+	XSTATE_XSAVE(fx, lmask, hmask, err);
 
 	return err;
 }
@@ -170,17 +187,7 @@ static inline int xrstor_state(struct xsave_struct *fx, u64 mask)
 	 * Use xrstors to restore context if it is enabled. xrstors supports
 	 * compacted format of xsave area which is not supported by xrstor.
 	 */
-	alternative_input(
-		"1: " XRSTOR,
-		XRSTORS,
-		X86_FEATURE_XSAVES,
-		"D" (fx), "m" (*fx), "a" (lmask), "d" (hmask)
-		: "memory");
-
-	asm volatile("2:\n"
-		     xstate_fault
-		     : "0" (0)
-		     : "memory");
+	XSTATE_XRESTORE(fx, lmask, hmask, err);
 
 	return err;
 }

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* Re: [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user
  2015-03-17 11:20         ` Borislav Petkov
@ 2015-03-17 11:36           ` Quentin Casasnovas
  2015-03-17 12:07             ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Quentin Casasnovas @ 2015-03-17 11:36 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Quentin Casasnovas, Oleg Nesterov, Dave Hansen, Ingo Molnar,
	Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, H. Peter Anvin

On Tue, Mar 17, 2015 at 12:20:15PM +0100, Borislav Petkov wrote:
> On Tue, Mar 17, 2015 at 11:00:46AM +0100, Quentin Casasnovas wrote:
> > Fair point, but AFAIUI we can't do check_insn(XSAVES) alone as of today,
> > and the "..." in your "check_isns(XSAVEOPT, ...)" code above would still
> > need to contain the outputs operands.
> 
> I think we can do this (see diff the end of this mail).
>

Right, FWIW I think your approach is valid, but not very generic.  Re-using
the check_insn() and making it more generic so we can widen its use felt
like a better approach to me.

AIUI, you didn't like my earlier draft because it wasn't very readable, but
I think this was just due to the (bad) example I took and by reworking it a
bit more, we could end up with the code you previously envisionned:

  if (static_cpu_has_safe(X86_FEATURE_XSAVEOPT))
          return check_insn(XSAVEOPT, xsave_buf, ...);
  else if (static_cpu_has_safe(X86_FEATURE_XSAVES)
          return check_insn(XSAVES, xsave_buf, ...);
  else
	  return check_insn(XSAVE, xsave_buf, ...)

Or maybe you were saying the actual macros weren't readable?

> [...]
> 
> But including the fault exception table in the macro is already an
> improvement IMO.

Agreed, it already looks much nicer with your diff.

Quentin

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user
  2015-03-17 11:36           ` Quentin Casasnovas
@ 2015-03-17 12:07             ` Borislav Petkov
  2015-03-18  9:06               ` Quentin Casasnovas
  0 siblings, 1 reply; 126+ messages in thread
From: Borislav Petkov @ 2015-03-17 12:07 UTC (permalink / raw)
  To: Quentin Casasnovas
  Cc: Oleg Nesterov, Dave Hansen, Ingo Molnar, Andy Lutomirski,
	Linus Torvalds, Pekka Riikonen, Rik van Riel, Suresh Siddha,
	LKML, Yu, Fenghua, H. Peter Anvin

On Tue, Mar 17, 2015 at 12:36:58PM +0100, Quentin Casasnovas wrote:
> Right, FWIW I think your approach is valid, but not very generic.  Re-using
> the check_insn() and making it more generic so we can widen its use felt
> like a better approach to me.
> 
> AIUI, you didn't like my earlier draft because it wasn't very readable, but
> I think this was just due to the (bad) example I took and by reworking it a
> bit more, we could end up with the code you previously envisionned:
> 
>   if (static_cpu_has_safe(X86_FEATURE_XSAVEOPT))
>           return check_insn(XSAVEOPT, xsave_buf, ...);
>   else if (static_cpu_has_safe(X86_FEATURE_XSAVES)
>           return check_insn(XSAVES, xsave_buf, ...);
>   else
> 	  return check_insn(XSAVE, xsave_buf, ...)
> 
> Or maybe you were saying the actual macros weren't readable?

Well, TBH, I don't like check_insn() either:

* naming is generic but it is not really used in a generic way - only in
FPU code.

* having variable arguments makes it really really unreadable to me when
you start looking at how it is called:

	...
        if (config_enabled(CONFIG_X86_32))
                return check_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
	...

The only thing that lets me differentiate what is input and what is
output is the "=" in there and you have to know inline asm to know that.

* The arguments have the same syntax as inline asm() arguments but you
don't see "asm volatile" there so it looks like something half-arsed in
between.

* the first argument is the instruction string with the operands which
gets stringified, yuck!

Do I need to say more? :-)

So what I would like is if we killed those half-arsed macros and
use either generic, clean macros like the alternatives or define
FPU-specific ones which do what FPU code needs done. If the second,
they should be self-contained, all in one place so that you don't have
to grep like crazy to rhyme together what the macro does - nothing like
xsave_fault. Yuck.

Or even extend the generic macros to fit the FPU use case, if possible
and if it makes sense.

Oh, and we shouldn't leave readability somewhere on the road.

I hope you catch my drift here.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user
  2015-03-17 12:07             ` Borislav Petkov
@ 2015-03-18  9:06               ` Quentin Casasnovas
  2015-03-18  9:53                 ` Borislav Petkov
  0 siblings, 1 reply; 126+ messages in thread
From: Quentin Casasnovas @ 2015-03-18  9:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Quentin Casasnovas, Oleg Nesterov, Dave Hansen, Ingo Molnar,
	Andy Lutomirski, Linus Torvalds, Pekka Riikonen, Rik van Riel,
	Suresh Siddha, LKML, Yu, Fenghua, H. Peter Anvin

On Tue, Mar 17, 2015 at 01:07:39PM +0100, Borislav Petkov wrote:
> On Tue, Mar 17, 2015 at 12:36:58PM +0100, Quentin Casasnovas wrote:
> > Right, FWIW I think your approach is valid, but not very generic.  Re-using
> > the check_insn() and making it more generic so we can widen its use felt
> > like a better approach to me.
> > 
> > AIUI, you didn't like my earlier draft because it wasn't very readable, but
> > I think this was just due to the (bad) example I took and by reworking it a
> > bit more, we could end up with the code you previously envisionned:
> > 
> >   if (static_cpu_has_safe(X86_FEATURE_XSAVEOPT))
> >           return check_insn(XSAVEOPT, xsave_buf, ...);
> >   else if (static_cpu_has_safe(X86_FEATURE_XSAVES)
> >           return check_insn(XSAVES, xsave_buf, ...);
> >   else
> > 	  return check_insn(XSAVE, xsave_buf, ...)
> > 
> > Or maybe you were saying the actual macros weren't readable?
> 
> Well, TBH, I don't like check_insn() either:
> 
> * naming is generic but it is not really used in a generic way - only in
> FPU code.

We could make it generic enough so it becomes useful elsewhere as well.

> 
> * having variable arguments makes it really really unreadable to me when
> you start looking at how it is called:
> 
> 	...
>         if (config_enabled(CONFIG_X86_32))
>                 return check_insn(fxrstor %[fx], "=m" (*fx), [fx] "m" (*fx));
> 	...
> 
> The only thing that lets me differentiate what is input and what is
> output is the "=" in there and you have to know inline asm to know that.
>

It gets even worse with the xstate_fault macro which silently includes the
output operands..

> 
> * The arguments have the same syntax as inline asm() arguments but you
> don't see "asm volatile" there so it looks like something half-arsed in
> between.
> 
> * the first argument is the instruction string with the operands which
> gets stringified, yuck!
> 

What if we renamed it to check_asm()/check_user_asm() and have the first
argument be a string, like an asm statement?  So basically check_asm()
would be exactly like an asm() statement except that it'll use a comma to
separate the input, output and clobber operands instead of a colon, and
would protect the first instruction of the assembler template.

        if (config_enabled(CONFIG_X86_32))
                return check_user_asm("fxrstor %[fx]", [fx] "=m" (*fx),,);

Then we can move that macro up the headers so it can be used elsewhere.
Looks more reable to me than how how we'd write that manually:

        if (config_enabled(CONFIG_X86_32) {
                volatile asm(ASM_STAC
		             "1: fxrstor %[fx]        \n\t"
		             "2:		      \n\t"
		             ASM_CLAC
		             ".section .fixup,\"ax\"  \n\t"
		             "3: movl $-1, %0	      \n\t"
		             "   jmp 2b		      \n\t"
		             ".previous		      \n\t"
		             _ASM_EXTABLE(1b, 3b)
		            : "=r" (err), [fx] "=m" (*fx)
		            : : )
	        return err;
	}

> Do I need to say more? :-)
> 
> So what I would like is if we killed those half-arsed macros and
> use either generic, clean macros like the alternatives or define
> FPU-specific ones which do what FPU code needs done. If the second,
> they should be self-contained, all in one place so that you don't have
> to grep like crazy to rhyme together what the macro does - nothing like
> xsave_fault. Yuck.
> 
> Or even extend the generic macros to fit the FPU use case, if possible
> and if it makes sense.
> 
> Oh, and we shouldn't leave readability somewhere on the road.

Readability will be a tough one since gcc extended asm isn't readable (IMO)
and we need to deal with the input/output/clobber operands syntax.

> 
> I hope you catch my drift here.
>

I do agree with all your above points, which is why I drafted that proposal
rework of check_insn() in my first e-mail :)  AFAICT, you were giving
arguments against the current macros, not against my previous proposal.

Quentin

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user
  2015-03-18  9:06               ` Quentin Casasnovas
@ 2015-03-18  9:53                 ` Borislav Petkov
  0 siblings, 0 replies; 126+ messages in thread
From: Borislav Petkov @ 2015-03-18  9:53 UTC (permalink / raw)
  To: Quentin Casasnovas
  Cc: Oleg Nesterov, Dave Hansen, Ingo Molnar, Andy Lutomirski,
	Linus Torvalds, Pekka Riikonen, Rik van Riel, Suresh Siddha,
	LKML, Yu, Fenghua, H. Peter Anvin

On Wed, Mar 18, 2015 at 10:06:32AM +0100, Quentin Casasnovas wrote:
> What if we renamed it to check_asm()/check_user_asm() and have the first
> argument be a string, like an asm statement?  So basically check_asm()
> would be exactly like an asm() statement except that it'll use a comma to
> separate the input, output and clobber operands instead of a colon, and
> would protect the first instruction of the assembler template.
> 
>         if (config_enabled(CONFIG_X86_32))
>                 return check_user_asm("fxrstor %[fx]", [fx] "=m" (*fx),,);
> 
> Then we can move that macro up the headers so it can be used elsewhere.

Actually, I don't like the variable arguments thing and am not sure at
all that there's a wide need for a check* thing across the tree. Maybe
there is but I haven't seen it yet.

So I'd much prefer macros of the sort:

	fxsave()
	xsave()
	xsaves()
	xrstor()
	...

(no need for the "check" thing)

which are self contained and get passed the needed operands. I.e.,

	fxsave(fx)

and fx is "struct i387_fxsave_struct __user *fx". We can wrap it in
inline functions for arguments checking too.

Also:

	xsave(state, lmask, hmask)

and the macro definition does the exception table thing. And we can have
a lower level __save_state() macro which is getting called by all those
so that we can save us the code duplication.

This is much cleaner IMO than the check_insn() things.

> Readability will be a tough one since gcc extended asm isn't readable
> (IMO) and we need to deal with the input/output/clobber operands
> syntax.

That's why I'm saying we wrap all that inline asm syntax in macros and
not pass inline asm-like but not really arguments to our macros.

> I do agree with all your above points, which is why I drafted that
> proposal rework of check_insn() in my first e-mail :) AFAICT, you were
> giving arguments against the current macros, not against my previous
> proposal.

All I'm saying is, it should be done cleanly instead of improving an
already not so optimal design.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 126+ messages in thread

* [tip:x86/fpu] x86/fpu: Document user_fpu_begin()
  2015-03-11 17:34   ` [PATCH 1/4] x86/fpu: document user_fpu_begin() Oleg Nesterov
  2015-03-13  9:47     ` Borislav Petkov
@ 2015-03-23 12:20     ` tip-bot for Oleg Nesterov
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot for Oleg Nesterov @ 2015-03-23 12:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, luto, linux-kernel, priikone, mingo, riel, bp, oleg, hpa,
	torvalds, quentin.casasnovas, fenghua.yu, dave.hansen, sbsiddha

Commit-ID:  fb14b4eadf73500d3b2104f031472a268562c047
Gitweb:     http://git.kernel.org/tip/fb14b4eadf73500d3b2104f031472a268562c047
Author:     Oleg Nesterov <oleg@redhat.com>
AuthorDate: Wed, 11 Mar 2015 18:34:09 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 23 Mar 2015 10:13:58 +0100

x86/fpu: Document user_fpu_begin()

Currently, user_fpu_begin() has a single caller and it is not clear why
do we actually need it and why we should not worry about preemption
right after preempt_enable().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150311173409.GC5032@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/fpu-internal.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h
index 810f20f..c58c930 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -508,10 +508,12 @@ static inline int restore_xstate_sig(void __user *buf, int ia32_frame)
 }
 
 /*
- * Need to be preemption-safe.
+ * Needs to be preemption-safe.
  *
  * NOTE! user_fpu_begin() must be used only immediately before restoring
- * it. This function does not do any save/restore on their own.
+ * the save state. It does not do any saving/restoring on its own. In
+ * lazy FPU mode, it is just an optimization to avoid a #NM exception,
+ * the task can lose the FPU right after preempt_enable().
  */
 static inline void user_fpu_begin(void)
 {

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip:x86/fpu] x86/fpu: Introduce restore_init_xstate()
  2015-03-11 17:34   ` [PATCH 2/4] x86/fpu: introduce restore_init_xstate() Oleg Nesterov
  2015-03-13 10:34     ` Borislav Petkov
@ 2015-03-23 12:20     ` tip-bot for Oleg Nesterov
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot for Oleg Nesterov @ 2015-03-23 12:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dave.hansen, sbsiddha, quentin.casasnovas, torvalds, bp,
	linux-kernel, hpa, riel, fenghua.yu, mingo, tglx, luto, oleg,
	priikone

Commit-ID:  8f4d81863ba4e8dfee93bd50840f1099a296251f
Gitweb:     http://git.kernel.org/tip/8f4d81863ba4e8dfee93bd50840f1099a296251f
Author:     Oleg Nesterov <oleg@redhat.com>
AuthorDate: Wed, 11 Mar 2015 18:34:29 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 23 Mar 2015 10:13:58 +0100

x86/fpu: Introduce restore_init_xstate()

Extract the "use_eager_fpu()" code from drop_init_fpu() into a new,
simple helper restore_init_xstate(). The next patch adds another user.

- It is not clear why we do not check use_fxsr() like fpu_restore_checking()
  does. eager_fpu_init_bp() calls setup_init_fpu_buf() too, and we have the
  "eagerfpu=on" kernel option.

- Ignoring the fact that init_xstate_buf is "struct xsave_struct *", not
  "union thread_xstate *", it is not clear why we can not simply use
  fpu_restore_checking() and avoid the code duplication.

- It is not clear why we can't call setup_init_fpu_buf() unconditionally
  to always create init_xstate_buf(). Then do_device_not_available() path
  (at least) could use restore_init_xstate() too. It doesn't need to init
  fpu->state, its content doesn't matter until unlazy_fpu()/__switch_to()/etc
  which overwrites this memory anyway.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150311173429.GD5032@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/fpu-internal.h | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/fpu-internal.h b/arch/x86/include/asm/fpu-internal.h
index c58c930..7d2f7fa 100644
--- a/arch/x86/include/asm/fpu-internal.h
+++ b/arch/x86/include/asm/fpu-internal.h
@@ -401,16 +401,20 @@ static inline void drop_fpu(struct task_struct *tsk)
 	preempt_enable();
 }
 
+static inline void restore_init_xstate(void)
+{
+	if (use_xsave())
+		xrstor_state(init_xstate_buf, -1);
+	else
+		fxrstor_checking(&init_xstate_buf->i387);
+}
+
 static inline void drop_init_fpu(struct task_struct *tsk)
 {
 	if (!use_eager_fpu())
 		drop_fpu(tsk);
-	else {
-		if (use_xsave())
-			xrstor_state(init_xstate_buf, -1);
-		else
-			fxrstor_checking(&init_xstate_buf->i387);
-	}
+	else
+		restore_init_xstate();
 }
 
 /*

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip:x86/fpu] x86/fpu: Use restore_init_xstate() instead of math_state_restore() on kthread exec
  2015-03-11 17:34   ` [PATCH 3/4] x86/fpu: use restore_init_xstate() instead of math_state_restore() on kthread exec Oleg Nesterov
  2015-03-13 10:48     ` Borislav Petkov
@ 2015-03-23 12:21     ` tip-bot for Oleg Nesterov
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot for Oleg Nesterov @ 2015-03-23 12:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, sbsiddha, priikone, bp, linux-kernel, luto,
	quentin.casasnovas, riel, hpa, oleg, torvalds, dave.hansen,
	mingo, fenghua.yu

Commit-ID:  9cb6ce823bbd1adbe15e30bd1435c84c2e271767
Gitweb:     http://git.kernel.org/tip/9cb6ce823bbd1adbe15e30bd1435c84c2e271767
Author:     Oleg Nesterov <oleg@redhat.com>
AuthorDate: Wed, 11 Mar 2015 18:34:49 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 23 Mar 2015 10:13:58 +0100

x86/fpu: Use restore_init_xstate() instead of math_state_restore() on kthread exec

Change flush_thread() to do user_fpu_begin() and restore_init_xstate()
instead of math_state_restore().

Note: "TODO: cleanup this horror" is still valid. We do not need
init_fpu() at all, we only need fpu_alloc() and memset(0). But this
needs other changes, in particular user_fpu_begin() should set
used_math().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150311173449.GE5032@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/process.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index dcaf4b0..6b05829 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -143,7 +143,8 @@ void flush_thread(void)
 		/* kthread execs. TODO: cleanup this horror. */
 		if (WARN_ON(init_fpu(current)))
 			force_sig(SIGKILL, current);
-		math_state_restore();
+		user_fpu_begin();
+		restore_init_xstate();
 	}
 }
 

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip:x86/fpu] x86/fpu: Don't abuse drop_init_fpu() in flush_thread()
  2015-03-13 17:30     ` [PATCH v2 " Oleg Nesterov
  2015-03-14 10:55       ` Borislav Petkov
@ 2015-03-23 12:21       ` tip-bot for Oleg Nesterov
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot for Oleg Nesterov @ 2015-03-23 12:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: priikone, dave.hansen, sbsiddha, luto, tglx, quentin.casasnovas,
	oleg, linux-kernel, fenghua.yu, torvalds, bp, mingo, hpa, riel

Commit-ID:  f893959b0898bd876673adbeb6798bdf25c034d7
Gitweb:     http://git.kernel.org/tip/f893959b0898bd876673adbeb6798bdf25c034d7
Author:     Oleg Nesterov <oleg@redhat.com>
AuthorDate: Fri, 13 Mar 2015 18:30:30 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 23 Mar 2015 10:13:58 +0100

x86/fpu: Don't abuse drop_init_fpu() in flush_thread()

flush_thread() -> drop_init_fpu() is suboptimal and confusing. It does
drop_fpu() or restore_init_xstate() depending on !use_eager_fpu(). But
flush_thread() too checks eagerfpu right after that, and if it is true
then restore_init_xstate() just burns CPU for no reason. We are going to
load init_xstate_buf again after we set used_math()/user_has_fpu(), until
then the FPU state can't survive after switch_to().

Remove it, and change the "if (!use_eager_fpu())" to call drop_fpu().
While at it, clean up the tsk/current usage.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150313173030.GA31217@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/process.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 6b05829..1d2ebad 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -132,17 +132,14 @@ void flush_thread(void)
 	flush_ptrace_hw_breakpoint(tsk);
 	memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array));
 
-	drop_init_fpu(tsk);
-	/*
-	 * Free the FPU state for non xsave platforms. They get reallocated
-	 * lazily at the first use.
-	 */
-	if (!use_eager_fpu())
+	if (!use_eager_fpu()) {
+		/* FPU state will be reallocated lazily at the first use. */
+		drop_fpu(tsk);
 		free_thread_xstate(tsk);
-	else if (!used_math()) {
+	} else if (!used_math()) {
 		/* kthread execs. TODO: cleanup this horror. */
-		if (WARN_ON(init_fpu(current)))
-			force_sig(SIGKILL, current);
+		if (WARN_ON(init_fpu(tsk)))
+			force_sig(SIGKILL, tsk);
 		user_fpu_begin();
 		restore_init_xstate();
 	}

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip:x86/fpu] x86/fpu: Don't allocate fpu->state for swapper/0
  2015-03-13 18:27   ` [PATCH 1/1] " Oleg Nesterov
  2015-03-16 10:18     ` Borislav Petkov
@ 2015-03-23 12:22     ` tip-bot for Oleg Nesterov
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot for Oleg Nesterov @ 2015-03-23 12:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: quentin.casasnovas, linux-kernel, riel, torvalds, priikone, oleg,
	bp, sbsiddha, luto, hpa, tglx, mingo, dave.hansen, fenghua.yu

Commit-ID:  4bd5bf8c85e6bca5be9e7c4b3d7ad1942ae323f3
Gitweb:     http://git.kernel.org/tip/4bd5bf8c85e6bca5be9e7c4b3d7ad1942ae323f3
Author:     Oleg Nesterov <oleg@redhat.com>
AuthorDate: Fri, 13 Mar 2015 19:27:16 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 23 Mar 2015 10:13:59 +0100

x86/fpu: Don't allocate fpu->state for swapper/0

Now that kthreads do not use FPU until they get executed, swapper/0
doesn't need to allocate fpu->state.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150313182716.GB8249@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/xsave.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index 65c29b0..ada8df7b 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -680,8 +680,6 @@ void xsave_init(void)
 
 static inline void __init eager_fpu_init_bp(void)
 {
-	current->thread.fpu.state =
-	    alloc_bootmem_align(xstate_size, __alignof__(struct xsave_struct));
 	if (!init_xstate_buf)
 		setup_init_fpu_buf();
 }

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* [tip:x86/fpu] x86/fpu: Kill eager_fpu_init_bp()
  2015-03-14 15:13       ` [PATCH 1/1] " Oleg Nesterov
  2015-03-16 12:44         ` Borislav Petkov
@ 2015-03-23 12:22         ` tip-bot for Oleg Nesterov
  1 sibling, 0 replies; 126+ messages in thread
From: tip-bot for Oleg Nesterov @ 2015-03-23 12:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: oleg, dave.hansen, torvalds, tglx, mingo, quentin.casasnovas,
	priikone, sbsiddha, hpa, fenghua.yu, linux-kernel, luto, riel,
	bp

Commit-ID:  7fc253e277ecf1ea57c2d670bdbcda3dffd19453
Gitweb:     http://git.kernel.org/tip/7fc253e277ecf1ea57c2d670bdbcda3dffd19453
Author:     Oleg Nesterov <oleg@redhat.com>
AuthorDate: Sat, 14 Mar 2015 16:13:34 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 23 Mar 2015 10:14:00 +0100

x86/fpu: Kill eager_fpu_init_bp()

Now that eager_fpu_init_bp() does setup_init_fpu_buf() only and
nothing else, we can remove it and move this code into its "caller",
eager_fpu_init().

This avoids the confusing games with "static __refdata void (*boot_func)":

init_xstate_buf can be NULL only during boot, so it is safe to call the
__init-annotated setup_init_fpu_buf() function in eager_fpu_init(), we
just need to mark it as __init_refok.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/20150314151334.GC13029@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/xsave.c | 20 +++++++-------------
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/xsave.c b/arch/x86/kernel/xsave.c
index ada8df7b..87a815b 100644
--- a/arch/x86/kernel/xsave.c
+++ b/arch/x86/kernel/xsave.c
@@ -678,16 +678,12 @@ void xsave_init(void)
 	this_func();
 }
 
-static inline void __init eager_fpu_init_bp(void)
-{
-	if (!init_xstate_buf)
-		setup_init_fpu_buf();
-}
-
-void eager_fpu_init(void)
+/*
+ * setup_init_fpu_buf() is __init and it is OK to call it here because
+ * init_xstate_buf will be unset only once during boot.
+ */
+void __init_refok eager_fpu_init(void)
 {
-	static __refdata void (*boot_func)(void) = eager_fpu_init_bp;
-
 	WARN_ON(used_math());
 	current_thread_info()->status = 0;
 
@@ -699,10 +695,8 @@ void eager_fpu_init(void)
 		return;
 	}
 
-	if (boot_func) {
-		boot_func();
-		boot_func = NULL;
-	}
+	if (!init_xstate_buf)
+		setup_init_fpu_buf();
 }
 
 /*

^ permalink raw reply related	[flat|nested] 126+ messages in thread

* RE: Oops with tip/x86/fpu
       [not found]       ` <20150305182203.GA4203@redhat.com>
  2015-03-05 18:34         ` Dave Hansen
  2015-03-05 18:41         ` Dave Hansen
@ 2015-03-26 22:37         ` Yu, Fenghua
  2015-03-26 22:43           ` Dave Hansen
  2015-03-27 19:06           ` Oleg Nesterov
  2 siblings, 2 replies; 126+ messages in thread
From: Yu, Fenghua @ 2015-03-26 22:37 UTC (permalink / raw)
  To: Oleg Nesterov, Borislav Petkov, Hansen, Dave
  Cc: Quentin Casasnovas, Andy Lutomirski, Ingo Molnar, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML

> From: Oleg Nesterov [mailto:oleg@redhat.com]
> Sent: Thursday, March 05, 2015 10:22 AM
> 
> On 03/05, Oleg Nesterov wrote:
> >
> 
> Does it trigger something else on your machine?
> 
> Oleg.
> 
> #include <stdio.h>
> #include <signal.h>
> #include <unistd.h>
> #include <ucontext.h>
> 
> void sighup(int sig, siginfo_t *info, void *ctxt) {
> 	struct ucontext *uctxt = ctxt;
> 	struct sigcontext *sctxt = (void*)&uctxt->uc_mcontext;
> 
> 	printf("SIGHUP! %p\n", sctxt->fpstate);
> 	sctxt->fpstate = (void *)1;

sctxt->fpstate=(void *)1 changes the fpstate pointer in the sigcontext. It will generate segfault and bad frame info in kernel.

This is expected behavior, right? Is this still a valid test?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: Oops with tip/x86/fpu
  2015-03-26 22:37         ` Yu, Fenghua
@ 2015-03-26 22:43           ` Dave Hansen
  2015-03-26 22:48             ` Yu, Fenghua
  2015-03-27 19:06           ` Oleg Nesterov
  1 sibling, 1 reply; 126+ messages in thread
From: Dave Hansen @ 2015-03-26 22:43 UTC (permalink / raw)
  To: Yu, Fenghua, Oleg Nesterov, Borislav Petkov
  Cc: Quentin Casasnovas, Andy Lutomirski, Ingo Molnar, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML

On 03/26/2015 03:37 PM, Yu, Fenghua wrote:
>> > void sighup(int sig, siginfo_t *info, void *ctxt) {
>> > 	struct ucontext *uctxt = ctxt;
>> > 	struct sigcontext *sctxt = (void*)&uctxt->uc_mcontext;
>> > 
>> > 	printf("SIGHUP! %p\n", sctxt->fpstate);
>> > 	sctxt->fpstate = (void *)1;
> sctxt->fpstate=(void *)1 changes the fpstate pointer in the sigcontext. It will generate segfault and bad frame info in kernel.
> 
> This is expected behavior, right? Is this still a valid test?

Just to be clear, I saw a full-on kernel panic induced from an
unprivileged application.

Are you seeing something different?

^ permalink raw reply	[flat|nested] 126+ messages in thread

* RE: Oops with tip/x86/fpu
  2015-03-26 22:43           ` Dave Hansen
@ 2015-03-26 22:48             ` Yu, Fenghua
  2015-03-27  7:30               ` Quentin Casasnovas
  0 siblings, 1 reply; 126+ messages in thread
From: Yu, Fenghua @ 2015-03-26 22:48 UTC (permalink / raw)
  To: Hansen, Dave, Oleg Nesterov, Borislav Petkov
  Cc: Quentin Casasnovas, Andy Lutomirski, Ingo Molnar, Linus Torvalds,
	Pekka Riikonen, Rik van Riel, Suresh Siddha, LKML

> From: Hansen, Dave
> Sent: Thursday, March 26, 2015 3:44 PM
> On 03/26/2015 03:37 PM, Yu, Fenghua wrote:
> >> > void sighup(int sig, siginfo_t *info, void *ctxt) {
> >> > 	struct ucontext *uctxt = ctxt;
> >> > 	struct sigcontext *sctxt = (void*)&uctxt->uc_mcontext;
> >> >
> >> > 	printf("SIGHUP! %p\n", sctxt->fpstate);
> >> > 	sctxt->fpstate = (void *)1;
> > sctxt->fpstate=(void *)1 changes the fpstate pointer in the sigcontext. It
> will generate segfault and bad frame info in kernel.
> >
> > This is expected behavior, right? Is this still a valid test?
> 
> Just to be clear, I saw a full-on kernel panic induced from an unprivileged
> application.
> 
> Are you seeing something different?

I use latest tip tree. Maybe it has the fixes already. I see "bad frame" reported in kernel. Seems the issue has been fixed in tip tree.

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: Oops with tip/x86/fpu
  2015-03-26 22:48             ` Yu, Fenghua
@ 2015-03-27  7:30               ` Quentin Casasnovas
  0 siblings, 0 replies; 126+ messages in thread
From: Quentin Casasnovas @ 2015-03-27  7:30 UTC (permalink / raw)
  To: Yu, Fenghua
  Cc: Hansen, Dave, Oleg Nesterov, Borislav Petkov, Quentin Casasnovas,
	Andy Lutomirski, Ingo Molnar, Linus Torvalds, Pekka Riikonen,
	Rik van Riel, Suresh Siddha, LKML

On Thu, Mar 26, 2015 at 10:48:18PM +0000, Yu, Fenghua wrote:
> > > sctxt->fpstate=(void *)1 changes the fpstate pointer in the
> > > sigcontext. It will generate segfault and bad frame info in kernel.
> > >
> > > This is expected behavior, right? Is this still a valid test?
> > 
> > Just to be clear, I saw a full-on kernel panic induced from an unprivileged
> > application.
> > 
> > Are you seeing something different?
> 
> I use latest tip tree. Maybe it has the fixes already. I see "bad frame"
> reported in kernel. Seems the issue has been fixed in tip tree.
> 

Fenghua, if you're interested, the details are now public here:

 http://seclists.org/oss-sec/2015/q1/877

Quentin

^ permalink raw reply	[flat|nested] 126+ messages in thread

* Re: Oops with tip/x86/fpu
  2015-03-26 22:37         ` Yu, Fenghua
  2015-03-26 22:43           ` Dave Hansen
@ 2015-03-27 19:06           ` Oleg Nesterov
  1 sibling, 0 replies; 126+ messages in thread
From: Oleg Nesterov @ 2015-03-27 19:06 UTC (permalink / raw)
  To: Yu, Fenghua
  Cc: Borislav Petkov, Hansen, Dave, Quentin Casasnovas,
	Andy Lutomirski, Ingo Molnar, Linus Torvalds, Pekka Riikonen,
	Rik van Riel, Suresh Siddha, LKML

On 03/26, Yu, Fenghua wrote:
>
> > On 03/05, Oleg Nesterov wrote:
> >
> > void sighup(int sig, siginfo_t *info, void *ctxt) {
> > 	struct ucontext *uctxt = ctxt;
> > 	struct sigcontext *sctxt = (void*)&uctxt->uc_mcontext;
> >
> > 	printf("SIGHUP! %p\n", sctxt->fpstate);
> > 	sctxt->fpstate = (void *)1;
>
> sctxt->fpstate=(void *)1 changes the fpstate pointer in the sigcontext.
> It will generate segfault and bad frame info in kernel.

Yes, but also it will trigger math_state_restore() without used_math().

This triggers 2 problems:

	1. "BUG: sleeping function called from invalid context ...".

	    Fixed by a7c80ebcac3068b1c3cb27d538d29558c30010c8

	2. On some machines this can lead to GPF. This is another FPU bug.
	   NOT FIXED yet. Because we all are busy with other problems ;)

	   And! this leads to kernel crash.

	   Fixed by 06c8173eb92bbfc03a0fe8bb64315857d0badd06

Oleg.


^ permalink raw reply	[flat|nested] 126+ messages in thread

end of thread, other threads:[~2015-03-27 19:08 UTC | newest]

Thread overview: 126+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-04 18:30 Oops with tip/x86/fpu Dave Hansen
2015-03-04 19:06 ` Oleg Nesterov
2015-03-04 19:12   ` Dave Hansen
2015-03-04 20:06   ` Borislav Petkov
2015-03-05 15:14     ` Oleg Nesterov
     [not found]       ` <20150305182203.GA4203@redhat.com>
2015-03-05 18:34         ` Dave Hansen
2015-03-05 18:46           ` Oleg Nesterov
2015-03-05 18:41         ` Dave Hansen
2015-03-26 22:37         ` Yu, Fenghua
2015-03-26 22:43           ` Dave Hansen
2015-03-26 22:48             ` Yu, Fenghua
2015-03-27  7:30               ` Quentin Casasnovas
2015-03-27 19:06           ` Oleg Nesterov
2015-03-05  8:38   ` Quentin Casasnovas
2015-03-05 15:13     ` Oleg Nesterov
2015-03-05 18:42       ` Borislav Petkov
2015-03-05 22:16         ` Dave Hansen
2015-03-05 19:51 ` [PATCH 0/1] x86/fpu: math_state_restore() should not blindly disable irqs Oleg Nesterov
2015-03-05 19:51   ` [PATCH 1/1] " Oleg Nesterov
2015-03-05 20:11     ` Ingo Molnar
2015-03-05 21:25       ` Oleg Nesterov
2015-03-06  7:58         ` Ingo Molnar
2015-03-06 13:26           ` Oleg Nesterov
2015-03-06 13:39             ` Oleg Nesterov
2015-03-06 13:46             ` Ingo Molnar
2015-03-06 14:01               ` Oleg Nesterov
2015-03-06 14:17                 ` Oleg Nesterov
2015-03-06 15:00                 ` David Vrabel
2015-03-06 15:36                   ` Oleg Nesterov
2015-03-06 16:15                     ` David Vrabel
2015-03-06 16:31                       ` Oleg Nesterov
2015-03-06 17:33           ` Linus Torvalds
2015-03-06 18:15             ` Oleg Nesterov
2015-03-06 19:23             ` Andy Lutomirski
2015-03-06 22:00               ` Linus Torvalds
2015-03-06 22:28                 ` Andy Lutomirski
2015-03-07 10:36                   ` Ingo Molnar
2015-03-07 20:11                     ` Linus Torvalds
2015-03-08  8:55                       ` Ingo Molnar
2015-03-08 11:38                         ` Ingo Molnar
2015-03-08 13:59                         ` Andy Lutomirski
2015-03-08 14:38                           ` Andy Lutomirski
2015-03-07 10:32             ` Ingo Molnar
2015-03-07 15:38   ` [PATCH 0/1] x86/fpu: x86/fpu: avoid math_state_restore() without used_math() in __restore_xstate_sig() Oleg Nesterov
2015-03-07 15:38     ` [PATCH 1/1] " Oleg Nesterov
2015-03-09 14:07       ` Borislav Petkov
2015-03-09 14:34         ` Oleg Nesterov
2015-03-09 15:18           ` Borislav Petkov
2015-03-09 16:24             ` Oleg Nesterov
2015-03-09 16:53               ` Borislav Petkov
2015-03-09 17:05                 ` Oleg Nesterov
2015-03-09 17:23                   ` Borislav Petkov
2015-03-16 12:07       ` [tip:x86/urgent] x86/fpu: Avoid " tip-bot for Oleg Nesterov
2015-03-05 20:35 ` [PATCH 0/1] x86/fpu: math_state_restore() should not blindly disable irqs Oleg Nesterov
2015-03-09 17:10 ` [PATCH] x86/fpu: drop_fpu() should not assume that tsk == current Oleg Nesterov
2015-03-09 17:36   ` Rik van Riel
2015-03-09 17:48   ` Borislav Petkov
2015-03-09 18:06     ` Oleg Nesterov
2015-03-09 18:10       ` Borislav Petkov
2015-03-16 12:07   ` [tip:x86/urgent] x86/fpu: Drop_fpu() should not assume that tsk equals current tip-bot for Oleg Nesterov
2015-03-11 17:33 ` [PATCH 0/4] x86/fpu: avoid math_state_restore() on kthread exec Oleg Nesterov
2015-03-11 17:34   ` [PATCH 1/4] x86/fpu: document user_fpu_begin() Oleg Nesterov
2015-03-13  9:47     ` Borislav Petkov
2015-03-13 14:34       ` Oleg Nesterov
2015-03-23 12:20     ` [tip:x86/fpu] x86/fpu: Document user_fpu_begin() tip-bot for Oleg Nesterov
2015-03-11 17:34   ` [PATCH 2/4] x86/fpu: introduce restore_init_xstate() Oleg Nesterov
2015-03-13 10:34     ` Borislav Petkov
2015-03-13 14:39       ` Oleg Nesterov
2015-03-13 15:20         ` Borislav Petkov
2015-03-16 19:05           ` Rik van Riel
2015-03-23 12:20     ` [tip:x86/fpu] x86/fpu: Introduce restore_init_xstate() tip-bot for Oleg Nesterov
2015-03-11 17:34   ` [PATCH 3/4] x86/fpu: use restore_init_xstate() instead of math_state_restore() on kthread exec Oleg Nesterov
2015-03-13 10:48     ` Borislav Petkov
2015-03-13 14:45       ` Oleg Nesterov
2015-03-13 15:51         ` Borislav Petkov
2015-03-23 12:21     ` [tip:x86/fpu] x86/fpu: Use " tip-bot for Oleg Nesterov
2015-03-11 17:35   ` [PATCH 4/4] x86/fpu: don't abuse drop_init_fpu() in flush_thread() Oleg Nesterov
2015-03-13 10:52     ` Borislav Petkov
2015-03-13 14:55       ` Oleg Nesterov
2015-03-13 16:19         ` Borislav Petkov
2015-03-13 16:26           ` Oleg Nesterov
2015-03-13 19:27             ` Borislav Petkov
2015-03-14 14:48               ` Oleg Nesterov
2015-03-15 17:36                 ` Borislav Petkov
2015-03-15 18:16                   ` Oleg Nesterov
2015-03-15 18:50                     ` Borislav Petkov
2015-03-15 20:04                       ` Oleg Nesterov
2015-03-15 20:38                         ` Borislav Petkov
2015-03-16  9:35                           ` Borislav Petkov
2015-03-16 10:28                             ` Ingo Molnar
2015-03-16 14:39                             ` Oleg Nesterov
2015-03-16 15:26                               ` Borislav Petkov
2015-03-16 15:34                             ` Andy Lutomirski
2015-03-16 15:35                               ` Borislav Petkov
2015-03-13 17:30     ` [PATCH v2 " Oleg Nesterov
2015-03-14 10:55       ` Borislav Petkov
2015-03-14 10:57         ` [PATCH] x86/fpu: Fold __drop_fpu() into its sole user Borislav Petkov
2015-03-14 15:15           ` Oleg Nesterov
2015-03-16 10:27           ` Ingo Molnar
2015-03-23 12:21       ` [tip:x86/fpu] x86/fpu: Don't abuse drop_init_fpu() in flush_thread() tip-bot for Oleg Nesterov
2015-03-13 18:26 ` [PATCH 0/1] x86/cpu: don't allocate fpu->state for swapper/0 Oleg Nesterov
2015-03-13 18:27   ` [PATCH 1/1] " Oleg Nesterov
2015-03-16 10:18     ` Borislav Petkov
2015-03-23 12:22     ` [tip:x86/fpu] x86/fpu: Don't " tip-bot for Oleg Nesterov
2015-03-14 11:16   ` [PATCH 0/1] x86/cpu: don't " Borislav Petkov
2015-03-14 15:13     ` [PATCH 0/1] x86/cpu: kill eager_fpu_init_bp() Oleg Nesterov
2015-03-14 15:13       ` [PATCH 1/1] " Oleg Nesterov
2015-03-16 12:44         ` Borislav Petkov
2015-03-23 12:22         ` [tip:x86/fpu] x86/fpu: Kill eager_fpu_init_bp() tip-bot for Oleg Nesterov
2015-03-15 16:49 ` [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user Oleg Nesterov
2015-03-15 16:50   ` [PATCH RFC 1/2] x86: introduce __user_insn() and __check_insn() Oleg Nesterov
2015-03-15 16:50   ` [PATCH RFC 2/2] x86/fpu: change xsave_user() and xrestore_user() to use __user_insn() Oleg Nesterov
2015-03-16 22:43     ` Quentin Casasnovas
2015-03-17  9:35       ` Borislav Petkov
2015-03-16 14:36   ` [PATCH RFC 0/2] x86/fpu: avoid "xstate_fault" in xsave_user/xrestore_user Borislav Petkov
2015-03-16 14:57     ` Oleg Nesterov
2015-03-16 17:58       ` Borislav Petkov
2015-03-16 22:37   ` Quentin Casasnovas
2015-03-17  9:47     ` Borislav Petkov
2015-03-17 10:00       ` Quentin Casasnovas
2015-03-17 11:20         ` Borislav Petkov
2015-03-17 11:36           ` Quentin Casasnovas
2015-03-17 12:07             ` Borislav Petkov
2015-03-18  9:06               ` Quentin Casasnovas
2015-03-18  9:53                 ` Borislav Petkov
2015-03-17 10:07       ` Quentin Casasnovas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.