All of lore.kernel.org
 help / color / mirror / Atom feed
* [PULL v8] KVM: arm64: Optimise FPSIMD context switching
@ 2018-05-16 10:49 ` Dave Martin
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Martin @ 2018-05-16 10:49 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Christoffer Dall, linux-arm-kernel, kvmarm

Hi Marc,

This is a trivial update to the previously posted v7 [1].  The only
changes are a couple of minor cosmetic changes requested by reviewers,
on-list and the addition of Acked-by/Reviewed-by tags received since the
series was posted.

Let me know if you need anything else on this.

Cheers
---Dave

[1] [PATCH v7 00/16] KVM: arm64: Optimise FPSIMD context switching
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/576595.html


The following changes since commit 6da6c0db5316275015e8cc2959f12a17584aeb64:

  Linux v4.17-rc3 (2018-04-29 14:17:42 -0700)

are available in the git repository at:

  git://linux-arm.org/linux-dm.git 

for you to fetch changes up to 90255dd95fe71a9bd1c4ce728abad4b0eec240d9:

  KVM: arm64: Invoke FPSIMD context switch trap from C (2018-05-16 10:44:00 +0100)

----------------------------------------------------------------
Christoffer Dall (1):
      KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change

Dave Martin (15):
      thread_info: Add update_thread_flag() helpers
      arm64: Use update{,_tsk}_thread_flag()
      KVM: arm64: Convert lazy FPSIMD context switch trap to C
      arm64: fpsimd: Generalise context saving for non-task contexts
      arm64/sve: Refactor user SVE trap maintenance for external use
      KVM: arm64: Repurpose vcpu_arch.debug_flags for general-purpose flags
      KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
      arm64/sve: Move read_zcr_features() out of cpufeature.h
      arm64/sve: Switch sve_pffr() argument from task to thread
      arm64/sve: Move sve_pffr() to fpsimd.h and make inline
      KVM: arm64: Save host SVE context as appropriate
      KVM: arm64: Remove eager host SVE state saving
      KVM: arm64: Remove redundant *exit_code changes in fpsimd_guest_exit()
      KVM: arm64: Fold redundant exit code checks out of fixup_guest_exit()
      KVM: arm64: Invoke FPSIMD context switch trap from C

 arch/arm/include/asm/kvm_host.h     |   9 ++-
 arch/arm64/Kconfig                  |   7 ++
 arch/arm64/include/asm/cpufeature.h |  29 -------
 arch/arm64/include/asm/fpsimd.h     |  21 ++++++
 arch/arm64/include/asm/kvm_asm.h    |   3 -
 arch/arm64/include/asm/kvm_host.h   |  32 +++++---
 arch/arm64/include/asm/processor.h  |   2 +
 arch/arm64/kernel/fpsimd.c          | 147 +++++++++++++++++++-----------------
 arch/arm64/kernel/ptrace.c          |   1 +
 arch/arm64/kvm/Kconfig              |   1 +
 arch/arm64/kvm/Makefile             |   2 +-
 arch/arm64/kvm/debug.c              |   8 +-
 arch/arm64/kvm/fpsimd.c             | 111 +++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/debug-sr.c       |   6 +-
 arch/arm64/kvm/hyp/entry.S          |  43 -----------
 arch/arm64/kvm/hyp/hyp-entry.S      |  19 -----
 arch/arm64/kvm/hyp/switch.c         | 124 ++++++++++++++++++++----------
 arch/arm64/kvm/hyp/sysreg-sr.c      |   4 +-
 arch/arm64/kvm/sys_regs.c           |   9 +--
 include/linux/kvm_host.h            |   9 +++
 include/linux/sched.h               |   6 ++
 include/linux/thread_info.h         |  11 +++
 virt/kvm/Kconfig                    |   3 +
 virt/kvm/arm/arm.c                  |  25 +++++-
 virt/kvm/kvm_main.c                 |   7 +-
 25 files changed, 406 insertions(+), 233 deletions(-)
 create mode 100644 arch/arm64/kvm/fpsimd.c

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PULL v8] KVM: arm64: Optimise FPSIMD context switching
@ 2018-05-16 10:49 ` Dave Martin
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Martin @ 2018-05-16 10:49 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,

This is a trivial update to the previously posted v7 [1].  The only
changes are a couple of minor cosmetic changes requested by reviewers,
on-list and the addition of Acked-by/Reviewed-by tags received since the
series was posted.

Let me know if you need anything else on this.

Cheers
---Dave

[1] [PATCH v7 00/16] KVM: arm64: Optimise FPSIMD context switching
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/576595.html


The following changes since commit 6da6c0db5316275015e8cc2959f12a17584aeb64:

  Linux v4.17-rc3 (2018-04-29 14:17:42 -0700)

are available in the git repository at:

  git://linux-arm.org/linux-dm.git 

for you to fetch changes up to 90255dd95fe71a9bd1c4ce728abad4b0eec240d9:

  KVM: arm64: Invoke FPSIMD context switch trap from C (2018-05-16 10:44:00 +0100)

----------------------------------------------------------------
Christoffer Dall (1):
      KVM: arm/arm64: Introduce kvm_arch_vcpu_run_pid_change

Dave Martin (15):
      thread_info: Add update_thread_flag() helpers
      arm64: Use update{,_tsk}_thread_flag()
      KVM: arm64: Convert lazy FPSIMD context switch trap to C
      arm64: fpsimd: Generalise context saving for non-task contexts
      arm64/sve: Refactor user SVE trap maintenance for external use
      KVM: arm64: Repurpose vcpu_arch.debug_flags for general-purpose flags
      KVM: arm64: Optimise FPSIMD handling to reduce guest/host thrashing
      arm64/sve: Move read_zcr_features() out of cpufeature.h
      arm64/sve: Switch sve_pffr() argument from task to thread
      arm64/sve: Move sve_pffr() to fpsimd.h and make inline
      KVM: arm64: Save host SVE context as appropriate
      KVM: arm64: Remove eager host SVE state saving
      KVM: arm64: Remove redundant *exit_code changes in fpsimd_guest_exit()
      KVM: arm64: Fold redundant exit code checks out of fixup_guest_exit()
      KVM: arm64: Invoke FPSIMD context switch trap from C

 arch/arm/include/asm/kvm_host.h     |   9 ++-
 arch/arm64/Kconfig                  |   7 ++
 arch/arm64/include/asm/cpufeature.h |  29 -------
 arch/arm64/include/asm/fpsimd.h     |  21 ++++++
 arch/arm64/include/asm/kvm_asm.h    |   3 -
 arch/arm64/include/asm/kvm_host.h   |  32 +++++---
 arch/arm64/include/asm/processor.h  |   2 +
 arch/arm64/kernel/fpsimd.c          | 147 +++++++++++++++++++-----------------
 arch/arm64/kernel/ptrace.c          |   1 +
 arch/arm64/kvm/Kconfig              |   1 +
 arch/arm64/kvm/Makefile             |   2 +-
 arch/arm64/kvm/debug.c              |   8 +-
 arch/arm64/kvm/fpsimd.c             | 111 +++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/debug-sr.c       |   6 +-
 arch/arm64/kvm/hyp/entry.S          |  43 -----------
 arch/arm64/kvm/hyp/hyp-entry.S      |  19 -----
 arch/arm64/kvm/hyp/switch.c         | 124 ++++++++++++++++++++----------
 arch/arm64/kvm/hyp/sysreg-sr.c      |   4 +-
 arch/arm64/kvm/sys_regs.c           |   9 +--
 include/linux/kvm_host.h            |   9 +++
 include/linux/sched.h               |   6 ++
 include/linux/thread_info.h         |  11 +++
 virt/kvm/Kconfig                    |   3 +
 virt/kvm/arm/arm.c                  |  25 +++++-
 virt/kvm/kvm_main.c                 |   7 +-
 25 files changed, 406 insertions(+), 233 deletions(-)
 create mode 100644 arch/arm64/kvm/fpsimd.c

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PULL v8] KVM: arm64: Optimise FPSIMD context switching
  2018-05-16 10:49 ` Dave Martin
@ 2018-05-20 13:14   ` Marc Zyngier
  -1 siblings, 0 replies; 8+ messages in thread
From: Marc Zyngier @ 2018-05-20 13:14 UTC (permalink / raw)
  To: Dave Martin; +Cc: Christoffer Dall, linux-arm-kernel, kvmarm

On Wed, 16 May 2018 11:49:42 +0100
Dave Martin <Dave.Martin@arm.com> wrote:

Hi Dave,

> Hi Marc,
> 
> This is a trivial update to the previously posted v7 [1].  The only
> changes are a couple of minor cosmetic changes requested by reviewers,
> on-list and the addition of Acked-by/Reviewed-by tags received since the
> series was posted.
> 
> Let me know if you need anything else on this.

So I've taken this, merged in Linus' top of tree, started a guest on a
dual A53 board, and immediately hit the following:

root@sy-borg:~# [  287.226184] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[  287.231672] Mem abort info:
[  287.234537]   ESR = 0x96000044
[  287.237674]   Exception class = DABT (current EL), IL = 32 bits
[  287.243765]   SET = 0, FnV = 0
[  287.246900]   EA = 0, S1PTW = 0
[  287.250126] Data abort info:
[  287.253083]   ISV = 0, ISS = 0x00000044
[  287.257025]   CM = 0, WnR = 1
[  287.260076] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000b8483f75
[  287.266882] [0000000000000000] pgd=0000000000000000
[  287.271903] Internal error: Oops: 96000044 [#1] PREEMPT SMP
[  287.277636] Modules linked in:
[  287.280776] CPU: 1 PID: 3098 Comm: kworker/u4:3 Not tainted 4.17.0-rc5-00166-gd84e81cca249 #136
[  287.289730] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
[  287.296364] pstate: 40000085 (nZcv daIf -PAN -UAO)
[  287.301301] pc : fpsimd_save_state+0x0/0x54
[  287.305595] lr : fpsimd_save+0x50/0x100
[  287.309531] sp : ffff00000dde3af0
[  287.312936] x29: ffff00000dde3af0 x28: ffff000008cd565c 
[  287.318401] x27: ffff800078ee9c80 x26: ffff80007b207628 
[  287.323867] x25: ffff0000093f9000 x24: 0000000000000001 
[  287.329333] x23: ffff0000093d4000 x22: ffff80007b207000 
[  287.334798] x21: ffff80007efd7d80 x20: ffff80007b207000 
[  287.340264] x19: 0000000000000000 x18: 0000000000040f0b 
[  287.345729] x17: 0000ffffb70752b8 x16: 0000ffffb708e008 
[  287.351195] x15: 0000000000000000 x14: 0000000000000400 
[  287.356661] x13: 0000000000000001 x12: 0000000000000001 
[  287.362127] x11: 0000000000000001 x10: 0000000000000000 
[  287.367592] x9 : 0000000000000253 x8 : ffff80007b207200 
[  287.373057] x7 : ffff80007b207100 x6 : ffff80007c378f18 
[  287.378523] x5 : 00000042c2094c00 x4 : 0000000000000000 
[  287.383990] x3 : 00000042e0033450 x2 : 0000000000000000 
[  287.389454] x1 : 0000800075bf6000 x0 : 0000000000000000 
[  287.394922] Process kworker/u4:3 (pid: 3098, stack limit = 0x00000000ca0dd8c6)
[  287.402358] Call trace:
[  287.404873]  fpsimd_save_state+0x0/0x54
[  287.408813]  fpsimd_thread_switch+0x28/0xa0
[  287.413114]  __switch_to+0x1c/0xd0
[  287.416609]  __schedule+0x1b8/0x730
[  287.420191]  preempt_schedule_common+0x24/0x48
[  287.424760]  preempt_schedule.part.23+0x1c/0x28
[  287.429419]  preempt_schedule+0x1c/0x28
[  287.433363]  _raw_spin_unlock+0x34/0x48
[  287.437308]  flush_old_exec+0x45c/0x6a0
[  287.441250]  load_elf_binary+0x324/0x1198
[  287.445372]  search_binary_handler+0xac/0x230
[  287.449851]  do_execveat_common.isra.14+0x508/0x6e0
[  287.454867]  do_execve+0x28/0x30
[  287.458185]  call_usermodehelper_exec_async+0xdc/0x140
[  287.463468]  ret_from_fork+0x10/0x18
[  287.467143] Code: a9425bf5 a8c37bfd d65f03c0 d65f03c0 (ad000400) 
[  287.473414] ---[ end trace c4346b99cc877f8e ]---

It happened just after having loaded the guest kernel, so I presume
we're missing some kind of initialization. I couldn't subsequently
reproduce it  on the same machine, and the same kernel is doing
absolutely fine on a Seattle box.

I can't immediately see how st would be NULL, unless we somehow are
missing some state tracking somewhere...

Any idea?

	M.
-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PULL v8] KVM: arm64: Optimise FPSIMD context switching
@ 2018-05-20 13:14   ` Marc Zyngier
  0 siblings, 0 replies; 8+ messages in thread
From: Marc Zyngier @ 2018-05-20 13:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 16 May 2018 11:49:42 +0100
Dave Martin <Dave.Martin@arm.com> wrote:

Hi Dave,

> Hi Marc,
> 
> This is a trivial update to the previously posted v7 [1].  The only
> changes are a couple of minor cosmetic changes requested by reviewers,
> on-list and the addition of Acked-by/Reviewed-by tags received since the
> series was posted.
> 
> Let me know if you need anything else on this.

So I've taken this, merged in Linus' top of tree, started a guest on a
dual A53 board, and immediately hit the following:

root at sy-borg:~# [  287.226184] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[  287.231672] Mem abort info:
[  287.234537]   ESR = 0x96000044
[  287.237674]   Exception class = DABT (current EL), IL = 32 bits
[  287.243765]   SET = 0, FnV = 0
[  287.246900]   EA = 0, S1PTW = 0
[  287.250126] Data abort info:
[  287.253083]   ISV = 0, ISS = 0x00000044
[  287.257025]   CM = 0, WnR = 1
[  287.260076] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000b8483f75
[  287.266882] [0000000000000000] pgd=0000000000000000
[  287.271903] Internal error: Oops: 96000044 [#1] PREEMPT SMP
[  287.277636] Modules linked in:
[  287.280776] CPU: 1 PID: 3098 Comm: kworker/u4:3 Not tainted 4.17.0-rc5-00166-gd84e81cca249 #136
[  287.289730] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
[  287.296364] pstate: 40000085 (nZcv daIf -PAN -UAO)
[  287.301301] pc : fpsimd_save_state+0x0/0x54
[  287.305595] lr : fpsimd_save+0x50/0x100
[  287.309531] sp : ffff00000dde3af0
[  287.312936] x29: ffff00000dde3af0 x28: ffff000008cd565c 
[  287.318401] x27: ffff800078ee9c80 x26: ffff80007b207628 
[  287.323867] x25: ffff0000093f9000 x24: 0000000000000001 
[  287.329333] x23: ffff0000093d4000 x22: ffff80007b207000 
[  287.334798] x21: ffff80007efd7d80 x20: ffff80007b207000 
[  287.340264] x19: 0000000000000000 x18: 0000000000040f0b 
[  287.345729] x17: 0000ffffb70752b8 x16: 0000ffffb708e008 
[  287.351195] x15: 0000000000000000 x14: 0000000000000400 
[  287.356661] x13: 0000000000000001 x12: 0000000000000001 
[  287.362127] x11: 0000000000000001 x10: 0000000000000000 
[  287.367592] x9 : 0000000000000253 x8 : ffff80007b207200 
[  287.373057] x7 : ffff80007b207100 x6 : ffff80007c378f18 
[  287.378523] x5 : 00000042c2094c00 x4 : 0000000000000000 
[  287.383990] x3 : 00000042e0033450 x2 : 0000000000000000 
[  287.389454] x1 : 0000800075bf6000 x0 : 0000000000000000 
[  287.394922] Process kworker/u4:3 (pid: 3098, stack limit = 0x00000000ca0dd8c6)
[  287.402358] Call trace:
[  287.404873]  fpsimd_save_state+0x0/0x54
[  287.408813]  fpsimd_thread_switch+0x28/0xa0
[  287.413114]  __switch_to+0x1c/0xd0
[  287.416609]  __schedule+0x1b8/0x730
[  287.420191]  preempt_schedule_common+0x24/0x48
[  287.424760]  preempt_schedule.part.23+0x1c/0x28
[  287.429419]  preempt_schedule+0x1c/0x28
[  287.433363]  _raw_spin_unlock+0x34/0x48
[  287.437308]  flush_old_exec+0x45c/0x6a0
[  287.441250]  load_elf_binary+0x324/0x1198
[  287.445372]  search_binary_handler+0xac/0x230
[  287.449851]  do_execveat_common.isra.14+0x508/0x6e0
[  287.454867]  do_execve+0x28/0x30
[  287.458185]  call_usermodehelper_exec_async+0xdc/0x140
[  287.463468]  ret_from_fork+0x10/0x18
[  287.467143] Code: a9425bf5 a8c37bfd d65f03c0 d65f03c0 (ad000400) 
[  287.473414] ---[ end trace c4346b99cc877f8e ]---

It happened just after having loaded the guest kernel, so I presume
we're missing some kind of initialization. I couldn't subsequently
reproduce it  on the same machine, and the same kernel is doing
absolutely fine on a Seattle box.

I can't immediately see how st would be NULL, unless we somehow are
missing some state tracking somewhere...

Any idea?

	M.
-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PULL v8] KVM: arm64: Optimise FPSIMD context switching
  2018-05-20 13:14   ` Marc Zyngier
@ 2018-05-21 10:02     ` Dave Martin
  -1 siblings, 0 replies; 8+ messages in thread
From: Dave Martin @ 2018-05-21 10:02 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Christoffer Dall, linux-arm-kernel, kvmarm

On Sun, May 20, 2018 at 02:14:41PM +0100, Marc Zyngier wrote:
> On Wed, 16 May 2018 11:49:42 +0100
> Dave Martin <Dave.Martin@arm.com> wrote:
> 
> Hi Dave,
> 
> > Hi Marc,
> > 
> > This is a trivial update to the previously posted v7 [1].  The only
> > changes are a couple of minor cosmetic changes requested by reviewers,
> > on-list and the addition of Acked-by/Reviewed-by tags received since the
> > series was posted.
> > 
> > Let me know if you need anything else on this.
> 
> So I've taken this, merged in Linus' top of tree, started a guest on a
> dual A53 board, and immediately hit the following:
> 
> root@sy-borg:~# [  287.226184] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [  287.231672] Mem abort info:
> [  287.234537]   ESR = 0x96000044
> [  287.237674]   Exception class = DABT (current EL), IL = 32 bits
> [  287.243765]   SET = 0, FnV = 0
> [  287.246900]   EA = 0, S1PTW = 0
> [  287.250126] Data abort info:
> [  287.253083]   ISV = 0, ISS = 0x00000044
> [  287.257025]   CM = 0, WnR = 1
> [  287.260076] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000b8483f75
> [  287.266882] [0000000000000000] pgd=0000000000000000
> [  287.271903] Internal error: Oops: 96000044 [#1] PREEMPT SMP
> [  287.277636] Modules linked in:
> [  287.280776] CPU: 1 PID: 3098 Comm: kworker/u4:3 Not tainted 4.17.0-rc5-00166-gd84e81cca249 #136
> [  287.289730] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
> [  287.296364] pstate: 40000085 (nZcv daIf -PAN -UAO)
> [  287.301301] pc : fpsimd_save_state+0x0/0x54
> [  287.305595] lr : fpsimd_save+0x50/0x100
> [  287.309531] sp : ffff00000dde3af0
> [  287.312936] x29: ffff00000dde3af0 x28: ffff000008cd565c 
> [  287.318401] x27: ffff800078ee9c80 x26: ffff80007b207628 
> [  287.323867] x25: ffff0000093f9000 x24: 0000000000000001 
> [  287.329333] x23: ffff0000093d4000 x22: ffff80007b207000 
> [  287.334798] x21: ffff80007efd7d80 x20: ffff80007b207000 
> [  287.340264] x19: 0000000000000000 x18: 0000000000040f0b 
> [  287.345729] x17: 0000ffffb70752b8 x16: 0000ffffb708e008 
> [  287.351195] x15: 0000000000000000 x14: 0000000000000400 
> [  287.356661] x13: 0000000000000001 x12: 0000000000000001 
> [  287.362127] x11: 0000000000000001 x10: 0000000000000000 
> [  287.367592] x9 : 0000000000000253 x8 : ffff80007b207200 
> [  287.373057] x7 : ffff80007b207100 x6 : ffff80007c378f18 
> [  287.378523] x5 : 00000042c2094c00 x4 : 0000000000000000 
> [  287.383990] x3 : 00000042e0033450 x2 : 0000000000000000 
> [  287.389454] x1 : 0000800075bf6000 x0 : 0000000000000000 
> [  287.394922] Process kworker/u4:3 (pid: 3098, stack limit = 0x00000000ca0dd8c6)
> [  287.402358] Call trace:
> [  287.404873]  fpsimd_save_state+0x0/0x54
> [  287.408813]  fpsimd_thread_switch+0x28/0xa0
> [  287.413114]  __switch_to+0x1c/0xd0
> [  287.416609]  __schedule+0x1b8/0x730
> [  287.420191]  preempt_schedule_common+0x24/0x48
> [  287.424760]  preempt_schedule.part.23+0x1c/0x28
> [  287.429419]  preempt_schedule+0x1c/0x28
> [  287.433363]  _raw_spin_unlock+0x34/0x48
> [  287.437308]  flush_old_exec+0x45c/0x6a0
> [  287.441250]  load_elf_binary+0x324/0x1198
> [  287.445372]  search_binary_handler+0xac/0x230
> [  287.449851]  do_execveat_common.isra.14+0x508/0x6e0
> [  287.454867]  do_execve+0x28/0x30
> [  287.458185]  call_usermodehelper_exec_async+0xdc/0x140
> [  287.463468]  ret_from_fork+0x10/0x18
> [  287.467143] Code: a9425bf5 a8c37bfd d65f03c0 d65f03c0 (ad000400) 
> [  287.473414] ---[ end trace c4346b99cc877f8e ]---
> 
> It happened just after having loaded the guest kernel, so I presume
> we're missing some kind of initialization. I couldn't subsequently
> reproduce it  on the same machine, and the same kernel is doing
> absolutely fine on a Seattle box.

Curious, this is a kernel thread, which shouldn't have any fpsimd state
of its own.  It would be interesting to know what ran before, but tricky
to find that out now unless we can find a way to repoduce...

Maybe a thread flag has ended up in the wrong state.

> I can't immediately see how st would be NULL, unless we somehow are
> missing some state tracking somewhere...

This might be a race, or a missing piece of logic somewhere... I'll
take a look.


Cheers
---Dave

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PULL v8] KVM: arm64: Optimise FPSIMD context switching
@ 2018-05-21 10:02     ` Dave Martin
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Martin @ 2018-05-21 10:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, May 20, 2018 at 02:14:41PM +0100, Marc Zyngier wrote:
> On Wed, 16 May 2018 11:49:42 +0100
> Dave Martin <Dave.Martin@arm.com> wrote:
> 
> Hi Dave,
> 
> > Hi Marc,
> > 
> > This is a trivial update to the previously posted v7 [1].  The only
> > changes are a couple of minor cosmetic changes requested by reviewers,
> > on-list and the addition of Acked-by/Reviewed-by tags received since the
> > series was posted.
> > 
> > Let me know if you need anything else on this.
> 
> So I've taken this, merged in Linus' top of tree, started a guest on a
> dual A53 board, and immediately hit the following:
> 
> root at sy-borg:~# [  287.226184] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [  287.231672] Mem abort info:
> [  287.234537]   ESR = 0x96000044
> [  287.237674]   Exception class = DABT (current EL), IL = 32 bits
> [  287.243765]   SET = 0, FnV = 0
> [  287.246900]   EA = 0, S1PTW = 0
> [  287.250126] Data abort info:
> [  287.253083]   ISV = 0, ISS = 0x00000044
> [  287.257025]   CM = 0, WnR = 1
> [  287.260076] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000b8483f75
> [  287.266882] [0000000000000000] pgd=0000000000000000
> [  287.271903] Internal error: Oops: 96000044 [#1] PREEMPT SMP
> [  287.277636] Modules linked in:
> [  287.280776] CPU: 1 PID: 3098 Comm: kworker/u4:3 Not tainted 4.17.0-rc5-00166-gd84e81cca249 #136
> [  287.289730] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
> [  287.296364] pstate: 40000085 (nZcv daIf -PAN -UAO)
> [  287.301301] pc : fpsimd_save_state+0x0/0x54
> [  287.305595] lr : fpsimd_save+0x50/0x100
> [  287.309531] sp : ffff00000dde3af0
> [  287.312936] x29: ffff00000dde3af0 x28: ffff000008cd565c 
> [  287.318401] x27: ffff800078ee9c80 x26: ffff80007b207628 
> [  287.323867] x25: ffff0000093f9000 x24: 0000000000000001 
> [  287.329333] x23: ffff0000093d4000 x22: ffff80007b207000 
> [  287.334798] x21: ffff80007efd7d80 x20: ffff80007b207000 
> [  287.340264] x19: 0000000000000000 x18: 0000000000040f0b 
> [  287.345729] x17: 0000ffffb70752b8 x16: 0000ffffb708e008 
> [  287.351195] x15: 0000000000000000 x14: 0000000000000400 
> [  287.356661] x13: 0000000000000001 x12: 0000000000000001 
> [  287.362127] x11: 0000000000000001 x10: 0000000000000000 
> [  287.367592] x9 : 0000000000000253 x8 : ffff80007b207200 
> [  287.373057] x7 : ffff80007b207100 x6 : ffff80007c378f18 
> [  287.378523] x5 : 00000042c2094c00 x4 : 0000000000000000 
> [  287.383990] x3 : 00000042e0033450 x2 : 0000000000000000 
> [  287.389454] x1 : 0000800075bf6000 x0 : 0000000000000000 
> [  287.394922] Process kworker/u4:3 (pid: 3098, stack limit = 0x00000000ca0dd8c6)
> [  287.402358] Call trace:
> [  287.404873]  fpsimd_save_state+0x0/0x54
> [  287.408813]  fpsimd_thread_switch+0x28/0xa0
> [  287.413114]  __switch_to+0x1c/0xd0
> [  287.416609]  __schedule+0x1b8/0x730
> [  287.420191]  preempt_schedule_common+0x24/0x48
> [  287.424760]  preempt_schedule.part.23+0x1c/0x28
> [  287.429419]  preempt_schedule+0x1c/0x28
> [  287.433363]  _raw_spin_unlock+0x34/0x48
> [  287.437308]  flush_old_exec+0x45c/0x6a0
> [  287.441250]  load_elf_binary+0x324/0x1198
> [  287.445372]  search_binary_handler+0xac/0x230
> [  287.449851]  do_execveat_common.isra.14+0x508/0x6e0
> [  287.454867]  do_execve+0x28/0x30
> [  287.458185]  call_usermodehelper_exec_async+0xdc/0x140
> [  287.463468]  ret_from_fork+0x10/0x18
> [  287.467143] Code: a9425bf5 a8c37bfd d65f03c0 d65f03c0 (ad000400) 
> [  287.473414] ---[ end trace c4346b99cc877f8e ]---
> 
> It happened just after having loaded the guest kernel, so I presume
> we're missing some kind of initialization. I couldn't subsequently
> reproduce it  on the same machine, and the same kernel is doing
> absolutely fine on a Seattle box.

Curious, this is a kernel thread, which shouldn't have any fpsimd state
of its own.  It would be interesting to know what ran before, but tricky
to find that out now unless we can find a way to repoduce...

Maybe a thread flag has ended up in the wrong state.

> I can't immediately see how st would be NULL, unless we somehow are
> missing some state tracking somewhere...

This might be a race, or a missing piece of logic somewhere... I'll
take a look.


Cheers
---Dave

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PULL v8] KVM: arm64: Optimise FPSIMD context switching
  2018-05-20 13:14   ` Marc Zyngier
@ 2018-05-22 12:04     ` Dave Martin
  -1 siblings, 0 replies; 8+ messages in thread
From: Dave Martin @ 2018-05-22 12:04 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Christoffer Dall, linux-arm-kernel, kvmarm

On Sun, May 20, 2018 at 02:14:41PM +0100, Marc Zyngier wrote:
> On Wed, 16 May 2018 11:49:42 +0100
> Dave Martin <Dave.Martin@arm.com> wrote:
> 
> Hi Dave,
> 
> > Hi Marc,
> > 
> > This is a trivial update to the previously posted v7 [1].  The only
> > changes are a couple of minor cosmetic changes requested by reviewers,
> > on-list and the addition of Acked-by/Reviewed-by tags received since the
> > series was posted.
> > 
> > Let me know if you need anything else on this.
> 
> So I've taken this, merged in Linus' top of tree, started a guest on a
> dual A53 board, and immediately hit the following:
> 
> root@sy-borg:~# [  287.226184] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [  287.231672] Mem abort info:
> [  287.234537]   ESR = 0x96000044
> [  287.237674]   Exception class = DABT (current EL), IL = 32 bits
> [  287.243765]   SET = 0, FnV = 0
> [  287.246900]   EA = 0, S1PTW = 0
> [  287.250126] Data abort info:
> [  287.253083]   ISV = 0, ISS = 0x00000044
> [  287.257025]   CM = 0, WnR = 1
> [  287.260076] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000b8483f75
> [  287.266882] [0000000000000000] pgd=0000000000000000
> [  287.271903] Internal error: Oops: 96000044 [#1] PREEMPT SMP
> [  287.277636] Modules linked in:
> [  287.280776] CPU: 1 PID: 3098 Comm: kworker/u4:3 Not tainted 4.17.0-rc5-00166-gd84e81cca249 #136
> [  287.289730] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
> [  287.296364] pstate: 40000085 (nZcv daIf -PAN -UAO)
> [  287.301301] pc : fpsimd_save_state+0x0/0x54
> [  287.305595] lr : fpsimd_save+0x50/0x100
> [  287.309531] sp : ffff00000dde3af0
> [  287.312936] x29: ffff00000dde3af0 x28: ffff000008cd565c 
> [  287.318401] x27: ffff800078ee9c80 x26: ffff80007b207628 
> [  287.323867] x25: ffff0000093f9000 x24: 0000000000000001 
> [  287.329333] x23: ffff0000093d4000 x22: ffff80007b207000 
> [  287.334798] x21: ffff80007efd7d80 x20: ffff80007b207000 
> [  287.340264] x19: 0000000000000000 x18: 0000000000040f0b 
> [  287.345729] x17: 0000ffffb70752b8 x16: 0000ffffb708e008 
> [  287.351195] x15: 0000000000000000 x14: 0000000000000400 
> [  287.356661] x13: 0000000000000001 x12: 0000000000000001 
> [  287.362127] x11: 0000000000000001 x10: 0000000000000000 
> [  287.367592] x9 : 0000000000000253 x8 : ffff80007b207200 
> [  287.373057] x7 : ffff80007b207100 x6 : ffff80007c378f18 
> [  287.378523] x5 : 00000042c2094c00 x4 : 0000000000000000 
> [  287.383990] x3 : 00000042e0033450 x2 : 0000000000000000 
> [  287.389454] x1 : 0000800075bf6000 x0 : 0000000000000000 
> [  287.394922] Process kworker/u4:3 (pid: 3098, stack limit = 0x00000000ca0dd8c6)
> [  287.402358] Call trace:
> [  287.404873]  fpsimd_save_state+0x0/0x54
> [  287.408813]  fpsimd_thread_switch+0x28/0xa0
> [  287.413114]  __switch_to+0x1c/0xd0
> [  287.416609]  __schedule+0x1b8/0x730
> [  287.420191]  preempt_schedule_common+0x24/0x48
> [  287.424760]  preempt_schedule.part.23+0x1c/0x28
> [  287.429419]  preempt_schedule+0x1c/0x28
> [  287.433363]  _raw_spin_unlock+0x34/0x48
> [  287.437308]  flush_old_exec+0x45c/0x6a0
> [  287.441250]  load_elf_binary+0x324/0x1198
> [  287.445372]  search_binary_handler+0xac/0x230
> [  287.449851]  do_execveat_common.isra.14+0x508/0x6e0
> [  287.454867]  do_execve+0x28/0x30
> [  287.458185]  call_usermodehelper_exec_async+0xdc/0x140
> [  287.463468]  ret_from_fork+0x10/0x18
> [  287.467143] Code: a9425bf5 a8c37bfd d65f03c0 d65f03c0 (ad000400) 
> [  287.473414] ---[ end trace c4346b99cc877f8e ]---
> 
> It happened just after having loaded the guest kernel, so I presume
> we're missing some kind of initialization. I couldn't subsequently
> reproduce it  on the same machine, and the same kernel is doing
> absolutely fine on a Seattle box.
> 
> I can't immediately see how st would be NULL, unless we somehow are
> missing some state tracking somewhere...
> 
> Any idea?

I have a reproducer now for something that resembles the above bug.
(Build test.c separately and run it from the target.)

In order for this particular bug to happen we need some kernel thread
or softirq in a kernel thread to use kernel-mode NEON (thus setting
fpsimd_last_state.st to NULL but leaving TIF_FOREIGN_FPSTATE clear),
then a kernel thread with no mm is scheduled in on the same cpu, then
that thread gans an mm and get scheduled out.

I don't force-affine anything in this test, but on an idle system it
seems to fire reliably.

There may be more plausible failure scenarios, but this is the
definite one that I've found.


There are two overlapping problems here: firstly the conditional
failure to set TIF_FOREIGN_FPSTATE in kernel_neon_begin() (which
was a latent bug prior to the VHE FPSIMD series), and secondly
the assumption that !TIF_FOREIGN_FPSTATE implies that current's
FPSIMD state is loaded (so fpsimd_last_state.st != NULL):
the latter is not true for kernel threads.

I will post patches shortly.  My current approach is to pull the
set_thread_flag(TIF_FOREIGN_FPSTATE) into fpsimd_flush_cpu_state()
(so that it can't be forgotten in this scenario), and to remove
the special-case handling of kernel threads in fpsimd.c: the only
requirement is to ensure TIF_FOREIGN_FPSTATE is set for the init task,
and remove the ->mm based "optimisation" that allows a wrong
TIF_FOREIGN_FPSTATE to be left behind on sched-in.

kernel threads never get any FPSIMD state loaded, but virtue of
never entering userspace.  They will now never save state either,
by virtue of TIF_FOREIGN_FPSTATE always being true for these
threads.

Cheers
---Dave


diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index b0d29b7..c85f4eb 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1314,3 +1314,10 @@ static int __init fpsimd_init(void)
 	return sve_sysctl_init();
 }
 core_initcall(fpsimd_init);
+
+void __fpsimd_print_state(char const *file, int line)
+{
+	pr_info("%s[%d]: %s:%d: current->mm = %p, fpsimd_last_state.st = %p, TIF_FOREIGN_FPSTATE = %d\n",
+		current->comm, task_pid_nr(current), file, line,
+		current->mm, this_cpu_read(fpsimd_last_state.st), test_thread_flag(TIF_FOREIGN_FPSTATE) ? 1 : 0);
+}
diff --git a/arch/arm64/kernel/sys.c b/arch/arm64/kernel/sys.c
index 72981ba..745f5f0 100644
--- a/arch/arm64/kernel/sys.c
+++ b/arch/arm64/kernel/sys.c
@@ -59,7 +59,9 @@ asmlinkage long sys_rt_sigreturn_wrapper(void);
  * The sys_call_table array must be 4K aligned to be accessible from
  * kernel/entry.S.
  */
+extern long arm64_ni_syscall(long nr);
+
 void * const sys_call_table[__NR_syscalls] __aligned(4096) = {
-	[0 ... __NR_syscalls - 1] = sys_ni_syscall,
+	[0 ... __NR_syscalls - 1] = arm64_ni_syscall,
 #include <asm/unistd.h>
 };
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 8bbdc17..a9a8589 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -18,6 +18,7 @@
  */
 
 #include <linux/bug.h>
+#include <linux/completion.h>
 #include <linux/signal.h>
 #include <linux/personality.h>
 #include <linux/kallsyms.h>
@@ -27,10 +28,13 @@
 #include <linux/kdebug.h>
 #include <linux/module.h>
 #include <linux/kexec.h>
+#include <linux/kthread.h>
 #include <linux/delay.h>
 #include <linux/init.h>
+#include <linux/mmu_context.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/debug.h>
+#include <linux/sched/mm.h>
 #include <linux/sched/task_stack.h>
 #include <linux/sizes.h>
 #include <linux/syscalls.h>
@@ -43,6 +47,7 @@
 #include <asm/debug-monitors.h>
 #include <asm/esr.h>
 #include <asm/insn.h>
+#include <asm/neon.h>
 #include <asm/traps.h>
 #include <asm/smp.h>
 #include <asm/stack_pointer.h>
@@ -549,10 +554,92 @@ asmlinkage void __exception do_sysinstr(unsigned int esr, struct pt_regs *regs)
 
 long compat_arm_syscall(struct pt_regs *regs);
 
+struct grabber_data {
+	int err;
+	struct completion c, c2;
+	struct mm_struct *mm;
+};
+
+extern void __fpsimd_print_state(char const *, int);
+#define fpsimd_print_state() __fpsimd_print_state(__FILE__, __LINE__)
+
+static int grabber(void *data)
+{
+	struct grabber_data *g = data;
+
+	if (!mmget_not_zero(g->mm)) {
+		g->err = -ESRCH;
+		goto out;
+	}
+
+	fpsimd_print_state();
+
+	g->err = 0;
+
+	kernel_neon_begin();
+	kernel_neon_end();
+
+	fpsimd_print_state();
+
+	use_mm(g->mm);
+
+	fpsimd_print_state();
+
+	complete(&g->c);
+	wait_for_completion(&g->c2);
+
+	g->err = 0;
+out:
+	complete(&g->c);
+	return g->err;
+}
+
+static int do_trigger_bug(void)
+{
+	struct task_struct *kth;
+	struct grabber_data g;
+
+	pr_info("%s[%d]: do_trigger_bug() called\n",
+		current->comm, task_pid_nr(current));
+
+	init_completion(&g.c);
+	init_completion(&g.c2);
+	mmgrab(current->mm);
+	g.mm = current->mm;
+
+	kth = kthread_create(grabber, &g, "grabber[%d]", task_pid_nr(current));
+	if (IS_ERR(kth)) {
+		mmdrop(current->mm);
+		return PTR_ERR(kth);
+	}
+
+	wake_up_process(kth);
+	wait_for_completion(&g.c);
+	mmdrop(current->mm);
+	if (g.err) {
+		pr_info("%s: mm disappeared\n", __func__);
+		return g.err;
+	}
+
+	init_completion(&g.c);
+	complete(&g.c2);
+	wait_for_completion(&g.c);
+	return g.err;
+}
+
+long arm64_ni_syscall(long nr)
+{
+	if ((int)nr == 8888)
+		return do_trigger_bug();
+
+	return -ENOSYS;
+}
+
 asmlinkage long do_ni_syscall(struct pt_regs *regs)
 {
-#ifdef CONFIG_COMPAT
 	long ret;
+
+#ifdef CONFIG_COMPAT
 	if (is_compat_task()) {
 		ret = compat_arm_syscall(regs);
 		if (ret != -ENOSYS)
@@ -560,7 +647,7 @@ asmlinkage long do_ni_syscall(struct pt_regs *regs)
 	}
 #endif
 
-	return sys_ni_syscall();
+	return arm64_ni_syscall(regs->regs[8]);
 }
 
 static const char *esr_class_str[] = {
--- /dev/null	2017-09-27 16:25:00.179999999 +0100
+++ test.c	2018-05-22 11:59:58.671159660 +0100
@@ -0,0 +1,12 @@
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+
+int main(void)
+{
+	if (syscall(8888) == -1)
+		perror("8888");
+
+	return 0;
+}


yielding

[  229.246349] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[  229.246423] Mem abort info:
[  229.246474]   ESR = 0x96000046
[  229.246529]   Exception class = DABT (current EL), IL = 32 bits
[  229.246598]   SET = 0, FnV = 0
[  229.246652]   EA = 0, S1PTW = 0
[  229.246700] Data abort info:
[  229.246751]   ISV = 0, ISS = 0x00000046
[  229.246809]   CM = 0, WnR = 1
[  229.246871] user pgtable: 4k pages, 48-bit VAs, pgdp =         (ptrval)
[  229.246944] [0000000000000000] pgd=00000008fa6be003, pud=00000008fa6b5003, pmd=0000000000000000
[  229.247070] Internal error: Oops: 96000046 [#1] PREEMPT SMP
[  229.247130] Modules linked in:
[  229.247214] CPU: 0 PID: 1348 Comm: grabber[1347] Not tainted 4.17.0-rc3-00016-g885df5d-dirty #25
[  229.247296] Hardware name: FVP Base (DT)
[  229.247368] pstate: 40000085 (nZcv daIf -PAN -UAO)
[  229.247452] pc : fpsimd_save_state+0x0/0x54
[  229.247530] lr : fpsimd_save+0x54/0xe8
[  229.247586] sp : ffff00000ad13c10
[  229.247643] x29: ffff00000ad13c10 x28: 0000000000000000
[  229.247734] x27: 0000000000000000 x26: ffff80087ac7cc20
[  229.247830] x25: ffff000008a554f0 x24: ffff000009269b88
[  229.247926] x23: ffff00000924f010 x22: ffff80087ac7c600
[  229.248022] x21: ffff000009272a80 x20: ffff80087ac7c600
[  229.248115] x19: 0000000000000000 x18: 0000000000000010
[  229.248209] x17: 00000000d12b3c86 x16: 000000000be55560
[  229.248302] x15: 0000000000000400 x14: 0000000000000400
[  229.248395] x13: 0000000000000400 x12: 0000000000000001
[  229.248488] x11: 00000000000001bd x10: 0000000000000000
[  229.248580] x9 : 0000000002e79c03 x8 : 0000000000000000
[  229.248671] x7 : 0000000000000000 x6 : ffff80087ac7c730
[  229.248767] x5 : ffff80087ac7c600 x4 : 0000000000000000
[  229.248858] x3 : 0000000000000002 x2 : 0000000000000000
[  229.248951] x1 : 0000800876d64000 x0 : 0000000000000000
[  229.249050] Process grabber[1347] (pid: 1348, stack limit = 0x        (ptrval))
[  229.249119] Call trace:
[  229.249193]  fpsimd_save_state+0x0/0x54
[  229.249272]  fpsimd_thread_switch+0x28/0xc0
[  229.249353]  __switch_to+0x1c/0xd0
[  229.249426]  __schedule+0x1c0/0x5d0
[  229.249502]  schedule+0x38/0xa0
[  229.249580]  schedule_timeout+0x23c/0x338
[  229.249661]  wait_for_common+0x140/0x168
[  229.249741]  wait_for_completion+0x14/0x20
[  229.249824]  grabber+0x108/0x128
[  229.249897]  kthread+0x124/0x128
[  229.249970]  ret_from_fork+0x10/0x18
[  229.250058] Code: 94273d7b b9403fe1 f9401be0 17ffffe8 (ad000400)
[  229.250133] ---[ end trace b5b40d96060ea368 ]---

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PULL v8] KVM: arm64: Optimise FPSIMD context switching
@ 2018-05-22 12:04     ` Dave Martin
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Martin @ 2018-05-22 12:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, May 20, 2018 at 02:14:41PM +0100, Marc Zyngier wrote:
> On Wed, 16 May 2018 11:49:42 +0100
> Dave Martin <Dave.Martin@arm.com> wrote:
> 
> Hi Dave,
> 
> > Hi Marc,
> > 
> > This is a trivial update to the previously posted v7 [1].  The only
> > changes are a couple of minor cosmetic changes requested by reviewers,
> > on-list and the addition of Acked-by/Reviewed-by tags received since the
> > series was posted.
> > 
> > Let me know if you need anything else on this.
> 
> So I've taken this, merged in Linus' top of tree, started a guest on a
> dual A53 board, and immediately hit the following:
> 
> root at sy-borg:~# [  287.226184] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [  287.231672] Mem abort info:
> [  287.234537]   ESR = 0x96000044
> [  287.237674]   Exception class = DABT (current EL), IL = 32 bits
> [  287.243765]   SET = 0, FnV = 0
> [  287.246900]   EA = 0, S1PTW = 0
> [  287.250126] Data abort info:
> [  287.253083]   ISV = 0, ISS = 0x00000044
> [  287.257025]   CM = 0, WnR = 1
> [  287.260076] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000b8483f75
> [  287.266882] [0000000000000000] pgd=0000000000000000
> [  287.271903] Internal error: Oops: 96000044 [#1] PREEMPT SMP
> [  287.277636] Modules linked in:
> [  287.280776] CPU: 1 PID: 3098 Comm: kworker/u4:3 Not tainted 4.17.0-rc5-00166-gd84e81cca249 #136
> [  287.289730] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT)
> [  287.296364] pstate: 40000085 (nZcv daIf -PAN -UAO)
> [  287.301301] pc : fpsimd_save_state+0x0/0x54
> [  287.305595] lr : fpsimd_save+0x50/0x100
> [  287.309531] sp : ffff00000dde3af0
> [  287.312936] x29: ffff00000dde3af0 x28: ffff000008cd565c 
> [  287.318401] x27: ffff800078ee9c80 x26: ffff80007b207628 
> [  287.323867] x25: ffff0000093f9000 x24: 0000000000000001 
> [  287.329333] x23: ffff0000093d4000 x22: ffff80007b207000 
> [  287.334798] x21: ffff80007efd7d80 x20: ffff80007b207000 
> [  287.340264] x19: 0000000000000000 x18: 0000000000040f0b 
> [  287.345729] x17: 0000ffffb70752b8 x16: 0000ffffb708e008 
> [  287.351195] x15: 0000000000000000 x14: 0000000000000400 
> [  287.356661] x13: 0000000000000001 x12: 0000000000000001 
> [  287.362127] x11: 0000000000000001 x10: 0000000000000000 
> [  287.367592] x9 : 0000000000000253 x8 : ffff80007b207200 
> [  287.373057] x7 : ffff80007b207100 x6 : ffff80007c378f18 
> [  287.378523] x5 : 00000042c2094c00 x4 : 0000000000000000 
> [  287.383990] x3 : 00000042e0033450 x2 : 0000000000000000 
> [  287.389454] x1 : 0000800075bf6000 x0 : 0000000000000000 
> [  287.394922] Process kworker/u4:3 (pid: 3098, stack limit = 0x00000000ca0dd8c6)
> [  287.402358] Call trace:
> [  287.404873]  fpsimd_save_state+0x0/0x54
> [  287.408813]  fpsimd_thread_switch+0x28/0xa0
> [  287.413114]  __switch_to+0x1c/0xd0
> [  287.416609]  __schedule+0x1b8/0x730
> [  287.420191]  preempt_schedule_common+0x24/0x48
> [  287.424760]  preempt_schedule.part.23+0x1c/0x28
> [  287.429419]  preempt_schedule+0x1c/0x28
> [  287.433363]  _raw_spin_unlock+0x34/0x48
> [  287.437308]  flush_old_exec+0x45c/0x6a0
> [  287.441250]  load_elf_binary+0x324/0x1198
> [  287.445372]  search_binary_handler+0xac/0x230
> [  287.449851]  do_execveat_common.isra.14+0x508/0x6e0
> [  287.454867]  do_execve+0x28/0x30
> [  287.458185]  call_usermodehelper_exec_async+0xdc/0x140
> [  287.463468]  ret_from_fork+0x10/0x18
> [  287.467143] Code: a9425bf5 a8c37bfd d65f03c0 d65f03c0 (ad000400) 
> [  287.473414] ---[ end trace c4346b99cc877f8e ]---
> 
> It happened just after having loaded the guest kernel, so I presume
> we're missing some kind of initialization. I couldn't subsequently
> reproduce it  on the same machine, and the same kernel is doing
> absolutely fine on a Seattle box.
> 
> I can't immediately see how st would be NULL, unless we somehow are
> missing some state tracking somewhere...
> 
> Any idea?

I have a reproducer now for something that resembles the above bug.
(Build test.c separately and run it from the target.)

In order for this particular bug to happen we need some kernel thread
or softirq in a kernel thread to use kernel-mode NEON (thus setting
fpsimd_last_state.st to NULL but leaving TIF_FOREIGN_FPSTATE clear),
then a kernel thread with no mm is scheduled in on the same cpu, then
that thread gans an mm and get scheduled out.

I don't force-affine anything in this test, but on an idle system it
seems to fire reliably.

There may be more plausible failure scenarios, but this is the
definite one that I've found.


There are two overlapping problems here: firstly the conditional
failure to set TIF_FOREIGN_FPSTATE in kernel_neon_begin() (which
was a latent bug prior to the VHE FPSIMD series), and secondly
the assumption that !TIF_FOREIGN_FPSTATE implies that current's
FPSIMD state is loaded (so fpsimd_last_state.st != NULL):
the latter is not true for kernel threads.

I will post patches shortly.  My current approach is to pull the
set_thread_flag(TIF_FOREIGN_FPSTATE) into fpsimd_flush_cpu_state()
(so that it can't be forgotten in this scenario), and to remove
the special-case handling of kernel threads in fpsimd.c: the only
requirement is to ensure TIF_FOREIGN_FPSTATE is set for the init task,
and remove the ->mm based "optimisation" that allows a wrong
TIF_FOREIGN_FPSTATE to be left behind on sched-in.

kernel threads never get any FPSIMD state loaded, but virtue of
never entering userspace.  They will now never save state either,
by virtue of TIF_FOREIGN_FPSTATE always being true for these
threads.

Cheers
---Dave


diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index b0d29b7..c85f4eb 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1314,3 +1314,10 @@ static int __init fpsimd_init(void)
 	return sve_sysctl_init();
 }
 core_initcall(fpsimd_init);
+
+void __fpsimd_print_state(char const *file, int line)
+{
+	pr_info("%s[%d]: %s:%d: current->mm = %p, fpsimd_last_state.st = %p, TIF_FOREIGN_FPSTATE = %d\n",
+		current->comm, task_pid_nr(current), file, line,
+		current->mm, this_cpu_read(fpsimd_last_state.st), test_thread_flag(TIF_FOREIGN_FPSTATE) ? 1 : 0);
+}
diff --git a/arch/arm64/kernel/sys.c b/arch/arm64/kernel/sys.c
index 72981ba..745f5f0 100644
--- a/arch/arm64/kernel/sys.c
+++ b/arch/arm64/kernel/sys.c
@@ -59,7 +59,9 @@ asmlinkage long sys_rt_sigreturn_wrapper(void);
  * The sys_call_table array must be 4K aligned to be accessible from
  * kernel/entry.S.
  */
+extern long arm64_ni_syscall(long nr);
+
 void * const sys_call_table[__NR_syscalls] __aligned(4096) = {
-	[0 ... __NR_syscalls - 1] = sys_ni_syscall,
+	[0 ... __NR_syscalls - 1] = arm64_ni_syscall,
 #include <asm/unistd.h>
 };
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 8bbdc17..a9a8589 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -18,6 +18,7 @@
  */
 
 #include <linux/bug.h>
+#include <linux/completion.h>
 #include <linux/signal.h>
 #include <linux/personality.h>
 #include <linux/kallsyms.h>
@@ -27,10 +28,13 @@
 #include <linux/kdebug.h>
 #include <linux/module.h>
 #include <linux/kexec.h>
+#include <linux/kthread.h>
 #include <linux/delay.h>
 #include <linux/init.h>
+#include <linux/mmu_context.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/debug.h>
+#include <linux/sched/mm.h>
 #include <linux/sched/task_stack.h>
 #include <linux/sizes.h>
 #include <linux/syscalls.h>
@@ -43,6 +47,7 @@
 #include <asm/debug-monitors.h>
 #include <asm/esr.h>
 #include <asm/insn.h>
+#include <asm/neon.h>
 #include <asm/traps.h>
 #include <asm/smp.h>
 #include <asm/stack_pointer.h>
@@ -549,10 +554,92 @@ asmlinkage void __exception do_sysinstr(unsigned int esr, struct pt_regs *regs)
 
 long compat_arm_syscall(struct pt_regs *regs);
 
+struct grabber_data {
+	int err;
+	struct completion c, c2;
+	struct mm_struct *mm;
+};
+
+extern void __fpsimd_print_state(char const *, int);
+#define fpsimd_print_state() __fpsimd_print_state(__FILE__, __LINE__)
+
+static int grabber(void *data)
+{
+	struct grabber_data *g = data;
+
+	if (!mmget_not_zero(g->mm)) {
+		g->err = -ESRCH;
+		goto out;
+	}
+
+	fpsimd_print_state();
+
+	g->err = 0;
+
+	kernel_neon_begin();
+	kernel_neon_end();
+
+	fpsimd_print_state();
+
+	use_mm(g->mm);
+
+	fpsimd_print_state();
+
+	complete(&g->c);
+	wait_for_completion(&g->c2);
+
+	g->err = 0;
+out:
+	complete(&g->c);
+	return g->err;
+}
+
+static int do_trigger_bug(void)
+{
+	struct task_struct *kth;
+	struct grabber_data g;
+
+	pr_info("%s[%d]: do_trigger_bug() called\n",
+		current->comm, task_pid_nr(current));
+
+	init_completion(&g.c);
+	init_completion(&g.c2);
+	mmgrab(current->mm);
+	g.mm = current->mm;
+
+	kth = kthread_create(grabber, &g, "grabber[%d]", task_pid_nr(current));
+	if (IS_ERR(kth)) {
+		mmdrop(current->mm);
+		return PTR_ERR(kth);
+	}
+
+	wake_up_process(kth);
+	wait_for_completion(&g.c);
+	mmdrop(current->mm);
+	if (g.err) {
+		pr_info("%s: mm disappeared\n", __func__);
+		return g.err;
+	}
+
+	init_completion(&g.c);
+	complete(&g.c2);
+	wait_for_completion(&g.c);
+	return g.err;
+}
+
+long arm64_ni_syscall(long nr)
+{
+	if ((int)nr == 8888)
+		return do_trigger_bug();
+
+	return -ENOSYS;
+}
+
 asmlinkage long do_ni_syscall(struct pt_regs *regs)
 {
-#ifdef CONFIG_COMPAT
 	long ret;
+
+#ifdef CONFIG_COMPAT
 	if (is_compat_task()) {
 		ret = compat_arm_syscall(regs);
 		if (ret != -ENOSYS)
@@ -560,7 +647,7 @@ asmlinkage long do_ni_syscall(struct pt_regs *regs)
 	}
 #endif
 
-	return sys_ni_syscall();
+	return arm64_ni_syscall(regs->regs[8]);
 }
 
 static const char *esr_class_str[] = {
--- /dev/null	2017-09-27 16:25:00.179999999 +0100
+++ test.c	2018-05-22 11:59:58.671159660 +0100
@@ -0,0 +1,12 @@
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/syscall.h>
+
+int main(void)
+{
+	if (syscall(8888) == -1)
+		perror("8888");
+
+	return 0;
+}


yielding

[  229.246349] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[  229.246423] Mem abort info:
[  229.246474]   ESR = 0x96000046
[  229.246529]   Exception class = DABT (current EL), IL = 32 bits
[  229.246598]   SET = 0, FnV = 0
[  229.246652]   EA = 0, S1PTW = 0
[  229.246700] Data abort info:
[  229.246751]   ISV = 0, ISS = 0x00000046
[  229.246809]   CM = 0, WnR = 1
[  229.246871] user pgtable: 4k pages, 48-bit VAs, pgdp =         (ptrval)
[  229.246944] [0000000000000000] pgd=00000008fa6be003, pud=00000008fa6b5003, pmd=0000000000000000
[  229.247070] Internal error: Oops: 96000046 [#1] PREEMPT SMP
[  229.247130] Modules linked in:
[  229.247214] CPU: 0 PID: 1348 Comm: grabber[1347] Not tainted 4.17.0-rc3-00016-g885df5d-dirty #25
[  229.247296] Hardware name: FVP Base (DT)
[  229.247368] pstate: 40000085 (nZcv daIf -PAN -UAO)
[  229.247452] pc : fpsimd_save_state+0x0/0x54
[  229.247530] lr : fpsimd_save+0x54/0xe8
[  229.247586] sp : ffff00000ad13c10
[  229.247643] x29: ffff00000ad13c10 x28: 0000000000000000
[  229.247734] x27: 0000000000000000 x26: ffff80087ac7cc20
[  229.247830] x25: ffff000008a554f0 x24: ffff000009269b88
[  229.247926] x23: ffff00000924f010 x22: ffff80087ac7c600
[  229.248022] x21: ffff000009272a80 x20: ffff80087ac7c600
[  229.248115] x19: 0000000000000000 x18: 0000000000000010
[  229.248209] x17: 00000000d12b3c86 x16: 000000000be55560
[  229.248302] x15: 0000000000000400 x14: 0000000000000400
[  229.248395] x13: 0000000000000400 x12: 0000000000000001
[  229.248488] x11: 00000000000001bd x10: 0000000000000000
[  229.248580] x9 : 0000000002e79c03 x8 : 0000000000000000
[  229.248671] x7 : 0000000000000000 x6 : ffff80087ac7c730
[  229.248767] x5 : ffff80087ac7c600 x4 : 0000000000000000
[  229.248858] x3 : 0000000000000002 x2 : 0000000000000000
[  229.248951] x1 : 0000800876d64000 x0 : 0000000000000000
[  229.249050] Process grabber[1347] (pid: 1348, stack limit = 0x        (ptrval))
[  229.249119] Call trace:
[  229.249193]  fpsimd_save_state+0x0/0x54
[  229.249272]  fpsimd_thread_switch+0x28/0xc0
[  229.249353]  __switch_to+0x1c/0xd0
[  229.249426]  __schedule+0x1c0/0x5d0
[  229.249502]  schedule+0x38/0xa0
[  229.249580]  schedule_timeout+0x23c/0x338
[  229.249661]  wait_for_common+0x140/0x168
[  229.249741]  wait_for_completion+0x14/0x20
[  229.249824]  grabber+0x108/0x128
[  229.249897]  kthread+0x124/0x128
[  229.249970]  ret_from_fork+0x10/0x18
[  229.250058] Code: 94273d7b b9403fe1 f9401be0 17ffffe8 (ad000400)
[  229.250133] ---[ end trace b5b40d96060ea368 ]---

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-05-22 12:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-16 10:49 [PULL v8] KVM: arm64: Optimise FPSIMD context switching Dave Martin
2018-05-16 10:49 ` Dave Martin
2018-05-20 13:14 ` Marc Zyngier
2018-05-20 13:14   ` Marc Zyngier
2018-05-21 10:02   ` Dave Martin
2018-05-21 10:02     ` Dave Martin
2018-05-22 12:04   ` Dave Martin
2018-05-22 12:04     ` Dave Martin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.