All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] sync_regs() TOCTOU issues
@ 2023-07-28  0:12 Michal Luczaj
  2023-07-28  0:12 ` [PATCH 1/2] KVM: x86: Fix KVM_CAP_SYNC_REGS's " Michal Luczaj
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Michal Luczaj @ 2023-07-28  0:12 UTC (permalink / raw)
  To: seanjc; +Cc: pbonzini, kvm, shuah, Michal Luczaj

Both __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() assume they
have exclusive rights to structs they operate on. While this is true when
coming from an ioctl handler (caller makes a local copy of user's data),
sync_regs() breaks this contract; a pointer to a user-modifiable memory
(vcpu->run->s.regs) is provided. This can lead to a situation when incoming
data is checked and/or sanitized only to be re-set by a user thread running
in parallel. 

Selftest racing kvm_vcpu_ioctl_x86_set_vcpu_events() results in hitting
some WARN_ON()s. With [1] applied, racing __set_sregs() ends with
KVM_BUG_ON() during ioctl(KVM_TRANSLATE).

[1] KVM: x86/mmu: Bug the VM if a vCPU ends up in long mode without PAE enabled
    https://lore.kernel.org/kvm/20230721230006.2337941-6-seanjc@google.com/

Selftest-induced splats:

arch/x86/kvm/x86.c:kvm_check_and_inject_events():
	WARN_ON_ONCE(vcpu->arch.exception.injected &&
		     vcpu->arch.exception.pending)

[  188.598039] WARNING: CPU: 4 PID: 969 at arch/x86/kvm/x86.c:10095 kvm_check_and_inject_events+0x220/0x500 [kvm]
[  188.598141] Modules linked in: 9p fscache netfs qrtr sunrpc intel_rapl_msr intel_rapl_common kvm_intel kvm 9pnet_virtio pcspkr 9pnet rapl i2c_piix4 drm zram crct10dif_pclmul crc32_pclmul crc32c_intel virtio_console serio_raw virtio_blk ghash_clmulni_intel ata_generic pata_acpi fuse qemu_fw_cfg
[  188.598194] CPU: 4 PID: 969 Comm: sync_regs_test Tainted: G        W          6.5.0-rc3+ #50
[  188.598199] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.2-1-1 04/01/2014
[  188.598202] RIP: 0010:kvm_check_and_inject_events+0x220/0x500 [kvm]
[  188.598274] Code: 0d 80 bb b0 08 00 00 00 0f 84 49 02 00 00 85 c0 0f 89 db fe ff ff 83 f8 f0 0f 84 b9 fe ff ff 48 83 c4 08 5b 5d 41 5c 41 5d c3 <0f> 0b 85 c0 78 e6 66 83 bb b0 08 00 00 00 0f 85 b2 01 00 00 0f b6
[  188.598278] RSP: 0018:ffffc9000173fcb0 EFLAGS: 00010202
[  188.598284] RAX: 0000000000000000 RBX: ffff888122588000 RCX: 0000000000000000
[  188.598287] RDX: 0000000000000000 RSI: 0000000080000300 RDI: ffff888122588000
[  188.598290] RBP: ffffc9000173fd40 R08: 0000000000000001 R09: ffff888107adc000
[  188.598293] R10: ffffc9000173fd58 R11: 0000000000000001 R12: ffffc9000173fcef
[  188.598296] R13: 0000000000000000 R14: ffff888108988000 R15: ffff888122588000
[  188.598299] FS:  00007fdb3d656740(0000) GS:ffff88842fc00000(0000) knlGS:0000000000000000
[  188.598303] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  188.598306] CR2: 0000000000000000 CR3: 000000011e617000 CR4: 0000000000752ee0
[  188.598316] PKRU: 55555554
[  188.598319] Call Trace:
[  188.598322]  <TASK>
[  188.598325]  ? kvm_check_and_inject_events+0x220/0x500 [kvm]
[  188.598389]  ? __warn+0x81/0x170
[  188.598397]  ? kvm_check_and_inject_events+0x220/0x500 [kvm]
[  188.598458]  ? report_bug+0x189/0x1c0
[  188.598466]  ? handle_bug+0x38/0x70
[  188.598472]  ? exc_invalid_op+0x13/0x60
[  188.598515]  ? asm_exc_invalid_op+0x16/0x20
[  188.598529]  ? kvm_check_and_inject_events+0x220/0x500 [kvm]
[  188.598592]  vcpu_run+0x5aa/0x1660 [kvm]
[  188.598658]  ? skip_emulated_instruction+0xa3/0x190 [kvm_intel]
[  188.598674]  ? complete_emulator_pio_in+0xab/0xc0 [kvm]
[  188.598740]  ? kvm_arch_vcpu_ioctl_run+0x1e4/0x740 [kvm]
[  188.598801]  kvm_arch_vcpu_ioctl_run+0x1e4/0x740 [kvm]
[  188.598862]  kvm_vcpu_ioctl+0x19d/0x680 [kvm]
[  188.598916]  ? lock_release+0x132/0x260
[  188.598927]  __x64_sys_ioctl+0x8c/0xc0
[  188.598935]  do_syscall_64+0x56/0x80
[  188.598939]  ? lockdep_hardirqs_on+0x7d/0x100
[  188.598944]  ? do_syscall_64+0x62/0x80
[  188.598948]  ? do_syscall_64+0x62/0x80
[  188.598952]  ? lockdep_hardirqs_on+0x7d/0x100
[  188.598956]  ? do_syscall_64+0x62/0x80
[  188.598960]  ? do_syscall_64+0x62/0x80
[  188.598964]  ? lockdep_hardirqs_on+0x7d/0x100
[  188.598968]  ? do_syscall_64+0x62/0x80
[  188.598974]  ? asm_exc_page_fault+0x22/0x30
[  188.598980]  ? lockdep_hardirqs_on+0x7d/0x100
[  188.598984]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  188.598991] RIP: 0033:0x7fdb3d759d6f
[  188.598995] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[  188.598999] RSP: 002b:00007ffe0c2f3b70 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  188.599004] RAX: ffffffffffffffda RBX: 0000000064c2f1b3 RCX: 00007fdb3d759d6f
[  188.599007] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
[  188.599010] RBP: 0000000000e1b2a0 R08: 0000000000417718 R09: 00000000004176e0
[  188.599012] R10: 00007ffe0c3ac258 R11: 0000000000000246 R12: 0000000000418be5
[  188.599015] R13: 0000000000e1b2a0 R14: 00007fdb3d842130 R15: 00007fdb3d87c000
[  188.599027]  </TASK>
[  188.599030] irq event stamp: 33643
[  188.599032] hardirqs last  enabled at (33649): [<ffffffff8118c45e>] __up_console_sem+0x5e/0x70
[  188.599039] hardirqs last disabled at (33654): [<ffffffff8118c443>] __up_console_sem+0x43/0x70
[  188.599043] softirqs last  enabled at (33446): [<ffffffff810f903d>] __irq_exit_rcu+0x9d/0x110
[  188.599050] softirqs last disabled at (33439): [<ffffffff810f903d>] __irq_exit_rcu+0x9d/0x110

arch/x86/kvm/x86.c:exception_type():
	WARN_ON(vector > 31 || vector == NMI_VECTOR)

[   47.224496] WARNING: CPU: 7 PID: 958 at arch/x86/kvm/x86.c:547 kvm_check_and_inject_events+0x4a0/0x500 [kvm]
[   47.224516] Modules linked in: 9p fscache netfs qrtr sunrpc intel_rapl_msr intel_rapl_common kvm_intel kvm 9pnet_virtio pcspkr 9pnet rapl i2c_piix4 drm zram crct10dif_pclmul crc32_pclmul crc32c_intel virtio_console serio_raw virtio_blk ghash_clmulni_intel ata_generic pata_acpi fuse qemu_fw_cfg
[   47.224532] CPU: 7 PID: 958 Comm: sync_regs_test Tainted: G        W          6.5.0-rc3+ #50
[   47.224534] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.2-1-1 04/01/2014
[   47.224534] RIP: 0010:kvm_check_and_inject_events+0x4a0/0x500 [kvm]
[   47.224553] Code: 31 ed c6 83 28 0b 00 00 01 e8 bc cd 0f 00 be 01 00 00 00 48 89 df e8 6f 68 10 00 85 c0 0f 89 0e fc ff ff 0f 0b e9 07 fc ff ff <0f> 0b e9 0e fe ff ff 0f 0b f6 43 42 10 0f 84 b9 fb ff ff 31 c0 e9
[   47.224554] RSP: 0018:ffffc90001573cc0 EFLAGS: 00010202
[   47.224556] RAX: 0000000000000001 RBX: ffff88810b768000 RCX: 00000000000000ff
[   47.224557] RDX: 0000000000000001 RSI: ffffc90001573cff RDI: ffff88810b768000
[   47.224557] RBP: 0000000000000001 R08: 00000000000005a4 R09: 0000000000000000
[   47.224558] R10: ffffc90001573d68 R11: 0000000000000001 R12: ffffc90001573cff
[   47.224559] R13: 0000000000000000 R14: ffff888110f73380 R15: ffff88810b768000
[   47.224560] FS:  00007f8fc0f80740(0000) GS:ffff88842fd80000(0000) knlGS:0000000000000000
[   47.224561] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   47.224562] CR2: 0000000000000000 CR3: 0000000151e24000 CR4: 0000000000752ee0
[   47.224564] PKRU: 55555554
[   47.224565] Call Trace:
[   47.224566]  <TASK>
[   47.224566]  ? kvm_check_and_inject_events+0x4a0/0x500 [kvm]
[   47.224585]  ? __warn+0x81/0x170
[   47.224587]  ? kvm_check_and_inject_events+0x4a0/0x500 [kvm]
[   47.224606]  ? report_bug+0x189/0x1c0
[   47.224608]  ? handle_bug+0x38/0x70
[   47.224610]  ? exc_invalid_op+0x13/0x60
[   47.224612]  ? asm_exc_invalid_op+0x16/0x20
[   47.224617]  ? kvm_check_and_inject_events+0x4a0/0x500 [kvm]
[   47.224636]  vcpu_run+0x5aa/0x1660 [kvm]
[   47.224656]  ? lock_acquire+0xd4/0x290
[   47.224658]  ? lockdep_hardirqs_on+0x7d/0x100
[   47.224660]  ? kvm_arch_vcpu_ioctl_run+0x1e4/0x740 [kvm]
[   47.224680]  kvm_arch_vcpu_ioctl_run+0x1e4/0x740 [kvm]
[   47.224700]  kvm_vcpu_ioctl+0x19d/0x680 [kvm]
[   47.224718]  ? lock_release+0x132/0x260
[   47.224722]  __x64_sys_ioctl+0x8c/0xc0
[   47.224724]  do_syscall_64+0x56/0x80
[   47.224726]  ? do_syscall_64+0x62/0x80
[   47.224728]  ? lockdep_hardirqs_on+0x7d/0x100
[   47.224729]  ? do_syscall_64+0x62/0x80
[   47.224730]  ? do_syscall_64+0x62/0x80
[   47.224732]  ? do_syscall_64+0x62/0x80
[   47.224733]  ? lockdep_hardirqs_on+0x7d/0x100
[   47.224734]  ? do_syscall_64+0x62/0x80
[   47.224735]  ? do_syscall_64+0x62/0x80
[   47.224737]  ? lockdep_hardirqs_on+0x7d/0x100
[   47.224738]  ? do_syscall_64+0x62/0x80
[   47.224739]  ? do_syscall_64+0x62/0x80
[   47.224741]  ? do_syscall_64+0x62/0x80
[   47.224742]  ? lockdep_hardirqs_on+0x7d/0x100
[   47.224744]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[   47.224746] RIP: 0033:0x7f8fc1083d6f
[   47.224747] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[   47.224748] RSP: 002b:00007ffe2e0a0ed0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   47.224749] RAX: ffffffffffffffda RBX: 0000000064c2f125 RCX: 00007f8fc1083d6f
[   47.224750] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
[   47.224751] RBP: 00000000015dc2a0 R08: 0000000000417718 R09: 00000000004176e0
[   47.224752] R10: 00007ffe2e0d1258 R11: 0000000000000246 R12: 0000000000418be5
[   47.224753] R13: 00000000015dc2a0 R14: 00007f8fc116c130 R15: 00007f8fc11a6000
[   47.224757]  </TASK>
[   47.224758] irq event stamp: 1494439
[   47.224759] hardirqs last  enabled at (1494445): [<ffffffff8118c45e>] __up_console_sem+0x5e/0x70
[   47.224760] hardirqs last disabled at (1494450): [<ffffffff8118c443>] __up_console_sem+0x43/0x70
[   47.224762] softirqs last  enabled at (1493904): [<ffffffff8103894d>] fpu_swap_kvm_fpstate+0x6d/0x120
[   47.224763] softirqs last disabled at (1493902): [<ffffffff810388e5>] fpu_swap_kvm_fpstate+0x5/0x120

arch/x86/kvm/mmu/paging_tmpl.h:
	KVM_BUG_ON(is_long_mode(vcpu) && !is_pae(vcpu), vcpu->kvm)

[   79.615678] WARNING: CPU: 1 PID: 944 at arch/x86/kvm/mmu/paging_tmpl.h:358 paging32_walk_addr_generic+0x431/0x8f0 [kvm]
[   79.615774] Modules linked in: 9p fscache netfs qrtr sunrpc intel_rapl_msr intel_rapl_common kvm_intel kvm 9pnet_virtio rapl 9pnet pcspkr i2c_piix4 drm zram crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_console virtio_blk serio_raw ata_generic pata_acpi fuse qemu_fw_cfg
[   79.615817] CPU: 1 PID: 944 Comm: sync_regs_test Not tainted 6.5.0-rc3+ #51
[   79.615821] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.2-1-1 04/01/2014
[   79.615824] RIP: 0010:paging32_walk_addr_generic+0x431/0x8f0 [kvm]
[   79.615899] Code: e9 09 fd ff ff 41 f6 86 70 02 00 00 20 0f 85 7d fc ff ff 4d 89 f5 4c 8b 1c 24 45 89 e6 49 8b 7d 00 80 bf 01 a2 00 00 00 75 1e <0f> 0b 41 b8 01 01 00 00 be 01 03 00 00 66 44 89 87 01 a2 00 00 e8
[   79.615903] RSP: 0018:ffffc9000161bcc0 EFLAGS: 00010246
[   79.615910] RAX: 0000000000000004 RBX: ffffc9000161bd48 RCX: 0000000000000000
[   79.615915] RDX: 0000000000000013 RSI: 000000000000000f RDI: ffffc9000166d000
[   79.615919] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000180000
[   79.615923] R10: 0000000000000001 R11: ffff888110770338 R12: 0000000000000000
[   79.615926] R13: ffff888110770000 R14: 0000000000000000 R15: ffff888110770628
[   79.615928] FS:  00007fde4879d740(0000) GS:ffff88842fa80000(0000) knlGS:0000000000000000
[   79.615932] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   79.615934] CR2: 0000000000000000 CR3: 0000000158e67000 CR4: 0000000000752ee0
[   79.615940] PKRU: 55555554
[   79.615942] Call Trace:
[   79.615945]  <TASK>
[   79.615948]  ? paging32_walk_addr_generic+0x431/0x8f0 [kvm]
[   79.616009]  ? __warn+0x81/0x170
[   79.616016]  ? paging32_walk_addr_generic+0x431/0x8f0 [kvm]
[   79.616076]  ? report_bug+0x189/0x1c0
[   79.616083]  ? handle_bug+0x38/0x70
[   79.616088]  ? exc_invalid_op+0x13/0x60
[   79.616093]  ? asm_exc_invalid_op+0x16/0x20
[   79.616105]  ? paging32_walk_addr_generic+0x431/0x8f0 [kvm]
[   79.616168]  ? lock_acquire+0xd4/0x290
[   79.616174]  paging32_gva_to_gpa+0x28/0x80 [kvm]
[   79.616235]  ? lock_acquire+0xd4/0x290
[   79.616240]  ? vmx_vcpu_load+0x27/0x40 [kvm_intel]
[   79.616257]  kvm_arch_vcpu_ioctl_translate+0x79/0xf0 [kvm]
[   79.616317]  ? kvm_arch_vcpu_ioctl_translate+0x5/0xf0 [kvm]
[   79.616374]  kvm_vcpu_ioctl+0x4f0/0x680 [kvm]
[   79.616469]  ? lock_release+0x132/0x260
[   79.616479]  __x64_sys_ioctl+0x8c/0xc0
[   79.616486]  do_syscall_64+0x56/0x80
[   79.616491]  ? do_syscall_64+0x62/0x80
[   79.616495]  ? do_syscall_64+0x62/0x80
[   79.616498]  ? lockdep_hardirqs_on+0x7d/0x100
[   79.616502]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[   79.616507] RIP: 0033:0x7fde488a0d6f
[   79.616512] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[   79.616514] RSP: 002b:00007fffc022e7d0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[   79.616519] RAX: ffffffffffffffda RBX: 0000000064c2f333 RCX: 00007fde488a0d6f
[   79.616522] RDX: 00007fffc022e850 RSI: 00000000c018ae85 RDI: 0000000000000005
[   79.616524] RBP: 00000000c018ae85 R08: 0000000000417718 R09: 00000000004176e0
[   79.616526] R10: 00007fffc03a5258 R11: 0000000000000246 R12: 0000000000418be5
[   79.616529] R13: 00000000021062a0 R14: 00007fde48989130 R15: 00007fde489c3000
[   79.616540]  </TASK>
[   79.616542] irq event stamp: 34521
[   79.616544] hardirqs last  enabled at (34527): [<ffffffff8118c45e>] __up_console_sem+0x5e/0x70
[   79.616574] hardirqs last disabled at (34532): [<ffffffff8118c443>] __up_console_sem+0x43/0x70
[   79.616577] softirqs last  enabled at (34418): [<ffffffff810f903d>] __irq_exit_rcu+0x9d/0x110
[   79.616582] softirqs last disabled at (34411): [<ffffffff810f903d>] __irq_exit_rcu+0x9d/0x110

Michal Luczaj (2):
  KVM: x86: Fix KVM_CAP_SYNC_REGS's sync_regs() TOCTOU issues
  KVM: selftests: Extend x86's sync_regs_test to check for races

 arch/x86/kvm/x86.c                            |  13 +-
 .../selftests/kvm/x86_64/sync_regs_test.c     | 124 ++++++++++++++++++
 2 files changed, 134 insertions(+), 3 deletions(-)

-- 
2.41.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/2] KVM: x86: Fix KVM_CAP_SYNC_REGS's sync_regs() TOCTOU issues
  2023-07-28  0:12 [PATCH 0/2] sync_regs() TOCTOU issues Michal Luczaj
@ 2023-07-28  0:12 ` Michal Luczaj
  2023-07-31 23:49   ` Sean Christopherson
  2023-07-28  0:12 ` [PATCH 2/2] KVM: selftests: Extend x86's sync_regs_test to check for races Michal Luczaj
  2023-08-02 21:11 ` [PATCH 0/2] sync_regs() TOCTOU issues Sean Christopherson
  2 siblings, 1 reply; 24+ messages in thread
From: Michal Luczaj @ 2023-07-28  0:12 UTC (permalink / raw)
  To: seanjc; +Cc: pbonzini, kvm, shuah, Michal Luczaj

In a spirit of using a sledgehammer to crack a nut, make sync_regs() feed
__set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() with kernel's own
copy of data.

Both __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() assume they
have exclusive rights to structs they operate on. While this is true when
coming from an ioctl handler (caller makes a local copy of user's data),
sync_regs() breaks this contract; a pointer to a user-modifiable memory
(vcpu->run->s.regs) is provided. This can lead to a situation when incoming
data is checked and/or sanitized only to be re-set by a user thread running
in parallel.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
A note: when servicing kvm_run->kvm_dirty_regs, changes made by
__set_sregs()/kvm_vcpu_ioctl_x86_set_vcpu_events() to on-stack copies of
vcpu->run.s.regs will not be reflected back in vcpu->run.s.regs. Is this
ok?

 arch/x86/kvm/x86.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7f4246e4255f..eb94081bd7e4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11787,15 +11787,22 @@ static int sync_regs(struct kvm_vcpu *vcpu)
 		__set_regs(vcpu, &vcpu->run->s.regs.regs);
 		vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_REGS;
 	}
+
 	if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_SREGS) {
-		if (__set_sregs(vcpu, &vcpu->run->s.regs.sregs))
+		struct kvm_sregs sregs = vcpu->run->s.regs.sregs;
+
+		if (__set_sregs(vcpu, &sregs))
 			return -EINVAL;
+
 		vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_SREGS;
 	}
+
 	if (vcpu->run->kvm_dirty_regs & KVM_SYNC_X86_EVENTS) {
-		if (kvm_vcpu_ioctl_x86_set_vcpu_events(
-				vcpu, &vcpu->run->s.regs.events))
+		struct kvm_vcpu_events events = vcpu->run->s.regs.events;
+
+		if (kvm_vcpu_ioctl_x86_set_vcpu_events(vcpu, &events))
 			return -EINVAL;
+
 		vcpu->run->kvm_dirty_regs &= ~KVM_SYNC_X86_EVENTS;
 	}
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/2] KVM: selftests: Extend x86's sync_regs_test to check for races
  2023-07-28  0:12 [PATCH 0/2] sync_regs() TOCTOU issues Michal Luczaj
  2023-07-28  0:12 ` [PATCH 1/2] KVM: x86: Fix KVM_CAP_SYNC_REGS's " Michal Luczaj
@ 2023-07-28  0:12 ` Michal Luczaj
  2023-08-02 21:07   ` Sean Christopherson
  2023-08-02 21:11 ` [PATCH 0/2] sync_regs() TOCTOU issues Sean Christopherson
  2 siblings, 1 reply; 24+ messages in thread
From: Michal Luczaj @ 2023-07-28  0:12 UTC (permalink / raw)
  To: seanjc; +Cc: pbonzini, kvm, shuah, Michal Luczaj

Attempt to modify vcpu->run->s.regs _after_ the sanity checks performed by
KVM_CAP_SYNC_REGS's arch/x86/kvm/x86.c:sync_regs(). This could lead to some
nonsensical vCPU states accompanied by kernel splats.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
---
 .../selftests/kvm/x86_64/sync_regs_test.c     | 124 ++++++++++++++++++
 1 file changed, 124 insertions(+)

diff --git a/tools/testing/selftests/kvm/x86_64/sync_regs_test.c b/tools/testing/selftests/kvm/x86_64/sync_regs_test.c
index 2da89fdc2471..feebc7d44c17 100644
--- a/tools/testing/selftests/kvm/x86_64/sync_regs_test.c
+++ b/tools/testing/selftests/kvm/x86_64/sync_regs_test.c
@@ -15,12 +15,14 @@
 #include <stdlib.h>
 #include <string.h>
 #include <sys/ioctl.h>
+#include <pthread.h>
 
 #include "test_util.h"
 #include "kvm_util.h"
 #include "processor.h"
 
 #define UCALL_PIO_PORT ((uint16_t)0x1000)
+#define TIMEOUT	2	/* seconds, roughly */
 
 struct ucall uc_none = {
 	.cmd = UCALL_NONE,
@@ -80,6 +82,124 @@ static void compare_vcpu_events(struct kvm_vcpu_events *left,
 #define TEST_SYNC_FIELDS   (KVM_SYNC_X86_REGS|KVM_SYNC_X86_SREGS|KVM_SYNC_X86_EVENTS)
 #define INVALID_SYNC_FIELD 0x80000000
 
+/*
+ * WARNING: CPU: 0 PID: 1115 at arch/x86/kvm/x86.c:10095 kvm_check_and_inject_events+0x220/0x500 [kvm]
+ *
+ * arch/x86/kvm/x86.c:kvm_check_and_inject_events():
+ * WARN_ON_ONCE(vcpu->arch.exception.injected &&
+ *		vcpu->arch.exception.pending);
+ */
+static void *race_events_inj_pen(void *arg)
+{
+	struct kvm_run *run = (struct kvm_run *)arg;
+	struct kvm_vcpu_events *events = &run->s.regs.events;
+
+	for (;;) {
+		WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_EVENTS);
+		WRITE_ONCE(events->flags, 0);
+		WRITE_ONCE(events->exception.injected, 1);
+		WRITE_ONCE(events->exception.pending, 1);
+
+		pthread_testcancel();
+	}
+
+	return NULL;
+}
+
+/*
+ * WARNING: CPU: 0 PID: 1107 at arch/x86/kvm/x86.c:547 kvm_check_and_inject_events+0x4a0/0x500 [kvm]
+ *
+ * arch/x86/kvm/x86.c:exception_type():
+ * WARN_ON(vector > 31 || vector == NMI_VECTOR)
+ */
+static void *race_events_exc(void *arg)
+{
+	struct kvm_run *run = (struct kvm_run *)arg;
+	struct kvm_vcpu_events *events = &run->s.regs.events;
+
+	for (;;) {
+		WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_EVENTS);
+		WRITE_ONCE(events->flags, 0);
+		WRITE_ONCE(events->exception.pending, 1);
+		WRITE_ONCE(events->exception.nr, 255);
+
+		pthread_testcancel();
+	}
+
+	return NULL;
+}
+
+/*
+ * WARNING: CPU: 0 PID: 1142 at arch/x86/kvm/mmu/paging_tmpl.h:358 paging32_walk_addr_generic+0x431/0x8f0 [kvm]
+ *
+ * arch/x86/kvm/mmu/paging_tmpl.h:
+ * KVM_BUG_ON(is_long_mode(vcpu) && !is_pae(vcpu), vcpu->kvm)
+ */
+static void *race_sregs_cr4(void *arg)
+{
+	struct kvm_run *run = (struct kvm_run *)arg;
+	__u64 *cr4 = &run->s.regs.sregs.cr4;
+	__u64 pae_enabled = *cr4;
+	__u64 pae_disabled = *cr4 & ~X86_CR4_PAE;
+
+	for (;;) {
+		WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_SREGS);
+		WRITE_ONCE(*cr4, pae_enabled);
+		asm volatile(".rept 512\n\t"
+			     "nop\n\t"
+			     ".endr");
+		WRITE_ONCE(*cr4, pae_disabled);
+
+		pthread_testcancel();
+	}
+
+	return NULL;
+}
+
+static void race_sync_regs(void *racer, bool poke_mmu)
+{
+	struct kvm_translation tr;
+	struct kvm_vcpu *vcpu;
+	struct kvm_run *run;
+	struct kvm_vm *vm;
+	pthread_t thread;
+	time_t t;
+
+	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
+	run = vcpu->run;
+
+	run->kvm_valid_regs = KVM_SYNC_X86_SREGS;
+	vcpu_run(vcpu);
+	TEST_REQUIRE(run->s.regs.sregs.cr4 & X86_CR4_PAE);
+	run->kvm_valid_regs = 0;
+
+	ASSERT_EQ(pthread_create(&thread, NULL, racer, (void *)run), 0);
+
+	for (t = time(NULL) + TIMEOUT; time(NULL) < t;) {
+		__vcpu_run(vcpu);
+
+		if (poke_mmu) {
+			tr = (struct kvm_translation) { .linear_address = 0 };
+			__vcpu_ioctl(vcpu, KVM_TRANSLATE, &tr);
+		}
+	}
+
+	ASSERT_EQ(pthread_cancel(thread), 0);
+	ASSERT_EQ(pthread_join(thread, NULL), 0);
+
+	/*
+	 * If kvm->bugged then we won't survive TEST_ASSERT(). Leak.
+	 *
+	 * kvm_vm_free()
+	 *   __vm_mem_region_delete()
+	 *     vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION, &region->region)
+	 *       _vm_ioctl(vm, cmd, #cmd, arg)
+	 *         TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(name, ret))
+	 */
+	if (!poke_mmu)
+		kvm_vm_free(vm);
+}
+
 int main(int argc, char *argv[])
 {
 	struct kvm_vcpu *vcpu;
@@ -218,5 +338,9 @@ int main(int argc, char *argv[])
 
 	kvm_vm_free(vm);
 
+	race_sync_regs(race_sregs_cr4, true);
+	race_sync_regs(race_events_exc, false);
+	race_sync_regs(race_events_inj_pen, false);
+
 	return 0;
 }
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Fix KVM_CAP_SYNC_REGS's sync_regs() TOCTOU issues
  2023-07-28  0:12 ` [PATCH 1/2] KVM: x86: Fix KVM_CAP_SYNC_REGS's " Michal Luczaj
@ 2023-07-31 23:49   ` Sean Christopherson
  2023-08-01 12:37     ` Michal Luczaj
  0 siblings, 1 reply; 24+ messages in thread
From: Sean Christopherson @ 2023-07-31 23:49 UTC (permalink / raw)
  To: Michal Luczaj; +Cc: pbonzini, kvm, shuah

On Fri, Jul 28, 2023, Michal Luczaj wrote:
> In a spirit of using a sledgehammer to crack a nut, make sync_regs() feed
> __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() with kernel's own
> copy of data.
> 
> Both __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() assume they
> have exclusive rights to structs they operate on. While this is true when
> coming from an ioctl handler (caller makes a local copy of user's data),
> sync_regs() breaks this contract; a pointer to a user-modifiable memory
> (vcpu->run->s.regs) is provided. This can lead to a situation when incoming
> data is checked and/or sanitized only to be re-set by a user thread running
> in parallel.

LOL, the really hilarious part is that the guilty,

  Fixes: 01643c51bfcf ("KVM: x86: KVM_CAP_SYNC_REGS")

also added this comment...

  /* kvm_sync_regs struct included by kvm_run struct */
  struct kvm_sync_regs {
	/* Members of this structure are potentially malicious.
	 * Care must be taken by code reading, esp. interpreting,
	 * data fields from them inside KVM to prevent TOCTOU and
	 * double-fetch types of vulnerabilities.
	 */
	struct kvm_regs regs;
	struct kvm_sregs sregs;
	struct kvm_vcpu_events events;
  };

though Radim did remove something so maybe the comment isn't as ironic as it looks.

    [Removed wrapper around check for reserved kvm_valid_regs. - Radim]
    Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

Anyways...

> Signed-off-by: Michal Luczaj <mhal@rbox.co>
> ---
> A note: when servicing kvm_run->kvm_dirty_regs, changes made by
> __set_sregs()/kvm_vcpu_ioctl_x86_set_vcpu_events() to on-stack copies of
> vcpu->run.s.regs will not be reflected back in vcpu->run.s.regs. Is this
> ok?

I would be amazed if anyone cares.  Given the justification and the author,

    This reduces ioctl overhead which is particularly important when userspace
    is making synchronous guest state modifications (e.g. when emulating and/or
    intercepting instructions).
    
    Signed-off-by: Ken Hofsass <hofsass@google.com>

I am pretty sure this was added to optimize a now-abandoned Google effort to do
emulation in uesrspace.  I bring that up because I was going to suggest that we
might be able to get away with a straight revert, as QEMU doesn't use the flag
and AFAICT neither does our VMM, but there are a non-zero number of hits in e.g.
github, so sadly I think we're stuck with the feature :-(

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Fix KVM_CAP_SYNC_REGS's sync_regs() TOCTOU issues
  2023-07-31 23:49   ` Sean Christopherson
@ 2023-08-01 12:37     ` Michal Luczaj
  2023-08-02 19:18       ` Sean Christopherson
  0 siblings, 1 reply; 24+ messages in thread
From: Michal Luczaj @ 2023-08-01 12:37 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: pbonzini, kvm, shuah

On 8/1/23 01:49, Sean Christopherson wrote:
> On Fri, Jul 28, 2023, Michal Luczaj wrote:
>> Both __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() assume they
>> have exclusive rights to structs they operate on. While this is true when
>> coming from an ioctl handler (caller makes a local copy of user's data),
>> sync_regs() breaks this contract; a pointer to a user-modifiable memory
>> (vcpu->run->s.regs) is provided. This can lead to a situation when incoming
>> data is checked and/or sanitized only to be re-set by a user thread running
>> in parallel.
> 
> LOL, the really hilarious part is that the guilty,
> 
>   Fixes: 01643c51bfcf ("KVM: x86: KVM_CAP_SYNC_REGS")
> 
> also added this comment...
> 
>   /* kvm_sync_regs struct included by kvm_run struct */
>   struct kvm_sync_regs {
> 	/* Members of this structure are potentially malicious.
> 	 * Care must be taken by code reading, esp. interpreting,
> 	 * data fields from them inside KVM to prevent TOCTOU and
> 	 * double-fetch types of vulnerabilities.
> 	 */
> 	struct kvm_regs regs;
> 	struct kvm_sregs sregs;
> 	struct kvm_vcpu_events events;
>   };
> 
> though Radim did remove something so maybe the comment isn't as ironic as it looks.
> 
>     [Removed wrapper around check for reserved kvm_valid_regs. - Radim]
>     Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> 
> Anyways...

Nah, from what I can see, it wasn't Radim's tweak that introduced the
TOCTOUs[1].

[1] https://lore.kernel.org/kvm/20180202210434.GC27896@flask/

>> A note: when servicing kvm_run->kvm_dirty_regs, changes made by
>> __set_sregs()/kvm_vcpu_ioctl_x86_set_vcpu_events() to on-stack copies of
>> vcpu->run.s.regs will not be reflected back in vcpu->run.s.regs. Is this
>> ok?
> 
> I would be amazed if anyone cares.  Given the justification and the author,
> 
>     This reduces ioctl overhead which is particularly important when userspace
>     is making synchronous guest state modifications (e.g. when emulating and/or
>     intercepting instructions).
>     
>     Signed-off-by: Ken Hofsass <hofsass@google.com>
> 
> I am pretty sure this was added to optimize a now-abandoned Google effort to do
> emulation in uesrspace.  I bring that up because I was going to suggest that we
> might be able to get away with a straight revert, as QEMU doesn't use the flag
> and AFAICT neither does our VMM, but there are a non-zero number of hits in e.g.
> github, so sadly I think we're stuck with the feature :-(

All right, so assuming the revert is not happening and the API is not misused
(i.e. unless vcpu->run->kvm_valid_regs is set, no one is expecting up to date
values in vcpu->run->s.regs), is assignment copying

	struct kvm_vcpu_events events = vcpu->run->s.regs.events;

the right approach or should it be a memcpy(), like in ioctl handlers?

thanks,
Michal


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Fix KVM_CAP_SYNC_REGS's sync_regs() TOCTOU issues
  2023-08-01 12:37     ` Michal Luczaj
@ 2023-08-02 19:18       ` Sean Christopherson
  2023-08-03  0:13         ` Michal Luczaj
  0 siblings, 1 reply; 24+ messages in thread
From: Sean Christopherson @ 2023-08-02 19:18 UTC (permalink / raw)
  To: Michal Luczaj; +Cc: pbonzini, kvm, shuah

On Tue, Aug 01, 2023, Michal Luczaj wrote:
> On 8/1/23 01:49, Sean Christopherson wrote:
> >> A note: when servicing kvm_run->kvm_dirty_regs, changes made by
> >> __set_sregs()/kvm_vcpu_ioctl_x86_set_vcpu_events() to on-stack copies of
> >> vcpu->run.s.regs will not be reflected back in vcpu->run.s.regs. Is this
> >> ok?
> > 
> > I would be amazed if anyone cares.  Given the justification and the author,
> > 
> >     This reduces ioctl overhead which is particularly important when userspace
> >     is making synchronous guest state modifications (e.g. when emulating and/or
> >     intercepting instructions).
> >     
> >     Signed-off-by: Ken Hofsass <hofsass@google.com>
> > 
> > I am pretty sure this was added to optimize a now-abandoned Google effort to do
> > emulation in uesrspace.  I bring that up because I was going to suggest that we
> > might be able to get away with a straight revert, as QEMU doesn't use the flag
> > and AFAICT neither does our VMM, but there are a non-zero number of hits in e.g.
> > github, so sadly I think we're stuck with the feature :-(
> 
> All right, so assuming the revert is not happening and the API is not misused
> (i.e. unless vcpu->run->kvm_valid_regs is set, no one is expecting up to date
> values in vcpu->run->s.regs), is assignment copying
> 
> 	struct kvm_vcpu_events events = vcpu->run->s.regs.events;
> 
> the right approach or should it be a memcpy(), like in ioctl handlers?

Both approaches are fine, though I am gaining a preference for the copy-by-value
method.  With gcc-12 and probably most compilers, the code generation is identical
for both as the compiler generates a call to memcpy() to handle the the struct
assignment.

The advantage of copy-by-value for structs, and why I think I now prefer it, is
that it provides type safety.  E.g. this compiles without complaint

	memcpy(&events, &vcpu->run->s.regs.sregs, sizeof(events));

whereas this

	struct kvm_vcpu_events events = vcpu->run->s.regs.sregs;

yields

  arch/x86/kvm/x86.c: In function ‘sync_regs’:
  arch/x86/kvm/x86.c:11793:49: error: invalid initializer
  11793 |                 struct kvm_vcpu_events events = vcpu->run->s.regs.sregs;
        |                                                 ^~~~

The downside is that it's less obvious when reading the code that there is a
large-ish memcpy happening, but IMO it's worth gaining the type safety.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] KVM: selftests: Extend x86's sync_regs_test to check for races
  2023-07-28  0:12 ` [PATCH 2/2] KVM: selftests: Extend x86's sync_regs_test to check for races Michal Luczaj
@ 2023-08-02 21:07   ` Sean Christopherson
  2023-08-03  0:44     ` Michal Luczaj
  0 siblings, 1 reply; 24+ messages in thread
From: Sean Christopherson @ 2023-08-02 21:07 UTC (permalink / raw)
  To: Michal Luczaj; +Cc: pbonzini, kvm, shuah

On Fri, Jul 28, 2023, Michal Luczaj wrote:
> Attempt to modify vcpu->run->s.regs _after_ the sanity checks performed by
> KVM_CAP_SYNC_REGS's arch/x86/kvm/x86.c:sync_regs(). This could lead to some
> nonsensical vCPU states accompanied by kernel splats.
> 
> Signed-off-by: Michal Luczaj <mhal@rbox.co>
> ---
>  .../selftests/kvm/x86_64/sync_regs_test.c     | 124 ++++++++++++++++++
>  1 file changed, 124 insertions(+)
> 
> diff --git a/tools/testing/selftests/kvm/x86_64/sync_regs_test.c b/tools/testing/selftests/kvm/x86_64/sync_regs_test.c
> index 2da89fdc2471..feebc7d44c17 100644
> --- a/tools/testing/selftests/kvm/x86_64/sync_regs_test.c
> +++ b/tools/testing/selftests/kvm/x86_64/sync_regs_test.c
> @@ -15,12 +15,14 @@
>  #include <stdlib.h>
>  #include <string.h>
>  #include <sys/ioctl.h>
> +#include <pthread.h>
>  
>  #include "test_util.h"
>  #include "kvm_util.h"
>  #include "processor.h"
>  
>  #define UCALL_PIO_PORT ((uint16_t)0x1000)
> +#define TIMEOUT	2	/* seconds, roughly */

I think it makes sense to make this a const in race_sync_regs(), that way its
usage is a bit more obvious.

>  struct ucall uc_none = {
>  	.cmd = UCALL_NONE,
> @@ -80,6 +82,124 @@ static void compare_vcpu_events(struct kvm_vcpu_events *left,
>  #define TEST_SYNC_FIELDS   (KVM_SYNC_X86_REGS|KVM_SYNC_X86_SREGS|KVM_SYNC_X86_EVENTS)
>  #define INVALID_SYNC_FIELD 0x80000000
>  
> +/*
> + * WARNING: CPU: 0 PID: 1115 at arch/x86/kvm/x86.c:10095 kvm_check_and_inject_events+0x220/0x500 [kvm]
> + *
> + * arch/x86/kvm/x86.c:kvm_check_and_inject_events():
> + * WARN_ON_ONCE(vcpu->arch.exception.injected &&
> + *		vcpu->arch.exception.pending);
> + */

For comments in selftests, describe what's happening without referencing KVM code,
things like this in particular will become stale sooner than later.  It's a-ok
(and encouraged) to put the WARNs and function references in changelogs though,
as those are explicitly tied to a specific time in history.

> +static void race_sync_regs(void *racer, bool poke_mmu)
> +{
> +	struct kvm_translation tr;
> +	struct kvm_vcpu *vcpu;
> +	struct kvm_run *run;
> +	struct kvm_vm *vm;
> +	pthread_t thread;
> +	time_t t;
> +
> +	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
> +	run = vcpu->run;
> +
> +	run->kvm_valid_regs = KVM_SYNC_X86_SREGS;
> +	vcpu_run(vcpu);
> +	TEST_REQUIRE(run->s.regs.sregs.cr4 & X86_CR4_PAE);

This can be an assert, and should also check EFER.LME.  Jump-starting in long mode
is a property of selftests, i.e. not something that should ever randomly "fail".

> +	run->kvm_valid_regs = 0;
> +
> +	ASSERT_EQ(pthread_create(&thread, NULL, racer, (void *)run), 0);
> +
> +	for (t = time(NULL) + TIMEOUT; time(NULL) < t;) {
> +		__vcpu_run(vcpu);
> +
> +		if (poke_mmu) {

Rather than pass a boolean, I think it makes sense to do

		if (racer == race_sregs_cr4)

It's arguably just trading ugliness for subtlety, but IMO it's worth avoiding
the boolean.

> +			tr = (struct kvm_translation) { .linear_address = 0 };
> +			__vcpu_ioctl(vcpu, KVM_TRANSLATE, &tr);
> +		}
> +	}
> +
> +	ASSERT_EQ(pthread_cancel(thread), 0);
> +	ASSERT_EQ(pthread_join(thread, NULL), 0);
> +
> +	/*
> +	 * If kvm->bugged then we won't survive TEST_ASSERT(). Leak.
> +	 *
> +	 * kvm_vm_free()
> +	 *   __vm_mem_region_delete()
> +	 *     vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION, &region->region)
> +	 *       _vm_ioctl(vm, cmd, #cmd, arg)
> +	 *         TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(name, ret))
> +	 */

We want the assert, it makes failures explicit.  The signature is a bit unfortunate,
but the WARN in the kernel log should provide a big clue.

> +	if (!poke_mmu)
> +		kvm_vm_free(vm);
> +}
> +
>  int main(int argc, char *argv[])
>  {
>  	struct kvm_vcpu *vcpu;
> @@ -218,5 +338,9 @@ int main(int argc, char *argv[])
>  
>  	kvm_vm_free(vm);
>  
> +	race_sync_regs(race_sregs_cr4, true);
> +	race_sync_regs(race_events_exc, false);
> +	race_sync_regs(race_events_inj_pen, false);

I'll fix up all of the above when applying, and will also split this into three
patches, mostly so that each splat can be covered in a changelog, i.e. is tied
to its testcase.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] sync_regs() TOCTOU issues
  2023-07-28  0:12 [PATCH 0/2] sync_regs() TOCTOU issues Michal Luczaj
  2023-07-28  0:12 ` [PATCH 1/2] KVM: x86: Fix KVM_CAP_SYNC_REGS's " Michal Luczaj
  2023-07-28  0:12 ` [PATCH 2/2] KVM: selftests: Extend x86's sync_regs_test to check for races Michal Luczaj
@ 2023-08-02 21:11 ` Sean Christopherson
  2023-08-15  0:48   ` Sean Christopherson
  2 siblings, 1 reply; 24+ messages in thread
From: Sean Christopherson @ 2023-08-02 21:11 UTC (permalink / raw)
  To: Sean Christopherson, Michal Luczaj; +Cc: pbonzini, kvm, shuah

On Fri, 28 Jul 2023 02:12:56 +0200, Michal Luczaj wrote:
> Both __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() assume they
> have exclusive rights to structs they operate on. While this is true when
> coming from an ioctl handler (caller makes a local copy of user's data),
> sync_regs() breaks this contract; a pointer to a user-modifiable memory
> (vcpu->run->s.regs) is provided. This can lead to a situation when incoming
> data is checked and/or sanitized only to be re-set by a user thread running
> in parallel.
> 
> [...]

Applied to kvm-x86 selftests (there are in-flight reworks for selftests
that will conflict, and I didn't want to split the testcases from the fix).

As mentioned in my reply to patch 2, I split up the selftests patch and
massaged things a bit.  Please holler if you disagree with any of the
changes.

Thanks much!

[1/4] KVM: x86: Fix KVM_CAP_SYNC_REGS's sync_regs() TOCTOU issues
      https://github.com/kvm-x86/linux/commit/0d033770d43a
[2/4] KVM: selftests: Extend x86's sync_regs_test to check for CR4 races
      https://github.com/kvm-x86/linux/commit/ae895cbe613a
[3/4] KVM: selftests: Extend x86's sync_regs_test to check for event vector races
      https://github.com/kvm-x86/linux/commit/60c4063b4752
[4/4] KVM: selftests: Extend x86's sync_regs_test to check for exception races
      https://github.com/kvm-x86/linux/commit/0de704d2d6c8

--
https://github.com/kvm-x86/linux/tree/next
https://github.com/kvm-x86/linux/tree/fixes

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Fix KVM_CAP_SYNC_REGS's sync_regs() TOCTOU issues
  2023-08-02 19:18       ` Sean Christopherson
@ 2023-08-03  0:13         ` Michal Luczaj
  2023-08-03 17:48           ` Paolo Bonzini
  0 siblings, 1 reply; 24+ messages in thread
From: Michal Luczaj @ 2023-08-03  0:13 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: pbonzini, kvm, shuah

On 8/2/23 21:18, Sean Christopherson wrote:
> On Tue, Aug 01, 2023, Michal Luczaj wrote:
>> All right, so assuming the revert is not happening and the API is not misused
>> (i.e. unless vcpu->run->kvm_valid_regs is set, no one is expecting up to date
>> values in vcpu->run->s.regs), is assignment copying
>>
>> 	struct kvm_vcpu_events events = vcpu->run->s.regs.events;
>>
>> the right approach or should it be a memcpy(), like in ioctl handlers?
> 
> Both approaches are fine, though I am gaining a preference for the copy-by-value
> method.  With gcc-12 and probably most compilers, the code generation is identical
> for both as the compiler generates a call to memcpy() to handle the the struct
> assignment.
> 
> The advantage of copy-by-value for structs, and why I think I now prefer it, is
> that it provides type safety.  E.g. this compiles without complaint
> 
> 	memcpy(&events, &vcpu->run->s.regs.sregs, sizeof(events));
> 
> whereas this
> 
> 	struct kvm_vcpu_events events = vcpu->run->s.regs.sregs;
> 
> yields
> 
>   arch/x86/kvm/x86.c: In function ‘sync_regs’:
>   arch/x86/kvm/x86.c:11793:49: error: invalid initializer
>   11793 |                 struct kvm_vcpu_events events = vcpu->run->s.regs.sregs;
>         |                                                 ^~~~
> 
> The downside is that it's less obvious when reading the code that there is a
> large-ish memcpy happening, but IMO it's worth gaining the type safety.

Sure, that makes sense. I was a bit concerned how a padding within a struct
might affect performance of such copy-by-value, but (obviously?) there's no
padding in kvm_sregs, nor kvm_vcpu_events...

Anyway, while there, could you take a look at __set_sregs_common()?

	*mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
	static_call(kvm_x86_set_cr0)(vcpu, sregs->cr0);
	vcpu->arch.cr0 = sregs->cr0;

That last assignment seems redundant as both vmx_set_cr0() and svm_set_cr0()
take care of it, but I may be missing something (even if selftests pass with
that line removed).

thanks,
Michal


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] KVM: selftests: Extend x86's sync_regs_test to check for races
  2023-08-02 21:07   ` Sean Christopherson
@ 2023-08-03  0:44     ` Michal Luczaj
  2023-08-03 16:41       ` Sean Christopherson
  0 siblings, 1 reply; 24+ messages in thread
From: Michal Luczaj @ 2023-08-03  0:44 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: pbonzini, kvm, shuah

On 8/2/23 23:07, Sean Christopherson wrote:
> On Fri, Jul 28, 2023, Michal Luczaj wrote:
>> +#define TIMEOUT	2	/* seconds, roughly */
> 
> I think it makes sense to make this a const in race_sync_regs(), that way its
> usage is a bit more obvious.

Yeah, sure.

>> +/*
>> + * WARNING: CPU: 0 PID: 1115 at arch/x86/kvm/x86.c:10095 kvm_check_and_inject_events+0x220/0x500 [kvm]
>> + *
>> + * arch/x86/kvm/x86.c:kvm_check_and_inject_events():
>> + * WARN_ON_ONCE(vcpu->arch.exception.injected &&
>> + *		vcpu->arch.exception.pending);
>> + */
> 
> For comments in selftests, describe what's happening without referencing KVM code,
> things like this in particular will become stale sooner than later.  It's a-ok
> (and encouraged) to put the WARNs and function references in changelogs though,
> as those are explicitly tied to a specific time in history.

Right, I'll try to remember. Actually, those comments were notes for myself and
then I've just left them thinking they can't hurt. But I agree that wasn't the
best idea.

>> +static void race_sync_regs(void *racer, bool poke_mmu)
>> +{
>> +	struct kvm_translation tr;
>> +	struct kvm_vcpu *vcpu;
>> +	struct kvm_run *run;
>> +	struct kvm_vm *vm;
>> +	pthread_t thread;
>> +	time_t t;
>> +
>> +	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
>> +	run = vcpu->run;
>> +
>> +	run->kvm_valid_regs = KVM_SYNC_X86_SREGS;
>> +	vcpu_run(vcpu);
>> +	TEST_REQUIRE(run->s.regs.sregs.cr4 & X86_CR4_PAE);
> 
> This can be an assert, and should also check EFER.LME.  Jump-starting in long mode
> is a property of selftests, i.e. not something that should ever randomly "fail".

Right, sorry for the misuse.

>> +	run->kvm_valid_regs = 0;
>> +
>> +	ASSERT_EQ(pthread_create(&thread, NULL, racer, (void *)run), 0);
>> +
>> +	for (t = time(NULL) + TIMEOUT; time(NULL) < t;) {
>> +		__vcpu_run(vcpu);
>> +
>> +		if (poke_mmu) {
> 
> Rather than pass a boolean, I think it makes sense to do
> 
> 		if (racer == race_sregs_cr4)
> 
> It's arguably just trading ugliness for subtlety, but IMO it's worth avoiding
> the boolean.

Ah, ok.

>> +	/*
>> +	 * If kvm->bugged then we won't survive TEST_ASSERT(). Leak.
>> +	 *
>> +	 * kvm_vm_free()
>> +	 *   __vm_mem_region_delete()
>> +	 *     vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION, &region->region)
>> +	 *       _vm_ioctl(vm, cmd, #cmd, arg)
>> +	 *         TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(name, ret))
>> +	 */
> 
> We want the assert, it makes failures explicit.  The signature is a bit unfortunate,
> but the WARN in the kernel log should provide a big clue.

Sure, I get it. And not that there is a way to check if VM is bugged/dead?

> I'll fix up all of the above when applying, and will also split this into three
> patches, mostly so that each splat can be covered in a changelog, i.e. is tied
> to its testcase.

Great, thank you for all the comments and fixes!

Michal


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] KVM: selftests: Extend x86's sync_regs_test to check for races
  2023-08-03  0:44     ` Michal Luczaj
@ 2023-08-03 16:41       ` Sean Christopherson
  2023-08-03 21:14         ` Michal Luczaj
  0 siblings, 1 reply; 24+ messages in thread
From: Sean Christopherson @ 2023-08-03 16:41 UTC (permalink / raw)
  To: Michal Luczaj; +Cc: pbonzini, kvm, shuah

On Thu, Aug 03, 2023, Michal Luczaj wrote:
> On 8/2/23 23:07, Sean Christopherson wrote:
> > On Fri, Jul 28, 2023, Michal Luczaj wrote:
> >> +	/*
> >> +	 * If kvm->bugged then we won't survive TEST_ASSERT(). Leak.
> >> +	 *
> >> +	 * kvm_vm_free()
> >> +	 *   __vm_mem_region_delete()
> >> +	 *     vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION, &region->region)
> >> +	 *       _vm_ioctl(vm, cmd, #cmd, arg)
> >> +	 *         TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(name, ret))
> >> +	 */
> > 
> > We want the assert, it makes failures explicit.  The signature is a bit unfortunate,
> > but the WARN in the kernel log should provide a big clue.
> 
> Sure, I get it. And not that there is a way to check if VM is bugged/dead?

KVM doesn't expost the bugged/dead information, though I suppose userspace could
probe that information by doing an ioctl() that is guaranteed to succeeed and
looking for -EIO, e.g. KVM_CHECK_EXTENSION on the VM.

I was going to say that it's not worth trying to detect a bugged/dead VM in
selftests, because it requires having the pointer to the VM, and that's not
typically available when an assert fails, but the obviously solution is to tap
into the VM and vCPU ioctl() helpers.  That's also good motivation to add helpers
and consolidate asserts for ioctls() that return fds, i.e. for which a positive
return is considered success.

With the below (partial conversion), the failing testcase yields this.  Using a
heuristic isn't ideal, but practically speaking I can't see a way for the -EIO
check to go awry, and anything to make debugging errors easier is definitely worth
doing IMO.

==== Test Assertion Failure ====
  lib/kvm_util.c:689: false
  pid=80347 tid=80347 errno=5 - Input/output error
     1	0x00000000004039ab: __vm_mem_region_delete at kvm_util.c:689 (discriminator 5)
     2	0x0000000000404660: kvm_vm_free at kvm_util.c:724 (discriminator 12)
     3	0x0000000000402ac9: race_sync_regs at sync_regs_test.c:193
     4	0x0000000000401cb7: main at sync_regs_test.c:334 (discriminator 6)
     5	0x0000000000418263: __libc_start_call_main at libc-start.o:?
     6	0x00000000004198af: __libc_start_main_impl at ??:?
     7	0x0000000000401d90: _start at ??:?
  KVM killed/bugged the VM, check kernel log for clues


diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 07732a157ccd..e48ac57be13a 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -258,17 +258,42 @@ static __always_inline void static_assert_is_vm(struct kvm_vm *vm) { }
        kvm_do_ioctl((vm)->fd, cmd, arg);                       \
 })
 
+/*
+ * Assert that a VM or vCPU ioctl() succeeded (obviously), with extra magic to
+ * detect if the ioctl() failed because KVM killed/bugged the VM.  To detect a
+ * dead VM, probe KVM_CAP_USER_MEMORY, which (a) has been supported by KVM
+ * since before selftests existed and (b) should never outright fail, i.e. is
+ * supposed to return 0 or 1.  If KVM kills a VM, KVM returns -EIO for all
+ * ioctl()s for the VM and its vCPUs, including KVM_CHECK_EXTENSION.
+ */
+#define TEST_ASSERT_VM_VCPU_IOCTL_SUCCESS(name, ret, vm)                               \
+do {                                                                                   \
+       int __errno = errno;                                                            \
+                                                                                       \
+       static_assert_is_vm(vm);                                                        \
+                                                                                       \
+       if (!ret)                                                                       \
+               break;                                                                  \
+                                                                                       \
+       if (errno == EIO &&                                                             \
+           __vm_ioctl(vm, KVM_CHECK_EXTENSION, (void *)KVM_CAP_USER_MEMORY) < 0) {     \
+               TEST_ASSERT(errno == EIO, "KVM killed the VM, should return -EIO");     \
+               TEST_FAIL("KVM killed/bugged the VM, check kernel log for clues");      \
+       }                                                                               \
+       errno = __errno;                                                                \
+       TEST_FAIL(__KVM_IOCTL_ERROR(name, ret));                                        \
+} while (0)
+
 #define _vm_ioctl(vm, cmd, name, arg)                          \
 ({                                                             \
        int ret = __vm_ioctl(vm, cmd, arg);                     \
                                                                \
-       TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(name, ret));        \
+       TEST_ASSERT_VM_VCPU_IOCTL_SUCCESS(name, ret, vm);       \
 })
 
 #define vm_ioctl(vm, cmd, arg)                                 \
        _vm_ioctl(vm, cmd, #cmd, arg)
 
-
 static __always_inline void static_assert_is_vcpu(struct kvm_vcpu *vcpu) { }
 
 #define __vcpu_ioctl(vcpu, cmd, arg)                           \
@@ -281,7 +306,7 @@ static __always_inline void static_assert_is_vcpu(struct kvm_vcpu *vcpu) { }
 ({                                                             \
        int ret = __vcpu_ioctl(vcpu, cmd, arg);                 \
                                                                \
-       TEST_ASSERT(!ret, __KVM_IOCTL_ERROR(name, ret));        \
+       TEST_ASSERT_VM_VCPU_IOCTL_SUCCESS(name, ret, vcpu->vm); \
 })
 
 #define vcpu_ioctl(vcpu, cmd, arg)                             \


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Fix KVM_CAP_SYNC_REGS's sync_regs() TOCTOU issues
  2023-08-03  0:13         ` Michal Luczaj
@ 2023-08-03 17:48           ` Paolo Bonzini
  2023-08-03 21:15             ` Michal Luczaj
  0 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2023-08-03 17:48 UTC (permalink / raw)
  To: Michal Luczaj, Sean Christopherson; +Cc: kvm, shuah

On 8/3/23 02:13, Michal Luczaj wrote:
> Anyway, while there, could you take a look at __set_sregs_common()?
> 
> 	*mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
> 	static_call(kvm_x86_set_cr0)(vcpu, sregs->cr0);
> 	vcpu->arch.cr0 = sregs->cr0;
> 
> That last assignment seems redundant as both vmx_set_cr0() and svm_set_cr0()
> take care of it, but I may be missing something (even if selftests pass with
> that line removed).

kvm_set_cr0 assumes that the static call sets vcpu->arch.cr0, so indeed 
it can be removed:

         static_call(kvm_x86_set_cr0)(vcpu, cr0);
         kvm_post_set_cr0(vcpu, old_cr0, cr0);
         return 0;

Neither __set_sregs_common nor its callers does not call 
kvm_post_set_cr0...  Not great, even though most uses of KVM_SET_SREGS 
are probably limited to reset in most "usual" VMMs.  It's probably 
enough to replace this line:

         *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;

with a call to the function just before __set_sregs_common returns.

Paolo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] KVM: selftests: Extend x86's sync_regs_test to check for races
  2023-08-03 16:41       ` Sean Christopherson
@ 2023-08-03 21:14         ` Michal Luczaj
  2023-08-08 23:11           ` Sean Christopherson
  0 siblings, 1 reply; 24+ messages in thread
From: Michal Luczaj @ 2023-08-03 21:14 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: pbonzini, kvm, shuah

On 8/3/23 18:41, Sean Christopherson wrote:
> KVM doesn't expost the bugged/dead information, though I suppose userspace could
> probe that information by doing an ioctl() that is guaranteed to succeeed and
> looking for -EIO, e.g. KVM_CHECK_EXTENSION on the VM.
> 
> I was going to say that it's not worth trying to detect a bugged/dead VM in
> selftests, because it requires having the pointer to the VM, and that's not
> typically available when an assert fails, but the obviously solution is to tap
> into the VM and vCPU ioctl() helpers.  That's also good motivation to add helpers
> and consolidate asserts for ioctls() that return fds, i.e. for which a positive
> return is considered success.
> 
> With the below (partial conversion), the failing testcase yields this.  Using a
> heuristic isn't ideal, but practically speaking I can't see a way for the -EIO
> check to go awry, and anything to make debugging errors easier is definitely worth
> doing IMO.
> 
> ==== Test Assertion Failure ====
>   lib/kvm_util.c:689: false
>   pid=80347 tid=80347 errno=5 - Input/output error
>      1	0x00000000004039ab: __vm_mem_region_delete at kvm_util.c:689 (discriminator 5)
>      2	0x0000000000404660: kvm_vm_free at kvm_util.c:724 (discriminator 12)
>      3	0x0000000000402ac9: race_sync_regs at sync_regs_test.c:193
>      4	0x0000000000401cb7: main at sync_regs_test.c:334 (discriminator 6)
>      5	0x0000000000418263: __libc_start_call_main at libc-start.o:?
>      6	0x00000000004198af: __libc_start_main_impl at ??:?
>      7	0x0000000000401d90: _start at ??:?
>   KVM killed/bugged the VM, check kernel log for clues

Yes, such automatic reporting of dead VMs is a really nice feature.

> diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
> index 07732a157ccd..e48ac57be13a 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util_base.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
> @@ -258,17 +258,42 @@ static __always_inline void static_assert_is_vm(struct kvm_vm *vm) { }
>         kvm_do_ioctl((vm)->fd, cmd, arg);                       \
>  })
>  
> +/*
> + * Assert that a VM or vCPU ioctl() succeeded (obviously), with extra magic to
> + * detect if the ioctl() failed because KVM killed/bugged the VM.  To detect a
> + * dead VM, probe KVM_CAP_USER_MEMORY, which (a) has been supported by KVM
> + * since before selftests existed and (b) should never outright fail, i.e. is
> + * supposed to return 0 or 1.  If KVM kills a VM, KVM returns -EIO for all
> + * ioctl()s for the VM and its vCPUs, including KVM_CHECK_EXTENSION.
> + */

Do you think it's worth mentioning the ioctl() always returning -EIO in case of
kvm->mm != current->mm? I suppose that's something purely hypothetical in this
context.

thanks,
Michal


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Fix KVM_CAP_SYNC_REGS's sync_regs() TOCTOU issues
  2023-08-03 17:48           ` Paolo Bonzini
@ 2023-08-03 21:15             ` Michal Luczaj
  2023-08-04  9:53               ` Paolo Bonzini
  0 siblings, 1 reply; 24+ messages in thread
From: Michal Luczaj @ 2023-08-03 21:15 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson; +Cc: kvm, shuah

On 8/3/23 19:48, Paolo Bonzini wrote:
> On 8/3/23 02:13, Michal Luczaj wrote:
>> Anyway, while there, could you take a look at __set_sregs_common()?
>>
>> 	*mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
>> 	static_call(kvm_x86_set_cr0)(vcpu, sregs->cr0);
>> 	vcpu->arch.cr0 = sregs->cr0;
>>
>> That last assignment seems redundant as both vmx_set_cr0() and svm_set_cr0()
>> take care of it, but I may be missing something (even if selftests pass with
>> that line removed).
> 
> kvm_set_cr0 assumes that the static call sets vcpu->arch.cr0, so indeed 
> it can be removed:

I guess the same can be done in enter_smm()?

	cr0 = vcpu->arch.cr0 & ~(X86_CR0_PE | X86_CR0_EM | X86_CR0_TS | X86_CR0_PG);
	static_call(kvm_x86_set_cr0)(vcpu, cr0);
	vcpu->arch.cr0 = cr0;

> Neither __set_sregs_common nor its callers does not call 
> kvm_post_set_cr0...  Not great, even though most uses of KVM_SET_SREGS 
> are probably limited to reset in most "usual" VMMs.  It's probably 
> enough to replace this line:
> 
>          *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
> 
> with a call to the function just before __set_sregs_common returns.

What about kvm_post_set_cr4() then? Should it be introduced to
__set_sregs_common() as well?

thanks,
Michal


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Fix KVM_CAP_SYNC_REGS's sync_regs() TOCTOU issues
  2023-08-03 21:15             ` Michal Luczaj
@ 2023-08-04  9:53               ` Paolo Bonzini
  2023-08-04 17:50                 ` Michal Luczaj
  0 siblings, 1 reply; 24+ messages in thread
From: Paolo Bonzini @ 2023-08-04  9:53 UTC (permalink / raw)
  To: Michal Luczaj, Sean Christopherson; +Cc: kvm, shuah

On 8/3/23 23:15, Michal Luczaj wrote:
>>           *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
>>
>> with a call to the function just before __set_sregs_common returns.
> What about kvm_post_set_cr4() then? Should it be introduced to
> __set_sregs_common() as well?

Yes, indeed, but it starts getting a bit unwieldy.

If we decide not to particularly optimize KVM_SYNC_X86_SREGS, however, 
we can just chuck a KVM_REQ_TLB_FLUSH_GUEST request after __set_sregs 
and __set_sregs2 call kvm_mmu_reset_context().

Paolo


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Fix KVM_CAP_SYNC_REGS's sync_regs() TOCTOU issues
  2023-08-04  9:53               ` Paolo Bonzini
@ 2023-08-04 17:50                 ` Michal Luczaj
  2023-08-14 22:29                   ` Michal Luczaj
  0 siblings, 1 reply; 24+ messages in thread
From: Michal Luczaj @ 2023-08-04 17:50 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson; +Cc: kvm, shuah

On 8/4/23 11:53, Paolo Bonzini wrote:
> On 8/3/23 23:15, Michal Luczaj wrote:
>>>           *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
>>>
>>> with a call to the function just before __set_sregs_common returns.
>> What about kvm_post_set_cr4() then? Should it be introduced to
>> __set_sregs_common() as well?
> 
> Yes, indeed, but it starts getting a bit unwieldy.
> 
> If we decide not to particularly optimize KVM_SYNC_X86_SREGS, however, 
> we can just chuck a KVM_REQ_TLB_FLUSH_GUEST request after __set_sregs 
> and __set_sregs2 call kvm_mmu_reset_context().

Something like this?

@@ -11562,8 +11562,10 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
        if (ret)
                return ret;

-       if (mmu_reset_needed)
+       if (mmu_reset_needed) {
                kvm_mmu_reset_context(vcpu);
+               kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+       }

        max_bits = KVM_NR_INTERRUPTS;
        pending_vec = find_first_bit(
@@ -11604,8 +11606,10 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
                mmu_reset_needed = 1;
                vcpu->arch.pdptrs_from_userspace = true;
        }
-       if (mmu_reset_needed)
+       if (mmu_reset_needed) {
                kvm_mmu_reset_context(vcpu);
+               kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
+       }
        return 0;
 }


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/2] KVM: selftests: Extend x86's sync_regs_test to check for races
  2023-08-03 21:14         ` Michal Luczaj
@ 2023-08-08 23:11           ` Sean Christopherson
  0 siblings, 0 replies; 24+ messages in thread
From: Sean Christopherson @ 2023-08-08 23:11 UTC (permalink / raw)
  To: Michal Luczaj; +Cc: pbonzini, kvm, shuah

On Thu, Aug 03, 2023, Michal Luczaj wrote:
> On 8/3/23 18:41, Sean Christopherson wrote:
> > +/*
> > + * Assert that a VM or vCPU ioctl() succeeded (obviously), with extra magic to
> > + * detect if the ioctl() failed because KVM killed/bugged the VM.  To detect a
> > + * dead VM, probe KVM_CAP_USER_MEMORY, which (a) has been supported by KVM
> > + * since before selftests existed and (b) should never outright fail, i.e. is
> > + * supposed to return 0 or 1.  If KVM kills a VM, KVM returns -EIO for all
> > + * ioctl()s for the VM and its vCPUs, including KVM_CHECK_EXTENSION.
> > + */
> 
> Do you think it's worth mentioning the ioctl() always returning -EIO in case of
> kvm->mm != current->mm? I suppose that's something purely hypothetical in this
> context.

Hmm, probably not?  Practically speaking, that scenario should really only ever
happen when someone is developing a new selftest.  Though I suppose a blurb in
the comment wouldn't hurt.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/2] KVM: x86: Fix KVM_CAP_SYNC_REGS's sync_regs() TOCTOU issues
  2023-08-04 17:50                 ` Michal Luczaj
@ 2023-08-14 22:29                   ` Michal Luczaj
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Luczaj @ 2023-08-14 22:29 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson; +Cc: kvm, shuah

On 8/4/23 19:50, Michal Luczaj wrote:
> On 8/4/23 11:53, Paolo Bonzini wrote:
>> On 8/3/23 23:15, Michal Luczaj wrote:
>>>>           *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0;
>>>>
>>>> with a call to the function just before __set_sregs_common returns.
>>> What about kvm_post_set_cr4() then? Should it be introduced to
>>> __set_sregs_common() as well?
>>
>> Yes, indeed, but it starts getting a bit unwieldy.
>>
>> If we decide not to particularly optimize KVM_SYNC_X86_SREGS, however, 
>> we can just chuck a KVM_REQ_TLB_FLUSH_GUEST request after __set_sregs 
>> and __set_sregs2 call kvm_mmu_reset_context().
> 
> Something like this?
> 
> @@ -11562,8 +11562,10 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
>         if (ret)
>                 return ret;
> 
> -       if (mmu_reset_needed)
> +       if (mmu_reset_needed) {
>                 kvm_mmu_reset_context(vcpu);
> +               kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> +       }
> 
>         max_bits = KVM_NR_INTERRUPTS;
>         pending_vec = find_first_bit(
> @@ -11604,8 +11606,10 @@ static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2)
>                 mmu_reset_needed = 1;
>                 vcpu->arch.pdptrs_from_userspace = true;
>         }
> -       if (mmu_reset_needed)
> +       if (mmu_reset_needed) {
>                 kvm_mmu_reset_context(vcpu);
> +               kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
> +       }
>         return 0;
>  }

I guess I'll just post a patch then. There it is:
https://lore.kernel.org/kvm/20230814222358.707877-1-mhal@rbox.co/

thanks,
Michal


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] sync_regs() TOCTOU issues
  2023-08-02 21:11 ` [PATCH 0/2] sync_regs() TOCTOU issues Sean Christopherson
@ 2023-08-15  0:48   ` Sean Christopherson
  2023-08-15  7:37     ` Michal Luczaj
  0 siblings, 1 reply; 24+ messages in thread
From: Sean Christopherson @ 2023-08-15  0:48 UTC (permalink / raw)
  To: Michal Luczaj; +Cc: pbonzini, kvm, shuah

On Wed, Aug 02, 2023, Sean Christopherson wrote:
> On Fri, 28 Jul 2023 02:12:56 +0200, Michal Luczaj wrote:
> > Both __set_sregs() and kvm_vcpu_ioctl_x86_set_vcpu_events() assume they
> > have exclusive rights to structs they operate on. While this is true when
> > coming from an ioctl handler (caller makes a local copy of user's data),
> > sync_regs() breaks this contract; a pointer to a user-modifiable memory
> > (vcpu->run->s.regs) is provided. This can lead to a situation when incoming
> > data is checked and/or sanitized only to be re-set by a user thread running
> > in parallel.
> > 
> > [...]
> 
> Applied to kvm-x86 selftests (there are in-flight reworks for selftests
> that will conflict, and I didn't want to split the testcases from the fix).
> 
> As mentioned in my reply to patch 2, I split up the selftests patch and
> massaged things a bit.  Please holler if you disagree with any of the
> changes.
> 
> Thanks much!
> 
> [1/4] KVM: x86: Fix KVM_CAP_SYNC_REGS's sync_regs() TOCTOU issues
>       https://github.com/kvm-x86/linux/commit/0d033770d43a
> [2/4] KVM: selftests: Extend x86's sync_regs_test to check for CR4 races
>       https://github.com/kvm-x86/linux/commit/ae895cbe613a
> [3/4] KVM: selftests: Extend x86's sync_regs_test to check for event vector races
>       https://github.com/kvm-x86/linux/commit/60c4063b4752
> [4/4] KVM: selftests: Extend x86's sync_regs_test to check for exception races
>       https://github.com/kvm-x86/linux/commit/0de704d2d6c8

Argh, apparently I didn't run these on AMD.  The exception injection test hangs
because the vCPU hits triple fault shutdown, and because the VMCB is technically
undefined on shutdown, KVM synthesizes INIT.  That starts the vCPU at the reset
vector and it happily fetches zeroes util being killed.

This fixes the issue, and I confirmed all three testcases repro the KVM bug with
it.  I'll post formally tomorrow.

---
 .../testing/selftests/kvm/x86_64/sync_regs_test.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/x86_64/sync_regs_test.c b/tools/testing/selftests/kvm/x86_64/sync_regs_test.c
index 93fac74ca0a7..55e9b68e6947 100644
--- a/tools/testing/selftests/kvm/x86_64/sync_regs_test.c
+++ b/tools/testing/selftests/kvm/x86_64/sync_regs_test.c
@@ -94,6 +94,7 @@ static void *race_events_inj_pen(void *arg)
 	for (;;) {
 		WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_EVENTS);
 		WRITE_ONCE(events->flags, 0);
+		WRITE_ONCE(events->exception.nr, GP_VECTOR);
 		WRITE_ONCE(events->exception.injected, 1);
 		WRITE_ONCE(events->exception.pending, 1);
 
@@ -115,6 +116,7 @@ static void *race_events_exc(void *arg)
 	for (;;) {
 		WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_EVENTS);
 		WRITE_ONCE(events->flags, 0);
+		WRITE_ONCE(events->exception.nr, GP_VECTOR);
 		WRITE_ONCE(events->exception.pending, 1);
 		WRITE_ONCE(events->exception.nr, 255);
 
@@ -152,6 +154,7 @@ static noinline void *race_sregs_cr4(void *arg)
 static void race_sync_regs(void *racer)
 {
 	const time_t TIMEOUT = 2; /* seconds, roughly */
+	struct kvm_x86_state *state;
 	struct kvm_translation tr;
 	struct kvm_vcpu *vcpu;
 	struct kvm_run *run;
@@ -178,8 +181,17 @@ static void race_sync_regs(void *racer)
 
 	TEST_ASSERT_EQ(pthread_create(&thread, NULL, racer, (void *)run), 0);
 
+	state = vcpu_save_state(vcpu);
+
 	for (t = time(NULL) + TIMEOUT; time(NULL) < t;) {
-		__vcpu_run(vcpu);
+		/*
+		 * Reload known good state if the vCPU triple faults, e.g. due
+		 * to the unhandled #GPs being injected.  VMX preserves state
+		 * on shutdown, but SVM synthesizes an INIT as the VMCB state
+		 * is architecturally undefined on triple fault.
+		 */
+		if (!__vcpu_run(vcpu) && run->exit_reason == KVM_EXIT_SHUTDOWN)
+			vcpu_load_state(vcpu, state);
 
 		if (racer == race_sregs_cr4) {
 			tr = (struct kvm_translation) { .linear_address = 0 };
@@ -190,6 +202,7 @@ static void race_sync_regs(void *racer)
 	TEST_ASSERT_EQ(pthread_cancel(thread), 0);
 	TEST_ASSERT_EQ(pthread_join(thread, NULL), 0);
 
+	kvm_x86_state_cleanup(state);
 	kvm_vm_free(vm);
 }
 

base-commit: 722b2afc50abbfaa74accbc52911f9b5e8719c95
-- 


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] sync_regs() TOCTOU issues
  2023-08-15  0:48   ` Sean Christopherson
@ 2023-08-15  7:37     ` Michal Luczaj
  2023-08-15 15:40       ` Sean Christopherson
  0 siblings, 1 reply; 24+ messages in thread
From: Michal Luczaj @ 2023-08-15  7:37 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: pbonzini, kvm, shuah

On 8/15/23 02:48, Sean Christopherson wrote:
> ...
> Argh, apparently I didn't run these on AMD.  The exception injection test hangs
> because the vCPU hits triple fault shutdown, and because the VMCB is technically
> undefined on shutdown, KVM synthesizes INIT.  That starts the vCPU at the reset
> vector and it happily fetches zeroes util being killed.

Thank you for getting this. I should have mentioned, due to lack of access to
AMD hardware, I've only tested on Intel.

> @@ -115,6 +116,7 @@ static void *race_events_exc(void *arg)
>  	for (;;) {
>  		WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_EVENTS);
>  		WRITE_ONCE(events->flags, 0);
> +		WRITE_ONCE(events->exception.nr, GP_VECTOR);
>  		WRITE_ONCE(events->exception.pending, 1);
>  		WRITE_ONCE(events->exception.nr, 255);

Here you're setting events->exception.nr twice. Is it deliberate?

Thanks again,
Michal


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] sync_regs() TOCTOU issues
  2023-08-15  7:37     ` Michal Luczaj
@ 2023-08-15 15:40       ` Sean Christopherson
  2023-08-15 17:49         ` Michal Luczaj
  0 siblings, 1 reply; 24+ messages in thread
From: Sean Christopherson @ 2023-08-15 15:40 UTC (permalink / raw)
  To: Michal Luczaj; +Cc: pbonzini, kvm, shuah

On Tue, Aug 15, 2023, Michal Luczaj wrote:
> On 8/15/23 02:48, Sean Christopherson wrote:
> > ...
> > Argh, apparently I didn't run these on AMD.  The exception injection test hangs
> > because the vCPU hits triple fault shutdown, and because the VMCB is technically
> > undefined on shutdown, KVM synthesizes INIT.  That starts the vCPU at the reset
> > vector and it happily fetches zeroes util being killed.
> 
> Thank you for getting this. I should have mentioned, due to lack of access to
> AMD hardware, I've only tested on Intel.
> 
> > @@ -115,6 +116,7 @@ static void *race_events_exc(void *arg)
> >  	for (;;) {
> >  		WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_EVENTS);
> >  		WRITE_ONCE(events->flags, 0);
> > +		WRITE_ONCE(events->exception.nr, GP_VECTOR);
> >  		WRITE_ONCE(events->exception.pending, 1);
> >  		WRITE_ONCE(events->exception.nr, 255);
> 
> Here you're setting events->exception.nr twice. Is it deliberate?

Heh, yes and no.  It's partly leftover from a brief attempt to gracefully eat the
fault in the guest.

However, unless there's magic I'm missing, race_events_exc() needs to set a "good"
vector in every iteration, otherwise only the first iteration will be able to hit
the "check good, consume bad" scenario.

For race_events_inj_pen(), it should be sufficient to set the vector just once,
outside of the loop.  I do think it should be explicitly set, as subtly relying
on '0' being a valid exception is a bit mean (though it does work).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] sync_regs() TOCTOU issues
  2023-08-15 15:40       ` Sean Christopherson
@ 2023-08-15 17:49         ` Michal Luczaj
  2023-08-15 18:15           ` Sean Christopherson
  0 siblings, 1 reply; 24+ messages in thread
From: Michal Luczaj @ 2023-08-15 17:49 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: pbonzini, kvm, shuah

On 8/15/23 17:40, Sean Christopherson wrote:
> On Tue, Aug 15, 2023, Michal Luczaj wrote:
>>> @@ -115,6 +116,7 @@ static void *race_events_exc(void *arg)
>>>  	for (;;) {
>>>  		WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_EVENTS);
>>>  		WRITE_ONCE(events->flags, 0);
>>> +		WRITE_ONCE(events->exception.nr, GP_VECTOR);
>>>  		WRITE_ONCE(events->exception.pending, 1);
>>>  		WRITE_ONCE(events->exception.nr, 255);
>>
>> Here you're setting events->exception.nr twice. Is it deliberate?
> 
> Heh, yes and no.  It's partly leftover from a brief attempt to gracefully eat the
> fault in the guest.
> 
> However, unless there's magic I'm missing, race_events_exc() needs to set a "good"
> vector in every iteration, otherwise only the first iteration will be able to hit
> the "check good, consume bad" scenario.

I think I understand what you mean. I see things slightly different: because

	if (events->flags & KVM_VCPUEVENT_VALID_PAYLOAD) {
		...
	} else {
		events->exception.pending = 0;
		events->exception_has_payload = 0;
	}

zeroes exception.pending on every iteration, even though exception.nr may
already be > 31, KVM does not necessary return -EINVAL at

	if ((events->exception.injected || events->exception.pending) &&
	    (events->exception.nr > 31 || events->exception.nr == NMI_VECTOR))
		return -EINVAL;

It would if the racer set exception.pending before this check, but if it does it
after the check, then KVM goes

	vcpu->arch.exception.pending = events->exception.pending;
	vcpu->arch.exception.vector = events->exception.nr;

which later triggers the WARN. That said, if I you think setting and re-setting
exception.nr is more efficient (as in: racy), I'm all for it.

> For race_events_inj_pen(), it should be sufficient to set the vector just once,
> outside of the loop.  I do think it should be explicitly set, as subtly relying
> on '0' being a valid exception is a bit mean (though it does work).

Sure, I get it.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] sync_regs() TOCTOU issues
  2023-08-15 17:49         ` Michal Luczaj
@ 2023-08-15 18:15           ` Sean Christopherson
  2023-08-15 18:38             ` Michal Luczaj
  0 siblings, 1 reply; 24+ messages in thread
From: Sean Christopherson @ 2023-08-15 18:15 UTC (permalink / raw)
  To: Michal Luczaj; +Cc: pbonzini, kvm, shuah

On Tue, Aug 15, 2023, Michal Luczaj wrote:
> On 8/15/23 17:40, Sean Christopherson wrote:
> > On Tue, Aug 15, 2023, Michal Luczaj wrote:
> >>> @@ -115,6 +116,7 @@ static void *race_events_exc(void *arg)
> >>>  	for (;;) {
> >>>  		WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_EVENTS);
> >>>  		WRITE_ONCE(events->flags, 0);
> >>> +		WRITE_ONCE(events->exception.nr, GP_VECTOR);
> >>>  		WRITE_ONCE(events->exception.pending, 1);
> >>>  		WRITE_ONCE(events->exception.nr, 255);
> >>
> >> Here you're setting events->exception.nr twice. Is it deliberate?
> > 
> > Heh, yes and no.  It's partly leftover from a brief attempt to gracefully eat the
> > fault in the guest.
> > 
> > However, unless there's magic I'm missing, race_events_exc() needs to set a "good"
> > vector in every iteration, otherwise only the first iteration will be able to hit
> > the "check good, consume bad" scenario.
> 
> I think I understand what you mean. I see things slightly different: because
> 
> 	if (events->flags & KVM_VCPUEVENT_VALID_PAYLOAD) {
> 		...
> 	} else {
> 		events->exception.pending = 0;
> 		events->exception_has_payload = 0;
> 	}
> 
> zeroes exception.pending on every iteration, even though exception.nr may
> already be > 31, KVM does not necessary return -EINVAL at
> 
> 	if ((events->exception.injected || events->exception.pending) &&
> 	    (events->exception.nr > 31 || events->exception.nr == NMI_VECTOR))
> 		return -EINVAL;
> 
> It would if the racer set exception.pending before this check, but if it does it
> after the check, then KVM goes
> 
> 	vcpu->arch.exception.pending = events->exception.pending;
> 	vcpu->arch.exception.vector = events->exception.nr;
> 
> which later triggers the WARN. That said, if I you think setting and re-setting
> exception.nr is more efficient (as in: racy), I'm all for it.

My goal isn't to make it easier to hit the *known* TOCTOU, it's to make the test
more valuable after that known bug has been fixed.  I.e. I don't want to rely on
KVM to update kvm_run (which was arguably a bug even if there weren't a TOCTOU
issue).  It's kinda silly, because realistically this test is likely only ever
going to find TOCTOU bugs, but so long as the test can consistently the known bug,
my preference is to make it as "generic" as possible from a coverage perspective.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/2] sync_regs() TOCTOU issues
  2023-08-15 18:15           ` Sean Christopherson
@ 2023-08-15 18:38             ` Michal Luczaj
  0 siblings, 0 replies; 24+ messages in thread
From: Michal Luczaj @ 2023-08-15 18:38 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: pbonzini, kvm, shuah

On 8/15/23 20:15, Sean Christopherson wrote:
> On Tue, Aug 15, 2023, Michal Luczaj wrote:
>> On 8/15/23 17:40, Sean Christopherson wrote:
>>> On Tue, Aug 15, 2023, Michal Luczaj wrote:
>>>>> @@ -115,6 +116,7 @@ static void *race_events_exc(void *arg)
>>>>>  	for (;;) {
>>>>>  		WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_EVENTS);
>>>>>  		WRITE_ONCE(events->flags, 0);
>>>>> +		WRITE_ONCE(events->exception.nr, GP_VECTOR);
>>>>>  		WRITE_ONCE(events->exception.pending, 1);
>>>>>  		WRITE_ONCE(events->exception.nr, 255);
>>>>
>>>> Here you're setting events->exception.nr twice. Is it deliberate?
>>>
>>> Heh, yes and no.  It's partly leftover from a brief attempt to gracefully eat the
>>> fault in the guest.
>>>
>>> However, unless there's magic I'm missing, race_events_exc() needs to set a "good"
>>> vector in every iteration, otherwise only the first iteration will be able to hit
>>> the "check good, consume bad" scenario.
>>
>> I think I understand what you mean. I see things slightly different: because
>>
>> 	if (events->flags & KVM_VCPUEVENT_VALID_PAYLOAD) {
>> 		...
>> 	} else {
>> 		events->exception.pending = 0;
>> 		events->exception_has_payload = 0;
>> 	}
>>
>> zeroes exception.pending on every iteration, even though exception.nr may
>> already be > 31, KVM does not necessary return -EINVAL at
>>
>> 	if ((events->exception.injected || events->exception.pending) &&
>> 	    (events->exception.nr > 31 || events->exception.nr == NMI_VECTOR))
>> 		return -EINVAL;
>>
>> It would if the racer set exception.pending before this check, but if it does it
>> after the check, then KVM goes
>>
>> 	vcpu->arch.exception.pending = events->exception.pending;
>> 	vcpu->arch.exception.vector = events->exception.nr;
>>
>> which later triggers the WARN. That said, if I you think setting and re-setting
>> exception.nr is more efficient (as in: racy), I'm all for it.
> 
> My goal isn't to make it easier to hit the *known* TOCTOU, it's to make the test
> more valuable after that known bug has been fixed.

Aha! Yup, turns out I did not understand what you meant after all :) Sorry.

> I.e. I don't want to rely on
> KVM to update kvm_run (which was arguably a bug even if there weren't a TOCTOU
> issue).  It's kinda silly, because realistically this test is likely only ever
> going to find TOCTOU bugs, but so long as the test can consistently the known bug,
> my preference is to make it as "generic" as possible from a coverage perspective.

Sure, that makes sense.


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-08-15 18:40 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-28  0:12 [PATCH 0/2] sync_regs() TOCTOU issues Michal Luczaj
2023-07-28  0:12 ` [PATCH 1/2] KVM: x86: Fix KVM_CAP_SYNC_REGS's " Michal Luczaj
2023-07-31 23:49   ` Sean Christopherson
2023-08-01 12:37     ` Michal Luczaj
2023-08-02 19:18       ` Sean Christopherson
2023-08-03  0:13         ` Michal Luczaj
2023-08-03 17:48           ` Paolo Bonzini
2023-08-03 21:15             ` Michal Luczaj
2023-08-04  9:53               ` Paolo Bonzini
2023-08-04 17:50                 ` Michal Luczaj
2023-08-14 22:29                   ` Michal Luczaj
2023-07-28  0:12 ` [PATCH 2/2] KVM: selftests: Extend x86's sync_regs_test to check for races Michal Luczaj
2023-08-02 21:07   ` Sean Christopherson
2023-08-03  0:44     ` Michal Luczaj
2023-08-03 16:41       ` Sean Christopherson
2023-08-03 21:14         ` Michal Luczaj
2023-08-08 23:11           ` Sean Christopherson
2023-08-02 21:11 ` [PATCH 0/2] sync_regs() TOCTOU issues Sean Christopherson
2023-08-15  0:48   ` Sean Christopherson
2023-08-15  7:37     ` Michal Luczaj
2023-08-15 15:40       ` Sean Christopherson
2023-08-15 17:49         ` Michal Luczaj
2023-08-15 18:15           ` Sean Christopherson
2023-08-15 18:38             ` Michal Luczaj

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.