linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* KMSAN: kernel-infoleak in kvm_vcpu_write_guest_page
@ 2018-11-07  1:38 syzbot
  2018-11-07 12:10 ` Alexander Potapenko
  0 siblings, 1 reply; 16+ messages in thread
From: syzbot @ 2018-11-07  1:38 UTC (permalink / raw)
  To: kvm, linux-kernel, pbonzini, rkrcmar, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    88b95ef4c780 kmsan: use MSan assembly instrumentation
git tree:       https://github.com/google/kmsan.git/master
console output: https://syzkaller.appspot.com/x/log.txt?x=12505e33400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=8df5fc509a1b351b
dashboard link: https://syzkaller.appspot.com/bug?extid=ded1696f6b50b615b630
compiler:       clang version 8.0.0 (trunk 343298)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15ce62f5400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=174efca3400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+ded1696f6b50b615b630@syzkaller.appspotmail.com

==================================================================
BUG: KMSAN: kernel-infoleak in __copy_to_user include/linux/uaccess.h:121  
[inline]
BUG: KMSAN: kernel-infoleak in __kvm_write_guest_page  
arch/x86/kvm/../../../virt/kvm/kvm_main.c:1849 [inline]
BUG: KMSAN: kernel-infoleak in kvm_vcpu_write_guest_page+0x39a/0x510  
arch/x86/kvm/../../../virt/kvm/kvm_main.c:1870
CPU: 0 PID: 7918 Comm: syz-executor542 Not tainted 4.19.0+ #77
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x32d/0x480 lib/dump_stack.c:113
  kmsan_report+0x1a2/0x2e0 mm/kmsan/kmsan.c:911
  kmsan_internal_check_memory+0x34c/0x430 mm/kmsan/kmsan.c:991
  kmsan_copy_to_user+0x85/0xe0 mm/kmsan/kmsan_hooks.c:552
  __copy_to_user include/linux/uaccess.h:121 [inline]
  __kvm_write_guest_page arch/x86/kvm/../../../virt/kvm/kvm_main.c:1849  
[inline]
  kvm_vcpu_write_guest_page+0x39a/0x510  
arch/x86/kvm/../../../virt/kvm/kvm_main.c:1870
  nested_release_vmcs12 arch/x86/kvm/vmx.c:8441 [inline]
  handle_vmptrld+0x2384/0x26b0 arch/x86/kvm/vmx.c:8907
  vmx_handle_exit+0x1e81/0xbac0 arch/x86/kvm/vmx.c:10128
  vcpu_enter_guest arch/x86/kvm/x86.c:7667 [inline]
  vcpu_run arch/x86/kvm/x86.c:7730 [inline]
  kvm_arch_vcpu_ioctl_run+0xac32/0x11d80 arch/x86/kvm/x86.c:7930
  kvm_vcpu_ioctl+0xfb1/0x1f90 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2590
  do_vfs_ioctl+0xf77/0x2d30 fs/ioctl.c:46
  ksys_ioctl fs/ioctl.c:702 [inline]
  __do_sys_ioctl fs/ioctl.c:709 [inline]
  __se_sys_ioctl+0x1da/0x270 fs/ioctl.c:707
  __x64_sys_ioctl+0x4a/0x70 fs/ioctl.c:707
  do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
  entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x44b6e9
Code: e8 dc e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 2b ff fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f096b292ce8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000006e3c48 RCX: 000000000044b6e9
RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
RBP: 00000000006e3c40 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000206 R12: 00000000006e3c4c
R13: 00007ffd978aeb2f R14: 00007f096b2939c0 R15: 00000000006e3d4c

Uninit was created at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:252 [inline]
  kmsan_internal_poison_shadow+0xc8/0x1e0 mm/kmsan/kmsan.c:177
  kmsan_kmalloc+0x98/0x110 mm/kmsan/kmsan_hooks.c:104
  __kmalloc+0x14c/0x4d0 mm/slub.c:3789
  kmalloc include/linux/slab.h:518 [inline]
  enter_vmx_operation+0x980/0x1a90 arch/x86/kvm/vmx.c:8278
  vmx_set_nested_state+0xc3a/0x1530 arch/x86/kvm/vmx.c:14045
  kvm_arch_vcpu_ioctl+0x4fc9/0x73a0 arch/x86/kvm/x86.c:4057
  kvm_vcpu_ioctl+0xca3/0x1f90 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2742
  do_vfs_ioctl+0xf77/0x2d30 fs/ioctl.c:46
  ksys_ioctl fs/ioctl.c:702 [inline]
  __do_sys_ioctl fs/ioctl.c:709 [inline]
  __se_sys_ioctl+0x1da/0x270 fs/ioctl.c:707
  __x64_sys_ioctl+0x4a/0x70 fs/ioctl.c:707
  do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
  entry_SYSCALL_64_after_hwframe+0x63/0xe7

Bytes 1000-4095 of 4096 are uninitialized
Memory access of size 4096 starts at ffff8801b5157000
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KMSAN: kernel-infoleak in kvm_vcpu_write_guest_page
  2018-11-07  1:38 KMSAN: kernel-infoleak in kvm_vcpu_write_guest_page syzbot
@ 2018-11-07 12:10 ` Alexander Potapenko
  2018-11-07 12:47   ` Paolo Bonzini
  2018-11-07 12:52   ` KMSAN: kernel-infoleak in kvm_vcpu_write_guest_page Liran Alon
  0 siblings, 2 replies; 16+ messages in thread
From: Alexander Potapenko @ 2018-11-07 12:10 UTC (permalink / raw)
  To: syzbot+ded1696f6b50b615b630; +Cc: kvm, LKML, pbonzini, rkrcmar, syzkaller-bugs

[-- Attachment #1: Type: text/plain, Size: 5671 bytes --]

On Wed, Nov 7, 2018 at 2:38 AM syzbot
<syzbot+ded1696f6b50b615b630@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    88b95ef4c780 kmsan: use MSan assembly instrumentation
> git tree:       https://github.com/google/kmsan.git/master
> console output: https://syzkaller.appspot.com/x/log.txt?x=12505e33400000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=8df5fc509a1b351b
> dashboard link: https://syzkaller.appspot.com/bug?extid=ded1696f6b50b615b630
> compiler:       clang version 8.0.0 (trunk 343298)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=15ce62f5400000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=174efca3400000
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+ded1696f6b50b615b630@syzkaller.appspotmail.com
>
> ==================================================================
> BUG: KMSAN: kernel-infoleak in __copy_to_user include/linux/uaccess.h:121
> [inline]
> BUG: KMSAN: kernel-infoleak in __kvm_write_guest_page
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:1849 [inline]
> BUG: KMSAN: kernel-infoleak in kvm_vcpu_write_guest_page+0x39a/0x510
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:1870
> CPU: 0 PID: 7918 Comm: syz-executor542 Not tainted 4.19.0+ #77
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>   __dump_stack lib/dump_stack.c:77 [inline]
>   dump_stack+0x32d/0x480 lib/dump_stack.c:113
>   kmsan_report+0x1a2/0x2e0 mm/kmsan/kmsan.c:911
>   kmsan_internal_check_memory+0x34c/0x430 mm/kmsan/kmsan.c:991
>   kmsan_copy_to_user+0x85/0xe0 mm/kmsan/kmsan_hooks.c:552
>   __copy_to_user include/linux/uaccess.h:121 [inline]
>   __kvm_write_guest_page arch/x86/kvm/../../../virt/kvm/kvm_main.c:1849
> [inline]
>   kvm_vcpu_write_guest_page+0x39a/0x510
> arch/x86/kvm/../../../virt/kvm/kvm_main.c:1870
>   nested_release_vmcs12 arch/x86/kvm/vmx.c:8441 [inline]
>   handle_vmptrld+0x2384/0x26b0 arch/x86/kvm/vmx.c:8907
>   vmx_handle_exit+0x1e81/0xbac0 arch/x86/kvm/vmx.c:10128
>   vcpu_enter_guest arch/x86/kvm/x86.c:7667 [inline]
>   vcpu_run arch/x86/kvm/x86.c:7730 [inline]
>   kvm_arch_vcpu_ioctl_run+0xac32/0x11d80 arch/x86/kvm/x86.c:7930
>   kvm_vcpu_ioctl+0xfb1/0x1f90 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2590
>   do_vfs_ioctl+0xf77/0x2d30 fs/ioctl.c:46
>   ksys_ioctl fs/ioctl.c:702 [inline]
>   __do_sys_ioctl fs/ioctl.c:709 [inline]
>   __se_sys_ioctl+0x1da/0x270 fs/ioctl.c:707
>   __x64_sys_ioctl+0x4a/0x70 fs/ioctl.c:707
>   do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
>   entry_SYSCALL_64_after_hwframe+0x63/0xe7
> RIP: 0033:0x44b6e9
> Code: e8 dc e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
> 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
> ff 0f 83 2b ff fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007f096b292ce8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00000000006e3c48 RCX: 000000000044b6e9
> RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
> RBP: 00000000006e3c40 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000206 R12: 00000000006e3c4c
> R13: 00007ffd978aeb2f R14: 00007f096b2939c0 R15: 00000000006e3d4c
>
> Uninit was created at:
>   kmsan_save_stack_with_flags mm/kmsan/kmsan.c:252 [inline]
>   kmsan_internal_poison_shadow+0xc8/0x1e0 mm/kmsan/kmsan.c:177
>   kmsan_kmalloc+0x98/0x110 mm/kmsan/kmsan_hooks.c:104
>   __kmalloc+0x14c/0x4d0 mm/slub.c:3789
>   kmalloc include/linux/slab.h:518 [inline]
>   enter_vmx_operation+0x980/0x1a90 arch/x86/kvm/vmx.c:8278
>   vmx_set_nested_state+0xc3a/0x1530 arch/x86/kvm/vmx.c:14045
>   kvm_arch_vcpu_ioctl+0x4fc9/0x73a0 arch/x86/kvm/x86.c:4057
>   kvm_vcpu_ioctl+0xca3/0x1f90 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2742
>   do_vfs_ioctl+0xf77/0x2d30 fs/ioctl.c:46
>   ksys_ioctl fs/ioctl.c:702 [inline]
>   __do_sys_ioctl fs/ioctl.c:709 [inline]
>   __se_sys_ioctl+0x1da/0x270 fs/ioctl.c:707
>   __x64_sys_ioctl+0x4a/0x70 fs/ioctl.c:707
>   do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
>   entry_SYSCALL_64_after_hwframe+0x63/0xe7
>
> Bytes 1000-4095 of 4096 are uninitialized
> Memory access of size 4096 starts at ffff8801b5157000
> ==================================================================
This appears to be a real bug in KVM.
Please see a simplified reproducer attached.

>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
> syzbot.
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/0000000000005de8da057a092ba2%40google.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

[-- Attachment #2: kvm-infoleak.c --]
[-- Type: text/x-csrc, Size: 27803 bytes --]

// autogenerated by syzkaller (https://github.com/google/syzkaller)

#define _GNU_SOURCE

#include <dirent.h>
#include <endian.h>
#include <errno.h>
#include <fcntl.h>
#include <setjmp.h>
#include <signal.h>
#include <stdarg.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/mount.h>
#include <sys/prctl.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#include <linux/futex.h>
#include <linux/kvm.h>


const char kvm_asm64_init_vm[] =
    "\x0f\x20\xc0\x0d\x00\x00\x00\x80\x0f\x22\xc0\xea\xde\xc0\xad\x0b\x50\x00"
    "\x48\xc7\xc0\xd8\x00\x00\x00\x0f\x00\xd8\x48\xc7\xc1\x3a\x00\x00\x00\x0f"
    "\x32\x48\x83\xc8\x05\x0f\x30\x0f\x20\xe0\x48\x0d\x00\x20\x00\x00\x0f\x22"
    "\xe0\x48\xc7\xc1\x80\x04\x00\x00\x0f\x32\x48\xc7\xc2\x00\x60\x00\x00\x89"
    "\x02\x48\xc7\xc2\x00\x70\x00\x00\x89\x02\x48\xc7\xc0\x00\x5f\x00\x00\xf3"
    "\x0f\xc7\x30\x48\xc7\xc0\x08\x5f\x00\x00\x66\x0f\xc7\x30\x0f\xc7\x30\x48"
    "\xc7\xc1\x81\x04\x00\x00\x0f\x32\x48\x83\xc8\x3f\x48\x21\xd0\x48\xc7\xc2"
    "\x00\x40\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x02\x40\x00\x00\x48\xb8\x84\x9e"
    "\x99\xf3\x00\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x1e\x40\x00\x00\x48\xc7"
    "\xc0\x81\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc1\x83\x04\x00\x00\x0f\x32\x48"
    "\x0d\xff\x6f\x03\x00\x48\x21\xd0\x48\xc7\xc2\x0c\x40\x00\x00\x0f\x79\xd0"
    "\x48\xc7\xc1\x84\x04\x00\x00\x0f\x32\x48\x0d\xff\x17\x00\x00\x48\x21\xd0"
    "\x48\xc7\xc2\x12\x40\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x04\x2c\x00\x00\x48"
    "\xc7\xc0\x00\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x00\x28\x00\x00\x48\xc7"
    "\xc0\xff\xff\xff\xff\x0f\x79\xd0\x48\xc7\xc2\x02\x0c\x00\x00\x48\xc7\xc0"
    "\x50\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc0\x58\x00\x00\x00\x48\xc7\xc2\x00"
    "\x0c\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x04\x0c\x00\x00\x0f\x79\xd0\x48\xc7"
    "\xc2\x06\x0c\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x08\x0c\x00\x00\x0f\x79\xd0"
    "\x48\xc7\xc2\x0a\x0c\x00\x00\x0f\x79\xd0\x48\xc7\xc0\xd8\x00\x00\x00\x48"
    "\xc7\xc2\x0c\x0c\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x02\x2c\x00\x00\x48\xc7"
    "\xc0\x00\x05\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x00\x4c\x00\x00\x48\xc7\xc0"
    "\x50\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x10\x6c\x00\x00\x48\xc7\xc0\x00"
    "\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x12\x6c\x00\x00\x48\xc7\xc0\x00\x00"
    "\x00\x00\x0f\x79\xd0\x0f\x20\xc0\x48\xc7\xc2\x00\x6c\x00\x00\x48\x89\xc0"
    "\x0f\x79\xd0\x0f\x20\xd8\x48\xc7\xc2\x02\x6c\x00\x00\x48\x89\xc0\x0f\x79"
    "\xd0\x0f\x20\xe0\x48\xc7\xc2\x04\x6c\x00\x00\x48\x89\xc0\x0f\x79\xd0\x48"
    "\xc7\xc2\x06\x6c\x00\x00\x48\xc7\xc0\x00\x00\x00\x00\x0f\x79\xd0\x48\xc7"
    "\xc2\x08\x6c\x00\x00\x48\xc7\xc0\x00\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2"
    "\x0a\x6c\x00\x00\x48\xc7\xc0\x00\x3a\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x0c"
    "\x6c\x00\x00\x48\xc7\xc0\x00\x10\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x0e\x6c"
    "\x00\x00\x48\xc7\xc0\x00\x38\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x14\x6c\x00"
    "\x00\x48\xc7\xc0\x00\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x16\x6c\x00\x00"
    "\x48\x8b\x04\x25\x10\x5f\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x00\x00\x00\x00"
    "\x48\xc7\xc0\x01\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x02\x00\x00\x00\x48"
    "\xc7\xc0\x00\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x00\x20\x00\x00\x48\xc7"
    "\xc0\x00\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x02\x20\x00\x00\x48\xc7\xc0"
    "\x00\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x04\x20\x00\x00\x48\xc7\xc0\x00"
    "\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x06\x20\x00\x00\x48\xc7\xc0\x00\x00"
    "\x00\x00\x0f\x79\xd0\x48\xc7\xc1\x77\x02\x00\x00\x0f\x32\x48\xc1\xe2\x20"
    "\x48\x09\xd0\x48\xc7\xc2\x00\x2c\x00\x00\x48\x89\xc0\x0f\x79\xd0\x48\xc7"
    "\xc2\x04\x40\x00\x00\x48\xc7\xc0\x00\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2"
    "\x0a\x40\x00\x00\x48\xc7\xc0\x00\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x0e"
    "\x40\x00\x00\x48\xc7\xc0\x00\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x10\x40"
    "\x00\x00\x48\xc7\xc0\x00\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x16\x40\x00"
    "\x00\x48\xc7\xc0\x00\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x14\x40\x00\x00"
    "\x48\xc7\xc0\x00\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x00\x60\x00\x00\x48"
    "\xc7\xc0\xff\xff\xff\xff\x0f\x79\xd0\x48\xc7\xc2\x02\x60\x00\x00\x48\xc7"
    "\xc0\xff\xff\xff\xff\x0f\x79\xd0\x48\xc7\xc2\x1c\x20\x00\x00\x48\xc7\xc0"
    "\x00\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x1e\x20\x00\x00\x48\xc7\xc0\x00"
    "\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x20\x20\x00\x00\x48\xc7\xc0\x00\x00"
    "\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x22\x20\x00\x00\x48\xc7\xc0\x00\x00\x00"
    "\x00\x0f\x79\xd0\x48\xc7\xc2\x00\x08\x00\x00\x48\xc7\xc0\x58\x00\x00\x00"
    "\x0f\x79\xd0\x48\xc7\xc2\x02\x08\x00\x00\x48\xc7\xc0\x50\x00\x00\x00\x0f"
    "\x79\xd0\x48\xc7\xc2\x04\x08\x00\x00\x48\xc7\xc0\x58\x00\x00\x00\x0f\x79"
    "\xd0\x48\xc7\xc2\x06\x08\x00\x00\x48\xc7\xc0\x58\x00\x00\x00\x0f\x79\xd0"
    "\x48\xc7\xc2\x08\x08\x00\x00\x48\xc7\xc0\x58\x00\x00\x00\x0f\x79\xd0\x48"
    "\xc7\xc2\x0a\x08\x00\x00\x48\xc7\xc0\x58\x00\x00\x00\x0f\x79\xd0\x48\xc7"
    "\xc2\x0c\x08\x00\x00\x48\xc7\xc0\x00\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2"
    "\x0e\x08\x00\x00\x48\xc7\xc0\xd8\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x12"
    "\x68\x00\x00\x48\xc7\xc0\x00\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x14\x68"
    "\x00\x00\x48\xc7\xc0\x00\x3a\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x16\x68\x00"
    "\x00\x48\xc7\xc0\x00\x10\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x18\x68\x00\x00"
    "\x48\xc7\xc0\x00\x38\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x00\x48\x00\x00\x48"
    "\xc7\xc0\xff\xff\x0f\x00\x0f\x79\xd0\x48\xc7\xc2\x02\x48\x00\x00\x48\xc7"
    "\xc0\xff\xff\x0f\x00\x0f\x79\xd0\x48\xc7\xc2\x04\x48\x00\x00\x48\xc7\xc0"
    "\xff\xff\x0f\x00\x0f\x79\xd0\x48\xc7\xc2\x06\x48\x00\x00\x48\xc7\xc0\xff"
    "\xff\x0f\x00\x0f\x79\xd0\x48\xc7\xc2\x08\x48\x00\x00\x48\xc7\xc0\xff\xff"
    "\x0f\x00\x0f\x79\xd0\x48\xc7\xc2\x0a\x48\x00\x00\x48\xc7\xc0\xff\xff\x0f"
    "\x00\x0f\x79\xd0\x48\xc7\xc2\x0c\x48\x00\x00\x48\xc7\xc0\x00\x00\x00\x00"
    "\x0f\x79\xd0\x48\xc7\xc2\x0e\x48\x00\x00\x48\xc7\xc0\xff\x1f\x00\x00\x0f"
    "\x79\xd0\x48\xc7\xc2\x10\x48\x00\x00\x48\xc7\xc0\xff\x1f\x00\x00\x0f\x79"
    "\xd0\x48\xc7\xc2\x12\x48\x00\x00\x48\xc7\xc0\xff\x1f\x00\x00\x0f\x79\xd0"
    "\x48\xc7\xc2\x14\x48\x00\x00\x48\xc7\xc0\x93\x40\x00\x00\x0f\x79\xd0\x48"
    "\xc7\xc2\x16\x48\x00\x00\x48\xc7\xc0\x9b\x20\x00\x00\x0f\x79\xd0\x48\xc7"
    "\xc2\x18\x48\x00\x00\x48\xc7\xc0\x93\x40\x00\x00\x0f\x79\xd0\x48\xc7\xc2"
    "\x1a\x48\x00\x00\x48\xc7\xc0\x93\x40\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x1c"
    "\x48\x00\x00\x48\xc7\xc0\x93\x40\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x1e\x48"
    "\x00\x00\x48\xc7\xc0\x93\x40\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x20\x48\x00"
    "\x00\x48\xc7\xc0\x82\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x22\x48\x00\x00"
    "\x48\xc7\xc0\x8b\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x1c\x68\x00\x00\x48"
    "\xc7\xc0\x00\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x1e\x68\x00\x00\x48\xc7"
    "\xc0\x00\x91\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x20\x68\x00\x00\x48\xc7\xc0"
    "\x02\x00\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x06\x28\x00\x00\x48\xc7\xc0\x00"
    "\x05\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x0a\x28\x00\x00\x48\xc7\xc0\x00\x00"
    "\x00\x00\x0f\x79\xd0\x48\xc7\xc2\x0c\x28\x00\x00\x48\xc7\xc0\x00\x00\x00"
    "\x00\x0f\x79\xd0\x48\xc7\xc2\x0e\x28\x00\x00\x48\xc7\xc0\x00\x00\x00\x00"
    "\x0f\x79\xd0\x48\xc7\xc2\x10\x28\x00\x00\x48\xc7\xc0\x00\x00\x00\x00\x0f"
    "\x79\xd0\x0f\x20\xc0\x48\xc7\xc2\x00\x68\x00\x00\x48\x89\xc0\x0f\x79\xd0"
    "\x0f\x20\xd8\x48\xc7\xc2\x02\x68\x00\x00\x48\x89\xc0\x0f\x79\xd0\x0f\x20"
    "\xe0\x48\xc7\xc2\x04\x68\x00\x00\x48\x89\xc0\x0f\x79\xd0\x48\xc7\xc0\x18"
    "\x5f\x00\x00\x48\x8b\x10\x48\xc7\xc0\x20\x5f\x00\x00\x48\x8b\x08\x48\x31"
    "\xc0\x0f\x78\xd0\x48\x31\xc8\x0f\x79\xd0\x0f\x01\xc2\x48\xc7\xc2\x00\x44"
    "\x00\x00\x0f\x78\xd0\xf4";
const char kvm_asm64_vm_exit[] = "\x48\xc7\xc3\x00\x44\x00\x00\x0f\x78\xda\x48"
                                 "\xc7\xc3\x02\x44\x00\x00\x0f\x78\xd9\x48\xc7"
                                 "\xc0\x00\x64\x00\x00\x0f\x78\xc0\x48\xc7\xc3"
                                 "\x1e\x68\x00\x00\x0f\x78\xdb\xf4";

#define ADDR_TEXT 0x0000
#define ADDR_GDT 0x1000
#define ADDR_LDT 0x1800
#define ADDR_PML4 0x2000
#define ADDR_PDP 0x3000
#define ADDR_PD 0x4000
#define ADDR_STACK0 0x0f80
#define ADDR_VAR_HLT 0x2800
#define ADDR_VAR_SYSRET 0x2808
#define ADDR_VAR_SYSEXIT 0x2810
#define ADDR_VAR_IDT 0x3800
#define ADDR_VAR_TSS64 0x3a00
#define ADDR_VAR_TSS64_CPL3 0x3c00
#define ADDR_VAR_TSS16 0x3d00
#define ADDR_VAR_TSS16_2 0x3e00
#define ADDR_VAR_TSS16_CPL3 0x3f00
#define ADDR_VAR_TSS32 0x4800
#define ADDR_VAR_TSS32_2 0x4a00
#define ADDR_VAR_TSS32_CPL3 0x4c00
#define ADDR_VAR_TSS32_VM86 0x4e00
#define ADDR_VAR_VMXON_PTR 0x5f00
#define ADDR_VAR_VMCS_PTR 0x5f08
#define ADDR_VAR_VMEXIT_PTR 0x5f10
#define ADDR_VAR_VMWRITE_FLD 0x5f18
#define ADDR_VAR_VMWRITE_VAL 0x5f20
#define ADDR_VAR_VMXON 0x6000
#define ADDR_VAR_VMCS 0x7000
#define ADDR_VAR_VMEXIT_CODE 0x9000
#define ADDR_VAR_USER_CODE 0x9100
#define ADDR_VAR_USER_CODE2 0x9120

#define SEL_LDT (1 << 3)
#define SEL_CS16 (2 << 3)
#define SEL_DS16 (3 << 3)
#define SEL_CS16_CPL3 ((4 << 3) + 3)
#define SEL_DS16_CPL3 ((5 << 3) + 3)
#define SEL_CS32 (6 << 3)
#define SEL_DS32 (7 << 3)
#define SEL_CS32_CPL3 ((8 << 3) + 3)
#define SEL_DS32_CPL3 ((9 << 3) + 3)
#define SEL_CS64 (10 << 3)
#define SEL_DS64 (11 << 3)
#define SEL_CS64_CPL3 ((12 << 3) + 3)
#define SEL_DS64_CPL3 ((13 << 3) + 3)
#define SEL_CGATE16 (14 << 3)
#define SEL_TGATE16 (15 << 3)
#define SEL_CGATE32 (16 << 3)
#define SEL_TGATE32 (17 << 3)
#define SEL_CGATE64 (18 << 3)
#define SEL_CGATE64_HI (19 << 3)
#define SEL_TSS16 (20 << 3)
#define SEL_TSS16_2 (21 << 3)
#define SEL_TSS16_CPL3 ((22 << 3) + 3)
#define SEL_TSS32 (23 << 3)
#define SEL_TSS32_2 (24 << 3)
#define SEL_TSS32_CPL3 ((25 << 3) + 3)
#define SEL_TSS32_VM86 (26 << 3)
#define SEL_TSS64 (27 << 3)
#define SEL_TSS64_HI (28 << 3)
#define SEL_TSS64_CPL3 ((29 << 3) + 3)
#define SEL_TSS64_CPL3_HI (30 << 3)

#define MSR_IA32_SYSENTER_CS 0x174
#define MSR_IA32_SYSENTER_ESP 0x175
#define MSR_IA32_SYSENTER_EIP 0x176
#define MSR_IA32_STAR 0xC0000081
#define MSR_IA32_LSTAR 0xC0000082

#define PREFIX_SIZE 0xba1d

#define CR0_PE 1
#define CR0_MP (1 << 1)
#define CR0_EM (1 << 2)
#define CR0_TS (1 << 3)
#define CR0_ET (1 << 4)
#define CR0_NE (1 << 5)

#define CR4_PAE (1 << 5)

#define EFER_SCE 1
#define EFER_LME (1 << 8)
#define PDE64_PRESENT 1
#define PDE64_RW (1 << 1)
#define PDE64_USER (1 << 2)
#define PDE64_PS (1 << 7)

#define MIN_STRLEN 1
void dump_buf(unsigned char *buf, int len) {
  int i, nz = 0;
  for (i = 0; i < len; i++) {
    if (buf[i]) {
      nz = 1;
      break;
    }
  }
  if (!nz)
    // The buffer is empty.
    return;

  //pthread_mutex_lock(&out_mutex);
  for (i = 0; i < len; i++) {
    if (buf[i]) {
      int str_len = strlen(&buf[i]);
      // Short string pieces are too boring.
      if (str_len >= MIN_STRLEN) {
        unsigned char *c;
        for (c = &buf[i]; c < &buf[i + str_len]; c++) {
          if ((*c > 127) || ((*c < 32) && (*c != 10) && (*c != 13)))  {
            *c = ' ';
            continue;
          }
        }
        // Dump the buffer.
        fprintf(stderr, "%s\n", &buf[i]);
      }
      i += str_len;
    }
  }
  //pthread_mutex_unlock(&out_mutex);
}


struct tss16 {
  uint16_t prev;
  uint16_t sp0;
  uint16_t ss0;
  uint16_t sp1;
  uint16_t ss1;
  uint16_t sp2;
  uint16_t ss2;
  uint16_t ip;
  uint16_t flags;
  uint16_t ax;
  uint16_t cx;
  uint16_t dx;
  uint16_t bx;
  uint16_t sp;
  uint16_t bp;
  uint16_t si;
  uint16_t di;
  uint16_t es;
  uint16_t cs;
  uint16_t ss;
  uint16_t ds;
  uint16_t ldt;
} __attribute__((packed));

struct tss32 {
  uint16_t prev, prevh;
  uint32_t sp0;
  uint16_t ss0, ss0h;
  uint32_t sp1;
  uint16_t ss1, ss1h;
  uint32_t sp2;
  uint16_t ss2, ss2h;
  uint32_t cr3;
  uint32_t ip;
  uint32_t flags;
  uint32_t ax;
  uint32_t cx;
  uint32_t dx;
  uint32_t bx;
  uint32_t sp;
  uint32_t bp;
  uint32_t si;
  uint32_t di;
  uint16_t es, esh;
  uint16_t cs, csh;
  uint16_t ss, ssh;
  uint16_t ds, dsh;
  uint16_t fs, fsh;
  uint16_t gs, gsh;
  uint16_t ldt, ldth;
  uint16_t trace;
  uint16_t io_bitmap;
} __attribute__((packed));

struct tss64 {
  uint32_t reserved0;
  uint64_t rsp[3];
  uint64_t reserved1;
  uint64_t ist[7];
  uint64_t reserved2;
  uint32_t reserved3;
  uint32_t io_bitmap;
} __attribute__((packed));

static void fill_segment_descriptor(uint64_t* dt, uint64_t* lt,
                                    struct kvm_segment* seg)
{
  uint16_t index = seg->selector >> 3;
  uint64_t limit = seg->g ? seg->limit >> 12 : seg->limit;
  uint64_t sd = (limit & 0xffff) | (seg->base & 0xffffff) << 16 |
                (uint64_t)seg->type << 40 | (uint64_t)seg->s << 44 |
                (uint64_t)seg->dpl << 45 | (uint64_t)seg->present << 47 |
                (limit & 0xf0000ULL) << 48 | (uint64_t)seg->avl << 52 |
                (uint64_t)seg->l << 53 | (uint64_t)seg->db << 54 |
                (uint64_t)seg->g << 55 | (seg->base & 0xff000000ULL) << 56;
  dt[index] = sd;
  lt[index] = sd;
}

static void fill_segment_descriptor_dword(uint64_t* dt, uint64_t* lt,
                                          struct kvm_segment* seg)
{
  fill_segment_descriptor(dt, lt, seg);
  uint16_t index = seg->selector >> 3;
  dt[index + 1] = 0;
  lt[index + 1] = 0;
}

static void setup_syscall_msrs(int cpufd, uint16_t sel_cs, uint16_t sel_cs_cpl3)
{
  char buf[sizeof(struct kvm_msrs) + 5 * sizeof(struct kvm_msr_entry)];
  memset(buf, 0, sizeof(buf));
  struct kvm_msrs* msrs = (struct kvm_msrs*)buf;
  struct kvm_msr_entry* entries = msrs->entries;
  msrs->nmsrs = 5;
  entries[0].index = MSR_IA32_SYSENTER_CS;
  entries[0].data = sel_cs;
  entries[1].index = MSR_IA32_SYSENTER_ESP;
  entries[1].data = ADDR_STACK0;
  entries[2].index = MSR_IA32_SYSENTER_EIP;
  entries[2].data = ADDR_VAR_SYSEXIT;
  entries[3].index = MSR_IA32_STAR;
  entries[3].data = ((uint64_t)sel_cs << 32) | ((uint64_t)sel_cs_cpl3 << 48);
  entries[4].index = MSR_IA32_LSTAR;
  entries[4].data = ADDR_VAR_SYSRET;
  ioctl(cpufd, KVM_SET_MSRS, msrs);
}


static void setup_64bit_idt(struct kvm_sregs* sregs, char* host_mem,
                            uintptr_t guest_mem)
{
  sregs->idt.base = guest_mem + ADDR_VAR_IDT;
  sregs->idt.limit = 0x1ff;
  uint64_t* idt = (uint64_t*)(host_mem + sregs->idt.base);
  int i;
  for (i = 0; i < 32; i++) {
    struct kvm_segment gate;
    gate.selector = (i * 2) << 3;
    gate.type = (i & 1) ? 14 : 15;
    gate.base = SEL_CS64;
    gate.limit = guest_mem + ADDR_VAR_USER_CODE2;
    gate.present = 1;
    gate.dpl = 0;
    gate.s = 0;
    gate.g = 0;
    gate.db = 0;
    gate.l = 0;
    gate.avl = 0;
    fill_segment_descriptor_dword(idt, idt, &gate);
  }
}

void syz_kvm_setup_vmfd(int vmfd, char *host_mem) {
  const uintptr_t page_size = 4 << 10;
  const uintptr_t ioapic_page = 10;
  const uintptr_t guest_mem_size = 24 * page_size;
  const uintptr_t guest_mem = 0;
  uintptr_t i;
  for (i = 0; i < guest_mem_size / page_size; i++) {
    struct kvm_userspace_memory_region memreg;
    memreg.slot = i;
    memreg.flags = 0;
    memreg.guest_phys_addr = guest_mem + i * page_size;
    if (i == ioapic_page)
      memreg.guest_phys_addr = 0xfec00000;
    memreg.memory_size = page_size;
    memreg.userspace_addr = (uintptr_t)host_mem + i * page_size;
    if (vmfd != -1)
      ioctl(vmfd, KVM_SET_USER_MEMORY_REGION, &memreg);
  }
  struct kvm_userspace_memory_region memreg;
  memreg.slot = 1 + (1 << 16);
  memreg.flags = 0;
  memreg.guest_phys_addr = 0x30000;
  memreg.memory_size = 64 << 10;
  memreg.userspace_addr = (uintptr_t)host_mem;
  if (vmfd != -1)
    ioctl(vmfd, KVM_SET_USER_MEMORY_REGION, &memreg);
}

static uintptr_t syz_kvm_setup_cpu(int kvmfd, uintptr_t cpufd, char *host_mem)
{
  const uintptr_t guest_mem = 0;
  uintptr_t i;

  struct kvm_sregs sregs;
  if (ioctl(cpufd, KVM_GET_SREGS, &sregs))
    return -1;
  struct kvm_regs regs;
  memset(&regs, 0, sizeof(regs));
  regs.rip = guest_mem + ADDR_TEXT;
  regs.rsp = ADDR_STACK0;
  sregs.gdt.base = guest_mem + ADDR_GDT;
  sregs.gdt.limit = 256 * sizeof(uint64_t) - 1;
  uint64_t* gdt = (uint64_t*)(host_mem + sregs.gdt.base);
  struct kvm_segment seg_ldt;
  seg_ldt.selector = SEL_LDT;
  seg_ldt.type = 2;
  seg_ldt.base = guest_mem + ADDR_LDT;
  seg_ldt.limit = 256 * sizeof(uint64_t) - 1;
  seg_ldt.present = 1;
  seg_ldt.dpl = 0;
  seg_ldt.s = 0;
  seg_ldt.g = 0;
  seg_ldt.db = 1;
  seg_ldt.l = 0;
  sregs.ldt = seg_ldt;
  uint64_t* ldt = (uint64_t*)(host_mem + sregs.ldt.base);
  struct kvm_segment seg_cs16;
  seg_cs16.selector = SEL_CS16;
  seg_cs16.type = 11;
  seg_cs16.base = 0;
  seg_cs16.limit = 0xfffff;
  seg_cs16.present = 1;
  seg_cs16.dpl = 0;
  seg_cs16.s = 1;
  seg_cs16.g = 0;
  seg_cs16.db = 0;
  seg_cs16.l = 0;
  struct kvm_segment seg_ds16 = seg_cs16;
  seg_ds16.selector = SEL_DS16;
  seg_ds16.type = 3;
  struct kvm_segment seg_cs16_cpl3 = seg_cs16;
  seg_cs16_cpl3.selector = SEL_CS16_CPL3;
  seg_cs16_cpl3.dpl = 3;
  struct kvm_segment seg_ds16_cpl3 = seg_ds16;
  seg_ds16_cpl3.selector = SEL_DS16_CPL3;
  seg_ds16_cpl3.dpl = 3;
  struct kvm_segment seg_cs32 = seg_cs16;
  seg_cs32.selector = SEL_CS32;
  seg_cs32.db = 1;
  struct kvm_segment seg_ds32 = seg_ds16;
  seg_ds32.selector = SEL_DS32;
  seg_ds32.db = 1;
  struct kvm_segment seg_cs32_cpl3 = seg_cs32;
  seg_cs32_cpl3.selector = SEL_CS32_CPL3;
  seg_cs32_cpl3.dpl = 3;
  struct kvm_segment seg_ds32_cpl3 = seg_ds32;
  seg_ds32_cpl3.selector = SEL_DS32_CPL3;
  seg_ds32_cpl3.dpl = 3;
  struct kvm_segment seg_cs64 = seg_cs16;
  seg_cs64.selector = SEL_CS64;
  seg_cs64.l = 1;
  struct kvm_segment seg_ds64 = seg_ds32;
  seg_ds64.selector = SEL_DS64;
  struct kvm_segment seg_cs64_cpl3 = seg_cs64;
  seg_cs64_cpl3.selector = SEL_CS64_CPL3;
  seg_cs64_cpl3.dpl = 3;
  struct kvm_segment seg_ds64_cpl3 = seg_ds64;
  seg_ds64_cpl3.selector = SEL_DS64_CPL3;
  seg_ds64_cpl3.dpl = 3;
  struct kvm_segment seg_tss32;
  seg_tss32.selector = SEL_TSS32;
  seg_tss32.type = 9;
  seg_tss32.base = ADDR_VAR_TSS32;
  seg_tss32.limit = 0x1ff;
  seg_tss32.present = 1;
  seg_tss32.dpl = 0;
  seg_tss32.s = 0;
  seg_tss32.g = 0;
  seg_tss32.db = 0;
  seg_tss32.l = 0;
  struct kvm_segment seg_tss32_2 = seg_tss32;
  seg_tss32_2.selector = SEL_TSS32_2;
  seg_tss32_2.base = ADDR_VAR_TSS32_2;
  struct kvm_segment seg_tss32_cpl3 = seg_tss32;
  seg_tss32_cpl3.selector = SEL_TSS32_CPL3;
  seg_tss32_cpl3.base = ADDR_VAR_TSS32_CPL3;
  struct kvm_segment seg_tss32_vm86 = seg_tss32;
  seg_tss32_vm86.selector = SEL_TSS32_VM86;
  seg_tss32_vm86.base = ADDR_VAR_TSS32_VM86;
  struct kvm_segment seg_tss16 = seg_tss32;
  seg_tss16.selector = SEL_TSS16;
  seg_tss16.base = ADDR_VAR_TSS16;
  seg_tss16.limit = 0xff;
  seg_tss16.type = 1;
  struct kvm_segment seg_tss16_2 = seg_tss16;
  seg_tss16_2.selector = SEL_TSS16_2;
  seg_tss16_2.base = ADDR_VAR_TSS16_2;
  seg_tss16_2.dpl = 0;
  struct kvm_segment seg_tss16_cpl3 = seg_tss16;
  seg_tss16_cpl3.selector = SEL_TSS16_CPL3;
  seg_tss16_cpl3.base = ADDR_VAR_TSS16_CPL3;
  seg_tss16_cpl3.dpl = 3;
  struct kvm_segment seg_tss64 = seg_tss32;
  seg_tss64.selector = SEL_TSS64;
  seg_tss64.base = ADDR_VAR_TSS64;
  seg_tss64.limit = 0x1ff;
  struct kvm_segment seg_tss64_cpl3 = seg_tss64;
  seg_tss64_cpl3.selector = SEL_TSS64_CPL3;
  seg_tss64_cpl3.base = ADDR_VAR_TSS64_CPL3;
  seg_tss64_cpl3.dpl = 3;
  struct kvm_segment seg_cgate16;
  seg_cgate16.selector = SEL_CGATE16;
  seg_cgate16.type = 4;
  seg_cgate16.base = SEL_CS16 | (2 << 16);
  seg_cgate16.limit = ADDR_VAR_USER_CODE2;
  seg_cgate16.present = 1;
  seg_cgate16.dpl = 0;
  seg_cgate16.s = 0;
  seg_cgate16.g = 0;
  seg_cgate16.db = 0;
  seg_cgate16.l = 0;
  seg_cgate16.avl = 0;
  struct kvm_segment seg_tgate16 = seg_cgate16;
  seg_tgate16.selector = SEL_TGATE16;
  seg_tgate16.type = 3;
  seg_cgate16.base = SEL_TSS16_2;
  seg_tgate16.limit = 0;
  struct kvm_segment seg_cgate32 = seg_cgate16;
  seg_cgate32.selector = SEL_CGATE32;
  seg_cgate32.type = 12;
  seg_cgate32.base = SEL_CS32 | (2 << 16);
  struct kvm_segment seg_tgate32 = seg_cgate32;
  seg_tgate32.selector = SEL_TGATE32;
  seg_tgate32.type = 11;
  seg_tgate32.base = SEL_TSS32_2;
  seg_tgate32.limit = 0;
  struct kvm_segment seg_cgate64 = seg_cgate16;
  seg_cgate64.selector = SEL_CGATE64;
  seg_cgate64.type = 12;
  seg_cgate64.base = SEL_CS64;
  char buf[sizeof(struct kvm_cpuid2) + 128 * sizeof(struct kvm_cpuid_entry2)];
  memset(buf, 0, sizeof(buf));
  struct kvm_cpuid2* cpuid = (struct kvm_cpuid2*)buf;
  cpuid->nent = 128;
  ioctl(kvmfd, KVM_GET_SUPPORTED_CPUID, cpuid);
  ioctl(cpufd, KVM_SET_CPUID2, cpuid);
  const char* text_prefix = 0;
  int text_prefix_size = 0;
  char* host_text = host_mem + ADDR_TEXT;
  if (1) {
    sregs.efer |= EFER_LME | EFER_SCE;
    sregs.cr0 |= CR0_PE;
    setup_syscall_msrs(cpufd, SEL_CS64, SEL_CS64_CPL3);
    setup_64bit_idt(&sregs, host_mem, guest_mem);
    sregs.cs = seg_cs32;
    sregs.ds = sregs.es = sregs.fs = sregs.gs = sregs.ss = seg_ds32;
    uint64_t pml4_addr = guest_mem + ADDR_PML4;
    uint64_t* pml4 = (uint64_t*)(host_mem + ADDR_PML4);
    uint64_t pdpt_addr = guest_mem + ADDR_PDP;
    uint64_t* pdpt = (uint64_t*)(host_mem + ADDR_PDP);
    uint64_t pd_addr = guest_mem + ADDR_PD;
    uint64_t* pd = (uint64_t*)(host_mem + ADDR_PD);
    pml4[0] = PDE64_PRESENT | PDE64_RW | PDE64_USER | pdpt_addr;
    pdpt[0] = PDE64_PRESENT | PDE64_RW | PDE64_USER | pd_addr;
    pd[0] = PDE64_PRESENT | PDE64_RW | PDE64_USER | PDE64_PS;
    sregs.cr3 = pml4_addr;
    sregs.cr4 |= CR4_PAE;
    if (1) {
      sregs.cr0 |= CR0_NE;
      *((uint64_t*)(host_mem + ADDR_VAR_VMXON_PTR)) =
                     ADDR_VAR_VMXON;
      *((uint64_t*)(host_mem + ADDR_VAR_VMCS_PTR)) = ADDR_VAR_VMCS;
      memcpy(host_mem + ADDR_VAR_VMEXIT_CODE, kvm_asm64_vm_exit,
                        sizeof(kvm_asm64_vm_exit) - 1);
      *((uint64_t*)(host_mem + ADDR_VAR_VMEXIT_PTR)) =
                     ADDR_VAR_VMEXIT_CODE;
      text_prefix = kvm_asm64_init_vm;
      text_prefix_size = sizeof(kvm_asm64_init_vm) - 1;
    }
  }
  struct tss16 tss16;
  memset(&tss16, 0, sizeof(tss16));
  tss16.ss0 = tss16.ss1 = tss16.ss2 = SEL_DS16;
  tss16.sp0 = tss16.sp1 = tss16.sp2 = ADDR_STACK0;
  tss16.ip = ADDR_VAR_USER_CODE2;
  tss16.flags = (1 << 1);
  tss16.cs = SEL_CS16;
  tss16.es = tss16.ds = tss16.ss = SEL_DS16;
  tss16.ldt = SEL_LDT;
  struct tss16* tss16_addr = (struct tss16*)(host_mem + seg_tss16_2.base);
  memcpy(tss16_addr, &tss16, sizeof(tss16));
  memset(&tss16, 0, sizeof(tss16));
  tss16.ss0 = tss16.ss1 = tss16.ss2 = SEL_DS16;
  tss16.sp0 = tss16.sp1 = tss16.sp2 = ADDR_STACK0;
  tss16.ip = ADDR_VAR_USER_CODE2;
  tss16.flags = (1 << 1);
  tss16.cs = SEL_CS16_CPL3;
  tss16.es = tss16.ds = tss16.ss = SEL_DS16_CPL3;
  tss16.ldt = SEL_LDT;
  struct tss16* tss16_cpl3_addr =
      (struct tss16*)(host_mem + seg_tss16_cpl3.base);
  memcpy(tss16_cpl3_addr, &tss16, sizeof(tss16));
  struct tss32 tss32;
  memset(&tss32, 0, sizeof(tss32));
  tss32.ss0 = tss32.ss1 = tss32.ss2 = SEL_DS32;
  tss32.sp0 = tss32.sp1 = tss32.sp2 = ADDR_STACK0;
  tss32.ip = ADDR_VAR_USER_CODE;
  tss32.flags = (1 << 1) | (1 << 17);
  tss32.ldt = SEL_LDT;
  tss32.cr3 = sregs.cr3;
  tss32.io_bitmap = offsetof(struct tss32, io_bitmap);
  struct tss32* tss32_addr = (struct tss32*)(host_mem + seg_tss32_vm86.base);
  memcpy(tss32_addr, &tss32, sizeof(tss32));
  memset(&tss32, 0, sizeof(tss32));
  tss32.ss0 = tss32.ss1 = tss32.ss2 = SEL_DS32;
  tss32.sp0 = tss32.sp1 = tss32.sp2 = ADDR_STACK0;
  tss32.ip = ADDR_VAR_USER_CODE;
  tss32.flags = (1 << 1);
  tss32.cr3 = sregs.cr3;
  tss32.es = tss32.ds = tss32.ss = tss32.gs = tss32.fs = SEL_DS32;
  tss32.cs = SEL_CS32;
  tss32.ldt = SEL_LDT;
  tss32.cr3 = sregs.cr3;
  tss32.io_bitmap = offsetof(struct tss32, io_bitmap);
  struct tss32* tss32_cpl3_addr = (struct tss32*)(host_mem + seg_tss32_2.base);
  memcpy(tss32_cpl3_addr, &tss32, sizeof(tss32));
  struct tss64 tss64;
  memset(&tss64, 0, sizeof(tss64));
  tss64.rsp[0] = ADDR_STACK0;
  tss64.rsp[1] = ADDR_STACK0;
  tss64.rsp[2] = ADDR_STACK0;
  tss64.io_bitmap = offsetof(struct tss64, io_bitmap);
  struct tss64* tss64_addr = (struct tss64*)(host_mem + seg_tss64.base);
  memcpy(tss64_addr, &tss64, sizeof(tss64));
  memset(&tss64, 0, sizeof(tss64));
  tss64.rsp[0] = ADDR_STACK0;
  tss64.rsp[1] = ADDR_STACK0;
  tss64.rsp[2] = ADDR_STACK0;
  tss64.io_bitmap = offsetof(struct tss64, io_bitmap);
  struct tss64* tss64_cpl3_addr =
      (struct tss64*)(host_mem + seg_tss64_cpl3.base);
  memcpy(tss64_cpl3_addr, &tss64, sizeof(tss64));
  if (text_prefix) {
    memcpy(host_text, text_prefix, text_prefix_size);
    void* patch = 0;
    patch =
                   memmem(host_text, text_prefix_size, "\xde\xc0\xad\x0b", 4);
    if (patch)
      *((uint32_t*)patch) =
                     guest_mem + ADDR_TEXT + ((char*)patch - host_text) + 6;
    uint16_t magic = PREFIX_SIZE;
    patch = 0;
    patch =
                   memmem(host_text, text_prefix_size, &magic, sizeof(magic));
    if (patch)
      *((uint16_t*)patch) =
                     guest_mem + ADDR_TEXT + text_prefix_size;
  }
  *(host_text + text_prefix_size) = 0xf4;
  *(host_mem + ADDR_VAR_USER_CODE) = 0xf4;
  *(host_mem + ADDR_VAR_HLT) = 0xf4;
  memcpy(host_mem + ADDR_VAR_SYSRET, "\x0f\x07\xf4", 3);
  memcpy(host_mem + ADDR_VAR_SYSEXIT, "\x0f\x35\xf4", 3);
  *(uint64_t*)(host_mem + ADDR_VAR_VMWRITE_FLD) = 0;
  *(uint64_t*)(host_mem + ADDR_VAR_VMWRITE_VAL) = 0;
  regs.rflags |= 2;
  fill_segment_descriptor(gdt, ldt, &seg_ldt);
  fill_segment_descriptor(gdt, ldt, &seg_cs16);
  fill_segment_descriptor(gdt, ldt, &seg_ds16);
  fill_segment_descriptor(gdt, ldt, &seg_cs16_cpl3);
  fill_segment_descriptor(gdt, ldt, &seg_ds16_cpl3);
  fill_segment_descriptor(gdt, ldt, &seg_cs32);
  fill_segment_descriptor(gdt, ldt, &seg_ds32);
  fill_segment_descriptor(gdt, ldt, &seg_cs32_cpl3);
  fill_segment_descriptor(gdt, ldt, &seg_ds32_cpl3);
  fill_segment_descriptor(gdt, ldt, &seg_cs64);
  fill_segment_descriptor(gdt, ldt, &seg_ds64);
  fill_segment_descriptor(gdt, ldt, &seg_cs64_cpl3);
  fill_segment_descriptor(gdt, ldt, &seg_ds64_cpl3);
  fill_segment_descriptor(gdt, ldt, &seg_tss32);
  fill_segment_descriptor(gdt, ldt, &seg_tss32_2);
  fill_segment_descriptor(gdt, ldt, &seg_tss32_cpl3);
  fill_segment_descriptor(gdt, ldt, &seg_tss32_vm86);
  fill_segment_descriptor(gdt, ldt, &seg_tss16);
  fill_segment_descriptor(gdt, ldt, &seg_tss16_2);
  fill_segment_descriptor(gdt, ldt, &seg_tss16_cpl3);
  fill_segment_descriptor_dword(gdt, ldt, &seg_tss64);
  fill_segment_descriptor_dword(gdt, ldt, &seg_tss64_cpl3);
  fill_segment_descriptor(gdt, ldt, &seg_cgate16);
  fill_segment_descriptor(gdt, ldt, &seg_tgate16);
  fill_segment_descriptor(gdt, ldt, &seg_cgate32);
  fill_segment_descriptor(gdt, ldt, &seg_tgate32);
  fill_segment_descriptor_dword(gdt, ldt, &seg_cgate64);
  if (ioctl(cpufd, KVM_SET_SREGS, &sregs))
    return -1;
  if (ioctl(cpufd, KVM_SET_REGS, &regs))
    return -1;
  return 0;
}

int kvmfd, vmfd, vcpufd;

int main(void)
{
  syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
  kvmfd = open("/dev/kvm", O_RDWR);
  vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
  vcpufd = ioctl(vmfd, KVM_CREATE_VCPU, 0);
  syz_kvm_setup_vmfd(vmfd, 0x20000000);
  syz_kvm_setup_cpu(kvmfd, vcpufd, /*host_mem*/0x20000000);

    *(uint16_t*)0x20000580 = 0;
    *(uint16_t*)0x20000582 = 0;
    *(uint32_t*)0x20000584 = 1040;
    *(uint64_t*)0x20000588 = 0;
    *(uint64_t*)0x20000590 = 0xe000;  // host_mem output offset
    *(uint16_t*)0x20000598 = 0;
    ioctl(vcpufd, 0x4080aebf, 0x20000580);

  ioctl(vcpufd, KVM_RUN, 0);
  dump_buf(0x2000e000, 0x1000);
  return 0;
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KMSAN: kernel-infoleak in kvm_vcpu_write_guest_page
  2018-11-07 12:10 ` Alexander Potapenko
@ 2018-11-07 12:47   ` Paolo Bonzini
  2018-11-07 12:58     ` Liran Alon
  2018-11-07 12:52   ` KMSAN: kernel-infoleak in kvm_vcpu_write_guest_page Liran Alon
  1 sibling, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2018-11-07 12:47 UTC (permalink / raw)
  To: Alexander Potapenko, syzbot+ded1696f6b50b615b630
  Cc: kvm, LKML, rkrcmar, syzkaller-bugs

On 07/11/2018 13:10, Alexander Potapenko wrote:
> This appears to be a real bug in KVM.
> Please see a simplified reproducer attached.

Thanks, I agree it's a reael bug.  The basic issue is that the
kvm_state->size member is too small (1040) in the KVM_SET_NESTED_STATE
ioctl, aka 0x4080aebf.

One way to fix it would be to just change kmalloc to kzalloc when
allocating cached_vmcs12 and cached_shadow_vmcs12, but really the ioctl
is wrong and should be rejected.  And the case where a shadow VMCS has
to be loaded is even more wrong, and we have to fix it anyway, so I
don't really like the idea of papering over the bug in the allocation.

I'll test this patch and submit it formally:

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c645f777b425..c546f0b1f3e0 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -14888,10 +14888,13 @@ static int vmx_set_nested_state(struct
kvm_vcpu *vcpu,
 	if (ret)
 		return ret;

-	/* Empty 'VMXON' state is permitted */
-	if (kvm_state->size < sizeof(kvm_state) + sizeof(*vmcs12))
+	/* Empty 'VMXON' state is permitted.  A partial VMCS12 is not.  */
+	if (kvm_state->size == sizeof(kvm_state))
 		return 0;

+	if (kvm_state->size < sizeof(kvm_state) + VMCS12_SIZE)
+		return -EINVAL;
+
 	if (kvm_state->vmx.vmcs_pa != -1ull) {
 		if (kvm_state->vmx.vmcs_pa == kvm_state->vmx.vmxon_pa ||
 		    !page_address_valid(vcpu, kvm_state->vmx.vmcs_pa))
@@ -14917,6 +14920,7 @@ static int vmx_set_nested_state(struct kvm_vcpu
*vcpu,
 	}

 	vmcs12 = get_vmcs12(vcpu);
+	BUILD_BUG_ON(sizeof(*vmcs12) > VMCS12_SIZE);
 	if (copy_from_user(vmcs12, user_kvm_nested_state->data, sizeof(*vmcs12)))
 		return -EFAULT;

@@ -14932,7 +14936,7 @@ static int vmx_set_nested_state(struct kvm_vcpu
*vcpu,
 	if (nested_cpu_has_shadow_vmcs(vmcs12) &&
 	    vmcs12->vmcs_link_pointer != -1ull) {
 		struct vmcs12 *shadow_vmcs12 = get_shadow_vmcs12(vcpu);
-		if (kvm_state->size < sizeof(kvm_state) + 2 * sizeof(*vmcs12))
+		if (kvm_state->size < sizeof(kvm_state) + 2 * VMCS12_SIZE)
 			return -EINVAL;

 		if (copy_from_user(shadow_vmcs12,

Paolo

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: KMSAN: kernel-infoleak in kvm_vcpu_write_guest_page
  2018-11-07 12:10 ` Alexander Potapenko
  2018-11-07 12:47   ` Paolo Bonzini
@ 2018-11-07 12:52   ` Liran Alon
  1 sibling, 0 replies; 16+ messages in thread
From: Liran Alon @ 2018-11-07 12:52 UTC (permalink / raw)
  To: Alexander Potapenko
  Cc: syzbot+ded1696f6b50b615b630, kvm, LKML, pbonzini, rkrcmar,
	syzkaller-bugs



> On 7 Nov 2018, at 14:10, Alexander Potapenko <glider@google.com> wrote:
> 
> On Wed, Nov 7, 2018 at 2:38 AM syzbot
> <syzbot+ded1696f6b50b615b630@syzkaller.appspotmail.com> wrote:
>> 
>> Hello,
>> 
>> syzbot found the following crash on:
>> 
>> HEAD commit:    88b95ef4c780 kmsan: use MSan assembly instrumentation
>> git tree:       https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_google_kmsan.git_master&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=NpjK8vgD4UmRGNWTNe6YFA-AJTZp9VlD0oMFvKFV25I&s=A5G9_UvV5TpBoJitbGWS2VXmfVMXFUgggq64QM-6nqc&e=
>> console output: https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_log.txt-3Fx-3D12505e33400000&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=NpjK8vgD4UmRGNWTNe6YFA-AJTZp9VlD0oMFvKFV25I&s=ZGVw04dRMjYdKZf4amN1yl3IheljZZq6V9h3mO9U-vM&e=
>> kernel config:  https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_.config-3Fx-3D8df5fc509a1b351b&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=NpjK8vgD4UmRGNWTNe6YFA-AJTZp9VlD0oMFvKFV25I&s=OUIhmSMzBSMhswtebZqyLnc6SkAagVPBGr0EmCLL2Fg&e=
>> dashboard link: https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_bug-3Fextid-3Dded1696f6b50b615b630&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=NpjK8vgD4UmRGNWTNe6YFA-AJTZp9VlD0oMFvKFV25I&s=jeCou6OWbHHIf190FBGsLr1wGsDvBWlpKnfNMmqGIqI&e=
>> compiler:       clang version 8.0.0 (trunk 343298)
>> syz repro:      https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_repro.syz-3Fx-3D15ce62f5400000&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=NpjK8vgD4UmRGNWTNe6YFA-AJTZp9VlD0oMFvKFV25I&s=PVi1m-uNP3XRO4XbDZJicGiqXdVmOPFDxCP20jmXuAs&e=
>> C reproducer:   https://urldefense.proofpoint.com/v2/url?u=https-3A__syzkaller.appspot.com_x_repro.c-3Fx-3D174efca3400000&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=NpjK8vgD4UmRGNWTNe6YFA-AJTZp9VlD0oMFvKFV25I&s=XzvJtd3__2LEBvhAi4F6RTLbxV9mELkY07bMDSDLRMc&e=
>> 
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+ded1696f6b50b615b630@syzkaller.appspotmail.com
>> 
>> ==================================================================
>> BUG: KMSAN: kernel-infoleak in __copy_to_user include/linux/uaccess.h:121
>> [inline]
>> BUG: KMSAN: kernel-infoleak in __kvm_write_guest_page
>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:1849 [inline]
>> BUG: KMSAN: kernel-infoleak in kvm_vcpu_write_guest_page+0x39a/0x510
>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:1870
>> CPU: 0 PID: 7918 Comm: syz-executor542 Not tainted 4.19.0+ #77
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:77 [inline]
>>  dump_stack+0x32d/0x480 lib/dump_stack.c:113
>>  kmsan_report+0x1a2/0x2e0 mm/kmsan/kmsan.c:911
>>  kmsan_internal_check_memory+0x34c/0x430 mm/kmsan/kmsan.c:991
>>  kmsan_copy_to_user+0x85/0xe0 mm/kmsan/kmsan_hooks.c:552
>>  __copy_to_user include/linux/uaccess.h:121 [inline]
>>  __kvm_write_guest_page arch/x86/kvm/../../../virt/kvm/kvm_main.c:1849
>> [inline]
>>  kvm_vcpu_write_guest_page+0x39a/0x510
>> arch/x86/kvm/../../../virt/kvm/kvm_main.c:1870
>>  nested_release_vmcs12 arch/x86/kvm/vmx.c:8441 [inline]
>>  handle_vmptrld+0x2384/0x26b0 arch/x86/kvm/vmx.c:8907
>>  vmx_handle_exit+0x1e81/0xbac0 arch/x86/kvm/vmx.c:10128
>>  vcpu_enter_guest arch/x86/kvm/x86.c:7667 [inline]
>>  vcpu_run arch/x86/kvm/x86.c:7730 [inline]
>>  kvm_arch_vcpu_ioctl_run+0xac32/0x11d80 arch/x86/kvm/x86.c:7930
>>  kvm_vcpu_ioctl+0xfb1/0x1f90 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2590
>>  do_vfs_ioctl+0xf77/0x2d30 fs/ioctl.c:46
>>  ksys_ioctl fs/ioctl.c:702 [inline]
>>  __do_sys_ioctl fs/ioctl.c:709 [inline]
>>  __se_sys_ioctl+0x1da/0x270 fs/ioctl.c:707
>>  __x64_sys_ioctl+0x4a/0x70 fs/ioctl.c:707
>>  do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
>>  entry_SYSCALL_64_after_hwframe+0x63/0xe7
>> RIP: 0033:0x44b6e9
>> Code: e8 dc e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
>> 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
>> ff 0f 83 2b ff fb ff c3 66 2e 0f 1f 84 00 00 00 00
>> RSP: 002b:00007f096b292ce8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
>> RAX: ffffffffffffffda RBX: 00000000006e3c48 RCX: 000000000044b6e9
>> RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000005
>> RBP: 00000000006e3c40 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000206 R12: 00000000006e3c4c
>> R13: 00007ffd978aeb2f R14: 00007f096b2939c0 R15: 00000000006e3d4c
>> 
>> Uninit was created at:
>>  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:252 [inline]
>>  kmsan_internal_poison_shadow+0xc8/0x1e0 mm/kmsan/kmsan.c:177
>>  kmsan_kmalloc+0x98/0x110 mm/kmsan/kmsan_hooks.c:104
>>  __kmalloc+0x14c/0x4d0 mm/slub.c:3789
>>  kmalloc include/linux/slab.h:518 [inline]
>>  enter_vmx_operation+0x980/0x1a90 arch/x86/kvm/vmx.c:8278
>>  vmx_set_nested_state+0xc3a/0x1530 arch/x86/kvm/vmx.c:14045
>>  kvm_arch_vcpu_ioctl+0x4fc9/0x73a0 arch/x86/kvm/x86.c:4057
>>  kvm_vcpu_ioctl+0xca3/0x1f90 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2742
>>  do_vfs_ioctl+0xf77/0x2d30 fs/ioctl.c:46
>>  ksys_ioctl fs/ioctl.c:702 [inline]
>>  __do_sys_ioctl fs/ioctl.c:709 [inline]
>>  __se_sys_ioctl+0x1da/0x270 fs/ioctl.c:707
>>  __x64_sys_ioctl+0x4a/0x70 fs/ioctl.c:707
>>  do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
>>  entry_SYSCALL_64_after_hwframe+0x63/0xe7
>> 
>> Bytes 1000-4095 of 4096 are uninitialized
>> Memory access of size 4096 starts at ffff8801b5157000
>> ==================================================================
> This appears to be a real bug in KVM.
> Please see a simplified reproducer attached.

If I understand the above trace correctly:
1) When guest enters VMX operation, cached_vmcs12 is allocated in enter_vmx_operation() by kmalloc()
2) When guest will execute VMPTRLD, handle_vmptrld() will first release current vmcs12 by nested_release_vmcs12()
    and only then copy newly loaded vmcs12 to cached_vmcs12 by memcpy().
3) The bug theoretically is that nested_release_vmcs12() copies entire cached_vmcs12 to guest memory which is
     not initialised in case this is the very first VMPTRLD executed by guest.
     But this case should not happen because nested_release_vmcs12() starts by checking if current_vmptr is equal to -1
     and vmx_create_vcpu() set current_vmptr to -1ull.
     
However, one case where (3) could happen is by vmx_set_nested_state().
In case kvm_state->vmx.vmcs_pa is valid, set_current_vmptr() is called which sets current_vmptr.
However, it is possible that copy_from_user() which copies into cached_vmcs12 will fail.
Thus, leaving us with current_vmptr != -1 but with uninitialised cached_vmcs12.
Therefore, next VMPTRLD will indeed leak into guest the uninitialised content of cached_vmcs12.

What I described seems to be a bug to me (correct me if I’m wrong) but it doesn’t seem to be the
same issue described here as we can see that attached reproducer don’t use KVM_SET_NESTED_STATE ioctl.

-Liran


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KMSAN: kernel-infoleak in kvm_vcpu_write_guest_page
  2018-11-07 12:47   ` Paolo Bonzini
@ 2018-11-07 12:58     ` Liran Alon
  2018-11-07 13:37       ` Paolo Bonzini
  0 siblings, 1 reply; 16+ messages in thread
From: Liran Alon @ 2018-11-07 12:58 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alexander Potapenko, syzbot+ded1696f6b50b615b630, kvm, LKML,
	rkrcmar, syzkaller-bugs



> On 7 Nov 2018, at 14:47, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 07/11/2018 13:10, Alexander Potapenko wrote:
>> This appears to be a real bug in KVM.
>> Please see a simplified reproducer attached.
> 
> Thanks, I agree it's a reael bug.  The basic issue is that the
> kvm_state->size member is too small (1040) in the KVM_SET_NESTED_STATE
> ioctl, aka 0x4080aebf.
> 
> One way to fix it would be to just change kmalloc to kzalloc when
> allocating cached_vmcs12 and cached_shadow_vmcs12, but really the ioctl
> is wrong and should be rejected.  And the case where a shadow VMCS has
> to be loaded is even more wrong, and we have to fix it anyway, so I
> don't really like the idea of papering over the bug in the allocation.
> 
> I'll test this patch and submit it formally:
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c645f777b425..c546f0b1f3e0 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -14888,10 +14888,13 @@ static int vmx_set_nested_state(struct
> kvm_vcpu *vcpu,
> 	if (ret)
> 		return ret;
> 
> -	/* Empty 'VMXON' state is permitted */
> -	if (kvm_state->size < sizeof(kvm_state) + sizeof(*vmcs12))
> +	/* Empty 'VMXON' state is permitted.  A partial VMCS12 is not.  */
> +	if (kvm_state->size == sizeof(kvm_state))
> 		return 0;
> 
> +	if (kvm_state->size < sizeof(kvm_state) + VMCS12_SIZE)
> +		return -EINVAL;
> +

I don’t think that this test is sufficient to fully resolve issue.
What if malicious userspace supplies valid size but pages containing nested_state->vmcs12 is unmapped?
This will result in vmx_set_nested_state() still calling set_current_vmptr() but failing on copy_from_user()
which still leaks cached_vmcs12 on next VMPTRLD of guest.

Therefore, I think that the correct patch should be to change vmx_set_nested_state() to
first gather all relevant information from userspace and validate it,
and only then start applying it to KVM’s internal vCPU state.

> 	if (kvm_state->vmx.vmcs_pa != -1ull) {
> 		if (kvm_state->vmx.vmcs_pa == kvm_state->vmx.vmxon_pa ||
> 		    !page_address_valid(vcpu, kvm_state->vmx.vmcs_pa))
> @@ -14917,6 +14920,7 @@ static int vmx_set_nested_state(struct kvm_vcpu
> *vcpu,
> 	}
> 
> 	vmcs12 = get_vmcs12(vcpu);
> +	BUILD_BUG_ON(sizeof(*vmcs12) > VMCS12_SIZE);

Why put this BUILD_BUG_ON() specifically here?
There are many places which assumes cached_vmcs12 is of size VMCS12_SIZE.
(Such as nested_release_vmcs12() and handle_vmptrld()).

> 	if (copy_from_user(vmcs12, user_kvm_nested_state->data, sizeof(*vmcs12)))
> 		return -EFAULT;
> 
> @@ -14932,7 +14936,7 @@ static int vmx_set_nested_state(struct kvm_vcpu
> *vcpu,
> 	if (nested_cpu_has_shadow_vmcs(vmcs12) &&
> 	    vmcs12->vmcs_link_pointer != -1ull) {
> 		struct vmcs12 *shadow_vmcs12 = get_shadow_vmcs12(vcpu);
> -		if (kvm_state->size < sizeof(kvm_state) + 2 * sizeof(*vmcs12))
> +		if (kvm_state->size < sizeof(kvm_state) + 2 * VMCS12_SIZE)
> 			return -EINVAL;
> 
> 		if (copy_from_user(shadow_vmcs12,
> 
> Paolo

-Liran



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: KMSAN: kernel-infoleak in kvm_vcpu_write_guest_page
  2018-11-07 12:58     ` Liran Alon
@ 2018-11-07 13:37       ` Paolo Bonzini
  2019-01-14 23:47         ` [RFC PATCH] kvm: x86/vmx: Use kzalloc for cached_vmcs12 Tom Roeder
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2018-11-07 13:37 UTC (permalink / raw)
  To: Liran Alon
  Cc: Alexander Potapenko, syzbot+ded1696f6b50b615b630, kvm, LKML,
	rkrcmar, syzkaller-bugs

On 07/11/2018 13:58, Liran Alon wrote:
> 
> 
>> On 7 Nov 2018, at 14:47, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 07/11/2018 13:10, Alexander Potapenko wrote:
>>> This appears to be a real bug in KVM.
>>> Please see a simplified reproducer attached.
>>
>> Thanks, I agree it's a reael bug.  The basic issue is that the
>> kvm_state->size member is too small (1040) in the KVM_SET_NESTED_STATE
>> ioctl, aka 0x4080aebf.
>>
>> One way to fix it would be to just change kmalloc to kzalloc when
>> allocating cached_vmcs12 and cached_shadow_vmcs12, but really the ioctl
>> is wrong and should be rejected.  And the case where a shadow VMCS has
>> to be loaded is even more wrong, and we have to fix it anyway, so I
>> don't really like the idea of papering over the bug in the allocation.
>>
>> I'll test this patch and submit it formally:
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index c645f777b425..c546f0b1f3e0 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -14888,10 +14888,13 @@ static int vmx_set_nested_state(struct
>> kvm_vcpu *vcpu,
>> 	if (ret)
>> 		return ret;
>>
>> -	/* Empty 'VMXON' state is permitted */
>> -	if (kvm_state->size < sizeof(kvm_state) + sizeof(*vmcs12))
>> +	/* Empty 'VMXON' state is permitted.  A partial VMCS12 is not.  */
>> +	if (kvm_state->size == sizeof(kvm_state))
>> 		return 0;
>>
>> +	if (kvm_state->size < sizeof(kvm_state) + VMCS12_SIZE)
>> +		return -EINVAL;
>> +
> 
> I don’t think that this test is sufficient to fully resolve issue.
> What if malicious userspace supplies valid size but pages containing nested_state->vmcs12 is unmapped?
> This will result in vmx_set_nested_state() still calling set_current_vmptr() but failing on copy_from_user()
> which still leaks cached_vmcs12 on next VMPTRLD of guest.

Makes sense; since SET_NESTED_STATE is not a fast path, we can just
memdup_user and pass a kernel pointer to vmx_set_nested_state.

> Therefore, I think that the correct patch should be to change vmx_set_nested_state() to
> first gather all relevant information from userspace and validate it,
> and only then start applying it to KVM’s internal vCPU state.
> 
>> 	if (kvm_state->vmx.vmcs_pa != -1ull) {
>> 		if (kvm_state->vmx.vmcs_pa == kvm_state->vmx.vmxon_pa ||
>> 		    !page_address_valid(vcpu, kvm_state->vmx.vmcs_pa))
>> @@ -14917,6 +14920,7 @@ static int vmx_set_nested_state(struct kvm_vcpu
>> *vcpu,
>> 	}
>>
>> 	vmcs12 = get_vmcs12(vcpu);
>> +	BUILD_BUG_ON(sizeof(*vmcs12) > VMCS12_SIZE);
> 
> Why put this BUILD_BUG_ON() specifically here?
> There are many places which assumes cached_vmcs12 is of size VMCS12_SIZE.
> (Such as nested_release_vmcs12() and handle_vmptrld()).

Unlike those places, here the copy has sizeof(*vmcs12) bytes and an
overflow would cause a userspace write to kernel memory.  Though, that
means there is still a possibility of leaking kernel data when
nested_release_vmcs12 is called.  So it also makes sense to use
VMCS12_SIZE for the memory copies, and kzalloc.

Thanks,

Paolo

>> 	if (copy_from_user(vmcs12, user_kvm_nested_state->data, sizeof(*vmcs12)))
>> 		return -EFAULT;
>>
>> @@ -14932,7 +14936,7 @@ static int vmx_set_nested_state(struct kvm_vcpu
>> *vcpu,
>> 	if (nested_cpu_has_shadow_vmcs(vmcs12) &&
>> 	    vmcs12->vmcs_link_pointer != -1ull) {
>> 		struct vmcs12 *shadow_vmcs12 = get_shadow_vmcs12(vcpu);
>> -		if (kvm_state->size < sizeof(kvm_state) + 2 * sizeof(*vmcs12))
>> +		if (kvm_state->size < sizeof(kvm_state) + 2 * VMCS12_SIZE)
>> 			return -EINVAL;
>>
>> 		if (copy_from_user(shadow_vmcs12,
>>
>> Paolo
> 
> -Liran
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC PATCH] kvm: x86/vmx: Use kzalloc for cached_vmcs12
  2018-11-07 13:37       ` Paolo Bonzini
@ 2019-01-14 23:47         ` Tom Roeder
  2019-01-15  0:03           ` Jim Mattson
  2019-01-15  2:43           ` Sean Christopherson
  0 siblings, 2 replies; 16+ messages in thread
From: Tom Roeder @ 2019-01-14 23:47 UTC (permalink / raw)
  To: Paolo Bonzini, Radim Krčmář
  Cc: Liran Alon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, kvm, linux-kernel, Tom Roeder,
	syzbot+ded1696f6b50b615b630

This changes the allocation of cached_vmcs12 to use kzalloc instead of
kmalloc. This removes the information leak found by Syzkaller (see
Reported-by) in this case and prevents similar leaks from happening
based on cached_vmcs12.

The email from Syszkaller led to a discussion about a patch in early
November on the KVM list (I've made this a reply to that thread), but
the current upstream kernel still has kmalloc instead of kzalloc for
cached_vmcs12 and cached_shadow_vmcs12. This RFC proposes changing to
kzalloc for defense in depth.

Tested: rebuilt but not tested, since this is an RFC

Reported-by: syzbot+ded1696f6b50b615b630@syzkaller.appspotmail.com
Signed-off-by: Tom Roeder <tmroeder@google.com>
---
 arch/x86/kvm/vmx/nested.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 2616bd2c7f2c7..ad46667042c7a 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4140,11 +4140,11 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu)
 	if (r < 0)
 		goto out_vmcs02;
 
-	vmx->nested.cached_vmcs12 = kmalloc(VMCS12_SIZE, GFP_KERNEL);
+	vmx->nested.cached_vmcs12 = kzalloc(VMCS12_SIZE, GFP_KERNEL);
 	if (!vmx->nested.cached_vmcs12)
 		goto out_cached_vmcs12;
 
-	vmx->nested.cached_shadow_vmcs12 = kmalloc(VMCS12_SIZE, GFP_KERNEL);
+	vmx->nested.cached_shadow_vmcs12 = kzalloc(VMCS12_SIZE, GFP_KERNEL);
 	if (!vmx->nested.cached_shadow_vmcs12)
 		goto out_cached_shadow_vmcs12;
 
-- 
2.20.1.97.g81188d93c3-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] kvm: x86/vmx: Use kzalloc for cached_vmcs12
  2019-01-14 23:47         ` [RFC PATCH] kvm: x86/vmx: Use kzalloc for cached_vmcs12 Tom Roeder
@ 2019-01-15  0:03           ` Jim Mattson
  2019-01-15  2:43           ` Sean Christopherson
  1 sibling, 0 replies; 16+ messages in thread
From: Jim Mattson @ 2019-01-15  0:03 UTC (permalink / raw)
  To: Tom Roeder
  Cc: Paolo Bonzini, Radim Krčmář,
	Liran Alon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, the arch/x86 maintainers, kvm list, LKML,
	syzbot+ded1696f6b50b615b630

On Mon, Jan 14, 2019 at 3:48 PM Tom Roeder <tmroeder@google.com> wrote:
>
> This changes the allocation of cached_vmcs12 to use kzalloc instead of
> kmalloc. This removes the information leak found by Syzkaller (see
> Reported-by) in this case and prevents similar leaks from happening
> based on cached_vmcs12.
>
> The email from Syszkaller led to a discussion about a patch in early
> November on the KVM list (I've made this a reply to that thread), but
> the current upstream kernel still has kmalloc instead of kzalloc for
> cached_vmcs12 and cached_shadow_vmcs12. This RFC proposes changing to
> kzalloc for defense in depth.
>
> Tested: rebuilt but not tested, since this is an RFC
>
> Reported-by: syzbot+ded1696f6b50b615b630@syzkaller.appspotmail.com
> Signed-off-by: Tom Roeder <tmroeder@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] kvm: x86/vmx: Use kzalloc for cached_vmcs12
  2019-01-14 23:47         ` [RFC PATCH] kvm: x86/vmx: Use kzalloc for cached_vmcs12 Tom Roeder
  2019-01-15  0:03           ` Jim Mattson
@ 2019-01-15  2:43           ` Sean Christopherson
  2019-01-15 10:15             ` Paolo Bonzini
  2019-01-15 17:51             ` Tom Roeder
  1 sibling, 2 replies; 16+ messages in thread
From: Sean Christopherson @ 2019-01-15  2:43 UTC (permalink / raw)
  To: Tom Roeder
  Cc: Paolo Bonzini, Radim Krčmář,
	Liran Alon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, kvm, linux-kernel,
	syzbot+ded1696f6b50b615b630

On Mon, Jan 14, 2019 at 03:47:28PM -0800, Tom Roeder wrote:
> This changes the allocation of cached_vmcs12 to use kzalloc instead of
> kmalloc. This removes the information leak found by Syzkaller (see
> Reported-by) in this case and prevents similar leaks from happening
> based on cached_vmcs12.

Is the leak specific to vmx_set_nested_state(), e.g. can we zero out
the memory if copy_from_user() fails instead of taking the hit on every
allocation?

> The email from Syszkaller led to a discussion about a patch in early
> November on the KVM list (I've made this a reply to that thread), but
> the current upstream kernel still has kmalloc instead of kzalloc for
> cached_vmcs12 and cached_shadow_vmcs12. This RFC proposes changing to
> kzalloc for defense in depth.
> 
> Tested: rebuilt but not tested, since this is an RFC
> 
> Reported-by: syzbot+ded1696f6b50b615b630@syzkaller.appspotmail.com
> Signed-off-by: Tom Roeder <tmroeder@google.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 2616bd2c7f2c7..ad46667042c7a 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -4140,11 +4140,11 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu)
>  	if (r < 0)
>  		goto out_vmcs02;
>  
> -	vmx->nested.cached_vmcs12 = kmalloc(VMCS12_SIZE, GFP_KERNEL);
> +	vmx->nested.cached_vmcs12 = kzalloc(VMCS12_SIZE, GFP_KERNEL);
>  	if (!vmx->nested.cached_vmcs12)
>  		goto out_cached_vmcs12;

Obviously not your code, but why do we allocate VMCS12_SIZE instead of
sizeof(struct vmcs12)?  I get why we require userspace to reserve the
full 4k, but I don't understand why KVM needs to allocate the reserved
bytes internally.

> -	vmx->nested.cached_shadow_vmcs12 = kmalloc(VMCS12_SIZE, GFP_KERNEL);
> +	vmx->nested.cached_shadow_vmcs12 = kzalloc(VMCS12_SIZE, GFP_KERNEL);
>  	if (!vmx->nested.cached_shadow_vmcs12)
>  		goto out_cached_shadow_vmcs12;
>  
> -- 
> 2.20.1.97.g81188d93c3-goog
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] kvm: x86/vmx: Use kzalloc for cached_vmcs12
  2019-01-15  2:43           ` Sean Christopherson
@ 2019-01-15 10:15             ` Paolo Bonzini
  2019-01-23 18:25               ` Tom Roeder
  2019-01-15 17:51             ` Tom Roeder
  1 sibling, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2019-01-15 10:15 UTC (permalink / raw)
  To: Sean Christopherson, Tom Roeder
  Cc: Radim Krčmář,
	Liran Alon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, kvm, linux-kernel,
	syzbot+ded1696f6b50b615b630

On 15/01/19 03:43, Sean Christopherson wrote:
>> -	vmx->nested.cached_vmcs12 = kmalloc(VMCS12_SIZE, GFP_KERNEL);
>> +	vmx->nested.cached_vmcs12 = kzalloc(VMCS12_SIZE, GFP_KERNEL);
>>  	if (!vmx->nested.cached_vmcs12)
>>  		goto out_cached_vmcs12;
> Obviously not your code, but why do we allocate VMCS12_SIZE instead of
> sizeof(struct vmcs12)?  I get why we require userspace to reserve the
> full 4k, but I don't understand why KVM needs to allocate the reserved
> bytes internally.

It's just cleaner and shorter code to copy everything in and out,
instead of having to explicitly zero the slack.

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] kvm: x86/vmx: Use kzalloc for cached_vmcs12
  2019-01-15  2:43           ` Sean Christopherson
  2019-01-15 10:15             ` Paolo Bonzini
@ 2019-01-15 17:51             ` Tom Roeder
  2019-01-23 18:33               ` Tom Roeder
  1 sibling, 1 reply; 16+ messages in thread
From: Tom Roeder @ 2019-01-15 17:51 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Radim Krčmář,
	Liran Alon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, kvm, linux-kernel,
	syzbot+ded1696f6b50b615b630

On Mon, Jan 14, 2019 at 06:43:04PM -0800, Sean Christopherson wrote:
> On Mon, Jan 14, 2019 at 03:47:28PM -0800, Tom Roeder wrote:
> > This changes the allocation of cached_vmcs12 to use kzalloc instead of
> > kmalloc. This removes the information leak found by Syzkaller (see
> > Reported-by) in this case and prevents similar leaks from happening
> > based on cached_vmcs12.
> 
> Is the leak specific to vmx_set_nested_state(), e.g. can we zero out
> the memory if copy_from_user() fails instead of taking the hit on every
> allocation?

I don't know if the leak is specific to vmx_set_nested_state.

This question (and the rest of the thread from November) goes to the
heart of what I wanted to get feedback about; hence the "RFC" part of
the subject line.

I'm new to the kernel, and I don't know all the idioms and expectations,
so the follow analysis is an outsider's view of the issue at hand.

What I see in this case is a field that is intended to be copied to the
guest and is allocated initially with data from the kernel. I'm sure we
could figure out all the current paths and error cases that we need to
handle to make sure that this data never leaks. Future reviewers then
also need to make sure that changes to the nested VMX code don't leak
data from this field.

Why not instead make sure that there isn't any data to leak in the first
place?

I understand that there's a cost to kzalloc vs. kmalloc, but I don't
know what it is in practice; slab.c shows that the extra code for the
__GFP_ZERO flag is a memset of 0 over the allocated memory. The
allocation looks very infrequent for the two lines in this patch, since
they occur in enter_vmx_operation. That sounds to me like the allocation
only happens when the guest enables nested virtualization.

Given the frequency of allocation and the relative security benefit of
not having to worry about leaking the data, I'd advocate for changing it
here.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] kvm: x86/vmx: Use kzalloc for cached_vmcs12
  2019-01-15 10:15             ` Paolo Bonzini
@ 2019-01-23 18:25               ` Tom Roeder
  2019-01-24  1:17                 ` Paolo Bonzini
  0 siblings, 1 reply; 16+ messages in thread
From: Tom Roeder @ 2019-01-23 18:25 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Radim Krčmář,
	Liran Alon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, kvm, linux-kernel,
	syzbot+ded1696f6b50b615b630

On Tue, Jan 15, 2019 at 11:15:51AM +0100, Paolo Bonzini wrote:
> On 15/01/19 03:43, Sean Christopherson wrote:
> >> -	vmx->nested.cached_vmcs12 = kmalloc(VMCS12_SIZE, GFP_KERNEL);
> >> +	vmx->nested.cached_vmcs12 = kzalloc(VMCS12_SIZE, GFP_KERNEL);
> >>  	if (!vmx->nested.cached_vmcs12)
> >>  		goto out_cached_vmcs12;
> > Obviously not your code, but why do we allocate VMCS12_SIZE instead of
> > sizeof(struct vmcs12)?  I get why we require userspace to reserve the
> > full 4k, but I don't understand why KVM needs to allocate the reserved
> > bytes internally.
> 
> It's just cleaner and shorter code to copy everything in and out,
> instead of having to explicitly zero the slack.

Could you please clarify? I don't see code that copies everything in and
out, but it depends on what you mean by "everything". In the context of
this email exchange, I assumed that "everything" was "all 4k
(VMCS12_SIZE)".

But it looks to me like the code doesn't copy 4k in and out, but rather
only ever copies sizeof(struct vmcs12) in and out. The copy_from_user
and copy_to_user cases in nested.c use sizeof(*vmcs12), which is
sizeof(struct vmcs12).

So maybe can switch to allocating sizeof(struct vmcs12). Is this
correct, or is there some other reason to allocate the larger size?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] kvm: x86/vmx: Use kzalloc for cached_vmcs12
  2019-01-15 17:51             ` Tom Roeder
@ 2019-01-23 18:33               ` Tom Roeder
  2019-01-24  1:18                 ` Paolo Bonzini
  0 siblings, 1 reply; 16+ messages in thread
From: Tom Roeder @ 2019-01-23 18:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Radim Krčmář,
	Liran Alon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, kvm, linux-kernel,
	syzbot+ded1696f6b50b615b630

On Tue, Jan 15, 2019 at 09:51:11AM -0800, Tom Roeder wrote:
> On Mon, Jan 14, 2019 at 06:43:04PM -0800, Sean Christopherson wrote:
> > On Mon, Jan 14, 2019 at 03:47:28PM -0800, Tom Roeder wrote:
> > > This changes the allocation of cached_vmcs12 to use kzalloc instead of
> > > kmalloc. This removes the information leak found by Syzkaller (see
> > > Reported-by) in this case and prevents similar leaks from happening
> > > based on cached_vmcs12.
> > 
> > Is the leak specific to vmx_set_nested_state(), e.g. can we zero out
> > the memory if copy_from_user() fails instead of taking the hit on every
> > allocation?
> 
> I don't know if the leak is specific to vmx_set_nested_state.

I've looked at the code more now, and it looks to me like there might be
other cases where the memory could leak. But I don't know enough of the
flows to be sure. The enter_vmx_operation function is called in
handle_vmon, and no data is copied from the guest immediately after
that. So, it depends on what happens after VMXON.

Even in vmx_set_nested_state, there are about 30 lines of code in
between enter_vmx_operation and copy_from_user, and there are a couple
of cases that cause vmx_set_nested_state to return with an error. So if
we want to fix this by handling all the error paths, I think it might be
cleanest to convert vmx_set_nested_state to use goto error handling,
since that would allow us to clear the allocated memory in one place.

What do you think?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] kvm: x86/vmx: Use kzalloc for cached_vmcs12
  2019-01-23 18:25               ` Tom Roeder
@ 2019-01-24  1:17                 ` Paolo Bonzini
  0 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2019-01-24  1:17 UTC (permalink / raw)
  To: Tom Roeder
  Cc: Sean Christopherson, Radim Krčmář,
	Liran Alon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, kvm, linux-kernel,
	syzbot+ded1696f6b50b615b630

On 23/01/19 19:25, Tom Roeder wrote:
> On Tue, Jan 15, 2019 at 11:15:51AM +0100, Paolo Bonzini wrote:
>> On 15/01/19 03:43, Sean Christopherson wrote:
>>>> -	vmx->nested.cached_vmcs12 = kmalloc(VMCS12_SIZE, GFP_KERNEL);
>>>> +	vmx->nested.cached_vmcs12 = kzalloc(VMCS12_SIZE, GFP_KERNEL);
>>>>  	if (!vmx->nested.cached_vmcs12)
>>>>  		goto out_cached_vmcs12;
>>> Obviously not your code, but why do we allocate VMCS12_SIZE instead of
>>> sizeof(struct vmcs12)?  I get why we require userspace to reserve the
>>> full 4k, but I don't understand why KVM needs to allocate the reserved
>>> bytes internally.
>>
>> It's just cleaner and shorter code to copy everything in and out,
>> instead of having to explicitly zero the slack.
> 
> Could you please clarify? I don't see code that copies everything in and
> out, but it depends on what you mean by "everything". In the context of
> this email exchange, I assumed that "everything" was "all 4k
> (VMCS12_SIZE)".

I was thinking of vmx_get_nested_state, but actually it only copies
sizeof(*vmcs12).  However, that is the place where we should copy 4k out
of it, including the zeroes.  Otherwise, our userspace clients (which
doesn't know sizeof(*vmcs12) could leak uninitialized data of their own.

Paolo

> But it looks to me like the code doesn't copy 4k in and out, but rather
> only ever copies sizeof(struct vmcs12) in and out. The copy_from_user
> and copy_to_user cases in nested.c use sizeof(*vmcs12), which is
> sizeof(struct vmcs12).
> 
> So maybe can switch to allocating sizeof(struct vmcs12). Is this
> correct, or is there some other reason to allocate the larger size?
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] kvm: x86/vmx: Use kzalloc for cached_vmcs12
  2019-01-23 18:33               ` Tom Roeder
@ 2019-01-24  1:18                 ` Paolo Bonzini
  2019-01-24 21:46                   ` Tom Roeder
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2019-01-24  1:18 UTC (permalink / raw)
  To: Tom Roeder, Sean Christopherson
  Cc: Radim Krčmář,
	Liran Alon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, kvm, linux-kernel,
	syzbot+ded1696f6b50b615b630

On 23/01/19 19:33, Tom Roeder wrote:
> 
> Even in vmx_set_nested_state, there are about 30 lines of code in
> between enter_vmx_operation and copy_from_user, and there are a couple
> of cases that cause vmx_set_nested_state to return with an error. So if
> we want to fix this by handling all the error paths, I think it might be
> cleanest to convert vmx_set_nested_state to use goto error handling,
> since that would allow us to clear the allocated memory in one place.

I prefer to have kzalloc, and change vmx_get_nested_state to copy the
whole page including the trailing zeroes.

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH] kvm: x86/vmx: Use kzalloc for cached_vmcs12
  2019-01-24  1:18                 ` Paolo Bonzini
@ 2019-01-24 21:46                   ` Tom Roeder
  0 siblings, 0 replies; 16+ messages in thread
From: Tom Roeder @ 2019-01-24 21:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Radim Krčmář,
	Liran Alon, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, x86, kvm, linux-kernel,
	syzbot+ded1696f6b50b615b630

On Thu, Jan 24, 2019 at 02:18:26AM +0100, Paolo Bonzini wrote:
> On 23/01/19 19:33, Tom Roeder wrote:
> > 
> > Even in vmx_set_nested_state, there are about 30 lines of code in
> > between enter_vmx_operation and copy_from_user, and there are a couple
> > of cases that cause vmx_set_nested_state to return with an error. So if
> > we want to fix this by handling all the error paths, I think it might be
> > cleanest to convert vmx_set_nested_state to use goto error handling,
> > since that would allow us to clear the allocated memory in one place.
> 
> I prefer to have kzalloc, and change vmx_get_nested_state to copy the
> whole page including the trailing zeroes.

OK, sending as a v2 patch in a new thread.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-01-24 21:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-07  1:38 KMSAN: kernel-infoleak in kvm_vcpu_write_guest_page syzbot
2018-11-07 12:10 ` Alexander Potapenko
2018-11-07 12:47   ` Paolo Bonzini
2018-11-07 12:58     ` Liran Alon
2018-11-07 13:37       ` Paolo Bonzini
2019-01-14 23:47         ` [RFC PATCH] kvm: x86/vmx: Use kzalloc for cached_vmcs12 Tom Roeder
2019-01-15  0:03           ` Jim Mattson
2019-01-15  2:43           ` Sean Christopherson
2019-01-15 10:15             ` Paolo Bonzini
2019-01-23 18:25               ` Tom Roeder
2019-01-24  1:17                 ` Paolo Bonzini
2019-01-15 17:51             ` Tom Roeder
2019-01-23 18:33               ` Tom Roeder
2019-01-24  1:18                 ` Paolo Bonzini
2019-01-24 21:46                   ` Tom Roeder
2018-11-07 12:52   ` KMSAN: kernel-infoleak in kvm_vcpu_write_guest_page Liran Alon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).