* There is a null-ptr-deref bug in kvm_dirty_ring_get in virt/kvm/dirty_ring.c @ 2021-10-18 17:14 butt3rflyh4ck 2021-10-21 20:08 ` Paolo Bonzini 0 siblings, 1 reply; 27+ messages in thread From: butt3rflyh4ck @ 2021-10-18 17:14 UTC (permalink / raw) To: pbonzini; +Cc: kvm, LKML Hi, there is a null-ptr-deref bug in kvm_dirty_ring_get in virt/kvm/dirty_ring.c and I reproduce it on 5.15.0-rc5+. ###analyze we can call KVM_XEN_HVM_SET_ATTR ioctl and it would invoke kvm_xen_hvm_set_attr(), it would call mark_page_dirty_in_slot(). mark_page_dirty_in_slot() ``` void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot, gfn_t gfn) { if (memslot && kvm_slot_dirty_track_enabled(memslot)) { unsigned long rel_gfn = gfn - memslot->base_gfn; u32 slot = (memslot->as_id << 16) | memslot->id; if (kvm->dirty_ring_size) kvm_dirty_ring_push(kvm_dirty_ring_get(kvm), slot, rel_gfn); else set_bit_le(rel_gfn, memslot->dirty_bitmap); } } ``` mark_page_dirty_in_slot() would call kvm_dirty_ring_get() to get vcpu->dirty_ring. kvm_dirty_ring_get() ``` struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm) { struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); //-------> invoke kvm_get_running_vcpu() to get a vcpu. WARN_ON_ONCE(vcpu->kvm != kvm); [1] return &vcpu->dirty_ring; } ``` but we had not called KVM_CREATE_VCPU ioctl to create a kvm_vcpu so vcpu is NULL. [1].vcpu->kvm caused a null pointer dereference. ###Crash log root@syzkaller:/home/user# ./kvm_dirty_ring_get [ 2608.490187][ T6513] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 2608.491652][ T6513] #PF: supervisor read access in kernel mode [ 2608.492713][ T6513] #PF: error_code(0x0000) - not-present page [ 2608.493770][ T6513] PGD 15944067 P4D 15944067 PUD 1589d067 PMD 0 [ 2608.495568][ T6513] Oops: 0000 [#1] PREEMPT SMP [ 2608.496355][ T6513] CPU: 1 PID: 6513 Comm: kvm_dirty_ring_ Not tainted 5.15.0-rc5+ #14 [ 2608.497755][ T6513] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014 [ 2608.499451][ T6513] RIP: 0010:kvm_dirty_ring_get+0x9/0x20 [ 2608.500480][ T6513] Code: 90 e8 5b bb 04 00 83 c0 40 c3 0f 1f 80 00 00 00 00 8b 07 8b 57 04 29 d0 39 47 0c 0f 96 c0 c3 66 90 cc 48 89 fb e8 17 06 ff ff <48> b [ 2608.503997][ T6513] RSP: 0018:ffffc90000ab3c08 EFLAGS: 00010286 [ 2608.505054][ T6513] RAX: 0000000000000000 RBX: ffffc90000abd000 RCX: 0000000000000000 [ 2608.506346][ T6513] RDX: 0000000000000001 RSI: ffffffff84fc5baf RDI: 00000000ffffffff [ 2608.507705][ T6513] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000050198 [ 2608.509119][ T6513] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [ 2608.510527][ T6513] R13: 0000000020fff000 R14: 0000000000000000 R15: 0000000000000004 [ 2608.512259][ T6513] FS: 0000000001cb0880(0000) GS:ffff88807ec00000(0000) knlGS:0000000000000000 [ 2608.513848][ T6513] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2608.515061][ T6513] CR2: 0000000000000000 CR3: 000000001583c000 CR4: 00000000000006e0 [ 2608.516506][ T6513] Call Trace: [ 2608.517110][ T6513] mark_page_dirty_in_slot.part.0+0x21/0x50 [ 2608.518163][ T6513] __kvm_write_guest_page+0xa1/0xc0 [ 2608.519078][ T6513] kvm_write_guest+0x42/0x80 [ 2608.519901][ T6513] kvm_write_wall_clock+0x7f/0x140 [ 2608.520835][ T6513] kvm_xen_hvm_set_attr+0x13d/0x190 [ 2608.521775][ T6513] kvm_arch_vm_ioctl+0xa8b/0xc50 [ 2608.522762][ T6513] ? tomoyo_path_number_perm+0xee/0x290 [ 2608.523771][ T6513] kvm_vm_ioctl+0x716/0xe10 [ 2608.524545][ T6513] __x64_sys_ioctl+0x7b/0xb0 [ 2608.525362][ T6513] do_syscall_64+0x35/0xb0 [ 2608.530275][ T6513] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 2608.531327][ T6513] RIP: 0033:0x44953d [ 2608.532096][ T6513] Code: 28 c3 e8 36 29 00 00 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 8 [ 2608.535565][ T6513] RSP: 002b:00007ffeb22c2238 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 [ 2608.537028][ T6513] RAX: ffffffffffffffda RBX: 0000000000400518 RCX: 000000000044953d [ 2608.538436][ T6513] RDX: 0000000020001080 RSI: 000000004048aec9 RDI: 0000000000000004 [ 2608.539851][ T6513] RBP: 00007ffeb22c2250 R08: 0000000000000000 R09: 0000000000000000 [ 2608.541273][ T6513] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000402fb0 [ 2608.542845][ T6513] R13: 0000000000000000 R14: 00000000004c0018 R15: 0000000000000000 [ 2608.544260][ T6513] Modules linked in: [ 2608.544965][ T6513] CR2: 0000000000000000 [ 2608.547791][ T6513] ---[ end trace 69dbdf44c6028ede ]--- [ 2608.548674][ T6513] RIP: 0010:kvm_dirty_ring_get+0x9/0x20 [ 2608.549513][ T6513] Code: 90 e8 5b bb 04 00 83 c0 40 c3 0f 1f 80 00 00 00 00 8b 07 8b 57 04 29 d0 39 47 0c 0f 96 c0 c3 66 90 cc 48 89 fb e8 17 06 ff ff <48> b [ 2608.552808][ T6513] RSP: 0018:ffffc90000ab3c08 EFLAGS: 00010286 [ 2608.553702][ T6513] RAX: 0000000000000000 RBX: ffffc90000abd000 RCX: 0000000000000000 [ 2608.556308][ T6513] RDX: 0000000000000001 RSI: ffffffff84fc5baf RDI: 00000000ffffffff [ 2608.557778][ T6513] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000050198 [ 2608.559314][ T6513] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [ 2608.560877][ T6513] R13: 0000000020fff000 R14: 0000000000000000 R15: 0000000000000004 [ 2608.562799][ T6513] FS: 0000000001cb0880(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000 [ 2608.564529][ T6513] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2608.565864][ T6513] CR2: 0000000020000001 CR3: 000000001583c000 CR4: 00000000000006f0 [ 2608.567378][ T6513] Kernel panic - not syncing: Fatal exception [ 2608.568551][ T6513] Kernel Offset: disabled [ 2608.574584][ T6513] Rebooting in 86400 seconds.. Regards, butt3rflyh4ck. -- Active Defense Lab of Venustech ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: There is a null-ptr-deref bug in kvm_dirty_ring_get in virt/kvm/dirty_ring.c 2021-10-18 17:14 There is a null-ptr-deref bug in kvm_dirty_ring_get in virt/kvm/dirty_ring.c butt3rflyh4ck @ 2021-10-21 20:08 ` Paolo Bonzini 2021-10-28 7:42 ` butt3rflyh4ck ` (5 more replies) 0 siblings, 6 replies; 27+ messages in thread From: Paolo Bonzini @ 2021-10-21 20:08 UTC (permalink / raw) To: butt3rflyh4ck, Woodhouse, David; +Cc: kvm, LKML On 18/10/21 19:14, butt3rflyh4ck wrote: > { > struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); //-------> invoke > kvm_get_running_vcpu() to get a vcpu. > > WARN_ON_ONCE(vcpu->kvm != kvm); [1] > > return &vcpu->dirty_ring; > } > ``` > but we had not called KVM_CREATE_VCPU ioctl to create a kvm_vcpu so > vcpu is NULL. It's not just because there was no call to KVM_CREATE_VCPU; in general kvm->dirty_ring_size only works if all writes are associated to a specific vCPU, which is not the case for the one of kvm_xen_shared_info_init. David, what do you think? Making dirty-page ring buffer incompatible with Xen is ugly and I'd rather avoid it; taking the mutex for vcpu 0 is not an option because, as the reporter said, you might not have even created a vCPU yet when you call KVM_XEN_HVM_SET_ATTR. The remaining option would be just "do not mark the page as dirty if the ring buffer is active". This is feasible because userspace itself has passed the shared info gfn; but again, it's ugly... Paolo ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: There is a null-ptr-deref bug in kvm_dirty_ring_get in virt/kvm/dirty_ring.c 2021-10-21 20:08 ` Paolo Bonzini @ 2021-10-28 7:42 ` butt3rflyh4ck 2021-11-08 5:11 ` butt3rflyh4ck ` (4 subsequent siblings) 5 siblings, 0 replies; 27+ messages in thread From: butt3rflyh4ck @ 2021-10-28 7:42 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Woodhouse, David, kvm, LKML I agree with you. But I don’t have a good idea how to fix it Regards, butt3rflyh4ck. On Fri, Oct 22, 2021 at 4:08 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 18/10/21 19:14, butt3rflyh4ck wrote: > > { > > struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); //-------> invoke > > kvm_get_running_vcpu() to get a vcpu. > > > > WARN_ON_ONCE(vcpu->kvm != kvm); [1] > > > > return &vcpu->dirty_ring; > > } > > ``` > > but we had not called KVM_CREATE_VCPU ioctl to create a kvm_vcpu so > > vcpu is NULL. > > It's not just because there was no call to KVM_CREATE_VCPU; in general > kvm->dirty_ring_size only works if all writes are associated to a > specific vCPU, which is not the case for the one of > kvm_xen_shared_info_init. > > David, what do you think? Making dirty-page ring buffer incompatible > with Xen is ugly and I'd rather avoid it; taking the mutex for vcpu 0 is > not an option because, as the reporter said, you might not have even > created a vCPU yet when you call KVM_XEN_HVM_SET_ATTR. The remaining > option would be just "do not mark the page as dirty if the ring buffer > is active". This is feasible because userspace itself has passed the > shared info gfn; but again, it's ugly... > > Paolo > -- Active Defense Lab of Venustech ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: There is a null-ptr-deref bug in kvm_dirty_ring_get in virt/kvm/dirty_ring.c 2021-10-21 20:08 ` Paolo Bonzini 2021-10-28 7:42 ` butt3rflyh4ck @ 2021-11-08 5:11 ` butt3rflyh4ck 2021-11-16 15:41 ` butt3rflyh4ck ` (3 subsequent siblings) 5 siblings, 0 replies; 27+ messages in thread From: butt3rflyh4ck @ 2021-11-08 5:11 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Woodhouse, David, kvm, LKML Hi, No one pays attention to this issue? Regards, butt3rflyh4ck. On Fri, Oct 22, 2021 at 4:08 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 18/10/21 19:14, butt3rflyh4ck wrote: > > { > > struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); //-------> invoke > > kvm_get_running_vcpu() to get a vcpu. > > > > WARN_ON_ONCE(vcpu->kvm != kvm); [1] > > > > return &vcpu->dirty_ring; > > } > > ``` > > but we had not called KVM_CREATE_VCPU ioctl to create a kvm_vcpu so > > vcpu is NULL. > > It's not just because there was no call to KVM_CREATE_VCPU; in general > kvm->dirty_ring_size only works if all writes are associated to a > specific vCPU, which is not the case for the one of > kvm_xen_shared_info_init. > > David, what do you think? Making dirty-page ring buffer incompatible > with Xen is ugly and I'd rather avoid it; taking the mutex for vcpu 0 is > not an option because, as the reporter said, you might not have even > created a vCPU yet when you call KVM_XEN_HVM_SET_ATTR. The remaining > option would be just "do not mark the page as dirty if the ring buffer > is active". This is feasible because userspace itself has passed the > shared info gfn; but again, it's ugly... > > Paolo > -- Active Defense Lab of Venustech ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: There is a null-ptr-deref bug in kvm_dirty_ring_get in virt/kvm/dirty_ring.c 2021-10-21 20:08 ` Paolo Bonzini 2021-10-28 7:42 ` butt3rflyh4ck 2021-11-08 5:11 ` butt3rflyh4ck @ 2021-11-16 15:41 ` butt3rflyh4ck 2021-11-16 16:22 ` [EXTERNAL] " David Woodhouse ` (2 subsequent siblings) 5 siblings, 0 replies; 27+ messages in thread From: butt3rflyh4ck @ 2021-11-16 15:41 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Woodhouse, David, kvm, LKML For this issue, I have reviewed the implementation code of vm creating,vCPU creating and dirty_ring creating, I have some ideas.If only judge kvm->dirty_ring_size, determine whether to call kvm_dirty_ring_push(), this condition is not sufficient. can we add a judgement on kvm->created_vcpus. kvm->created_vcpus is not NULL. After all, there is a situation, no vCPU was created, but kvm->dirty_ring_size has a value. Regards, butt3rflyh4ck. On Fri, Oct 22, 2021 at 4:08 AM Paolo Bonzini <pbonzini@redhat.com> wrote: > > On 18/10/21 19:14, butt3rflyh4ck wrote: > > { > > struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); //-------> invoke > > kvm_get_running_vcpu() to get a vcpu. > > > > WARN_ON_ONCE(vcpu->kvm != kvm); [1] > > > > return &vcpu->dirty_ring; > > } > > ``` > > but we had not called KVM_CREATE_VCPU ioctl to create a kvm_vcpu so > > vcpu is NULL. > > It's not just because there was no call to KVM_CREATE_VCPU; in general > kvm->dirty_ring_size only works if all writes are associated to a > specific vCPU, which is not the case for the one of > kvm_xen_shared_info_init. > > David, what do you think? Making dirty-page ring buffer incompatible > with Xen is ugly and I'd rather avoid it; taking the mutex for vcpu 0 is > not an option because, as the reporter said, you might not have even > created a vCPU yet when you call KVM_XEN_HVM_SET_ATTR. The remaining > option would be just "do not mark the page as dirty if the ring buffer > is active". This is feasible because userspace itself has passed the > shared info gfn; but again, it's ugly... > > Paolo > -- Active Defense Lab of Venustech ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [EXTERNAL] There is a null-ptr-deref bug in kvm_dirty_ring_get in virt/kvm/dirty_ring.c 2021-10-21 20:08 ` Paolo Bonzini ` (2 preceding siblings ...) 2021-11-16 15:41 ` butt3rflyh4ck @ 2021-11-16 16:22 ` David Woodhouse 2021-11-16 17:07 ` David Woodhouse 2021-11-17 9:46 ` Woodhouse, David 2021-11-20 10:16 ` KVM: Warn if mark_page_dirty() is called without an active vCPU David Woodhouse 5 siblings, 1 reply; 27+ messages in thread From: David Woodhouse @ 2021-11-16 16:22 UTC (permalink / raw) To: Paolo Bonzini, butt3rflyh4ck; +Cc: kvm, LKML [-- Attachment #1: Type: text/plain, Size: 2131 bytes --] On Thu, 2021-10-21 at 22:08 +0200, Paolo Bonzini wrote: > On 18/10/21 19:14, butt3rflyh4ck wrote: > > { > > struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); //-------> invoke > > kvm_get_running_vcpu() to get a vcpu. > > > > WARN_ON_ONCE(vcpu->kvm != kvm); [1] > > > > return &vcpu->dirty_ring; > > } > > ``` > > but we had not called KVM_CREATE_VCPU ioctl to create a kvm_vcpu so > > vcpu is NULL. > > It's not just because there was no call to KVM_CREATE_VCPU; in general > kvm->dirty_ring_size only works if all writes are associated to a > specific vCPU, which is not the case for the one of > kvm_xen_shared_info_init. > > David, what do you think? Making dirty-page ring buffer incompatible > with Xen is ugly and I'd rather avoid it; taking the mutex for vcpu 0 is > not an option because, as the reporter said, you might not have even > created a vCPU yet when you call KVM_XEN_HVM_SET_ATTR. The remaining > option would be just "do not mark the page as dirty if the ring buffer > is active". This is feasible because userspace itself has passed the > shared info gfn; but again, it's ugly... Sorry for delayed response. So, from the point of view of Xen guests setting a shinfo page at runtime, there *is* a vCPU running, and it's made the XENMAPSPACE_shared_info hypercall to request that the shinfo page be mapped. So from KVM's point of view, that vCPU will have exited with KVM_EXIT_XEN / KVM_EXIT_XEN_HCALL. The VMM is then invoking the KVM_XEN_HVM_SET_ATTR ioctl from that vCPU's userspace thread, while the KVM vCPU is idle. I suppose we could make it a KVM_XEN_VCPU_SET_ATTR instead, and thus associate it with a particular CPU at least for the initial wallclock write? Interrupts get delivered to the shinfo page too, but those will also have a vCPU associated with them. On live migration / live update the solution isn't quite so natural; right now it *is* restored before any vCPUs are created. But if the kernel made it a KVM_XEN_VCPU_SET_ATTR then the VMM will just have to restore it as part of restoring the BSP or something. That can work? [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5174 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [EXTERNAL] There is a null-ptr-deref bug in kvm_dirty_ring_get in virt/kvm/dirty_ring.c 2021-11-16 16:22 ` [EXTERNAL] " David Woodhouse @ 2021-11-16 17:07 ` David Woodhouse 0 siblings, 0 replies; 27+ messages in thread From: David Woodhouse @ 2021-11-16 17:07 UTC (permalink / raw) To: Paolo Bonzini, butt3rflyh4ck; +Cc: kvm, LKML [-- Attachment #1: Type: text/plain, Size: 13582 bytes --] On Tue, 2021-11-16 at 16:22 +0000, David Woodhouse wrote: > > I suppose we could make it a KVM_XEN_VCPU_SET_ATTR instead, and thus > associate it with a particular CPU at least for the initial wallclock > write? > We'd end up needing to do all this too, to plumb that 'vcpu' through to the actual mark_page_dirty_in_slot(). And I might end up wanting to kill kvm_write_guest() et al completely. If we're never supposed to be writing without a vCPU associated with the write, then we should always use kvm_vcpu_write_guest(), shouldn't we? Paolo, what do you think? Want me to finish and test this and submit it, along with changing the shinfo address to a per-vCPU thing? diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 326cdfec74a1..d8411ce4db4b 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1143,7 +1143,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, /* Mark the page dirty only if the fault is handled successfully */ if (writable && !ret) { kvm_set_pfn_dirty(pfn); - mark_page_dirty_in_slot(kvm, memslot, gfn); + mark_page_dirty_in_slot(kvm, vcpu, memslot, gfn); } out_unlock: diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 33794379949e..4fd2ad5327b6 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3090,7 +3090,7 @@ fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, return false; if (is_writable_pte(new_spte) && !is_writable_pte(old_spte)) - mark_page_dirty_in_slot(vcpu->kvm, fault->slot, fault->gfn); + mark_page_dirty_in_slot(vcpu->kvm, vcpu, fault->slot, fault->gfn); return true; } diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 0c76c45fdb68..0598515f3ae2 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -184,7 +184,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) { /* Enforced by kvm_mmu_hugepage_adjust. */ WARN_ON(level > PG_LEVEL_4K); - mark_page_dirty_in_slot(vcpu->kvm, slot, gfn); + mark_page_dirty_in_slot(vcpu->kvm, vcpu, slot, gfn); } *new_spte = spte; diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index a54c3491af42..c5669c9918a4 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -247,7 +247,7 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn, if ((!is_writable_pte(old_spte) || pfn_changed) && is_writable_pte(new_spte)) { slot = __gfn_to_memslot(__kvm_memslots(kvm, as_id), gfn); - mark_page_dirty_in_slot(kvm, slot, gfn); + mark_page_dirty_in_slot(kvm, NULL, slot, gfn); } } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index a879e4d08758..c14ce545fae9 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2118,7 +2118,7 @@ static s64 get_kvmclock_base_ns(void) } #endif -void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock, int sec_hi_ofs) +void kvm_write_wall_clock(struct kvm_vcpu *vcpu, gpa_t wall_clock, int sec_hi_ofs) { int version; int r; @@ -2129,7 +2129,7 @@ void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock, int sec_hi_ofs) if (!wall_clock) return; - r = kvm_read_guest(kvm, wall_clock, &version, sizeof(version)); + r = kvm_vcpu_read_guest(vcpu, wall_clock, &version, sizeof(version)); if (r) return; @@ -2138,7 +2138,7 @@ void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock, int sec_hi_ofs) ++version; - if (kvm_write_guest(kvm, wall_clock, &version, sizeof(version))) + if (kvm_vcpu_write_guest(vcpu, wall_clock, &version, sizeof(version))) return; /* @@ -2146,22 +2146,22 @@ void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock, int sec_hi_ofs) * system time (updated by kvm_guest_time_update below) to the * wall clock specified here. We do the reverse here. */ - wall_nsec = ktime_get_real_ns() - get_kvmclock_ns(kvm); + wall_nsec = ktime_get_real_ns() - get_kvmclock_ns(vcpu->kvm); wc.nsec = do_div(wall_nsec, 1000000000); wc.sec = (u32)wall_nsec; /* overflow in 2106 guest time */ wc.version = version; - kvm_write_guest(kvm, wall_clock, &wc, sizeof(wc)); + kvm_vcpu_write_guest(vcpu, wall_clock, &wc, sizeof(wc)); if (sec_hi_ofs) { wc_sec_hi = wall_nsec >> 32; - kvm_write_guest(kvm, wall_clock + sec_hi_ofs, - &wc_sec_hi, sizeof(wc_sec_hi)); + kvm_vcpu_write_guest(vcpu, wall_clock + sec_hi_ofs, + &wc_sec_hi, sizeof(wc_sec_hi)); } version++; - kvm_write_guest(kvm, wall_clock, &version, sizeof(version)); + kvm_vcpu_write_guest(vcpu, wall_clock, &version, sizeof(version)); } static void kvm_write_system_time(struct kvm_vcpu *vcpu, gpa_t system_time, @@ -3353,7 +3353,7 @@ static void record_steal_time(struct kvm_vcpu *vcpu) out: user_access_end(); dirty: - mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa)); + mark_page_dirty_in_slot(vcpu->kvm, vcpu, ghc->memslot, gpa_to_gfn(ghc->gpa)); } int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) @@ -3494,14 +3494,14 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) return 1; vcpu->kvm->arch.wall_clock = data; - kvm_write_wall_clock(vcpu->kvm, data, 0); + kvm_write_wall_clock(vcpu, data, 0); break; case MSR_KVM_WALL_CLOCK: if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE)) return 1; vcpu->kvm->arch.wall_clock = data; - kvm_write_wall_clock(vcpu->kvm, data, 0); + kvm_write_wall_clock(vcpu, data, 0); break; case MSR_KVM_SYSTEM_TIME_NEW: if (!guest_pv_has(vcpu, KVM_FEATURE_CLOCKSOURCE2)) @@ -4421,7 +4421,7 @@ static void kvm_steal_time_set_preempted(struct kvm_vcpu *vcpu) if (!copy_to_user_nofault(&st->preempted, &preempted, sizeof(preempted))) vcpu->arch.st.preempted = KVM_VCPU_PREEMPTED; - mark_page_dirty_in_slot(vcpu->kvm, ghc->memslot, gpa_to_gfn(ghc->gpa)); + mark_page_dirty_in_slot(vcpu->kvm, vcpu, ghc->memslot, gpa_to_gfn(ghc->gpa)); } void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index ea264c4502e4..f1dab3413fc8 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -294,7 +294,7 @@ static inline bool kvm_vcpu_latch_init(struct kvm_vcpu *vcpu) return is_smm(vcpu) || static_call(kvm_x86_apic_init_signal_blocked)(vcpu); } -void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock, int sec_hi_ofs); +void kvm_write_wall_clock(struct kvm_vcpu *vcpu, gpa_t wall_clock, int sec_hi_ofs); void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip); u64 get_kvmclock_ns(struct kvm *kvm); diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c index 7a58df25c9b2..e7b0c0af807d 100644 --- a/arch/x86/kvm/xen.c +++ b/arch/x86/kvm/xen.c @@ -37,7 +37,7 @@ static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn) ret = kvm_gfn_to_pfn_cache_init(kvm, gpc, NULL, true, gpa, PAGE_SIZE, true); - if (ret) + if (ret && !kvm->vcpus[0]) goto out; /* Paranoia checks on the 32-bit struct layout */ @@ -60,7 +60,7 @@ static int kvm_xen_shared_info_init(struct kvm *kvm, gfn_t gfn) } #endif - kvm_write_wall_clock(kvm, gpa + wc_ofs, sec_hi_ofs - wc_ofs); + kvm_write_wall_clock(kvm->vcpus[0], gpa + wc_ofs, sec_hi_ofs - wc_ofs); kvm_make_all_cpus_request(kvm, KVM_REQ_MASTERCLOCK_UPDATE); out: @@ -316,7 +316,7 @@ int __kvm_xen_has_interrupt(struct kvm_vcpu *v) err: user_access_end(); - mark_page_dirty_in_slot(v->kvm, ghc->memslot, ghc->gpa >> PAGE_SHIFT); + mark_page_dirty_in_slot(v->kvm, v, ghc->memslot, ghc->gpa >> PAGE_SHIFT); } else { __get_user(rc, (u8 __user *)ghc->hva + offset); } diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h index 4da8d4a4140b..f3be974f9c5a 100644 --- a/include/linux/kvm_dirty_ring.h +++ b/include/linux/kvm_dirty_ring.h @@ -43,7 +43,8 @@ static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, return 0; } -static inline struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm) +static inline struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm, + struct kvm_vcpu *vcpu) { return NULL; } @@ -78,7 +79,8 @@ static inline bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring) u32 kvm_dirty_ring_get_rsvd_entries(void); int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size); -struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm); +struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm, + struct kvm_vcpu *vcpu); /* * called with kvm->slots_lock held, returns the number of diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f145a58e37b0..59a92a82e3a3 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -954,7 +954,8 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn); bool kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn); bool kvm_vcpu_is_visible_gfn(struct kvm_vcpu *vcpu, gfn_t gfn); unsigned long kvm_host_page_size(struct kvm_vcpu *vcpu, gfn_t gfn); -void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot, gfn_t gfn); +void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_vcpu *vcpu, + struct kvm_memory_slot *memslot, gfn_t gfn); void mark_page_dirty(struct kvm *kvm, gfn_t gfn); struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu); diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c index 2b4474387895..879d454eef71 100644 --- a/virt/kvm/dirty_ring.c +++ b/virt/kvm/dirty_ring.c @@ -36,12 +36,16 @@ static bool kvm_dirty_ring_full(struct kvm_dirty_ring *ring) return kvm_dirty_ring_used(ring) >= ring->size; } -struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm) +struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm, struct kvm_vcpu *vcpu) { - struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); + struct kvm_vcpu *running_vcpu = kvm_get_running_vcpu(); + WARN_ON_ONCE(vcpu && vcpu != running_vcpu); WARN_ON_ONCE(vcpu->kvm != kvm); + if (!vcpu) + vcpu = running_vcpu; + return &vcpu->dirty_ring; } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4065fd32271a..24f300e5fa96 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2787,7 +2787,7 @@ int kvm_vcpu_read_guest_atomic(struct kvm_vcpu *vcpu, gpa_t gpa, } EXPORT_SYMBOL_GPL(kvm_vcpu_read_guest_atomic); -static int __kvm_write_guest_page(struct kvm *kvm, +static int __kvm_write_guest_page(struct kvm *kvm, struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, gfn_t gfn, const void *data, int offset, int len) { @@ -2800,7 +2800,7 @@ static int __kvm_write_guest_page(struct kvm *kvm, r = __copy_to_user((void __user *)addr + offset, data, len); if (r) return -EFAULT; - mark_page_dirty_in_slot(kvm, memslot, gfn); + mark_page_dirty_in_slot(kvm, vcpu, memslot, gfn); return 0; } @@ -2809,7 +2809,7 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, { struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn); - return __kvm_write_guest_page(kvm, slot, gfn, data, offset, len); + return __kvm_write_guest_page(kvm, NULL, slot, gfn, data, offset, len); } EXPORT_SYMBOL_GPL(kvm_write_guest_page); @@ -2818,7 +2818,7 @@ int kvm_vcpu_write_guest_page(struct kvm_vcpu *vcpu, gfn_t gfn, { struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); - return __kvm_write_guest_page(vcpu->kvm, slot, gfn, data, offset, len); + return __kvm_write_guest_page(vcpu->kvm, vcpu, slot, gfn, data, offset, len); } EXPORT_SYMBOL_GPL(kvm_vcpu_write_guest_page); @@ -2937,7 +2937,7 @@ int kvm_write_guest_offset_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc, r = __copy_to_user((void __user *)ghc->hva + offset, data, len); if (r) return -EFAULT; - mark_page_dirty_in_slot(kvm, ghc->memslot, gpa >> PAGE_SHIFT); + mark_page_dirty_in_slot(kvm, NULL, ghc->memslot, gpa >> PAGE_SHIFT); return 0; } @@ -3006,7 +3006,7 @@ int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len) } EXPORT_SYMBOL_GPL(kvm_clear_guest); -void mark_page_dirty_in_slot(struct kvm *kvm, +void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot, gfn_t gfn) { @@ -3015,7 +3015,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm, u32 slot = (memslot->as_id << 16) | memslot->id; if (kvm->dirty_ring_size) - kvm_dirty_ring_push(kvm_dirty_ring_get(kvm), + kvm_dirty_ring_push(kvm_dirty_ring_get(kvm, vcpu), slot, rel_gfn); else set_bit_le(rel_gfn, memslot->dirty_bitmap); @@ -3028,7 +3028,7 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn) struct kvm_memory_slot *memslot; memslot = gfn_to_memslot(kvm, gfn); - mark_page_dirty_in_slot(kvm, memslot, gfn); + mark_page_dirty_in_slot(kvm, NULL, memslot, gfn); } EXPORT_SYMBOL_GPL(mark_page_dirty); @@ -3037,7 +3037,7 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn) struct kvm_memory_slot *memslot; memslot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); - mark_page_dirty_in_slot(vcpu->kvm, memslot, gfn); + mark_page_dirty_in_slot(vcpu->kvm, vcpu, memslot, gfn); } EXPORT_SYMBOL_GPL(kvm_vcpu_mark_page_dirty); [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5174 bytes --] ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: There is a null-ptr-deref bug in kvm_dirty_ring_get in virt/kvm/dirty_ring.c 2021-10-21 20:08 ` Paolo Bonzini ` (3 preceding siblings ...) 2021-11-16 16:22 ` [EXTERNAL] " David Woodhouse @ 2021-11-17 9:46 ` Woodhouse, David 2021-11-17 16:49 ` Paolo Bonzini 2021-11-20 10:16 ` KVM: Warn if mark_page_dirty() is called without an active vCPU David Woodhouse 5 siblings, 1 reply; 27+ messages in thread From: Woodhouse, David @ 2021-11-17 9:46 UTC (permalink / raw) To: pbonzini, butterflyhuangxx; +Cc: kvm, linux-kernel On Thu, 2021-10-21 at 22:08 +0200, Paolo Bonzini wrote: > On 18/10/21 19:14, butt3rflyh4ck wrote: > > { > > struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); //-------> invoke > > kvm_get_running_vcpu() to get a vcpu. > > > > WARN_ON_ONCE(vcpu->kvm != kvm); [1] > > > > return &vcpu->dirty_ring; > > } > > ``` > > but we had not called KVM_CREATE_VCPU ioctl to create a kvm_vcpu so > > vcpu is NULL. > > It's not just because there was no call to KVM_CREATE_VCPU; in general > kvm->dirty_ring_size only works if all writes are associated to a > specific vCPU, which is not the case for the one of > kvm_xen_shared_info_init. > > David, what do you think? Making dirty-page ring buffer incompatible > with Xen is ugly and I'd rather avoid it; taking the mutex for vcpu 0 is > not an option because, as the reporter said, you might not have even > created a vCPU yet when you call KVM_XEN_HVM_SET_ATTR. The remaining > option would be just "do not mark the page as dirty if the ring buffer > is active". This is feasible because userspace itself has passed the > shared info gfn; but again, it's ugly... I think I am coming to quite like that 'remaining option' as long as we rephrase it as follows: KVM does not mark the shared_info page as dirty, and userspace is expected to *assume* that it is dirty at all times. It's used for delivering event channel interrupts and the overhead of marking it dirty each time is just pointless. I've merged the patch I sent yesterday into my pfncache series because I realised we needed the same there, but I'll look at making the Xen code *not* mark the shinfo page dirty when it writes the clock data there. Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: There is a null-ptr-deref bug in kvm_dirty_ring_get in virt/kvm/dirty_ring.c 2021-11-17 9:46 ` Woodhouse, David @ 2021-11-17 16:49 ` Paolo Bonzini 2021-11-17 18:10 ` Woodhouse, David 0 siblings, 1 reply; 27+ messages in thread From: Paolo Bonzini @ 2021-11-17 16:49 UTC (permalink / raw) To: Woodhouse, David, butterflyhuangxx; +Cc: kvm, linux-kernel On 11/17/21 10:46, Woodhouse, David wrote: >> The remaining >> option would be just "do not mark the page as dirty if the ring buffer >> is active". This is feasible because userspace itself has passed the >> shared info gfn; but again, it's ugly... > I think I am coming to quite like that 'remaining option' as long as we > rephrase it as follows: > > KVM does not mark the shared_info page as dirty, and userspace is > expected to*assume* that it is dirty at all times. It's used for > delivering event channel interrupts and the overhead of marking it > dirty each time is just pointless. For the case of dirty-bitmap, one solution could be to only set a bool and actually mark the page dirty lazily, at the time of KVM_GET_DIRTY_LOG. For dirty-ring, I agree that it's easiest if userspace just "knows" the page is dirty. Paolo ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: There is a null-ptr-deref bug in kvm_dirty_ring_get in virt/kvm/dirty_ring.c 2021-11-17 16:49 ` Paolo Bonzini @ 2021-11-17 18:10 ` Woodhouse, David 0 siblings, 0 replies; 27+ messages in thread From: Woodhouse, David @ 2021-11-17 18:10 UTC (permalink / raw) To: pbonzini, butterflyhuangxx; +Cc: kvm, linux-kernel On Wed, 2021-11-17 at 17:49 +0100, Paolo Bonzini wrote: > On 11/17/21 10:46, Woodhouse, David wrote: > > > The remaining > > > option would be just "do not mark the page as dirty if the ring buffer > > > is active". This is feasible because userspace itself has passed the > > > shared info gfn; but again, it's ugly... > > I think I am coming to quite like that 'remaining option' as long as we > > rephrase it as follows: > > > > KVM does not mark the shared_info page as dirty, and userspace is > > expected to*assume* that it is dirty at all times. It's used for > > delivering event channel interrupts and the overhead of marking it > > dirty each time is just pointless. > > For the case of dirty-bitmap, one solution could be to only set a bool > and actually mark the page dirty lazily, at the time of > KVM_GET_DIRTY_LOG. For dirty-ring, I agree that it's easiest if > userspace just "knows" the page is dirty. TBH we get that former behaviour for free if we just do the access via the shiny new gfn_to_pfn_cache. The page is marked dirty once, when the cache is invalidated. I was actually tempted to avoid even setting the dirty bit even when we write to the shinfo page. None of which *immediately* solves it for the wall clock part because we just call the same kvm_write_wall_clock() that the KVM MSR version invokes, and that uses kvm_write_guest(). I think I'll just reimplement the interesting part of kvm_write_wall_clock() directly in kvm_xen_shared_info_init(). I don't much like the duplication but it isn't much and it's the simplest option I see. And it actually simplifies kvm_write_wall_clock() which no longer needs the 'sec_hi_ofs' argument. And the Xen version is also simpler because it can just access the kernel mapping directly. Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom. ^ permalink raw reply [flat|nested] 27+ messages in thread
* KVM: Warn if mark_page_dirty() is called without an active vCPU 2021-10-21 20:08 ` Paolo Bonzini ` (4 preceding siblings ...) 2021-11-17 9:46 ` Woodhouse, David @ 2021-11-20 10:16 ` David Woodhouse 2021-11-22 17:01 ` Sean Christopherson 2022-01-13 12:06 ` Christian Borntraeger 5 siblings, 2 replies; 27+ messages in thread From: David Woodhouse @ 2021-11-20 10:16 UTC (permalink / raw) To: Paolo Bonzini, butt3rflyh4ck; +Cc: kvm, LKML, Sean Christopherson [-- Attachment #1: Type: text/plain, Size: 3163 bytes --] From: David Woodhouse <dwmw@amazon.co.uk> The various kvm_write_guest() and mark_page_dirty() functions must only ever be called in the context of an active vCPU, because if dirty ring tracking is enabled it may simply oops when kvm_get_running_vcpu() returns NULL for the vcpu and then kvm_dirty_ring_get() dereferences it. This oops was reported by "butt3rflyh4ck" <butterflyhuangxx@gmail.com> in https://lore.kernel.org/kvm/CAFcO6XOmoS7EacN_n6v4Txk7xL7iqRa2gABg3F7E3Naf5uG94g@mail.gmail.com/ That actual bug will be fixed under separate cover but this warning should help to prevent new ones from being added. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> --- include/linux/kvm_dirty_ring.h | 6 ------ virt/kvm/dirty_ring.c | 9 --------- virt/kvm/kvm_main.c | 7 ++++++- 3 files changed, 6 insertions(+), 16 deletions(-) diff --git a/include/linux/kvm_dirty_ring.h b/include/linux/kvm_dirty_ring.h index 120e5e90fa1d..fb0fa18878e2 100644 --- a/include/linux/kvm_dirty_ring.h +++ b/include/linux/kvm_dirty_ring.h @@ -43,11 +43,6 @@ static inline int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, return 0; } -static inline struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm) -{ - return NULL; -} - static inline int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring) { @@ -78,7 +73,6 @@ static inline bool kvm_dirty_ring_soft_full(struct kvm_dirty_ring *ring) u32 kvm_dirty_ring_get_rsvd_entries(void); int kvm_dirty_ring_alloc(struct kvm_dirty_ring *ring, int index, u32 size); -struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm); /* * called with kvm->slots_lock held, returns the number of diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c index 88f4683198ea..8e9874760fb3 100644 --- a/virt/kvm/dirty_ring.c +++ b/virt/kvm/dirty_ring.c @@ -36,15 +36,6 @@ static bool kvm_dirty_ring_full(struct kvm_dirty_ring *ring) return kvm_dirty_ring_used(ring) >= ring->size; } -struct kvm_dirty_ring *kvm_dirty_ring_get(struct kvm *kvm) -{ - struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); - - WARN_ON_ONCE(vcpu->kvm != kvm); - - return &vcpu->dirty_ring; -} - static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask) { struct kvm_memory_slot *memslot; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 6c5083f2eb50..72c6453bcef4 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3020,12 +3020,17 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot, gfn_t gfn) { + struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); + + if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm)) + return; + if (memslot && kvm_slot_dirty_track_enabled(memslot)) { unsigned long rel_gfn = gfn - memslot->base_gfn; u32 slot = (memslot->as_id << 16) | memslot->id; if (kvm->dirty_ring_size) - kvm_dirty_ring_push(kvm_dirty_ring_get(kvm), + kvm_dirty_ring_push(&vcpu->dirty_ring, slot, rel_gfn); else set_bit_le(rel_gfn, memslot->dirty_bitmap); -- 2.31.1 [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5174 bytes --] ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: KVM: Warn if mark_page_dirty() is called without an active vCPU 2021-11-20 10:16 ` KVM: Warn if mark_page_dirty() is called without an active vCPU David Woodhouse @ 2021-11-22 17:01 ` Sean Christopherson 2021-11-22 17:52 ` David Woodhouse 2022-01-13 12:06 ` Christian Borntraeger 1 sibling, 1 reply; 27+ messages in thread From: Sean Christopherson @ 2021-11-22 17:01 UTC (permalink / raw) To: David Woodhouse; +Cc: Paolo Bonzini, butt3rflyh4ck, kvm, LKML On Sat, Nov 20, 2021, David Woodhouse wrote: > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 6c5083f2eb50..72c6453bcef4 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -3020,12 +3020,17 @@ void mark_page_dirty_in_slot(struct kvm *kvm, > struct kvm_memory_slot *memslot, > gfn_t gfn) > { > + struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); > + > + if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm)) Maybe use KVM_BUG_ON? And two separate WARNs are probably overkill. if (KVM_BUG_ON(!vcpu || vcpu->kvm != kvm, kvm)) I'd also prefer to not retrieve the vCPU in the dirty_bitmap path, at least not until it's necessary (for the proposed dirty quota throttling), though that's not a strong preference. > + return; > + > if (memslot && kvm_slot_dirty_track_enabled(memslot)) { > unsigned long rel_gfn = gfn - memslot->base_gfn; > u32 slot = (memslot->as_id << 16) | memslot->id; > > if (kvm->dirty_ring_size) > - kvm_dirty_ring_push(kvm_dirty_ring_get(kvm), > + kvm_dirty_ring_push(&vcpu->dirty_ring, > slot, rel_gfn); This can now squeeze on a single line. kvm_dirty_ring_push(&vcpu->dirty_ring, slot, rel_gfn); > else > set_bit_le(rel_gfn, memslot->dirty_bitmap); > -- > 2.31.1 > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: KVM: Warn if mark_page_dirty() is called without an active vCPU 2021-11-22 17:01 ` Sean Christopherson @ 2021-11-22 17:52 ` David Woodhouse 2021-11-22 18:49 ` Sean Christopherson 0 siblings, 1 reply; 27+ messages in thread From: David Woodhouse @ 2021-11-22 17:52 UTC (permalink / raw) To: Sean Christopherson; +Cc: Paolo Bonzini, butt3rflyh4ck, kvm, LKML [-- Attachment #1: Type: text/plain, Size: 1663 bytes --] On Mon, 2021-11-22 at 17:01 +0000, Sean Christopherson wrote: > On Sat, Nov 20, 2021, David Woodhouse wrote: > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 6c5083f2eb50..72c6453bcef4 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -3020,12 +3020,17 @@ void mark_page_dirty_in_slot(struct kvm *kvm, > > struct kvm_memory_slot *memslot, > > gfn_t gfn) > > { > > + struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); > > + > > + if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm)) > > Maybe use KVM_BUG_ON? And two separate WARNs are probably overkill. > > if (KVM_BUG_ON(!vcpu || vcpu->kvm != kvm, kvm)) > > > I'd also prefer to not retrieve the vCPU in the dirty_bitmap path, at least not > until it's necessary (for the proposed dirty quota throttling), though that's not > a strong preference. I don't think that would achieve my objective. This was my reaction to learning that I was never supposed to have called kvm_write_guest() when I didn't have an active vCPU context¹. I wanted there to have been a *warning* about that, right there and then when I first did it instead of waiting for syzkaller to find it. I didn't want to wait for the actual circumstances to arise that made it *crash*; I wanted an early warning. And that's also why it was a warning not a BUG(), but I suppose KVM_BUG_ON() would be OK. -- dwmw2 ¹ My other reaction was wanting to remove kvm_write_guest() entirely and let people use kvm_vcpu_write_guest() instead. That's the path I was going down with the original patch to propagate the vcpu down. [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5174 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: KVM: Warn if mark_page_dirty() is called without an active vCPU 2021-11-22 17:52 ` David Woodhouse @ 2021-11-22 18:49 ` Sean Christopherson 0 siblings, 0 replies; 27+ messages in thread From: Sean Christopherson @ 2021-11-22 18:49 UTC (permalink / raw) To: David Woodhouse; +Cc: Paolo Bonzini, butt3rflyh4ck, kvm, LKML On Mon, Nov 22, 2021, David Woodhouse wrote: > On Mon, 2021-11-22 at 17:01 +0000, Sean Christopherson wrote: > > On Sat, Nov 20, 2021, David Woodhouse wrote: > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > > index 6c5083f2eb50..72c6453bcef4 100644 > > > --- a/virt/kvm/kvm_main.c > > > +++ b/virt/kvm/kvm_main.c > > > @@ -3020,12 +3020,17 @@ void mark_page_dirty_in_slot(struct kvm *kvm, > > > struct kvm_memory_slot *memslot, > > > gfn_t gfn) > > > { > > > + struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); > > > + > > > + if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm)) > > > > Maybe use KVM_BUG_ON? And two separate WARNs are probably overkill. > > > > if (KVM_BUG_ON(!vcpu || vcpu->kvm != kvm, kvm)) > > > > > > I'd also prefer to not retrieve the vCPU in the dirty_bitmap path, at least not > > until it's necessary (for the proposed dirty quota throttling), though that's not > > a strong preference. > > I don't think that would achieve my objective. This was my reaction to > learning that I was never supposed to have called kvm_write_guest() > when I didn't have an active vCPU context¹. I wanted there to have been > a *warning* about that, right there and then when I first did it > instead of waiting for syzkaller to find it. Fair enough. And probably a moot point since Paolo hasn't vehemently objected to the dirty quota idea. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: KVM: Warn if mark_page_dirty() is called without an active vCPU 2021-11-20 10:16 ` KVM: Warn if mark_page_dirty() is called without an active vCPU David Woodhouse 2021-11-22 17:01 ` Sean Christopherson @ 2022-01-13 12:06 ` Christian Borntraeger 2022-01-13 12:14 ` Paolo Bonzini 1 sibling, 1 reply; 27+ messages in thread From: Christian Borntraeger @ 2022-01-13 12:06 UTC (permalink / raw) To: dwmw2 Cc: butterflyhuangxx, kvm, linux-kernel, pbonzini, seanjc, Cornelia Huck, Christian Borntraeger, Janosch Frank, David Hildenbrand, linux-s390, Thomas Huth, Claudio Imbrenda From: Christian Borntraeger <borntraeger@de.ibm.com> Quick heads-up. The new warnon triggers on s390. Here we write to the guest from an irqfd worker. Since we do not use dirty_ring yet this might be an over-indication. Still have to look into that. [ 1801.980777] WARNING: CPU: 12 PID: 117600 at arch/s390/kvm/../../../virt/kvm/kvm_main.c:3166 mark_page_dirty_in_slot+0xa0/0xb0 [kvm] [ 1801.980839] Modules linked in: xt_CHECKSUM(E) xt_MASQUERADE(E) xt_conntrack(E) ipt_REJECT(E) xt_tcpudp(E) nft_compat(E) nf_nat_tftp(E) nft_objref(E) vhost_vsock(E) vmw_vsock_virtio_transport_common(E) vsock(E) vhost(E) vhost_iotlb(E) nf_conntrack_tftp(E) crc32_generic(E) algif_hash(E) af_alg(E) paes_s390(E) dm_crypt(E) encrypted_keys(E) loop(E) lcs(E) ctcm(E) fsm(E) kvm(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) ip_set(E) nf_tables(E) nfnetlink(E) sunrpc(E) dm_service_time(E) dm_multipath(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) zfcp(E) scsi_transport_fc(E) ism(E) smc(E) ib_core(E) eadm_sch(E) vfio_ccw(E) mdev(E) vfio_iommu_type1(E) vfio(E) sch_fq_codel(E) configfs(E) ip_tables(E) x_tables(E) ghash_s39 [...truncated...] [ 1801.980915] sha1_s390(E) sha_common(E) pkey(E) zcrypt(E) rng_core(E) autofs4(E) [last unloaded: vfio_ap] [ 1801.980931] CPU: 12 PID: 117600 Comm: kworker/12:0 Tainted: G E 5.17.0-20220113.rc0.git0.32ce2abb03cf.300.fc35.s390x+next #1 [ 1801.980935] Hardware name: IBM 2964 NC9 712 (LPAR) [ 1801.980938] Workqueue: events irqfd_inject [kvm] [ 1801.980959] Krnl PSW : 0704e00180000000 000003ff805f0f5c (mark_page_dirty_in_slot+0xa4/0xb0 [kvm]) [ 1801.980981] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 [ 1801.980985] Krnl GPRS: 000003ff298e9040 000000017754a660 0000000000000000 0000000000000000 [ 1801.980988] 000000003fefcc36 ffffffffffffff68 0000000000000000 0000000177871500 [ 1801.980990] 00000001d1918000 000000003fefcc36 00000001d1918000 0000000000000000 [ 1801.980993] 00000001375b0000 00000001d191a838 000003ff805f0ee6 0000038000babb48 [ 1801.981003] Krnl Code: 000003ff805f0f4c: eb9ff0a00004 lmg %r9,%r15,160(%r15) 000003ff805f0f52: c0f400018c61 brcl 15,000003ff80622814 #000003ff805f0f58: af000000 mc 0,0 >000003ff805f0f5c: eb9ff0a00004 lmg %r9,%r15,160(%r15) 000003ff805f0f62: c0f400018c59 brcl 15,000003ff80622814 000003ff805f0f68: c004ffe37b10 brcl 0,000003ff80260588 000003ff805f0f6e: ec360033007c cgij %r3,0,6,000003ff805f0fd4 000003ff805f0f74: e31020100012 lt %r1,16(%r2) [ 1801.981057] Call Trace: [ 1801.981060] [<000003ff805f0f5c>] mark_page_dirty_in_slot+0xa4/0xb0 [kvm] [ 1801.981083] [<000003ff8060e9fe>] adapter_indicators_set+0xde/0x268 [kvm] [ 1801.981104] [<000003ff80613c24>] set_adapter_int+0x64/0xd8 [kvm] [ 1801.981124] [<000003ff805fb9aa>] kvm_set_irq+0xc2/0x130 [kvm] [ 1801.981144] [<000003ff805f8d86>] irqfd_inject+0x76/0xa0 [kvm] [ 1801.981164] [<0000000175e56906>] process_one_work+0x1fe/0x470 [ 1801.981173] [<0000000175e570a4>] worker_thread+0x64/0x498 [ 1801.981176] [<0000000175e5ef2c>] kthread+0x10c/0x110 [ 1801.981180] [<0000000175de73c8>] __ret_from_fork+0x40/0x58 [ 1801.981185] [<000000017698440a>] ret_from_fork+0xa/0x40 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: KVM: Warn if mark_page_dirty() is called without an active vCPU 2022-01-13 12:06 ` Christian Borntraeger @ 2022-01-13 12:14 ` Paolo Bonzini 2022-01-13 12:29 ` [PATCH] KVM: avoid warning on s390 in mark_page_dirty Christian Borntraeger 2022-01-13 12:30 ` KVM: Warn if mark_page_dirty() is called without an active vCPU David Woodhouse 0 siblings, 2 replies; 27+ messages in thread From: Paolo Bonzini @ 2022-01-13 12:14 UTC (permalink / raw) To: Christian Borntraeger, dwmw2 Cc: butterflyhuangxx, kvm, linux-kernel, seanjc, Cornelia Huck, Janosch Frank, David Hildenbrand, linux-s390, Thomas Huth, Claudio Imbrenda On 1/13/22 13:06, Christian Borntraeger wrote: > From: Christian Borntraeger<borntraeger@de.ibm.com> > > Quick heads-up. > The new warnon triggers on s390. Here we write to the guest from an > irqfd worker. Since we do not use dirty_ring yet this might be an over-indication. > Still have to look into that. Yes, it's okay to add an #ifdef around the warning. Paolo ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH] KVM: avoid warning on s390 in mark_page_dirty 2022-01-13 12:14 ` Paolo Bonzini @ 2022-01-13 12:29 ` Christian Borntraeger 2022-01-13 12:31 ` David Woodhouse 2022-01-18 8:37 ` Christian Borntraeger 2022-01-13 12:30 ` KVM: Warn if mark_page_dirty() is called without an active vCPU David Woodhouse 1 sibling, 2 replies; 27+ messages in thread From: Christian Borntraeger @ 2022-01-13 12:29 UTC (permalink / raw) To: dwmw2, pbonzini Cc: butterflyhuangxx, kvm, linux-kernel, seanjc, Cornelia Huck, Christian Borntraeger, Janosch Frank, David Hildenbrand, linux-s390, Thomas Huth, Claudio Imbrenda Avoid warnings on s390 like [ 1801.980931] CPU: 12 PID: 117600 Comm: kworker/12:0 Tainted: G E 5.17.0-20220113.rc0.git0.32ce2abb03cf.300.fc35.s390x+next #1 [ 1801.980938] Workqueue: events irqfd_inject [kvm] [...] [ 1801.981057] Call Trace: [ 1801.981060] [<000003ff805f0f5c>] mark_page_dirty_in_slot+0xa4/0xb0 [kvm] [ 1801.981083] [<000003ff8060e9fe>] adapter_indicators_set+0xde/0x268 [kvm] [ 1801.981104] [<000003ff80613c24>] set_adapter_int+0x64/0xd8 [kvm] [ 1801.981124] [<000003ff805fb9aa>] kvm_set_irq+0xc2/0x130 [kvm] [ 1801.981144] [<000003ff805f8d86>] irqfd_inject+0x76/0xa0 [kvm] [ 1801.981164] [<0000000175e56906>] process_one_work+0x1fe/0x470 [ 1801.981173] [<0000000175e570a4>] worker_thread+0x64/0x498 [ 1801.981176] [<0000000175e5ef2c>] kthread+0x10c/0x110 [ 1801.981180] [<0000000175de73c8>] __ret_from_fork+0x40/0x58 [ 1801.981185] [<000000017698440a>] ret_from_fork+0xa/0x40 when writing to a guest from an irqfd worker as long as we do not have the dirty ring. Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> --- virt/kvm/kvm_main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 504158f0e131..1a682d3e106d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3163,8 +3163,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm, { struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); +#ifdef CONFIG_HAVE_KVM_DIRTY_RING if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm)) return; +#endif if (memslot && kvm_slot_dirty_track_enabled(memslot)) { unsigned long rel_gfn = gfn - memslot->base_gfn; -- 2.33.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] KVM: avoid warning on s390 in mark_page_dirty 2022-01-13 12:29 ` [PATCH] KVM: avoid warning on s390 in mark_page_dirty Christian Borntraeger @ 2022-01-13 12:31 ` David Woodhouse 2022-01-18 8:37 ` Christian Borntraeger 1 sibling, 0 replies; 27+ messages in thread From: David Woodhouse @ 2022-01-13 12:31 UTC (permalink / raw) To: Christian Borntraeger, pbonzini Cc: butterflyhuangxx, kvm, linux-kernel, seanjc, Cornelia Huck, Janosch Frank, David Hildenbrand, linux-s390, Thomas Huth, Claudio Imbrenda [-- Attachment #1: Type: text/plain, Size: 1301 bytes --] On Thu, 2022-01-13 at 13:29 +0100, Christian Borntraeger wrote: > Avoid warnings on s390 like > [ 1801.980931] CPU: 12 PID: 117600 Comm: kworker/12:0 Tainted: G E 5.17.0-20220113.rc0.git0.32ce2abb03cf.300.fc35.s390x+next #1 > [ 1801.980938] Workqueue: events irqfd_inject [kvm] > [...] > [ 1801.981057] Call Trace: > [ 1801.981060] [<000003ff805f0f5c>] mark_page_dirty_in_slot+0xa4/0xb0 [kvm] > [ 1801.981083] [<000003ff8060e9fe>] adapter_indicators_set+0xde/0x268 [kvm] > [ 1801.981104] [<000003ff80613c24>] set_adapter_int+0x64/0xd8 [kvm] > [ 1801.981124] [<000003ff805fb9aa>] kvm_set_irq+0xc2/0x130 [kvm] > [ 1801.981144] [<000003ff805f8d86>] irqfd_inject+0x76/0xa0 [kvm] > [ 1801.981164] [<0000000175e56906>] process_one_work+0x1fe/0x470 > [ 1801.981173] [<0000000175e570a4>] worker_thread+0x64/0x498 > [ 1801.981176] [<0000000175e5ef2c>] kthread+0x10c/0x110 > [ 1801.981180] [<0000000175de73c8>] __ret_from_fork+0x40/0x58 > [ 1801.981185] [<000000017698440a>] ret_from_fork+0xa/0x40 > > when writing to a guest from an irqfd worker as long as we do not have > the dirty ring. > > Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reluctantly-acked-by: David Woodhouse <dwmw@amazon.co.uk> ... with a bucket nearby just in case. [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] KVM: avoid warning on s390 in mark_page_dirty 2022-01-13 12:29 ` [PATCH] KVM: avoid warning on s390 in mark_page_dirty Christian Borntraeger 2022-01-13 12:31 ` David Woodhouse @ 2022-01-18 8:37 ` Christian Borntraeger 2022-01-18 8:44 ` Paolo Bonzini 1 sibling, 1 reply; 27+ messages in thread From: Christian Borntraeger @ 2022-01-18 8:37 UTC (permalink / raw) To: dwmw2, pbonzini Cc: butterflyhuangxx, kvm, linux-kernel, seanjc, Cornelia Huck, Janosch Frank, David Hildenbrand, linux-s390, Thomas Huth, Claudio Imbrenda Am 13.01.22 um 13:29 schrieb Christian Borntraeger: > Avoid warnings on s390 like > [ 1801.980931] CPU: 12 PID: 117600 Comm: kworker/12:0 Tainted: G E 5.17.0-20220113.rc0.git0.32ce2abb03cf.300.fc35.s390x+next #1 > [ 1801.980938] Workqueue: events irqfd_inject [kvm] > [...] > [ 1801.981057] Call Trace: > [ 1801.981060] [<000003ff805f0f5c>] mark_page_dirty_in_slot+0xa4/0xb0 [kvm] > [ 1801.981083] [<000003ff8060e9fe>] adapter_indicators_set+0xde/0x268 [kvm] > [ 1801.981104] [<000003ff80613c24>] set_adapter_int+0x64/0xd8 [kvm] > [ 1801.981124] [<000003ff805fb9aa>] kvm_set_irq+0xc2/0x130 [kvm] > [ 1801.981144] [<000003ff805f8d86>] irqfd_inject+0x76/0xa0 [kvm] > [ 1801.981164] [<0000000175e56906>] process_one_work+0x1fe/0x470 > [ 1801.981173] [<0000000175e570a4>] worker_thread+0x64/0x498 > [ 1801.981176] [<0000000175e5ef2c>] kthread+0x10c/0x110 > [ 1801.981180] [<0000000175de73c8>] __ret_from_fork+0x40/0x58 > [ 1801.981185] [<000000017698440a>] ret_from_fork+0xa/0x40 > > when writing to a guest from an irqfd worker as long as we do not have > the dirty ring. > > Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> > --- > virt/kvm/kvm_main.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 504158f0e131..1a682d3e106d 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -3163,8 +3163,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm, > { > struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); > > +#ifdef CONFIG_HAVE_KVM_DIRTY_RING > if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm)) > return; > +#endif > > if (memslot && kvm_slot_dirty_track_enabled(memslot)) { > unsigned long rel_gfn = gfn - memslot->base_gfn; Paolo, are you going to pick this for next for the time being? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] KVM: avoid warning on s390 in mark_page_dirty 2022-01-18 8:37 ` Christian Borntraeger @ 2022-01-18 8:44 ` Paolo Bonzini 2022-01-18 8:53 ` Christian Borntraeger 0 siblings, 1 reply; 27+ messages in thread From: Paolo Bonzini @ 2022-01-18 8:44 UTC (permalink / raw) To: Christian Borntraeger, dwmw2 Cc: butterflyhuangxx, kvm, linux-kernel, seanjc, Cornelia Huck, Janosch Frank, David Hildenbrand, linux-s390, Thomas Huth, Claudio Imbrenda On 1/18/22 09:37, Christian Borntraeger wrote: > Am 13.01.22 um 13:29 schrieb Christian Borntraeger: >> Avoid warnings on s390 like >> [ 1801.980931] CPU: 12 PID: 117600 Comm: kworker/12:0 Tainted: >> G E >> 5.17.0-20220113.rc0.git0.32ce2abb03cf.300.fc35.s390x+next #1 >> [ 1801.980938] Workqueue: events irqfd_inject [kvm] >> [...] >> [ 1801.981057] Call Trace: >> [ 1801.981060] [<000003ff805f0f5c>] mark_page_dirty_in_slot+0xa4/0xb0 >> [kvm] >> [ 1801.981083] [<000003ff8060e9fe>] adapter_indicators_set+0xde/0x268 >> [kvm] >> [ 1801.981104] [<000003ff80613c24>] set_adapter_int+0x64/0xd8 [kvm] >> [ 1801.981124] [<000003ff805fb9aa>] kvm_set_irq+0xc2/0x130 [kvm] >> [ 1801.981144] [<000003ff805f8d86>] irqfd_inject+0x76/0xa0 [kvm] >> [ 1801.981164] [<0000000175e56906>] process_one_work+0x1fe/0x470 >> [ 1801.981173] [<0000000175e570a4>] worker_thread+0x64/0x498 >> [ 1801.981176] [<0000000175e5ef2c>] kthread+0x10c/0x110 >> [ 1801.981180] [<0000000175de73c8>] __ret_from_fork+0x40/0x58 >> [ 1801.981185] [<000000017698440a>] ret_from_fork+0xa/0x40 >> >> when writing to a guest from an irqfd worker as long as we do not have >> the dirty ring. >> >> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> >> --- >> virt/kvm/kvm_main.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >> index 504158f0e131..1a682d3e106d 100644 >> --- a/virt/kvm/kvm_main.c >> +++ b/virt/kvm/kvm_main.c >> @@ -3163,8 +3163,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm, >> { >> struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); >> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING >> if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm)) >> return; >> +#endif >> if (memslot && kvm_slot_dirty_track_enabled(memslot)) { >> unsigned long rel_gfn = gfn - memslot->base_gfn; > > Paolo, are you going to pick this for next for the time being? > Yep, done now. Paolo ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] KVM: avoid warning on s390 in mark_page_dirty 2022-01-18 8:44 ` Paolo Bonzini @ 2022-01-18 8:53 ` Christian Borntraeger 2022-01-18 11:44 ` Paolo Bonzini 0 siblings, 1 reply; 27+ messages in thread From: Christian Borntraeger @ 2022-01-18 8:53 UTC (permalink / raw) To: Paolo Bonzini, dwmw2 Cc: butterflyhuangxx, kvm, linux-kernel, seanjc, Cornelia Huck, Janosch Frank, David Hildenbrand, linux-s390, Thomas Huth, Claudio Imbrenda Am 18.01.22 um 09:44 schrieb Paolo Bonzini: > On 1/18/22 09:37, Christian Borntraeger wrote: >> Am 13.01.22 um 13:29 schrieb Christian Borntraeger: >>> Avoid warnings on s390 like >>> [ 1801.980931] CPU: 12 PID: 117600 Comm: kworker/12:0 Tainted: G E 5.17.0-20220113.rc0.git0.32ce2abb03cf.300.fc35.s390x+next #1 >>> [ 1801.980938] Workqueue: events irqfd_inject [kvm] >>> [...] >>> [ 1801.981057] Call Trace: >>> [ 1801.981060] [<000003ff805f0f5c>] mark_page_dirty_in_slot+0xa4/0xb0 [kvm] >>> [ 1801.981083] [<000003ff8060e9fe>] adapter_indicators_set+0xde/0x268 [kvm] >>> [ 1801.981104] [<000003ff80613c24>] set_adapter_int+0x64/0xd8 [kvm] >>> [ 1801.981124] [<000003ff805fb9aa>] kvm_set_irq+0xc2/0x130 [kvm] >>> [ 1801.981144] [<000003ff805f8d86>] irqfd_inject+0x76/0xa0 [kvm] >>> [ 1801.981164] [<0000000175e56906>] process_one_work+0x1fe/0x470 >>> [ 1801.981173] [<0000000175e570a4>] worker_thread+0x64/0x498 >>> [ 1801.981176] [<0000000175e5ef2c>] kthread+0x10c/0x110 >>> [ 1801.981180] [<0000000175de73c8>] __ret_from_fork+0x40/0x58 >>> [ 1801.981185] [<000000017698440a>] ret_from_fork+0xa/0x40 >>> >>> when writing to a guest from an irqfd worker as long as we do not have >>> the dirty ring. >>> >>> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com> >>> --- >>> virt/kvm/kvm_main.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >>> index 504158f0e131..1a682d3e106d 100644 >>> --- a/virt/kvm/kvm_main.c >>> +++ b/virt/kvm/kvm_main.c >>> @@ -3163,8 +3163,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm, >>> { >>> struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); >>> +#ifdef CONFIG_HAVE_KVM_DIRTY_RING >>> if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm)) >>> return; >>> +#endif >>> if (memslot && kvm_slot_dirty_track_enabled(memslot)) { >>> unsigned long rel_gfn = gfn - memslot->base_gfn; >> >> Paolo, are you going to pick this for next for the time being? >> > > Yep, done now. > > Paolo Thanks. I just realized that Davids patch meanwhile landed in Linus tree. So better take this via master and not next. Maybe also add Fixes: 2efd61a608b0 ("KVM: Warn if mark_page_dirty() is called without an active vCPU") in case the patch is picked for stable ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] KVM: avoid warning on s390 in mark_page_dirty 2022-01-18 8:53 ` Christian Borntraeger @ 2022-01-18 11:44 ` Paolo Bonzini 0 siblings, 0 replies; 27+ messages in thread From: Paolo Bonzini @ 2022-01-18 11:44 UTC (permalink / raw) To: Christian Borntraeger, dwmw2 Cc: butterflyhuangxx, kvm, linux-kernel, seanjc, Cornelia Huck, Janosch Frank, David Hildenbrand, linux-s390, Thomas Huth, Claudio Imbrenda On 1/18/22 09:53, Christian Borntraeger wrote: >>> >>> Paolo, are you going to pick this for next for the time being? >>> >> >> Yep, done now. >> >> Paolo > > Thanks. I just realized that Davids patch meanwhile landed in Linus > tree. So better > take this via master and not next. Yeah, I understood what you meant. :) In fact, "master" right now is still on 5.16 (for patches that are destined to stable, but have conflicts merge window changes; those are pushed to master and merged from there into next). There will be another pull request this week and it will have this patch. > Maybe also add > Fixes: 2efd61a608b0 ("KVM: Warn if mark_page_dirty() is called without > an active vCPU") > in case the patch is picked for stable Ok, will do. Paolo ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: KVM: Warn if mark_page_dirty() is called without an active vCPU 2022-01-13 12:14 ` Paolo Bonzini 2022-01-13 12:29 ` [PATCH] KVM: avoid warning on s390 in mark_page_dirty Christian Borntraeger @ 2022-01-13 12:30 ` David Woodhouse 2022-01-13 12:51 ` Christian Borntraeger 2022-01-13 14:36 ` Paolo Bonzini 1 sibling, 2 replies; 27+ messages in thread From: David Woodhouse @ 2022-01-13 12:30 UTC (permalink / raw) To: Paolo Bonzini, Christian Borntraeger Cc: butterflyhuangxx, kvm, linux-kernel, seanjc, Cornelia Huck, Janosch Frank, David Hildenbrand, linux-s390, Thomas Huth, Claudio Imbrenda [-- Attachment #1: Type: text/plain, Size: 1544 bytes --] On Thu, 2022-01-13 at 13:14 +0100, Paolo Bonzini wrote: > On 1/13/22 13:06, Christian Borntraeger wrote: > > From: Christian Borntraeger< > > borntraeger@de.ibm.com > > > > > > > Quick heads-up. > > The new warnon triggers on s390. Here we write to the guest from an > > irqfd worker. Since we do not use dirty_ring yet this might be an > > over-indication. > > Still have to look into that. > > Yes, it's okay to add an #ifdef around the warning. That would be #ifndef CONFIG_HAVE_KVM_DIRTY_RING, yes? I already found it hard to write down the rules around how kvm_vcpu_write_guest() doesn't use the vCPU it's passed, and how both it and kvm_write_guest() need to be invoked on a pCPU which currently owns *a* vCPU belonging to the same KVM... if we add "unless you're on an architecture that doesn't support dirty ring logging", you may have to pass me a bucket. Are you proposing that as an officially documented part of the already horrid API, or a temporary measure :) Btw, that get_map_page() in arch/s390/kvm/interrupt.c looks like it has the same use-after-free problem that kvm_map_gfn() used to have. It probably wants converting to the new gfn_to_pfn_cache. Take a look at how I resolve the same issue for delivering Xen event channel interrupts. Although I gave myself a free pass on the dirty marking in that case, by declaring that the shinfo page doesn't get marked dirty; it should be considered *always* dirty. You might have less fun declaring that retrospectively in your case. [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: KVM: Warn if mark_page_dirty() is called without an active vCPU 2022-01-13 12:30 ` KVM: Warn if mark_page_dirty() is called without an active vCPU David Woodhouse @ 2022-01-13 12:51 ` Christian Borntraeger 2022-01-13 13:22 ` David Woodhouse 2022-01-13 14:36 ` Paolo Bonzini 1 sibling, 1 reply; 27+ messages in thread From: Christian Borntraeger @ 2022-01-13 12:51 UTC (permalink / raw) To: David Woodhouse, Paolo Bonzini Cc: butterflyhuangxx, kvm, linux-kernel, seanjc, Cornelia Huck, Janosch Frank, David Hildenbrand, linux-s390, Thomas Huth, Claudio Imbrenda Am 13.01.22 um 13:30 schrieb David Woodhouse: > On Thu, 2022-01-13 at 13:14 +0100, Paolo Bonzini wrote: >> On 1/13/22 13:06, Christian Borntraeger wrote: >>> From: Christian Borntraeger< >>> borntraeger@de.ibm.com >>>> >>> >>> Quick heads-up. >>> The new warnon triggers on s390. Here we write to the guest from an >>> irqfd worker. Since we do not use dirty_ring yet this might be an >>> over-indication. >>> Still have to look into that. >> >> Yes, it's okay to add an #ifdef around the warning. > > That would be #ifndef CONFIG_HAVE_KVM_DIRTY_RING, yes? > > I already found it hard to write down the rules around how > kvm_vcpu_write_guest() doesn't use the vCPU it's passed, and how both > it and kvm_write_guest() need to be invoked on a pCPU which currently > owns *a* vCPU belonging to the same KVM... if we add "unless you're on > an architecture that doesn't support dirty ring logging", you may have > to pass me a bucket. > > Are you proposing that as an officially documented part of the already > horrid API, or a temporary measure :) > > Btw, that get_map_page() in arch/s390/kvm/interrupt.c looks like it has > the same use-after-free problem that kvm_map_gfn() used to have. It > probably wants converting to the new gfn_to_pfn_cache. > > Take a look at how I resolve the same issue for delivering Xen event > channel interrupts. Do you have a commit ID for your Xen event channel fix? > > Although I gave myself a free pass on the dirty marking in that case, > by declaring that the shinfo page doesn't get marked dirty; it should > be considered *always* dirty. You might have less fun declaring that > retrospectively in your case. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: KVM: Warn if mark_page_dirty() is called without an active vCPU 2022-01-13 12:51 ` Christian Borntraeger @ 2022-01-13 13:22 ` David Woodhouse 2022-01-13 15:09 ` Christian Borntraeger 0 siblings, 1 reply; 27+ messages in thread From: David Woodhouse @ 2022-01-13 13:22 UTC (permalink / raw) To: Christian Borntraeger, Paolo Bonzini Cc: butterflyhuangxx, kvm, linux-kernel, seanjc, Cornelia Huck, Janosch Frank, David Hildenbrand, linux-s390, Thomas Huth, Claudio Imbrenda [-- Attachment #1: Type: text/plain, Size: 1570 bytes --] On Thu, 2022-01-13 at 13:51 +0100, Christian Borntraeger wrote: > > Btw, that get_map_page() in arch/s390/kvm/interrupt.c looks like it has > > the same use-after-free problem that kvm_map_gfn() used to have. It > > probably wants converting to the new gfn_to_pfn_cache. > > > > Take a look at how I resolve the same issue for delivering Xen event > > channel interrupts. > > Do you have a commit ID for your Xen event channel fix? 14243b387137 ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery") and the commits reworking the gfn_to_pfn_cache which lead up to it. Questions: In your kvm_set_routing_entry() where it calls gmap_translate() to turn the summary_addr and ind_addr from guest addresses to userspace virtual addresses, what protects those translations? If the gmap changes before kvm_set_routing_entry() even returns, what ensures that the IRQ gets retranslated? And later in adapter_indicators_set() where you take that userspace virtual address and pass it to your get_map_page() function, the same question: what if userspace does a concurrent mmap() and changes the physical page that the userspace address points to? In the latter case, at least it does look like you don't support external memory accessed only by a PFN without having a corresponding struct page. So at least you don't end up accessing a page that can now belong to *another* guest, because the original underlying page is locked. You're probably OK in that case, so it's just the gmap changing that we need to focus on? [-- Attachment #2: smime.p7s --] [-- Type: application/pkcs7-signature, Size: 5965 bytes --] ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: KVM: Warn if mark_page_dirty() is called without an active vCPU 2022-01-13 13:22 ` David Woodhouse @ 2022-01-13 15:09 ` Christian Borntraeger 0 siblings, 0 replies; 27+ messages in thread From: Christian Borntraeger @ 2022-01-13 15:09 UTC (permalink / raw) To: David Woodhouse, Paolo Bonzini Cc: butterflyhuangxx, kvm, linux-kernel, seanjc, Cornelia Huck, Janosch Frank, David Hildenbrand, linux-s390, Thomas Huth, Claudio Imbrenda Am 13.01.22 um 14:22 schrieb David Woodhouse: > On Thu, 2022-01-13 at 13:51 +0100, Christian Borntraeger wrote: >>> Btw, that get_map_page() in arch/s390/kvm/interrupt.c looks like it has >>> the same use-after-free problem that kvm_map_gfn() used to have. It >>> probably wants converting to the new gfn_to_pfn_cache. >>> >>> Take a look at how I resolve the same issue for delivering Xen event >>> channel interrupts. >> >> Do you have a commit ID for your Xen event channel fix? > > 14243b387137 ("KVM: x86/xen: Add KVM_IRQ_ROUTING_XEN_EVTCHN and event > channel delivery") and the commits reworking the gfn_to_pfn_cache which > lead up to it. > > Questions: In your kvm_set_routing_entry() where it calls > gmap_translate() to turn the summary_addr and ind_addr from guest > addresses to userspace virtual addresses, what protects those > translations? If the gmap changes before kvm_set_routing_entry() even > returns, what ensures that the IRQ gets retranslated? In the end the gmap translated between guest physical and host virtual, just like the kvm memslots. This is done in kvm_arch_commit_memory_region. The gmap is a method where we share the last level page table between the guest mapping and the user mapping. That is why our memslots have to be on 1 MB boundary (our page tables have 256 4k entries). So instead of gmap we could have used the memslots as well for translation. I have trouble seeing a kernel integrity issue: worst case is that we point to a different address in the userspace mapping if qemu would change the memslots maybe even to an invalid one. But then the access should fail (for invalid) or you get a self-inflicted bit flips on your own address space if you play such games. Well, one thing: if QEMU changes memslots, it might need to redo the irqfd registration as well as we do not follow these changes. Maybe this is something that we could improve as future QEMUs could do more changes regarding memslots. > > And later in adapter_indicators_set() where you take that userspace > virtual address and pass it to your get_map_page() function, the same > question: what if userspace does a concurrent mmap() and changes the > physical page that the userspace address points to? > > In the latter case, at least it does look like you don't support > external memory accessed only by a PFN without having a corresponding > struct page. So at least you don't end up accessing a page that can now > belong to *another* guest, because the original underlying page is > locked. You're probably OK in that case, so it's just the gmap changing > that we need to focus on? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: KVM: Warn if mark_page_dirty() is called without an active vCPU 2022-01-13 12:30 ` KVM: Warn if mark_page_dirty() is called without an active vCPU David Woodhouse 2022-01-13 12:51 ` Christian Borntraeger @ 2022-01-13 14:36 ` Paolo Bonzini 1 sibling, 0 replies; 27+ messages in thread From: Paolo Bonzini @ 2022-01-13 14:36 UTC (permalink / raw) To: David Woodhouse, Christian Borntraeger Cc: butterflyhuangxx, kvm, linux-kernel, seanjc, Cornelia Huck, Janosch Frank, David Hildenbrand, linux-s390, Thomas Huth, Claudio Imbrenda On 1/13/22 13:30, David Woodhouse wrote: > Are you proposing that as an officially documented part of the already > horrid API, or a temporary measure:) Hopefully temporary, but honestly you never know how these things go. Paolo ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2022-01-18 11:44 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-10-18 17:14 There is a null-ptr-deref bug in kvm_dirty_ring_get in virt/kvm/dirty_ring.c butt3rflyh4ck 2021-10-21 20:08 ` Paolo Bonzini 2021-10-28 7:42 ` butt3rflyh4ck 2021-11-08 5:11 ` butt3rflyh4ck 2021-11-16 15:41 ` butt3rflyh4ck 2021-11-16 16:22 ` [EXTERNAL] " David Woodhouse 2021-11-16 17:07 ` David Woodhouse 2021-11-17 9:46 ` Woodhouse, David 2021-11-17 16:49 ` Paolo Bonzini 2021-11-17 18:10 ` Woodhouse, David 2021-11-20 10:16 ` KVM: Warn if mark_page_dirty() is called without an active vCPU David Woodhouse 2021-11-22 17:01 ` Sean Christopherson 2021-11-22 17:52 ` David Woodhouse 2021-11-22 18:49 ` Sean Christopherson 2022-01-13 12:06 ` Christian Borntraeger 2022-01-13 12:14 ` Paolo Bonzini 2022-01-13 12:29 ` [PATCH] KVM: avoid warning on s390 in mark_page_dirty Christian Borntraeger 2022-01-13 12:31 ` David Woodhouse 2022-01-18 8:37 ` Christian Borntraeger 2022-01-18 8:44 ` Paolo Bonzini 2022-01-18 8:53 ` Christian Borntraeger 2022-01-18 11:44 ` Paolo Bonzini 2022-01-13 12:30 ` KVM: Warn if mark_page_dirty() is called without an active vCPU David Woodhouse 2022-01-13 12:51 ` Christian Borntraeger 2022-01-13 13:22 ` David Woodhouse 2022-01-13 15:09 ` Christian Borntraeger 2022-01-13 14:36 ` Paolo Bonzini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).