KVM: check userspace_addr for all memslots
diff mbox series

Message ID 20200601082146.18969-1-pbonzini@redhat.com
State In Next
Commit 09d952c971a58c897eb7ea7d41516e7d953bb65d
Headers show
Series
  • KVM: check userspace_addr for all memslots
Related show

Commit Message

Paolo Bonzini June 1, 2020, 8:21 a.m. UTC
The userspace_addr alignment and range checks are not performed for private
memory slots that are prepared by KVM itself.  This is unnecessary and makes
it questionable to use __*_user functions to access memory later on.  We also
rely on the userspace address being aligned since we have an entire family
of functions to map gfn to pfn.

Fortunately skipping the check is completely unnecessary.  Only x86 uses
private memslots and their userspace_addr is obtained from vm_mmap,
therefore it must be below PAGE_OFFSET.  In fact, any attempt to pass
an address above PAGE_OFFSET would have failed because such an address
would return true for kvm_is_error_hva.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 virt/kvm/kvm_main.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

Maxim Levitsky June 11, 2020, 2:44 p.m. UTC | #1
On Mon, 2020-06-01 at 04:21 -0400, Paolo Bonzini wrote:
> The userspace_addr alignment and range checks are not performed for private
> memory slots that are prepared by KVM itself.  This is unnecessary and makes
> it questionable to use __*_user functions to access memory later on.  We also
> rely on the userspace address being aligned since we have an entire family
> of functions to map gfn to pfn.
> 
> Fortunately skipping the check is completely unnecessary.  Only x86 uses
> private memslots and their userspace_addr is obtained from vm_mmap,
> therefore it must be below PAGE_OFFSET.  In fact, any attempt to pass
> an address above PAGE_OFFSET would have failed because such an address
> would return true for kvm_is_error_hva.
> 
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

I bisected this patch to break a VM on my AMD system (3970X)

The reason it happens, is because I have avic enabled (which uses
a private KVM memslot), but it is permanently disabled for that VM,
since I enabled nesting for that VM (+svm) and that triggers the code
in __x86_set_memory_region to set userspace_addr of the disabled
memslot to non canonical address (0xdeadull << 48) which is later rejected in __kvm_set_memory_region
after that patch, and that makes it silently not disable the memslot, which hangs the guest.

The call is from avic_update_access_page, which is called from svm_pre_update_apicv_exec_ctrl
which discards the return value.


I think that the fix for this would be to either make access_ok always return
true for size==0, or __kvm_set_memory_region should treat size==0 specially
and skip that check for it.

Best regards,
	Maxim Levitsky
Paolo Bonzini June 11, 2020, 3:27 p.m. UTC | #2
On 11/06/20 16:44, Maxim Levitsky wrote:
> On Mon, 2020-06-01 at 04:21 -0400, Paolo Bonzini wrote:
>> The userspace_addr alignment and range checks are not performed for private
>> memory slots that are prepared by KVM itself.  This is unnecessary and makes
>> it questionable to use __*_user functions to access memory later on.  We also
>> rely on the userspace address being aligned since we have an entire family
>> of functions to map gfn to pfn.
>>
>> Fortunately skipping the check is completely unnecessary.  Only x86 uses
>> private memslots and their userspace_addr is obtained from vm_mmap,
>> therefore it must be below PAGE_OFFSET.  In fact, any attempt to pass
>> an address above PAGE_OFFSET would have failed because such an address
>> would return true for kvm_is_error_hva.
>>
>> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> I bisected this patch to break a VM on my AMD system (3970X)
> 
> The reason it happens, is because I have avic enabled (which uses
> a private KVM memslot), but it is permanently disabled for that VM,
> since I enabled nesting for that VM (+svm) and that triggers the code
> in __x86_set_memory_region to set userspace_addr of the disabled
> memslot to non canonical address (0xdeadull << 48) which is later rejected in __kvm_set_memory_region
> after that patch, and that makes it silently not disable the memslot, which hangs the guest.
> 
> The call is from avic_update_access_page, which is called from svm_pre_update_apicv_exec_ctrl
> which discards the return value.
> 
> 
> I think that the fix for this would be to either make access_ok always return
> true for size==0, or __kvm_set_memory_region should treat size==0 specially
> and skip that check for it.

Or just set hva to 0.  Deletion goes through kvm_delete_memslot so that
dummy hva is not used anywhere.  If we really want to poison the hva of
deleted memslots we should not do it specially in
__x86_set_memory_region.  I'll send a patch.

Paolo
Maxim Levitsky June 11, 2020, 8:11 p.m. UTC | #3
On Thu, 2020-06-11 at 17:27 +0200, Paolo Bonzini wrote:
> On 11/06/20 16:44, Maxim Levitsky wrote:
> > On Mon, 2020-06-01 at 04:21 -0400, Paolo Bonzini wrote:
> > > The userspace_addr alignment and range checks are not performed for private
> > > memory slots that are prepared by KVM itself.  This is unnecessary and makes
> > > it questionable to use __*_user functions to access memory later on.  We also
> > > rely on the userspace address being aligned since we have an entire family
> > > of functions to map gfn to pfn.
> > > 
> > > Fortunately skipping the check is completely unnecessary.  Only x86 uses
> > > private memslots and their userspace_addr is obtained from vm_mmap,
> > > therefore it must be below PAGE_OFFSET.  In fact, any attempt to pass
> > > an address above PAGE_OFFSET would have failed because such an address
> > > would return true for kvm_is_error_hva.
> > > 
> > > Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > 
> > I bisected this patch to break a VM on my AMD system (3970X)
> > 
> > The reason it happens, is because I have avic enabled (which uses
> > a private KVM memslot), but it is permanently disabled for that VM,
> > since I enabled nesting for that VM (+svm) and that triggers the code
> > in __x86_set_memory_region to set userspace_addr of the disabled
> > memslot to non canonical address (0xdeadull << 48) which is later rejected in __kvm_set_memory_region
> > after that patch, and that makes it silently not disable the memslot, which hangs the guest.
> > 
> > The call is from avic_update_access_page, which is called from svm_pre_update_apicv_exec_ctrl
> > which discards the return value.
> > 
> > 
> > I think that the fix for this would be to either make access_ok always return
> > true for size==0, or __kvm_set_memory_region should treat size==0 specially
> > and skip that check for it.
> 
> Or just set hva to 0.  Deletion goes through kvm_delete_memslot so that
> dummy hva is not used anywhere.  If we really want to poison the hva of
> deleted memslots we should not do it specially in
> __x86_set_memory_region.  I'll send a patch.

After checking exactly what access_ok does, I mostly agree with this.
There is still an implicit assumption that address 0 is a valid userspace address.
It is fair to assume that on x86 though.

Best regards,
	Maxim Levitsky

> 
> Paolo
>

Patch
diff mbox series

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 731c1e517716..08184f571669 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1223,10 +1223,9 @@  int __kvm_set_memory_region(struct kvm *kvm,
 	if (mem->guest_phys_addr & (PAGE_SIZE - 1))
 		return -EINVAL;
 	/* We can read the guest memory with __xxx_user() later on. */
-	if ((id < KVM_USER_MEM_SLOTS) &&
-	    ((mem->userspace_addr & (PAGE_SIZE - 1)) ||
+	if ((mem->userspace_addr & (PAGE_SIZE - 1)) ||
 	     !access_ok((void __user *)(unsigned long)mem->userspace_addr,
-			mem->memory_size)))
+			mem->memory_size))
 		return -EINVAL;
 	if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_MEM_SLOTS_NUM)
 		return -EINVAL;