From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752877AbeEOMis (ORCPT ); Tue, 15 May 2018 08:38:48 -0400 Received: from mail-pf0-f180.google.com ([209.85.192.180]:46733 "EHLO mail-pf0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752227AbeEOMiq (ORCPT ); Tue, 15 May 2018 08:38:46 -0400 X-Google-Smtp-Source: AB8JxZrSPxIDZrqdOVk5aCMDlQBqoN5DnezyvHjQbCtHoBzArVYiYikU96XwHsKVHVgH2p93+SkMgQ== Subject: Re: [PATCH] KVM: arm/arm64: fix unaligned hva start and end in handle_hva_to_gpa To: Suzuki K Poulose , Marc Zyngier , Christoffer Dall , linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu Cc: linux-kernel@vger.kernel.org, Jia He , li.zhang@hxt-semitech.com, hughd@google.com, Andrea Arcangeli ";" Minchan Kim ";" Claudio Imbrenda ";" Arvind Yadav ";" Mike Rapoport , akpm@linux-foundation.org References: <1525244911-5519-1-git-send-email-hejianet@gmail.com> <04e6109f-cbbd-24b0-03bb-9247b930d42c@arm.com> <85e04362-05dd-c697-e9c4-ad5824e63819@gmail.com> <0f134188-12d5-7184-3fbd-0ec8204cf649@arm.com> <695beacb-ff51-bb2c-72ef-d268f7d4e59d@arm.com> <1e065e9b-4dad-611d-fc5b-26fe6c031507@gmail.com> <14954b3a-6a2a-fe05-47b7-1890375ab8a4@arm.com> From: Jia He Message-ID: Date: Tue, 15 May 2018 20:38:32 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <14954b3a-6a2a-fe05-47b7-1890375ab8a4@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Suzuki On 5/15/2018 4:36 PM, Suzuki K Poulose Wrote: > > Hi Jia > > On 05/15/2018 03:03 AM, Jia He wrote: >> Hi Suzuki >> >> I will merge the other thread into this, and add the necessary CC list >> >> That WARN_ON call trace is very easy to reproduce in my armv8a server after I >> start 20 guests >> >> and run memhog in the host. Of course, ksm should be enabled >> >> For you question about my inject fault debug patch: > > > Thanks for the patch, comments below. > >> > > ... > >> index 7f6a944..ab8545e 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -290,12 +290,17 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd, >>    * destroying the VM), otherwise another faulting VCPU may come in and mess >>    * with things behind our backs. >>    */ >> +extern int trigger_by_ksm; >>   static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) >>   { >>          pgd_t *pgd; >>          phys_addr_t addr = start, end = start + size; >>          phys_addr_t next; >> >> +       if(trigger_by_ksm) { >> +               end -= 0x200; >> +       } >> + >>          assert_spin_locked(&kvm->mmu_lock); >>          pgd = kvm->arch.pgd + stage2_pgd_index(addr); >>          do { >> >> I need to point out that I never reproduced it without this debugging patch. > > That could trigger the panic iff your "size" <= 0x200, leading to the > condition (end < start), which can make the loop go forever, as we > do while(addr < end) and end up accessing something which may not be PGD entry > and thus get a bad page with bad numbers all around. This case could be hit only > with your change and the bug in the KSM which gives us an address near the page > boundary. No, I injected the fault on purpose to simulate the case when size is less than PAGE_SIZE(eg. PAGE_SIZE-0x200=65024) I ever got the panic info [1] *without* the debugging patch only once [1] https://lkml.org/lkml/2018/5/9/992 > > So, I think we can safely ignore the PANIC(). > More below. > > >>>> Suzuki, thanks for the comments. >>>> >>>> I proposed another ksm patch https://lkml.org/lkml/2018/5/3/1042 >>>> The root cause is ksm will add some extra flags to indicate that the page >>>> is in/not_in the stable tree. This makes address not be aligned with PAGE_SIZE. >>> Thanks for the pointer. In the future, please Cc the people relevant to the >>> discussion in the patches. >>> >>>>   From arm kvm mmu point of view, do you think handle_hva_to_gpa still need >>>> to handle >>>> the unalignment case? >>> I don't think we should do that. Had we done this, we would never have caught >>> this bug >>> in KSM. Eventually if some other new implementation comes up with the a new >>> notifier >>> consumer which doesn't check alignment and doesn't WARN, it could simply do >>> the wrong >>> thing. So I believe what we have is a good measure to make sure that things are >>> in the right order. >>> >>>> IMO, the PAGE_SIZE alignment is still needed because we should not let the >>>> bottom function >>>> kvm_age_hva_handler to handle the exception. Please refer to the >>>> implementation in X86 and >>>> powerpc kvm_handle_hva_range(). They both aligned the hva with >>>> hva_to_gfn_memslot. >>>> >>>   From an API perspective, you are passed on a "start" and "end" address. So, >>> you could potentially >>> do the wrong thing if you align the "start" and "end". May be those handlers >>> should also do the >>> same thing as we do. > >> But handle_hva_to_gpa has partially adjusted the alignment possibly: >>     1750         kvm_for_each_memslot(memslot, slots) { >>     1751                 unsigned long hva_start, hva_end; >>     1752                 gfn_t gpa; >>     1753 >>     1754                 hva_start = max(start, memslot->userspace_addr); >>     1755                 hva_end = min(end, memslot->userspace_addr + >>     1756                             (memslot->npages << PAGE_SHIFT)); >> >> at line 1755, let us assume that end=0x12340200 and >> memslot->userspace_addr + (memslot->npages << PAGE_SHIFT)=0x12340000 >> Then, hva_start is not page_size aligned and hva_end is aligned, and the size >> will be PAGE_SIZE-0x200, >> just as what I had done in the inject fault debugging patch. > > Thats because we want to limit the handling of the hva/gpa range by memslot. So, > we make sure we pass on the range within the given memslot > to hva_to_gfn_memslot(). But we do iterate over the next memslot if the > original range falls in to the next slot. So, in practice, there is no > alignment/trimming of the range. Its just that we pass on the appropriate range > for each slot. > Yes, I understand what the codes did in hva_to_gfn_memslot(). What I mean is hva_end may be changed and (hva_end - hva_start) will not be same as the parameter _size_ ? >ret |= handler(kvm, gpa, (u64)(hva_end - hva_start), data); Anyway, I have to admit that all the exceptions are originally caused by the STABLE_FLAG in ksm code. What I want to discuss here is how to make arm kvm handle the exception more gracefully. -- Cheers, Jia From mboxrd@z Thu Jan 1 00:00:00 1970 From: hejianet@gmail.com (Jia He) Date: Tue, 15 May 2018 20:38:32 +0800 Subject: [PATCH] KVM: arm/arm64: fix unaligned hva start and end in handle_hva_to_gpa In-Reply-To: <14954b3a-6a2a-fe05-47b7-1890375ab8a4@arm.com> References: <1525244911-5519-1-git-send-email-hejianet@gmail.com> <04e6109f-cbbd-24b0-03bb-9247b930d42c@arm.com> <85e04362-05dd-c697-e9c4-ad5824e63819@gmail.com> <0f134188-12d5-7184-3fbd-0ec8204cf649@arm.com> <695beacb-ff51-bb2c-72ef-d268f7d4e59d@arm.com> <1e065e9b-4dad-611d-fc5b-26fe6c031507@gmail.com> <14954b3a-6a2a-fe05-47b7-1890375ab8a4@arm.com> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi Suzuki On 5/15/2018 4:36 PM, Suzuki K Poulose Wrote: > > Hi Jia > > On 05/15/2018 03:03 AM, Jia He wrote: >> Hi Suzuki >> >> I will merge the other thread into this, and add the necessary CC list >> >> That WARN_ON call trace is very easy to reproduce in my armv8a server after I >> start 20 guests >> >> and run memhog in the host. Of course, ksm should be enabled >> >> For you question about my inject fault debug patch: > > > Thanks for the patch, comments below. > >> > > ... > >> index 7f6a944..ab8545e 100644 >> --- a/virt/kvm/arm/mmu.c >> +++ b/virt/kvm/arm/mmu.c >> @@ -290,12 +290,17 @@ static void unmap_stage2_puds(struct kvm *kvm, pgd_t *pgd, >> ?? * destroying the VM), otherwise another faulting VCPU may come in and mess >> ?? * with things behind our backs. >> ?? */ >> +extern int trigger_by_ksm; >> ??static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size) >> ??{ >> ???????? pgd_t *pgd; >> ???????? phys_addr_t addr = start, end = start + size; >> ???????? phys_addr_t next; >> >> +?????? if(trigger_by_ksm) { >> +?????????????? end -= 0x200; >> +?????? } >> + >> ???????? assert_spin_locked(&kvm->mmu_lock); >> ???????? pgd = kvm->arch.pgd + stage2_pgd_index(addr); >> ???????? do { >> >> I need to point out that I never reproduced it without this debugging patch. > > That could trigger the panic iff your "size" <= 0x200, leading to the > condition (end < start), which can make the loop go forever, as we > do while(addr < end) and end up accessing something which may not be PGD entry > and thus get a bad page with bad numbers all around. This case could be hit only > with your change and the bug in the KSM which gives us an address near the page > boundary. No, I injected the fault on purpose to simulate the case when size is less than PAGE_SIZE(eg. PAGE_SIZE-0x200=65024) I ever got the panic info [1] *without* the debugging patch only once [1] https://lkml.org/lkml/2018/5/9/992 > > So, I think we can safely ignore the PANIC(). > More below. > > >>>> Suzuki, thanks for the comments. >>>> >>>> I proposed another ksm patch https://lkml.org/lkml/2018/5/3/1042 >>>> The root cause is ksm will add some extra flags to indicate that the page >>>> is in/not_in the stable tree. This makes address not be aligned with PAGE_SIZE. >>> Thanks for the pointer. In the future, please Cc the people relevant to the >>> discussion in the patches. >>> >>>> ? From arm kvm mmu point of view, do you think handle_hva_to_gpa still need >>>> to handle >>>> the unalignment case? >>> I don't think we should do that. Had we done this, we would never have caught >>> this bug >>> in KSM. Eventually if some other new implementation comes up with the a new >>> notifier >>> consumer which doesn't check alignment and doesn't WARN, it could simply do >>> the wrong >>> thing. So I believe what we have is a good measure to make sure that things are >>> in the right order. >>> >>>> IMO, the PAGE_SIZE alignment is still needed because we should not let the >>>> bottom function >>>> kvm_age_hva_handler to handle the exception. Please refer to the >>>> implementation in X86 and >>>> powerpc kvm_handle_hva_range(). They both aligned the hva with >>>> hva_to_gfn_memslot. >>>> >>> ? From an API perspective, you are passed on a "start" and "end" address. So, >>> you could potentially >>> do the wrong thing if you align the "start" and "end". May be those handlers >>> should also do the >>> same thing as we do. > >> But handle_hva_to_gpa has partially adjusted the alignment possibly: >> ??? 1750???????? kvm_for_each_memslot(memslot, slots) { >> ??? 1751???????????????? unsigned long hva_start, hva_end; >> ??? 1752???????????????? gfn_t gpa; >> ??? 1753 >> ??? 1754???????????????? hva_start = max(start, memslot->userspace_addr); >> ??? 1755???????????????? hva_end = min(end, memslot->userspace_addr + >> ??? 1756???????????????????????????? (memslot->npages << PAGE_SHIFT)); >> >> at line 1755, let us assume that end=0x12340200 and >> memslot->userspace_addr + (memslot->npages << PAGE_SHIFT)=0x12340000 >> Then, hva_start is not page_size aligned and hva_end is aligned, and the size >> will be PAGE_SIZE-0x200, >> just as what I had done in the inject fault debugging patch. > > Thats because we want to limit the handling of the hva/gpa range by memslot. So, > we make sure we pass on the range within the given memslot > to hva_to_gfn_memslot(). But we do iterate over the next memslot if the > original range falls in to the next slot. So, in practice, there is no > alignment/trimming of the range. Its just that we pass on the appropriate range > for each slot. > Yes, I understand what the codes did in hva_to_gfn_memslot(). What I mean is hva_end may be changed and (hva_end - hva_start) will not be same as the parameter _size_ ? >ret |= handler(kvm, gpa, (u64)(hva_end - hva_start), data); Anyway, I have to admit that all the exceptions are originally caused by the STABLE_FLAG in ksm code. What I want to discuss here is how to make arm kvm handle the exception more gracefully. -- Cheers, Jia