All of lore.kernel.org
 help / color / mirror / Atom feed
From: Keqian Zhu <zhukeqian1@huawei.com>
To: Gavin Shan <gshan@redhat.com>, Santosh Shukla <sashukla@nvidia.com>
Cc: <maz@kernel.org>, <kvm@vger.kernel.org>,
	<kvmarm@lists.cs.columbia.edu>, <linux-kernel@vger.kernel.org>,
	<cjia@nvidia.com>, <linux-arm-kernel@lists.infradead.org>,
	"Wanghaibin (D)" <wanghaibin.wang@huawei.com>
Subject: Re: [PATCH] KVM: arm64: Correctly handle the mmio faulting
Date: Wed, 21 Apr 2021 14:17:44 +0800	[thread overview]
Message-ID: <ed8a8b90-8b96-4967-01f5-cd0f536c38d2@huawei.com> (raw)
In-Reply-To: <2e23aaa7-0c8d-13ba-2eae-9e6ab2adc587@redhat.com>

Hi Gavin,

On 2021/4/21 14:20, Gavin Shan wrote:
> Hi Keqian and Santosh,
> 
> On 4/21/21 12:59 PM, Keqian Zhu wrote:
>> On 2020/10/22 0:16, Santosh Shukla wrote:
>>> The Commit:6d674e28 introduces a notion to detect and handle the
>>> device mapping. The commit checks for the VM_PFNMAP flag is set
>>> in vma->flags and if set then marks force_pte to true such that
>>> if force_pte is true then ignore the THP function check
>>> (/transparent_hugepage_adjust()).
>>>
>>> There could be an issue with the VM_PFNMAP flag setting and checking.
>>> For example consider a case where the mdev vendor driver register's
>>> the vma_fault handler named vma_mmio_fault(), which maps the
>>> host MMIO region in-turn calls remap_pfn_range() and maps
>>> the MMIO's vma space. Where, remap_pfn_range implicitly sets
>>> the VM_PFNMAP flag into vma->flags.
>> Could you give the name of the mdev vendor driver that triggers this issue?
>> I failed to find one according to your description. Thanks.
>>
> 
> I think it would be fixed in driver side to set VM_PFNMAP in
> its mmap() callback (call_mmap()), like vfio PCI driver does.
> It means it won't be delayed until page fault is issued and
> remap_pfn_range() is called. It's determined from the beginning
> that the vma associated the mdev vendor driver is serving as
> PFN remapping purpose. So the vma should be populated completely,
> including the VM_PFNMAP flag before it becomes visible to user
> space.
> 
> The example can be found from vfio driver in drivers/vfio/pci/vfio_pci.c:
>     vfio_pci_mmap:       VM_PFNMAP is set for the vma
>     vfio_pci_mmap_fault: remap_pfn_range() is called
Right. I have discussed the above with Marc. I want to find the driver
to fix it. However, AFAICS, there is no driver matches the description...

Thanks,
Keqian

> 
> Thanks,
> Gavin
> 
>>>
>>> Now lets assume a mmio fault handing flow where guest first access
>>> the MMIO region whose 2nd stage translation is not present.
>>> So that results to arm64-kvm hypervisor executing guest abort handler,
>>> like below:
>>>
>>> kvm_handle_guest_abort() -->
>>>   user_mem_abort()--> {
>>>
>>>      ...
>>>      0. checks the vma->flags for the VM_PFNMAP.
>>>      1. Since VM_PFNMAP flag is not yet set so force_pte _is_ false;
>>>      2. gfn_to_pfn_prot() -->
>>>          __gfn_to_pfn_memslot() -->
>>>              fixup_user_fault() -->
>>>                  handle_mm_fault()-->
>>>                      __do_fault() -->
>>>                         vma_mmio_fault() --> // vendor's mdev fault handler
>>>                          remap_pfn_range()--> // Here sets the VM_PFNMAP
>>>                         flag into vma->flags.
>>>      3. Now that force_pte is set to false in step-2),
>>>         will execute transparent_hugepage_adjust() func and
>>>         that lead to Oops [4].
>>>   }
>>>
>>> The proposition is to check is_iomap flag before executing the THP
>>> function transparent_hugepage_adjust().
>>>
>>> [4] THP Oops:
>>>> pc: kvm_is_transparent_hugepage+0x18/0xb0
>>>> ...
>>>> ...
>>>> user_mem_abort+0x340/0x9b8
>>>> kvm_handle_guest_abort+0x248/0x468
>>>> handle_exit+0x150/0x1b0
>>>> kvm_arch_vcpu_ioctl_run+0x4d4/0x778
>>>> kvm_vcpu_ioctl+0x3c0/0x858
>>>> ksys_ioctl+0x84/0xb8
>>>> __arm64_sys_ioctl+0x28/0x38
>>>
>>> Tested on Huawei Kunpeng Taishan-200 arm64 server, Using VFIO-mdev device.
>>> Linux tip: 583090b1
>>>
>>> Fixes: 6d674e28 ("KVM: arm/arm64: Properly handle faulting of device mappings")
>>> Signed-off-by: Santosh Shukla <sashukla@nvidia.com>
>>> ---
>>>   arch/arm64/kvm/mmu.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>> index 3d26b47..ff15357 100644
>>> --- a/arch/arm64/kvm/mmu.c
>>> +++ b/arch/arm64/kvm/mmu.c
>>> @@ -1947,7 +1947,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>        * If we are not forced to use page mapping, check if we are
>>>        * backed by a THP and thus use block mapping if possible.
>>>        */
>>> -    if (vma_pagesize == PAGE_SIZE && !force_pte)
>>> +    if (vma_pagesize == PAGE_SIZE && !force_pte && !is_iomap(flags))
>>>           vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>>                                  &pfn, &fault_ipa);
>>>       if (writable)
>>>
>>
> 
> .
> 

WARNING: multiple messages have this Message-ID (diff)
From: Keqian Zhu <zhukeqian1@huawei.com>
To: Gavin Shan <gshan@redhat.com>, Santosh Shukla <sashukla@nvidia.com>
Cc: cjia@nvidia.com, kvm@vger.kernel.org, maz@kernel.org,
	linux-kernel@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] KVM: arm64: Correctly handle the mmio faulting
Date: Wed, 21 Apr 2021 14:17:44 +0800	[thread overview]
Message-ID: <ed8a8b90-8b96-4967-01f5-cd0f536c38d2@huawei.com> (raw)
In-Reply-To: <2e23aaa7-0c8d-13ba-2eae-9e6ab2adc587@redhat.com>

Hi Gavin,

On 2021/4/21 14:20, Gavin Shan wrote:
> Hi Keqian and Santosh,
> 
> On 4/21/21 12:59 PM, Keqian Zhu wrote:
>> On 2020/10/22 0:16, Santosh Shukla wrote:
>>> The Commit:6d674e28 introduces a notion to detect and handle the
>>> device mapping. The commit checks for the VM_PFNMAP flag is set
>>> in vma->flags and if set then marks force_pte to true such that
>>> if force_pte is true then ignore the THP function check
>>> (/transparent_hugepage_adjust()).
>>>
>>> There could be an issue with the VM_PFNMAP flag setting and checking.
>>> For example consider a case where the mdev vendor driver register's
>>> the vma_fault handler named vma_mmio_fault(), which maps the
>>> host MMIO region in-turn calls remap_pfn_range() and maps
>>> the MMIO's vma space. Where, remap_pfn_range implicitly sets
>>> the VM_PFNMAP flag into vma->flags.
>> Could you give the name of the mdev vendor driver that triggers this issue?
>> I failed to find one according to your description. Thanks.
>>
> 
> I think it would be fixed in driver side to set VM_PFNMAP in
> its mmap() callback (call_mmap()), like vfio PCI driver does.
> It means it won't be delayed until page fault is issued and
> remap_pfn_range() is called. It's determined from the beginning
> that the vma associated the mdev vendor driver is serving as
> PFN remapping purpose. So the vma should be populated completely,
> including the VM_PFNMAP flag before it becomes visible to user
> space.
> 
> The example can be found from vfio driver in drivers/vfio/pci/vfio_pci.c:
>     vfio_pci_mmap:       VM_PFNMAP is set for the vma
>     vfio_pci_mmap_fault: remap_pfn_range() is called
Right. I have discussed the above with Marc. I want to find the driver
to fix it. However, AFAICS, there is no driver matches the description...

Thanks,
Keqian

> 
> Thanks,
> Gavin
> 
>>>
>>> Now lets assume a mmio fault handing flow where guest first access
>>> the MMIO region whose 2nd stage translation is not present.
>>> So that results to arm64-kvm hypervisor executing guest abort handler,
>>> like below:
>>>
>>> kvm_handle_guest_abort() -->
>>>   user_mem_abort()--> {
>>>
>>>      ...
>>>      0. checks the vma->flags for the VM_PFNMAP.
>>>      1. Since VM_PFNMAP flag is not yet set so force_pte _is_ false;
>>>      2. gfn_to_pfn_prot() -->
>>>          __gfn_to_pfn_memslot() -->
>>>              fixup_user_fault() -->
>>>                  handle_mm_fault()-->
>>>                      __do_fault() -->
>>>                         vma_mmio_fault() --> // vendor's mdev fault handler
>>>                          remap_pfn_range()--> // Here sets the VM_PFNMAP
>>>                         flag into vma->flags.
>>>      3. Now that force_pte is set to false in step-2),
>>>         will execute transparent_hugepage_adjust() func and
>>>         that lead to Oops [4].
>>>   }
>>>
>>> The proposition is to check is_iomap flag before executing the THP
>>> function transparent_hugepage_adjust().
>>>
>>> [4] THP Oops:
>>>> pc: kvm_is_transparent_hugepage+0x18/0xb0
>>>> ...
>>>> ...
>>>> user_mem_abort+0x340/0x9b8
>>>> kvm_handle_guest_abort+0x248/0x468
>>>> handle_exit+0x150/0x1b0
>>>> kvm_arch_vcpu_ioctl_run+0x4d4/0x778
>>>> kvm_vcpu_ioctl+0x3c0/0x858
>>>> ksys_ioctl+0x84/0xb8
>>>> __arm64_sys_ioctl+0x28/0x38
>>>
>>> Tested on Huawei Kunpeng Taishan-200 arm64 server, Using VFIO-mdev device.
>>> Linux tip: 583090b1
>>>
>>> Fixes: 6d674e28 ("KVM: arm/arm64: Properly handle faulting of device mappings")
>>> Signed-off-by: Santosh Shukla <sashukla@nvidia.com>
>>> ---
>>>   arch/arm64/kvm/mmu.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>> index 3d26b47..ff15357 100644
>>> --- a/arch/arm64/kvm/mmu.c
>>> +++ b/arch/arm64/kvm/mmu.c
>>> @@ -1947,7 +1947,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>        * If we are not forced to use page mapping, check if we are
>>>        * backed by a THP and thus use block mapping if possible.
>>>        */
>>> -    if (vma_pagesize == PAGE_SIZE && !force_pte)
>>> +    if (vma_pagesize == PAGE_SIZE && !force_pte && !is_iomap(flags))
>>>           vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>>                                  &pfn, &fault_ipa);
>>>       if (writable)
>>>
>>
> 
> .
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Keqian Zhu <zhukeqian1@huawei.com>
To: Gavin Shan <gshan@redhat.com>, Santosh Shukla <sashukla@nvidia.com>
Cc: <maz@kernel.org>, <kvm@vger.kernel.org>,
	<kvmarm@lists.cs.columbia.edu>, <linux-kernel@vger.kernel.org>,
	<cjia@nvidia.com>, <linux-arm-kernel@lists.infradead.org>,
	"Wanghaibin (D)" <wanghaibin.wang@huawei.com>
Subject: Re: [PATCH] KVM: arm64: Correctly handle the mmio faulting
Date: Wed, 21 Apr 2021 14:17:44 +0800	[thread overview]
Message-ID: <ed8a8b90-8b96-4967-01f5-cd0f536c38d2@huawei.com> (raw)
In-Reply-To: <2e23aaa7-0c8d-13ba-2eae-9e6ab2adc587@redhat.com>

Hi Gavin,

On 2021/4/21 14:20, Gavin Shan wrote:
> Hi Keqian and Santosh,
> 
> On 4/21/21 12:59 PM, Keqian Zhu wrote:
>> On 2020/10/22 0:16, Santosh Shukla wrote:
>>> The Commit:6d674e28 introduces a notion to detect and handle the
>>> device mapping. The commit checks for the VM_PFNMAP flag is set
>>> in vma->flags and if set then marks force_pte to true such that
>>> if force_pte is true then ignore the THP function check
>>> (/transparent_hugepage_adjust()).
>>>
>>> There could be an issue with the VM_PFNMAP flag setting and checking.
>>> For example consider a case where the mdev vendor driver register's
>>> the vma_fault handler named vma_mmio_fault(), which maps the
>>> host MMIO region in-turn calls remap_pfn_range() and maps
>>> the MMIO's vma space. Where, remap_pfn_range implicitly sets
>>> the VM_PFNMAP flag into vma->flags.
>> Could you give the name of the mdev vendor driver that triggers this issue?
>> I failed to find one according to your description. Thanks.
>>
> 
> I think it would be fixed in driver side to set VM_PFNMAP in
> its mmap() callback (call_mmap()), like vfio PCI driver does.
> It means it won't be delayed until page fault is issued and
> remap_pfn_range() is called. It's determined from the beginning
> that the vma associated the mdev vendor driver is serving as
> PFN remapping purpose. So the vma should be populated completely,
> including the VM_PFNMAP flag before it becomes visible to user
> space.
> 
> The example can be found from vfio driver in drivers/vfio/pci/vfio_pci.c:
>     vfio_pci_mmap:       VM_PFNMAP is set for the vma
>     vfio_pci_mmap_fault: remap_pfn_range() is called
Right. I have discussed the above with Marc. I want to find the driver
to fix it. However, AFAICS, there is no driver matches the description...

Thanks,
Keqian

> 
> Thanks,
> Gavin
> 
>>>
>>> Now lets assume a mmio fault handing flow where guest first access
>>> the MMIO region whose 2nd stage translation is not present.
>>> So that results to arm64-kvm hypervisor executing guest abort handler,
>>> like below:
>>>
>>> kvm_handle_guest_abort() -->
>>>   user_mem_abort()--> {
>>>
>>>      ...
>>>      0. checks the vma->flags for the VM_PFNMAP.
>>>      1. Since VM_PFNMAP flag is not yet set so force_pte _is_ false;
>>>      2. gfn_to_pfn_prot() -->
>>>          __gfn_to_pfn_memslot() -->
>>>              fixup_user_fault() -->
>>>                  handle_mm_fault()-->
>>>                      __do_fault() -->
>>>                         vma_mmio_fault() --> // vendor's mdev fault handler
>>>                          remap_pfn_range()--> // Here sets the VM_PFNMAP
>>>                         flag into vma->flags.
>>>      3. Now that force_pte is set to false in step-2),
>>>         will execute transparent_hugepage_adjust() func and
>>>         that lead to Oops [4].
>>>   }
>>>
>>> The proposition is to check is_iomap flag before executing the THP
>>> function transparent_hugepage_adjust().
>>>
>>> [4] THP Oops:
>>>> pc: kvm_is_transparent_hugepage+0x18/0xb0
>>>> ...
>>>> ...
>>>> user_mem_abort+0x340/0x9b8
>>>> kvm_handle_guest_abort+0x248/0x468
>>>> handle_exit+0x150/0x1b0
>>>> kvm_arch_vcpu_ioctl_run+0x4d4/0x778
>>>> kvm_vcpu_ioctl+0x3c0/0x858
>>>> ksys_ioctl+0x84/0xb8
>>>> __arm64_sys_ioctl+0x28/0x38
>>>
>>> Tested on Huawei Kunpeng Taishan-200 arm64 server, Using VFIO-mdev device.
>>> Linux tip: 583090b1
>>>
>>> Fixes: 6d674e28 ("KVM: arm/arm64: Properly handle faulting of device mappings")
>>> Signed-off-by: Santosh Shukla <sashukla@nvidia.com>
>>> ---
>>>   arch/arm64/kvm/mmu.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>> index 3d26b47..ff15357 100644
>>> --- a/arch/arm64/kvm/mmu.c
>>> +++ b/arch/arm64/kvm/mmu.c
>>> @@ -1947,7 +1947,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>        * If we are not forced to use page mapping, check if we are
>>>        * backed by a THP and thus use block mapping if possible.
>>>        */
>>> -    if (vma_pagesize == PAGE_SIZE && !force_pte)
>>> +    if (vma_pagesize == PAGE_SIZE && !force_pte && !is_iomap(flags))
>>>           vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>>                                  &pfn, &fault_ipa);
>>>       if (writable)
>>>
>>
> 
> .
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-04-21  6:17 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-21 16:16 [PATCH] KVM: arm64: Correctly handle the mmio faulting Santosh Shukla
2020-10-21 16:16 ` Santosh Shukla
2020-10-21 16:16 ` Santosh Shukla
2020-10-23 11:29 ` Marc Zyngier
2020-10-23 11:29   ` Marc Zyngier
2020-10-23 11:29   ` Marc Zyngier
2020-10-26  4:56   ` Santosh Shukla
2020-10-26  6:50   ` Santosh Shukla
2021-04-21  2:59 ` Keqian Zhu
2021-04-21  2:59   ` Keqian Zhu
2021-04-21  2:59   ` Keqian Zhu
2021-04-21  6:20   ` Gavin Shan
2021-04-21  6:20     ` Gavin Shan
2021-04-21  6:20     ` Gavin Shan
2021-04-21  6:17     ` Keqian Zhu [this message]
2021-04-21  6:17       ` Keqian Zhu
2021-04-21  6:17       ` Keqian Zhu
2021-04-21 11:59       ` Marc Zyngier
2021-04-21 11:59         ` Marc Zyngier
2021-04-21 11:59         ` Marc Zyngier
2021-04-22  2:02         ` Gavin Shan
2021-04-22  2:02           ` Gavin Shan
2021-04-22  2:02           ` Gavin Shan
2021-04-22  6:50           ` Marc Zyngier
2021-04-22  6:50             ` Marc Zyngier
2021-04-22  6:50             ` Marc Zyngier
2021-04-22  7:36             ` Tarun Gupta (SW-GPU)
2021-04-22  7:36               ` Tarun Gupta (SW-GPU)
2021-04-22  7:36               ` Tarun Gupta (SW-GPU)
2021-04-22  8:00               ` Santosh Shukla
2021-04-22  8:00                 ` Santosh Shukla
2021-04-22  8:00                 ` Santosh Shukla
2021-04-23  1:06                 ` Keqian Zhu
2021-04-23  1:06                   ` Keqian Zhu
2021-04-23  1:06                   ` Keqian Zhu
2021-04-23  1:38             ` Gavin Shan
2021-04-23  1:38               ` Gavin Shan
2021-04-23  1:38               ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ed8a8b90-8b96-4967-01f5-cd0f536c38d2@huawei.com \
    --to=zhukeqian1@huawei.com \
    --cc=cjia@nvidia.com \
    --cc=gshan@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=sashukla@nvidia.com \
    --cc=wanghaibin.wang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.