All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gavin Shan <gshan@redhat.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Keqian Zhu <zhukeqian1@huawei.com>,
	kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-kernel@vger.kernel.org, cjia@nvidia.com,
	linux-arm-kernel@lists.infradead.org,
	"Wanghaibin (D)" <wanghaibin.wang@huawei.com>
Subject: Re: [PATCH] KVM: arm64: Correctly handle the mmio faulting
Date: Fri, 23 Apr 2021 11:38:38 +1000	[thread overview]
Message-ID: <33eb4850-2476-01ae-e3db-5bd20c1a2c23@redhat.com> (raw)
In-Reply-To: <87v98eq0dh.wl-maz@kernel.org>

Hi Marc,

On 4/22/21 4:50 PM, Marc Zyngier wrote:
> On Thu, 22 Apr 2021 03:02:00 +0100,
> Gavin Shan <gshan@redhat.com> wrote:
>> On 4/21/21 9:59 PM, Marc Zyngier wrote:
>>> On Wed, 21 Apr 2021 07:17:44 +0100,
>>> Keqian Zhu <zhukeqian1@huawei.com> wrote:
>>>> On 2021/4/21 14:20, Gavin Shan wrote:
>>>>> On 4/21/21 12:59 PM, Keqian Zhu wrote:
>>>>>> On 2020/10/22 0:16, Santosh Shukla wrote:
>>>>>>> The Commit:6d674e28 introduces a notion to detect and handle the
>>>>>>> device mapping. The commit checks for the VM_PFNMAP flag is set
>>>>>>> in vma->flags and if set then marks force_pte to true such that
>>>>>>> if force_pte is true then ignore the THP function check
>>>>>>> (/transparent_hugepage_adjust()).
>>>>>>>
>>>>>>> There could be an issue with the VM_PFNMAP flag setting and checking.
>>>>>>> For example consider a case where the mdev vendor driver register's
>>>>>>> the vma_fault handler named vma_mmio_fault(), which maps the
>>>>>>> host MMIO region in-turn calls remap_pfn_range() and maps
>>>>>>> the MMIO's vma space. Where, remap_pfn_range implicitly sets
>>>>>>> the VM_PFNMAP flag into vma->flags.
>>>>>> Could you give the name of the mdev vendor driver that triggers this issue?
>>>>>> I failed to find one according to your description. Thanks.
>>>>>>
>>>>>
>>>>> I think it would be fixed in driver side to set VM_PFNMAP in
>>>>> its mmap() callback (call_mmap()), like vfio PCI driver does.
>>>>> It means it won't be delayed until page fault is issued and
>>>>> remap_pfn_range() is called. It's determined from the beginning
>>>>> that the vma associated the mdev vendor driver is serving as
>>>>> PFN remapping purpose. So the vma should be populated completely,
>>>>> including the VM_PFNMAP flag before it becomes visible to user
>>>>> space.
>>>
>>> Why should that be a requirement? Lazy populating of the VMA should be
>>> perfectly acceptable if the fault can only happen on the CPU side.
>>>
>>
>> It isn't a requirement and the drivers needn't follow strictly. I checked
>> several drivers before looking into the patch and found almost all the
>> drivers have VM_PFNMAP set at mmap() time. In drivers/vfio/vfio-pci.c,
>> there is a comment as below, but it doesn't reveal too much about why
>> we can't set VM_PFNMAP at fault time.
>>
>> static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
>> {
>>        :
>>          /*
>>           * See remap_pfn_range(), called from vfio_pci_fault() but we can't
>>           * change vm_flags within the fault handler.  Set them now.
>>           */
>>          vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
>>          vma->vm_ops = &vfio_pci_mmap_ops;
>>
>>          return 0;
>> }
>>
>> To set these flags in advance does have advantages. For example,
>> VM_DONTEXPAND prevents the vma to be merged with another
>> one. VM_DONTDUMP make this vma isn't eligible for
>> coredump. Otherwise, the address space, which is associated with the
>> vma is accessed and unnecessary page faults are triggered on
>> coredump.  VM_IO and VM_PFNMAP avoids to walk the page frames
>> associated with the vma since we don't have valid PFN in the
>> mapping.
> 
> But PCI clearly isn't the case we are dealing with here, and not
> everything is VFIO either. I can *today* create a driver that
> implements a mmap+fault handler, call mmap() on it, pass the result to
> a memslot, and get to the exact same result Santosh describes.
> 
> No PCI, no VFIO, just a random driver. We are *required* to handle
> that.
> 

hmm, ok. I was thinking it's related to VFIO mdev driver when Santosh was
talking about "mdev driver". Anyway, it's always nice to support the case :)

Thanks,
Gavin


WARNING: multiple messages have this Message-ID (diff)
From: Gavin Shan <gshan@redhat.com>
To: Marc Zyngier <maz@kernel.org>
Cc: cjia@nvidia.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] KVM: arm64: Correctly handle the mmio faulting
Date: Fri, 23 Apr 2021 11:38:38 +1000	[thread overview]
Message-ID: <33eb4850-2476-01ae-e3db-5bd20c1a2c23@redhat.com> (raw)
In-Reply-To: <87v98eq0dh.wl-maz@kernel.org>

Hi Marc,

On 4/22/21 4:50 PM, Marc Zyngier wrote:
> On Thu, 22 Apr 2021 03:02:00 +0100,
> Gavin Shan <gshan@redhat.com> wrote:
>> On 4/21/21 9:59 PM, Marc Zyngier wrote:
>>> On Wed, 21 Apr 2021 07:17:44 +0100,
>>> Keqian Zhu <zhukeqian1@huawei.com> wrote:
>>>> On 2021/4/21 14:20, Gavin Shan wrote:
>>>>> On 4/21/21 12:59 PM, Keqian Zhu wrote:
>>>>>> On 2020/10/22 0:16, Santosh Shukla wrote:
>>>>>>> The Commit:6d674e28 introduces a notion to detect and handle the
>>>>>>> device mapping. The commit checks for the VM_PFNMAP flag is set
>>>>>>> in vma->flags and if set then marks force_pte to true such that
>>>>>>> if force_pte is true then ignore the THP function check
>>>>>>> (/transparent_hugepage_adjust()).
>>>>>>>
>>>>>>> There could be an issue with the VM_PFNMAP flag setting and checking.
>>>>>>> For example consider a case where the mdev vendor driver register's
>>>>>>> the vma_fault handler named vma_mmio_fault(), which maps the
>>>>>>> host MMIO region in-turn calls remap_pfn_range() and maps
>>>>>>> the MMIO's vma space. Where, remap_pfn_range implicitly sets
>>>>>>> the VM_PFNMAP flag into vma->flags.
>>>>>> Could you give the name of the mdev vendor driver that triggers this issue?
>>>>>> I failed to find one according to your description. Thanks.
>>>>>>
>>>>>
>>>>> I think it would be fixed in driver side to set VM_PFNMAP in
>>>>> its mmap() callback (call_mmap()), like vfio PCI driver does.
>>>>> It means it won't be delayed until page fault is issued and
>>>>> remap_pfn_range() is called. It's determined from the beginning
>>>>> that the vma associated the mdev vendor driver is serving as
>>>>> PFN remapping purpose. So the vma should be populated completely,
>>>>> including the VM_PFNMAP flag before it becomes visible to user
>>>>> space.
>>>
>>> Why should that be a requirement? Lazy populating of the VMA should be
>>> perfectly acceptable if the fault can only happen on the CPU side.
>>>
>>
>> It isn't a requirement and the drivers needn't follow strictly. I checked
>> several drivers before looking into the patch and found almost all the
>> drivers have VM_PFNMAP set at mmap() time. In drivers/vfio/vfio-pci.c,
>> there is a comment as below, but it doesn't reveal too much about why
>> we can't set VM_PFNMAP at fault time.
>>
>> static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
>> {
>>        :
>>          /*
>>           * See remap_pfn_range(), called from vfio_pci_fault() but we can't
>>           * change vm_flags within the fault handler.  Set them now.
>>           */
>>          vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
>>          vma->vm_ops = &vfio_pci_mmap_ops;
>>
>>          return 0;
>> }
>>
>> To set these flags in advance does have advantages. For example,
>> VM_DONTEXPAND prevents the vma to be merged with another
>> one. VM_DONTDUMP make this vma isn't eligible for
>> coredump. Otherwise, the address space, which is associated with the
>> vma is accessed and unnecessary page faults are triggered on
>> coredump.  VM_IO and VM_PFNMAP avoids to walk the page frames
>> associated with the vma since we don't have valid PFN in the
>> mapping.
> 
> But PCI clearly isn't the case we are dealing with here, and not
> everything is VFIO either. I can *today* create a driver that
> implements a mmap+fault handler, call mmap() on it, pass the result to
> a memslot, and get to the exact same result Santosh describes.
> 
> No PCI, no VFIO, just a random driver. We are *required* to handle
> that.
> 

hmm, ok. I was thinking it's related to VFIO mdev driver when Santosh was
talking about "mdev driver". Anyway, it's always nice to support the case :)

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Gavin Shan <gshan@redhat.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Keqian Zhu <zhukeqian1@huawei.com>,
	kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-kernel@vger.kernel.org, cjia@nvidia.com,
	linux-arm-kernel@lists.infradead.org,
	"Wanghaibin (D)" <wanghaibin.wang@huawei.com>
Subject: Re: [PATCH] KVM: arm64: Correctly handle the mmio faulting
Date: Fri, 23 Apr 2021 11:38:38 +1000	[thread overview]
Message-ID: <33eb4850-2476-01ae-e3db-5bd20c1a2c23@redhat.com> (raw)
In-Reply-To: <87v98eq0dh.wl-maz@kernel.org>

Hi Marc,

On 4/22/21 4:50 PM, Marc Zyngier wrote:
> On Thu, 22 Apr 2021 03:02:00 +0100,
> Gavin Shan <gshan@redhat.com> wrote:
>> On 4/21/21 9:59 PM, Marc Zyngier wrote:
>>> On Wed, 21 Apr 2021 07:17:44 +0100,
>>> Keqian Zhu <zhukeqian1@huawei.com> wrote:
>>>> On 2021/4/21 14:20, Gavin Shan wrote:
>>>>> On 4/21/21 12:59 PM, Keqian Zhu wrote:
>>>>>> On 2020/10/22 0:16, Santosh Shukla wrote:
>>>>>>> The Commit:6d674e28 introduces a notion to detect and handle the
>>>>>>> device mapping. The commit checks for the VM_PFNMAP flag is set
>>>>>>> in vma->flags and if set then marks force_pte to true such that
>>>>>>> if force_pte is true then ignore the THP function check
>>>>>>> (/transparent_hugepage_adjust()).
>>>>>>>
>>>>>>> There could be an issue with the VM_PFNMAP flag setting and checking.
>>>>>>> For example consider a case where the mdev vendor driver register's
>>>>>>> the vma_fault handler named vma_mmio_fault(), which maps the
>>>>>>> host MMIO region in-turn calls remap_pfn_range() and maps
>>>>>>> the MMIO's vma space. Where, remap_pfn_range implicitly sets
>>>>>>> the VM_PFNMAP flag into vma->flags.
>>>>>> Could you give the name of the mdev vendor driver that triggers this issue?
>>>>>> I failed to find one according to your description. Thanks.
>>>>>>
>>>>>
>>>>> I think it would be fixed in driver side to set VM_PFNMAP in
>>>>> its mmap() callback (call_mmap()), like vfio PCI driver does.
>>>>> It means it won't be delayed until page fault is issued and
>>>>> remap_pfn_range() is called. It's determined from the beginning
>>>>> that the vma associated the mdev vendor driver is serving as
>>>>> PFN remapping purpose. So the vma should be populated completely,
>>>>> including the VM_PFNMAP flag before it becomes visible to user
>>>>> space.
>>>
>>> Why should that be a requirement? Lazy populating of the VMA should be
>>> perfectly acceptable if the fault can only happen on the CPU side.
>>>
>>
>> It isn't a requirement and the drivers needn't follow strictly. I checked
>> several drivers before looking into the patch and found almost all the
>> drivers have VM_PFNMAP set at mmap() time. In drivers/vfio/vfio-pci.c,
>> there is a comment as below, but it doesn't reveal too much about why
>> we can't set VM_PFNMAP at fault time.
>>
>> static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
>> {
>>        :
>>          /*
>>           * See remap_pfn_range(), called from vfio_pci_fault() but we can't
>>           * change vm_flags within the fault handler.  Set them now.
>>           */
>>          vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
>>          vma->vm_ops = &vfio_pci_mmap_ops;
>>
>>          return 0;
>> }
>>
>> To set these flags in advance does have advantages. For example,
>> VM_DONTEXPAND prevents the vma to be merged with another
>> one. VM_DONTDUMP make this vma isn't eligible for
>> coredump. Otherwise, the address space, which is associated with the
>> vma is accessed and unnecessary page faults are triggered on
>> coredump.  VM_IO and VM_PFNMAP avoids to walk the page frames
>> associated with the vma since we don't have valid PFN in the
>> mapping.
> 
> But PCI clearly isn't the case we are dealing with here, and not
> everything is VFIO either. I can *today* create a driver that
> implements a mmap+fault handler, call mmap() on it, pass the result to
> a memslot, and get to the exact same result Santosh describes.
> 
> No PCI, no VFIO, just a random driver. We are *required* to handle
> that.
> 

hmm, ok. I was thinking it's related to VFIO mdev driver when Santosh was
talking about "mdev driver". Anyway, it's always nice to support the case :)

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2021-04-22 23:38 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-21 16:16 [PATCH] KVM: arm64: Correctly handle the mmio faulting Santosh Shukla
2020-10-21 16:16 ` Santosh Shukla
2020-10-21 16:16 ` Santosh Shukla
2020-10-23 11:29 ` Marc Zyngier
2020-10-23 11:29   ` Marc Zyngier
2020-10-23 11:29   ` Marc Zyngier
2020-10-26  4:56   ` Santosh Shukla
2020-10-26  6:50   ` Santosh Shukla
2021-04-21  2:59 ` Keqian Zhu
2021-04-21  2:59   ` Keqian Zhu
2021-04-21  2:59   ` Keqian Zhu
2021-04-21  6:20   ` Gavin Shan
2021-04-21  6:20     ` Gavin Shan
2021-04-21  6:20     ` Gavin Shan
2021-04-21  6:17     ` Keqian Zhu
2021-04-21  6:17       ` Keqian Zhu
2021-04-21  6:17       ` Keqian Zhu
2021-04-21 11:59       ` Marc Zyngier
2021-04-21 11:59         ` Marc Zyngier
2021-04-21 11:59         ` Marc Zyngier
2021-04-22  2:02         ` Gavin Shan
2021-04-22  2:02           ` Gavin Shan
2021-04-22  2:02           ` Gavin Shan
2021-04-22  6:50           ` Marc Zyngier
2021-04-22  6:50             ` Marc Zyngier
2021-04-22  6:50             ` Marc Zyngier
2021-04-22  7:36             ` Tarun Gupta (SW-GPU)
2021-04-22  7:36               ` Tarun Gupta (SW-GPU)
2021-04-22  7:36               ` Tarun Gupta (SW-GPU)
2021-04-22  8:00               ` Santosh Shukla
2021-04-22  8:00                 ` Santosh Shukla
2021-04-22  8:00                 ` Santosh Shukla
2021-04-23  1:06                 ` Keqian Zhu
2021-04-23  1:06                   ` Keqian Zhu
2021-04-23  1:06                   ` Keqian Zhu
2021-04-23  1:38             ` Gavin Shan [this message]
2021-04-23  1:38               ` Gavin Shan
2021-04-23  1:38               ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=33eb4850-2476-01ae-e3db-5bd20c1a2c23@redhat.com \
    --to=gshan@redhat.com \
    --cc=cjia@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=wanghaibin.wang@huawei.com \
    --cc=zhukeqian1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.