All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gavin Shan <gshan@redhat.com>
To: Marc Zyngier <maz@kernel.org>, Keqian Zhu <zhukeqian1@huawei.com>
Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-kernel@vger.kernel.org, cjia@nvidia.com,
	linux-arm-kernel@lists.infradead.org,
	"Wanghaibin (D)" <wanghaibin.wang@huawei.com>
Subject: Re: [PATCH] KVM: arm64: Correctly handle the mmio faulting
Date: Thu, 22 Apr 2021 12:02:00 +1000	[thread overview]
Message-ID: <b97415a2-7970-a741-9690-3e4514b4aa7d@redhat.com> (raw)
In-Reply-To: <871rb3rgpl.wl-maz@kernel.org>

Hi Marc,

On 4/21/21 9:59 PM, Marc Zyngier wrote:
> On Wed, 21 Apr 2021 07:17:44 +0100,
> Keqian Zhu <zhukeqian1@huawei.com> wrote:
>> On 2021/4/21 14:20, Gavin Shan wrote:
>>> On 4/21/21 12:59 PM, Keqian Zhu wrote:
>>>> On 2020/10/22 0:16, Santosh Shukla wrote:
>>>>> The Commit:6d674e28 introduces a notion to detect and handle the
>>>>> device mapping. The commit checks for the VM_PFNMAP flag is set
>>>>> in vma->flags and if set then marks force_pte to true such that
>>>>> if force_pte is true then ignore the THP function check
>>>>> (/transparent_hugepage_adjust()).
>>>>>
>>>>> There could be an issue with the VM_PFNMAP flag setting and checking.
>>>>> For example consider a case where the mdev vendor driver register's
>>>>> the vma_fault handler named vma_mmio_fault(), which maps the
>>>>> host MMIO region in-turn calls remap_pfn_range() and maps
>>>>> the MMIO's vma space. Where, remap_pfn_range implicitly sets
>>>>> the VM_PFNMAP flag into vma->flags.
>>>> Could you give the name of the mdev vendor driver that triggers this issue?
>>>> I failed to find one according to your description. Thanks.
>>>>
>>>
>>> I think it would be fixed in driver side to set VM_PFNMAP in
>>> its mmap() callback (call_mmap()), like vfio PCI driver does.
>>> It means it won't be delayed until page fault is issued and
>>> remap_pfn_range() is called. It's determined from the beginning
>>> that the vma associated the mdev vendor driver is serving as
>>> PFN remapping purpose. So the vma should be populated completely,
>>> including the VM_PFNMAP flag before it becomes visible to user
>>> space.
> 
> Why should that be a requirement? Lazy populating of the VMA should be
> perfectly acceptable if the fault can only happen on the CPU side.
> 

It isn't a requirement and the drivers needn't follow strictly. I checked
several drivers before looking into the patch and found almost all the
drivers have VM_PFNMAP set at mmap() time. In drivers/vfio/vfio-pci.c,
there is a comment as below, but it doesn't reveal too much about why
we can't set VM_PFNMAP at fault time.

static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
{
       :
         /*
          * See remap_pfn_range(), called from vfio_pci_fault() but we can't
          * change vm_flags within the fault handler.  Set them now.
          */
         vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
         vma->vm_ops = &vfio_pci_mmap_ops;

         return 0;
}

To set these flags in advance does have advantages. For example, VM_DONTEXPAND
prevents the vma to be merged with another one. VM_DONTDUMP make this vma
isn't eligible for coredump. Otherwise, the address space, which is associated
with the vma is accessed and unnecessary page faults are triggered on coredump.
VM_IO and VM_PFNMAP avoids to walk the page frames associated with the vma since
we don't have valid PFN in the mapping.

>>>
>>> The example can be found from vfio driver in drivers/vfio/pci/vfio_pci.c:
>>>      vfio_pci_mmap:       VM_PFNMAP is set for the vma
>>>      vfio_pci_mmap_fault: remap_pfn_range() is called
>> Right. I have discussed the above with Marc. I want to find the driver
>> to fix it. However, AFAICS, there is no driver matches the description...
> 
> I have the feeling this is an out-of-tree driver (and Santosh email
> address is bouncing, so I guess we won't have much information from
> him).
> 
> However, the simple fact that any odd driver can provide a fault
> handler and populate it the VMA on demand makes me think that we need
> to support this case.
> 

Yeah, Santosh's email isn't reachable. I think the VM_PFNMAP need to be
rechecked after gup if we're going to support this case.

Thanks,
Gavin


WARNING: multiple messages have this Message-ID (diff)
From: Gavin Shan <gshan@redhat.com>
To: Marc Zyngier <maz@kernel.org>, Keqian Zhu <zhukeqian1@huawei.com>
Cc: cjia@nvidia.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] KVM: arm64: Correctly handle the mmio faulting
Date: Thu, 22 Apr 2021 12:02:00 +1000	[thread overview]
Message-ID: <b97415a2-7970-a741-9690-3e4514b4aa7d@redhat.com> (raw)
In-Reply-To: <871rb3rgpl.wl-maz@kernel.org>

Hi Marc,

On 4/21/21 9:59 PM, Marc Zyngier wrote:
> On Wed, 21 Apr 2021 07:17:44 +0100,
> Keqian Zhu <zhukeqian1@huawei.com> wrote:
>> On 2021/4/21 14:20, Gavin Shan wrote:
>>> On 4/21/21 12:59 PM, Keqian Zhu wrote:
>>>> On 2020/10/22 0:16, Santosh Shukla wrote:
>>>>> The Commit:6d674e28 introduces a notion to detect and handle the
>>>>> device mapping. The commit checks for the VM_PFNMAP flag is set
>>>>> in vma->flags and if set then marks force_pte to true such that
>>>>> if force_pte is true then ignore the THP function check
>>>>> (/transparent_hugepage_adjust()).
>>>>>
>>>>> There could be an issue with the VM_PFNMAP flag setting and checking.
>>>>> For example consider a case where the mdev vendor driver register's
>>>>> the vma_fault handler named vma_mmio_fault(), which maps the
>>>>> host MMIO region in-turn calls remap_pfn_range() and maps
>>>>> the MMIO's vma space. Where, remap_pfn_range implicitly sets
>>>>> the VM_PFNMAP flag into vma->flags.
>>>> Could you give the name of the mdev vendor driver that triggers this issue?
>>>> I failed to find one according to your description. Thanks.
>>>>
>>>
>>> I think it would be fixed in driver side to set VM_PFNMAP in
>>> its mmap() callback (call_mmap()), like vfio PCI driver does.
>>> It means it won't be delayed until page fault is issued and
>>> remap_pfn_range() is called. It's determined from the beginning
>>> that the vma associated the mdev vendor driver is serving as
>>> PFN remapping purpose. So the vma should be populated completely,
>>> including the VM_PFNMAP flag before it becomes visible to user
>>> space.
> 
> Why should that be a requirement? Lazy populating of the VMA should be
> perfectly acceptable if the fault can only happen on the CPU side.
> 

It isn't a requirement and the drivers needn't follow strictly. I checked
several drivers before looking into the patch and found almost all the
drivers have VM_PFNMAP set at mmap() time. In drivers/vfio/vfio-pci.c,
there is a comment as below, but it doesn't reveal too much about why
we can't set VM_PFNMAP at fault time.

static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
{
       :
         /*
          * See remap_pfn_range(), called from vfio_pci_fault() but we can't
          * change vm_flags within the fault handler.  Set them now.
          */
         vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
         vma->vm_ops = &vfio_pci_mmap_ops;

         return 0;
}

To set these flags in advance does have advantages. For example, VM_DONTEXPAND
prevents the vma to be merged with another one. VM_DONTDUMP make this vma
isn't eligible for coredump. Otherwise, the address space, which is associated
with the vma is accessed and unnecessary page faults are triggered on coredump.
VM_IO and VM_PFNMAP avoids to walk the page frames associated with the vma since
we don't have valid PFN in the mapping.

>>>
>>> The example can be found from vfio driver in drivers/vfio/pci/vfio_pci.c:
>>>      vfio_pci_mmap:       VM_PFNMAP is set for the vma
>>>      vfio_pci_mmap_fault: remap_pfn_range() is called
>> Right. I have discussed the above with Marc. I want to find the driver
>> to fix it. However, AFAICS, there is no driver matches the description...
> 
> I have the feeling this is an out-of-tree driver (and Santosh email
> address is bouncing, so I guess we won't have much information from
> him).
> 
> However, the simple fact that any odd driver can provide a fault
> handler and populate it the VMA on demand makes me think that we need
> to support this case.
> 

Yeah, Santosh's email isn't reachable. I think the VM_PFNMAP need to be
rechecked after gup if we're going to support this case.

Thanks,
Gavin

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Gavin Shan <gshan@redhat.com>
To: Marc Zyngier <maz@kernel.org>, Keqian Zhu <zhukeqian1@huawei.com>
Cc: kvm@vger.kernel.org, kvmarm@lists.cs.columbia.edu,
	linux-kernel@vger.kernel.org, cjia@nvidia.com,
	linux-arm-kernel@lists.infradead.org,
	"Wanghaibin (D)" <wanghaibin.wang@huawei.com>
Subject: Re: [PATCH] KVM: arm64: Correctly handle the mmio faulting
Date: Thu, 22 Apr 2021 12:02:00 +1000	[thread overview]
Message-ID: <b97415a2-7970-a741-9690-3e4514b4aa7d@redhat.com> (raw)
In-Reply-To: <871rb3rgpl.wl-maz@kernel.org>

Hi Marc,

On 4/21/21 9:59 PM, Marc Zyngier wrote:
> On Wed, 21 Apr 2021 07:17:44 +0100,
> Keqian Zhu <zhukeqian1@huawei.com> wrote:
>> On 2021/4/21 14:20, Gavin Shan wrote:
>>> On 4/21/21 12:59 PM, Keqian Zhu wrote:
>>>> On 2020/10/22 0:16, Santosh Shukla wrote:
>>>>> The Commit:6d674e28 introduces a notion to detect and handle the
>>>>> device mapping. The commit checks for the VM_PFNMAP flag is set
>>>>> in vma->flags and if set then marks force_pte to true such that
>>>>> if force_pte is true then ignore the THP function check
>>>>> (/transparent_hugepage_adjust()).
>>>>>
>>>>> There could be an issue with the VM_PFNMAP flag setting and checking.
>>>>> For example consider a case where the mdev vendor driver register's
>>>>> the vma_fault handler named vma_mmio_fault(), which maps the
>>>>> host MMIO region in-turn calls remap_pfn_range() and maps
>>>>> the MMIO's vma space. Where, remap_pfn_range implicitly sets
>>>>> the VM_PFNMAP flag into vma->flags.
>>>> Could you give the name of the mdev vendor driver that triggers this issue?
>>>> I failed to find one according to your description. Thanks.
>>>>
>>>
>>> I think it would be fixed in driver side to set VM_PFNMAP in
>>> its mmap() callback (call_mmap()), like vfio PCI driver does.
>>> It means it won't be delayed until page fault is issued and
>>> remap_pfn_range() is called. It's determined from the beginning
>>> that the vma associated the mdev vendor driver is serving as
>>> PFN remapping purpose. So the vma should be populated completely,
>>> including the VM_PFNMAP flag before it becomes visible to user
>>> space.
> 
> Why should that be a requirement? Lazy populating of the VMA should be
> perfectly acceptable if the fault can only happen on the CPU side.
> 

It isn't a requirement and the drivers needn't follow strictly. I checked
several drivers before looking into the patch and found almost all the
drivers have VM_PFNMAP set at mmap() time. In drivers/vfio/vfio-pci.c,
there is a comment as below, but it doesn't reveal too much about why
we can't set VM_PFNMAP at fault time.

static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)
{
       :
         /*
          * See remap_pfn_range(), called from vfio_pci_fault() but we can't
          * change vm_flags within the fault handler.  Set them now.
          */
         vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
         vma->vm_ops = &vfio_pci_mmap_ops;

         return 0;
}

To set these flags in advance does have advantages. For example, VM_DONTEXPAND
prevents the vma to be merged with another one. VM_DONTDUMP make this vma
isn't eligible for coredump. Otherwise, the address space, which is associated
with the vma is accessed and unnecessary page faults are triggered on coredump.
VM_IO and VM_PFNMAP avoids to walk the page frames associated with the vma since
we don't have valid PFN in the mapping.

>>>
>>> The example can be found from vfio driver in drivers/vfio/pci/vfio_pci.c:
>>>      vfio_pci_mmap:       VM_PFNMAP is set for the vma
>>>      vfio_pci_mmap_fault: remap_pfn_range() is called
>> Right. I have discussed the above with Marc. I want to find the driver
>> to fix it. However, AFAICS, there is no driver matches the description...
> 
> I have the feeling this is an out-of-tree driver (and Santosh email
> address is bouncing, so I guess we won't have much information from
> him).
> 
> However, the simple fact that any odd driver can provide a fault
> handler and populate it the VMA on demand makes me think that we need
> to support this case.
> 

Yeah, Santosh's email isn't reachable. I think the VM_PFNMAP need to be
rechecked after gup if we're going to support this case.

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-04-22  0:02 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-21 16:16 [PATCH] KVM: arm64: Correctly handle the mmio faulting Santosh Shukla
2020-10-21 16:16 ` Santosh Shukla
2020-10-21 16:16 ` Santosh Shukla
2020-10-23 11:29 ` Marc Zyngier
2020-10-23 11:29   ` Marc Zyngier
2020-10-23 11:29   ` Marc Zyngier
2020-10-26  4:56   ` Santosh Shukla
2020-10-26  6:50   ` Santosh Shukla
2021-04-21  2:59 ` Keqian Zhu
2021-04-21  2:59   ` Keqian Zhu
2021-04-21  2:59   ` Keqian Zhu
2021-04-21  6:20   ` Gavin Shan
2021-04-21  6:20     ` Gavin Shan
2021-04-21  6:20     ` Gavin Shan
2021-04-21  6:17     ` Keqian Zhu
2021-04-21  6:17       ` Keqian Zhu
2021-04-21  6:17       ` Keqian Zhu
2021-04-21 11:59       ` Marc Zyngier
2021-04-21 11:59         ` Marc Zyngier
2021-04-21 11:59         ` Marc Zyngier
2021-04-22  2:02         ` Gavin Shan [this message]
2021-04-22  2:02           ` Gavin Shan
2021-04-22  2:02           ` Gavin Shan
2021-04-22  6:50           ` Marc Zyngier
2021-04-22  6:50             ` Marc Zyngier
2021-04-22  6:50             ` Marc Zyngier
2021-04-22  7:36             ` Tarun Gupta (SW-GPU)
2021-04-22  7:36               ` Tarun Gupta (SW-GPU)
2021-04-22  7:36               ` Tarun Gupta (SW-GPU)
2021-04-22  8:00               ` Santosh Shukla
2021-04-22  8:00                 ` Santosh Shukla
2021-04-22  8:00                 ` Santosh Shukla
2021-04-23  1:06                 ` Keqian Zhu
2021-04-23  1:06                   ` Keqian Zhu
2021-04-23  1:06                   ` Keqian Zhu
2021-04-23  1:38             ` Gavin Shan
2021-04-23  1:38               ` Gavin Shan
2021-04-23  1:38               ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b97415a2-7970-a741-9690-3e4514b4aa7d@redhat.com \
    --to=gshan@redhat.com \
    --cc=cjia@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=wanghaibin.wang@huawei.com \
    --cc=zhukeqian1@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.