linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] vfio/type1: avoid redundant PageReserved checking
@ 2019-08-27 12:49 Ben Luo
  2019-08-27 18:40 ` Alex Williamson
  0 siblings, 1 reply; 9+ messages in thread
From: Ben Luo @ 2019-08-27 12:49 UTC (permalink / raw)
  To: alex.williamson, cohuck; +Cc: linux-kernel

currently, if the page is not a tail of compound page, it will be
checked twice for the same thing.

Signed-off-by: Ben Luo <luoben@linux.alibaba.com>
---
 drivers/vfio/vfio_iommu_type1.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 054391f..cbe0d88 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
 static bool is_invalid_reserved_pfn(unsigned long pfn)
 {
 	if (pfn_valid(pfn)) {
-		bool reserved;
 		struct page *tail = pfn_to_page(pfn);
 		struct page *head = compound_head(tail);
-		reserved = !!(PageReserved(head));
 		if (head != tail) {
+			bool reserved = !!(PageReserved(head));
 			/*
 			 * "head" is not a dangling pointer
 			 * (compound_head takes care of that)
@@ -310,7 +309,7 @@ static bool is_invalid_reserved_pfn(unsigned long pfn)
 			if (PageTail(tail))
 				return reserved;
 		}
-		return PageReserved(tail);
+		return !!(PageReserved(tail));
 	}
 
 	return true;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] vfio/type1: avoid redundant PageReserved checking
  2019-08-27 12:49 [PATCH] vfio/type1: avoid redundant PageReserved checking Ben Luo
@ 2019-08-27 18:40 ` Alex Williamson
  2019-08-28  4:28   ` [PATCH v2] " Ben Luo
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Williamson @ 2019-08-27 18:40 UTC (permalink / raw)
  To: Ben Luo; +Cc: cohuck, linux-kernel

On Tue, 27 Aug 2019 20:49:48 +0800
Ben Luo <luoben@linux.alibaba.com> wrote:

> currently, if the page is not a tail of compound page, it will be
> checked twice for the same thing.
> 
> Signed-off-by: Ben Luo <luoben@linux.alibaba.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 054391f..cbe0d88 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
>  static bool is_invalid_reserved_pfn(unsigned long pfn)
>  {
>  	if (pfn_valid(pfn)) {
> -		bool reserved;
>  		struct page *tail = pfn_to_page(pfn);
>  		struct page *head = compound_head(tail);
> -		reserved = !!(PageReserved(head));
>  		if (head != tail) {
> +			bool reserved = !!(PageReserved(head));
>  			/*
>  			 * "head" is not a dangling pointer
>  			 * (compound_head takes care of that)
> @@ -310,7 +309,7 @@ static bool is_invalid_reserved_pfn(unsigned long pfn)
>  			if (PageTail(tail))
>  				return reserved;
>  		}
> -		return PageReserved(tail);
> +		return !!(PageReserved(tail));
>  	}
>  
>  	return true;

Logic seems fine to me, though I'd actually prefer to get rid of the !!
in the first use than duplicate it at the second use.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] vfio/type1: avoid redundant PageReserved checking
  2019-08-27 18:40 ` Alex Williamson
@ 2019-08-28  4:28   ` Ben Luo
  2019-08-28 15:55     ` Alex Williamson
  0 siblings, 1 reply; 9+ messages in thread
From: Ben Luo @ 2019-08-28  4:28 UTC (permalink / raw)
  To: alex.williamson, cohuck; +Cc: linux-kernel

currently, if the page is not a tail of compound page, it will be
checked twice for the same thing.

Signed-off-by: Ben Luo <luoben@linux.alibaba.com>
---
 drivers/vfio/vfio_iommu_type1.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 054391f..d0f7346 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
 static bool is_invalid_reserved_pfn(unsigned long pfn)
 {
 	if (pfn_valid(pfn)) {
-		bool reserved;
 		struct page *tail = pfn_to_page(pfn);
 		struct page *head = compound_head(tail);
-		reserved = !!(PageReserved(head));
 		if (head != tail) {
+			bool reserved = PageReserved(head);
 			/*
 			 * "head" is not a dangling pointer
 			 * (compound_head takes care of that)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] vfio/type1: avoid redundant PageReserved checking
  2019-08-28  4:28   ` [PATCH v2] " Ben Luo
@ 2019-08-28 15:55     ` Alex Williamson
  2019-08-29 16:58       ` Ben Luo
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Williamson @ 2019-08-28 15:55 UTC (permalink / raw)
  To: Ben Luo; +Cc: cohuck, linux-kernel, Andrea Arcangeli

On Wed, 28 Aug 2019 12:28:04 +0800
Ben Luo <luoben@linux.alibaba.com> wrote:

> currently, if the page is not a tail of compound page, it will be
> checked twice for the same thing.
> 
> Signed-off-by: Ben Luo <luoben@linux.alibaba.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 054391f..d0f7346 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
>  static bool is_invalid_reserved_pfn(unsigned long pfn)
>  {
>  	if (pfn_valid(pfn)) {
> -		bool reserved;
>  		struct page *tail = pfn_to_page(pfn);
>  		struct page *head = compound_head(tail);
> -		reserved = !!(PageReserved(head));
>  		if (head != tail) {
> +			bool reserved = PageReserved(head);
>  			/*
>  			 * "head" is not a dangling pointer
>  			 * (compound_head takes care of that)

Thinking more about this, the code here was originally just a copy of
kvm_is_mmio_pfn() which was simplified in v3.12 with the commit below.
Should we instead do the same thing here?  Thanks,

Alex

commit 11feeb498086a3a5907b8148bdf1786a9b18fc55
Author: Andrea Arcangeli <aarcange@redhat.com>
Date:   Thu Jul 25 03:04:38 2013 +0200

    kvm: optimize away THP checks in kvm_is_mmio_pfn()
    
    The checks on PG_reserved in the page structure on head and tail pages
    aren't necessary because split_huge_page wouldn't transfer the
    PG_reserved bit from head to tail anyway.
    
    This was a forward-thinking check done in the case PageReserved was
    set by a driver-owned page mapped in userland with something like
    remap_pfn_range in a VM_PFNMAP region, but using hugepmds (not
    possible right now). It was meant to be very safe, but it's overkill
    as it's unlikely split_huge_page could ever run without the driver
    noticing and tearing down the hugepage itself.
    
    And if a driver in the future will really want to map a reserved
    hugepage in userland using an huge pmd it should simply take care of
    marking all subpages reserved too to keep KVM safe. This of course
    would require such a hypothetical driver to tear down the huge pmd
    itself and splitting the hugepage itself, instead of relaying on
    split_huge_page, but that sounds very reasonable, especially
    considering split_huge_page wouldn't currently transfer the reserved
    bit anyway.
    
    Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
    Signed-off-by: Gleb Natapov <gleb@redhat.com>

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d2836788561e..0fc25aed79a8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -102,28 +102,8 @@ static bool largepages_enabled = true;
 
 bool kvm_is_mmio_pfn(pfn_t pfn)
 {
-       if (pfn_valid(pfn)) {
-               int reserved;
-               struct page *tail = pfn_to_page(pfn);
-               struct page *head = compound_trans_head(tail);
-               reserved = PageReserved(head);
-               if (head != tail) {
-                       /*
-                        * "head" is not a dangling pointer
-                        * (compound_trans_head takes care of that)
-                        * but the hugepage may have been splitted
-                        * from under us (and we may not hold a
-                        * reference count on the head page so it can
-                        * be reused before we run PageReferenced), so
-                        * we've to check PageTail before returning
-                        * what we just read.
-                        */
-                       smp_rmb();
-                       if (PageTail(tail))
-                               return reserved;
-               }
-               return PageReserved(tail);
-       }
+       if (pfn_valid(pfn))
+               return PageReserved(pfn_to_page(pfn));
 
        return true;
 }

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] vfio/type1: avoid redundant PageReserved checking
  2019-08-28 15:55     ` Alex Williamson
@ 2019-08-29 16:58       ` Ben Luo
  2019-08-29 17:06         ` Alex Williamson
  0 siblings, 1 reply; 9+ messages in thread
From: Ben Luo @ 2019-08-29 16:58 UTC (permalink / raw)
  To: Alex Williamson; +Cc: cohuck, linux-kernel, Andrea Arcangeli


在 2019/8/28 下午11:55, Alex Williamson 写道:
> On Wed, 28 Aug 2019 12:28:04 +0800
> Ben Luo <luoben@linux.alibaba.com> wrote:
>
>> currently, if the page is not a tail of compound page, it will be
>> checked twice for the same thing.
>>
>> Signed-off-by: Ben Luo <luoben@linux.alibaba.com>
>> ---
>>   drivers/vfio/vfio_iommu_type1.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index 054391f..d0f7346 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
>>   static bool is_invalid_reserved_pfn(unsigned long pfn)
>>   {
>>   	if (pfn_valid(pfn)) {
>> -		bool reserved;
>>   		struct page *tail = pfn_to_page(pfn);
>>   		struct page *head = compound_head(tail);
>> -		reserved = !!(PageReserved(head));
>>   		if (head != tail) {
>> +			bool reserved = PageReserved(head);
>>   			/*
>>   			 * "head" is not a dangling pointer
>>   			 * (compound_head takes care of that)
> Thinking more about this, the code here was originally just a copy of
> kvm_is_mmio_pfn() which was simplified in v3.12 with the commit below.
> Should we instead do the same thing here?  Thanks,
>
> Alex
ok, and kvm_is_mmio_pfn() has also been updated since then, I will take 
a look at that and compose a new patch
>
> commit 11feeb498086a3a5907b8148bdf1786a9b18fc55
> Author: Andrea Arcangeli <aarcange@redhat.com>
> Date:   Thu Jul 25 03:04:38 2013 +0200
>
>      kvm: optimize away THP checks in kvm_is_mmio_pfn()
>      
>      The checks on PG_reserved in the page structure on head and tail pages
>      aren't necessary because split_huge_page wouldn't transfer the
>      PG_reserved bit from head to tail anyway.
>      
>      This was a forward-thinking check done in the case PageReserved was
>      set by a driver-owned page mapped in userland with something like
>      remap_pfn_range in a VM_PFNMAP region, but using hugepmds (not
>      possible right now). It was meant to be very safe, but it's overkill
>      as it's unlikely split_huge_page could ever run without the driver
>      noticing and tearing down the hugepage itself.
>      
>      And if a driver in the future will really want to map a reserved
>      hugepage in userland using an huge pmd it should simply take care of
>      marking all subpages reserved too to keep KVM safe. This of course
>      would require such a hypothetical driver to tear down the huge pmd
>      itself and splitting the hugepage itself, instead of relaying on
>      split_huge_page, but that sounds very reasonable, especially
>      considering split_huge_page wouldn't currently transfer the reserved
>      bit anyway.
>      
>      Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
>      Signed-off-by: Gleb Natapov <gleb@redhat.com>
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index d2836788561e..0fc25aed79a8 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -102,28 +102,8 @@ static bool largepages_enabled = true;
>   
>   bool kvm_is_mmio_pfn(pfn_t pfn)
>   {
> -       if (pfn_valid(pfn)) {
> -               int reserved;
> -               struct page *tail = pfn_to_page(pfn);
> -               struct page *head = compound_trans_head(tail);
> -               reserved = PageReserved(head);
> -               if (head != tail) {
> -                       /*
> -                        * "head" is not a dangling pointer
> -                        * (compound_trans_head takes care of that)
> -                        * but the hugepage may have been splitted
> -                        * from under us (and we may not hold a
> -                        * reference count on the head page so it can
> -                        * be reused before we run PageReferenced), so
> -                        * we've to check PageTail before returning
> -                        * what we just read.
> -                        */
> -                       smp_rmb();
> -                       if (PageTail(tail))
> -                               return reserved;
> -               }
> -               return PageReserved(tail);
> -       }
> +       if (pfn_valid(pfn))
> +               return PageReserved(pfn_to_page(pfn));
>   
>          return true;
>   }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] vfio/type1: avoid redundant PageReserved checking
  2019-08-29 16:58       ` Ben Luo
@ 2019-08-29 17:06         ` Alex Williamson
  2019-09-02  7:58           ` Ben Luo
       [not found]           ` <5c9c57ba-ca5f-f080-3bb0-417a08788235@linux.alibaba.com>
  0 siblings, 2 replies; 9+ messages in thread
From: Alex Williamson @ 2019-08-29 17:06 UTC (permalink / raw)
  To: Ben Luo; +Cc: cohuck, linux-kernel, Andrea Arcangeli

On Fri, 30 Aug 2019 00:58:22 +0800
Ben Luo <luoben@linux.alibaba.com> wrote:

> 在 2019/8/28 下午11:55, Alex Williamson 写道:
> > On Wed, 28 Aug 2019 12:28:04 +0800
> > Ben Luo <luoben@linux.alibaba.com> wrote:
> >  
> >> currently, if the page is not a tail of compound page, it will be
> >> checked twice for the same thing.
> >>
> >> Signed-off-by: Ben Luo <luoben@linux.alibaba.com>
> >> ---
> >>   drivers/vfio/vfio_iommu_type1.c | 3 +--
> >>   1 file changed, 1 insertion(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >> index 054391f..d0f7346 100644
> >> --- a/drivers/vfio/vfio_iommu_type1.c
> >> +++ b/drivers/vfio/vfio_iommu_type1.c
> >> @@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
> >>   static bool is_invalid_reserved_pfn(unsigned long pfn)
> >>   {
> >>   	if (pfn_valid(pfn)) {
> >> -		bool reserved;
> >>   		struct page *tail = pfn_to_page(pfn);
> >>   		struct page *head = compound_head(tail);
> >> -		reserved = !!(PageReserved(head));
> >>   		if (head != tail) {
> >> +			bool reserved = PageReserved(head);
> >>   			/*
> >>   			 * "head" is not a dangling pointer
> >>   			 * (compound_head takes care of that)  
> > Thinking more about this, the code here was originally just a copy of
> > kvm_is_mmio_pfn() which was simplified in v3.12 with the commit below.
> > Should we instead do the same thing here?  Thanks,
> >
> > Alex  
> ok, and kvm_is_mmio_pfn() has also been updated since then, I will take 
> a look at that and compose a new patch

I'm not sure if the further updates are quite as relevant for vfio, but
appreciate your review of them.  Thanks,

Alex

> >
> > commit 11feeb498086a3a5907b8148bdf1786a9b18fc55
> > Author: Andrea Arcangeli <aarcange@redhat.com>
> > Date:   Thu Jul 25 03:04:38 2013 +0200
> >
> >      kvm: optimize away THP checks in kvm_is_mmio_pfn()
> >      
> >      The checks on PG_reserved in the page structure on head and tail pages
> >      aren't necessary because split_huge_page wouldn't transfer the
> >      PG_reserved bit from head to tail anyway.
> >      
> >      This was a forward-thinking check done in the case PageReserved was
> >      set by a driver-owned page mapped in userland with something like
> >      remap_pfn_range in a VM_PFNMAP region, but using hugepmds (not
> >      possible right now). It was meant to be very safe, but it's overkill
> >      as it's unlikely split_huge_page could ever run without the driver
> >      noticing and tearing down the hugepage itself.
> >      
> >      And if a driver in the future will really want to map a reserved
> >      hugepage in userland using an huge pmd it should simply take care of
> >      marking all subpages reserved too to keep KVM safe. This of course
> >      would require such a hypothetical driver to tear down the huge pmd
> >      itself and splitting the hugepage itself, instead of relaying on
> >      split_huge_page, but that sounds very reasonable, especially
> >      considering split_huge_page wouldn't currently transfer the reserved
> >      bit anyway.
> >      
> >      Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> >      Signed-off-by: Gleb Natapov <gleb@redhat.com>
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index d2836788561e..0fc25aed79a8 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -102,28 +102,8 @@ static bool largepages_enabled = true;
> >   
> >   bool kvm_is_mmio_pfn(pfn_t pfn)
> >   {
> > -       if (pfn_valid(pfn)) {
> > -               int reserved;
> > -               struct page *tail = pfn_to_page(pfn);
> > -               struct page *head = compound_trans_head(tail);
> > -               reserved = PageReserved(head);
> > -               if (head != tail) {
> > -                       /*
> > -                        * "head" is not a dangling pointer
> > -                        * (compound_trans_head takes care of that)
> > -                        * but the hugepage may have been splitted
> > -                        * from under us (and we may not hold a
> > -                        * reference count on the head page so it can
> > -                        * be reused before we run PageReferenced), so
> > -                        * we've to check PageTail before returning
> > -                        * what we just read.
> > -                        */
> > -                       smp_rmb();
> > -                       if (PageTail(tail))
> > -                               return reserved;
> > -               }
> > -               return PageReserved(tail);
> > -       }
> > +       if (pfn_valid(pfn))
> > +               return PageReserved(pfn_to_page(pfn));
> >   
> >          return true;
> >   }  


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] vfio/type1: avoid redundant PageReserved checking
  2019-08-29 17:06         ` Alex Williamson
@ 2019-09-02  7:58           ` Ben Luo
       [not found]           ` <5c9c57ba-ca5f-f080-3bb0-417a08788235@linux.alibaba.com>
  1 sibling, 0 replies; 9+ messages in thread
From: Ben Luo @ 2019-09-02  7:58 UTC (permalink / raw)
  To: Alex Williamson; +Cc: cohuck, linux-kernel, Andrea Arcangeli


在 2019/8/30 上午1:06, Alex Williamson 写道:
> On Fri, 30 Aug 2019 00:58:22 +0800
> Ben Luo <luoben@linux.alibaba.com> wrote:
>
>> 在 2019/8/28 下午11:55, Alex Williamson 写道:
>>> On Wed, 28 Aug 2019 12:28:04 +0800
>>> Ben Luo <luoben@linux.alibaba.com> wrote:
>>>   
>>>> currently, if the page is not a tail of compound page, it will be
>>>> checked twice for the same thing.
>>>>
>>>> Signed-off-by: Ben Luo <luoben@linux.alibaba.com>
>>>> ---
>>>>    drivers/vfio/vfio_iommu_type1.c | 3 +--
>>>>    1 file changed, 1 insertion(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>>>> index 054391f..d0f7346 100644
>>>> --- a/drivers/vfio/vfio_iommu_type1.c
>>>> +++ b/drivers/vfio/vfio_iommu_type1.c
>>>> @@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
>>>>    static bool is_invalid_reserved_pfn(unsigned long pfn)
>>>>    {
>>>>    	if (pfn_valid(pfn)) {
>>>> -		bool reserved;
>>>>    		struct page *tail = pfn_to_page(pfn);
>>>>    		struct page *head = compound_head(tail);
>>>> -		reserved = !!(PageReserved(head));
>>>>    		if (head != tail) {
>>>> +			bool reserved = PageReserved(head);
>>>>    			/*
>>>>    			 * "head" is not a dangling pointer
>>>>    			 * (compound_head takes care of that)
>>> Thinking more about this, the code here was originally just a copy of
>>> kvm_is_mmio_pfn() which was simplified in v3.12 with the commit below.
>>> Should we instead do the same thing here?  Thanks,
>>>
>>> Alex
>> ok, and kvm_is_mmio_pfn() has also been updated since then, I will take
>> a look at that and compose a new patch
> I'm not sure if the further updates are quite as relevant for vfio, but
> appreciate your review of them.  Thanks,
>
> Alex
After studying the related code, my personal understandings are:

kvm_is_mmio_pfn() is used to find out whether a memory range is MMIO 
mapped so
that to set the proper MTRR TYPE to spte.

is_invalid_reserved_pfn() is used in two scenarios:
1, to tell whether a page should be counted against user's mlock limits, as
the function's name implies, all 'invalid' PFNs who are not backed by struct
page and those reserved pages (including zero page and those from NVDIMM 
DAX)
should be excluded.
2, to check if we have got a valid and pinned pfn for the vma with VM_PFNMAP
flag. So, for the zero page and 'RAM' backed PFNs without 'struct page',
kvm_is_mmio_pfn() should return false because they are not MMIO and are
cacheable, but is_invalid_reserved_pfn() should return true since they are
truely reserved or invalid and should not be counted against user's mlock
limits.

For fsdax-page, current get_user_pages() returns -EOPNOTSUPP, and VFIO also
returns this error code to user, seems not support fsdax for now, so 
there is
no chance to call into is_invalid_reserved_pfn() currently, if fsdax is 
to be
supported, not only this function needs to be updated, vaddr_get_pfn() also
needs some changes.

Now, with the assumption that PFNs of compound pages with reserved bit 
set in
head will not be passed to is_invalid_reserved_pfn(), we can simplify this
function to:

static bool is_invalid_reserved_pfn(unsigned long pfn)
{
         if (pfn_valid(pfn))
                 return PageReserved(pfn_to_page(pfn));

         return true;
}

But, I am not sure if the assumption above is true, if not, we still need to
check the reserved bit of head for a tail page as this PATCH v2 does.

Thanks,

     Ben


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] vfio/type1: avoid redundant PageReserved checking
       [not found]           ` <5c9c57ba-ca5f-f080-3bb0-417a08788235@linux.alibaba.com>
@ 2019-09-13 18:05             ` Alex Williamson
  2019-09-30 23:35               ` Andrea Arcangeli
  0 siblings, 1 reply; 9+ messages in thread
From: Alex Williamson @ 2019-09-13 18:05 UTC (permalink / raw)
  To: Ben Luo; +Cc: cohuck, linux-kernel, Andrea Arcangeli

On Mon, 2 Sep 2019 15:32:42 +0800
Ben Luo <luoben@linux.alibaba.com> wrote:

> 在 2019/8/30 上午1:06, Alex Williamson 写道:
> > On Fri, 30 Aug 2019 00:58:22 +0800
> > Ben Luo <luoben@linux.alibaba.com> wrote:
> >  
> >> 在 2019/8/28 下午11:55, Alex Williamson 写道:  
> >>> On Wed, 28 Aug 2019 12:28:04 +0800
> >>> Ben Luo <luoben@linux.alibaba.com> wrote:
> >>>     
> >>>> currently, if the page is not a tail of compound page, it will be
> >>>> checked twice for the same thing.
> >>>>
> >>>> Signed-off-by: Ben Luo <luoben@linux.alibaba.com>
> >>>> ---
> >>>>    drivers/vfio/vfio_iommu_type1.c | 3 +--
> >>>>    1 file changed, 1 insertion(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >>>> index 054391f..d0f7346 100644
> >>>> --- a/drivers/vfio/vfio_iommu_type1.c
> >>>> +++ b/drivers/vfio/vfio_iommu_type1.c
> >>>> @@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
> >>>>    static bool is_invalid_reserved_pfn(unsigned long pfn)
> >>>>    {
> >>>>    	if (pfn_valid(pfn)) {
> >>>> -		bool reserved;
> >>>>    		struct page *tail = pfn_to_page(pfn);
> >>>>    		struct page *head = compound_head(tail);
> >>>> -		reserved = !!(PageReserved(head));
> >>>>    		if (head != tail) {
> >>>> +			bool reserved = PageReserved(head);
> >>>>    			/*
> >>>>    			 * "head" is not a dangling pointer
> >>>>    			 * (compound_head takes care of that)  
> >>> Thinking more about this, the code here was originally just a copy of
> >>> kvm_is_mmio_pfn() which was simplified in v3.12 with the commit below.
> >>> Should we instead do the same thing here?  Thanks,
> >>>
> >>> Alex  
> >> ok, and kvm_is_mmio_pfn() has also been updated since then, I will take
> >> a look at that and compose a new patch  
> > I'm not sure if the further updates are quite as relevant for vfio, but
> > appreciate your review of them.  Thanks,
> >
> > Alex  
> 
> After studying the related code, my personal understandings are:
> 
> kvm_is_mmio_pfn() is used to find out whether a memory range is MMIO 
> mapped so that to set
> the proper MTRR TYPE to spte.
> 
> is_invalid_reserved_pfn() is used in two scenarios:
>      1. to tell whether a page should be counted against user's mlock 
> limits, as the function's name
> implies, all 'invalid' PFNs who are not backed by struct page and those 
> reserved pages (including
> zero page and those from NVDIMM DAX) should be excluded.
> 2. to check if we have got a valid and pinned pfn for the vma 
> with VM_PFNMAP flag.
> 
> So, for the zero page and 'RAM' backed PFNs without 'struct page', 
> kvm_is_mmio_pfn() should
> return false because they are not MMIO and are cacheable, but 
> is_invalid_reserved_pfn() should
> return true since they are truely reserved or invalid and should not be 
> counted against user's
> mlock limits.
> 
> For fsdax-page, current get_user_pages() returns -EOPNOTSUPP, and VFIO 
> also returns this
> error code to user, seems not support fsdax for now, so there is no 
> chance to call into
> is_invalid_reserved_pfn() currently, if fsdax is to be supported, not 
> only this function needs to be
> updated, vaddr_get_pfn() also needs some changes.
> 
> Now, with the assumption that PFNs of compound pages with reserved bit 
> set in head will not be
> passed to is_invalid_reserved_pfn(), we can simplify this function to:
> 
> static bool is_invalid_reserved_pfn(unsigned long pfn)
> {
>          if (pfn_valid(pfn))
>                  return PageReserved(pfn_to_page(pfn));
> 
>          return true;
> }
> 
> But, I am not sure if the assumption above is true, if not, we still 
> need to check the reserved bit of
> head for a tail page as this PATCH v2 does.

I believe what you've described is correct.  Andrea, have we missed
anything?  Thanks,

Alex


> >  
> >>> commit 11feeb498086a3a5907b8148bdf1786a9b18fc55
> >>> Author: Andrea Arcangeli <aarcange@redhat.com>
> >>> Date:   Thu Jul 25 03:04:38 2013 +0200
> >>>
> >>>       kvm: optimize away THP checks in kvm_is_mmio_pfn()
> >>>       
> >>>       The checks on PG_reserved in the page structure on head and tail pages
> >>>       aren't necessary because split_huge_page wouldn't transfer the
> >>>       PG_reserved bit from head to tail anyway.
> >>>       
> >>>       This was a forward-thinking check done in the case PageReserved was
> >>>       set by a driver-owned page mapped in userland with something like
> >>>       remap_pfn_range in a VM_PFNMAP region, but using hugepmds (not
> >>>       possible right now). It was meant to be very safe, but it's overkill
> >>>       as it's unlikely split_huge_page could ever run without the driver
> >>>       noticing and tearing down the hugepage itself.
> >>>       
> >>>       And if a driver in the future will really want to map a reserved
> >>>       hugepage in userland using an huge pmd it should simply take care of
> >>>       marking all subpages reserved too to keep KVM safe. This of course
> >>>       would require such a hypothetical driver to tear down the huge pmd
> >>>       itself and splitting the hugepage itself, instead of relaying on
> >>>       split_huge_page, but that sounds very reasonable, especially
> >>>       considering split_huge_page wouldn't currently transfer the reserved
> >>>       bit anyway.
> >>>       
> >>>       Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> >>>       Signed-off-by: Gleb Natapov <gleb@redhat.com>
> >>>
> >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >>> index d2836788561e..0fc25aed79a8 100644
> >>> --- a/virt/kvm/kvm_main.c
> >>> +++ b/virt/kvm/kvm_main.c
> >>> @@ -102,28 +102,8 @@ static bool largepages_enabled = true;
> >>>    
> >>>    bool kvm_is_mmio_pfn(pfn_t pfn)
> >>>    {
> >>> -       if (pfn_valid(pfn)) {
> >>> -               int reserved;
> >>> -               struct page *tail = pfn_to_page(pfn);
> >>> -               struct page *head = compound_trans_head(tail);
> >>> -               reserved = PageReserved(head);
> >>> -               if (head != tail) {
> >>> -                       /*
> >>> -                        * "head" is not a dangling pointer
> >>> -                        * (compound_trans_head takes care of that)
> >>> -                        * but the hugepage may have been splitted
> >>> -                        * from under us (and we may not hold a
> >>> -                        * reference count on the head page so it can
> >>> -                        * be reused before we run PageReferenced), so
> >>> -                        * we've to check PageTail before returning
> >>> -                        * what we just read.
> >>> -                        */
> >>> -                       smp_rmb();
> >>> -                       if (PageTail(tail))
> >>> -                               return reserved;
> >>> -               }
> >>> -               return PageReserved(tail);
> >>> -       }
> >>> +       if (pfn_valid(pfn))
> >>> +               return PageReserved(pfn_to_page(pfn));
> >>>    
> >>>           return true;
> >>>    }  


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] vfio/type1: avoid redundant PageReserved checking
  2019-09-13 18:05             ` Alex Williamson
@ 2019-09-30 23:35               ` Andrea Arcangeli
  0 siblings, 0 replies; 9+ messages in thread
From: Andrea Arcangeli @ 2019-09-30 23:35 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Ben Luo, cohuck, linux-kernel

Hello,

On Fri, Sep 13, 2019 at 12:05:26PM -0600, Alex Williamson wrote:
> On Mon, 2 Sep 2019 15:32:42 +0800
> Ben Luo <luoben@linux.alibaba.com> wrote:
> 
> > 在 2019/8/30 上午1:06, Alex Williamson 写道:
> > > On Fri, 30 Aug 2019 00:58:22 +0800
> > > Ben Luo <luoben@linux.alibaba.com> wrote:
> > >  
> > >> 在 2019/8/28 下午11:55, Alex Williamson 写道:  
> > >>> On Wed, 28 Aug 2019 12:28:04 +0800
> > >>> Ben Luo <luoben@linux.alibaba.com> wrote:
> > >>>     
> > >>>> currently, if the page is not a tail of compound page, it will be
> > >>>> checked twice for the same thing.
> > >>>>
> > >>>> Signed-off-by: Ben Luo <luoben@linux.alibaba.com>
> > >>>> ---
> > >>>>    drivers/vfio/vfio_iommu_type1.c | 3 +--
> > >>>>    1 file changed, 1 insertion(+), 2 deletions(-)
> > >>>>
> > >>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > >>>> index 054391f..d0f7346 100644
> > >>>> --- a/drivers/vfio/vfio_iommu_type1.c
> > >>>> +++ b/drivers/vfio/vfio_iommu_type1.c
> > >>>> @@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
> > >>>>    static bool is_invalid_reserved_pfn(unsigned long pfn)
> > >>>>    {
> > >>>>    	if (pfn_valid(pfn)) {
> > >>>> -		bool reserved;
> > >>>>    		struct page *tail = pfn_to_page(pfn);
> > >>>>    		struct page *head = compound_head(tail);
> > >>>> -		reserved = !!(PageReserved(head));
> > >>>>    		if (head != tail) {
> > >>>> +			bool reserved = PageReserved(head);
> > >>>>    			/*
> > >>>>    			 * "head" is not a dangling pointer
> > >>>>    			 * (compound_head takes care of that)  
> > >>> Thinking more about this, the code here was originally just a copy of
> > >>> kvm_is_mmio_pfn() which was simplified in v3.12 with the commit below.
> > >>> Should we instead do the same thing here?  Thanks,
> > >>>
> > >>> Alex  
> > >> ok, and kvm_is_mmio_pfn() has also been updated since then, I will take
> > >> a look at that and compose a new patch  
> > > I'm not sure if the further updates are quite as relevant for vfio, but
> > > appreciate your review of them.  Thanks,
> > >
> > > Alex  
> > 
> > After studying the related code, my personal understandings are:
> > 
> > kvm_is_mmio_pfn() is used to find out whether a memory range is MMIO 
> > mapped so that to set
> > the proper MTRR TYPE to spte.
> > 
> > is_invalid_reserved_pfn() is used in two scenarios:
> >      1. to tell whether a page should be counted against user's mlock 
> > limits, as the function's name
> > implies, all 'invalid' PFNs who are not backed by struct page and those 
> > reserved pages (including
> > zero page and those from NVDIMM DAX) should be excluded.
> > 2. to check if we have got a valid and pinned pfn for the vma 
> > with VM_PFNMAP flag.
> > 
> > So, for the zero page and 'RAM' backed PFNs without 'struct page', 
> > kvm_is_mmio_pfn() should
> > return false because they are not MMIO and are cacheable, but 
> > is_invalid_reserved_pfn() should
> > return true since they are truely reserved or invalid and should not be 
> > counted against user's
> > mlock limits.
> > 
> > For fsdax-page, current get_user_pages() returns -EOPNOTSUPP, and VFIO 
> > also returns this
> > error code to user, seems not support fsdax for now, so there is no 
> > chance to call into
> > is_invalid_reserved_pfn() currently, if fsdax is to be supported, not 
> > only this function needs to be
> > updated, vaddr_get_pfn() also needs some changes.
> > 
> > Now, with the assumption that PFNs of compound pages with reserved bit 
> > set in head will not be
> > passed to is_invalid_reserved_pfn(), we can simplify this function to:
> > 
> > static bool is_invalid_reserved_pfn(unsigned long pfn)
> > {
> >          if (pfn_valid(pfn))
> >                  return PageReserved(pfn_to_page(pfn));
> > 
> >          return true;
> > }
> > 
> > But, I am not sure if the assumption above is true, if not, we still 
> > need to check the reserved bit of
> > head for a tail page as this PATCH v2 does.
> 
> I believe what you've described is correct.  Andrea, have we missed
> anything?  Thanks,

Yes it looks good. The only reason for ever wanting to check the head
page reserved bit (instead of only checking the tail page reserved
bit) would be if any code would transfer the reserved bit from head to
tail during a hugepage split, but no hugepage split code can transfer
the reserved bit from head to tail during the split, so checking the
head can't make a difference.

The buddy wouldn't allow the driver to allocate an hugepage if any
subpage is reserved in the e820 map at boot, so non-RAM pages with a
backing struct page aren't an issue here. This was only meaningful for
PFNMAP in case the PG_reserved bit was set by the driver on a hugepage
before mapping it in userland, in which case the driver needs to set
the reserved bit in all subpages to be safe (not only in the head).

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-09-30 23:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-27 12:49 [PATCH] vfio/type1: avoid redundant PageReserved checking Ben Luo
2019-08-27 18:40 ` Alex Williamson
2019-08-28  4:28   ` [PATCH v2] " Ben Luo
2019-08-28 15:55     ` Alex Williamson
2019-08-29 16:58       ` Ben Luo
2019-08-29 17:06         ` Alex Williamson
2019-09-02  7:58           ` Ben Luo
     [not found]           ` <5c9c57ba-ca5f-f080-3bb0-417a08788235@linux.alibaba.com>
2019-09-13 18:05             ` Alex Williamson
2019-09-30 23:35               ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).