All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] iommu/vt-d: Fix unmap_pages support
@ 2021-11-11  0:32 Alex Williamson
  2021-11-11  8:49 ` Ajay Garg
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Alex Williamson @ 2021-11-11  0:32 UTC (permalink / raw)
  To: baolu.lu; +Cc: iommu, dwmw2

When supporting only the .map and .unmap callbacks of iommu_ops,
the IOMMU driver can make assumptions about the size and alignment
used for mappings based on the driver provided pgsize_bitmap.  VT-d
previously used essentially PAGE_MASK for this bitmap as any power
of two mapping was acceptably filled by native page sizes.

However, with the .map_pages and .unmap_pages interface we're now
getting page-size and count arguments.  If we simply combine these
as (page-size * count) and make use of the previous map/unmap
functions internally, any size and alignment assumptions are very
different.

As an example, a given vfio device assignment VM will often create
a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff].  On a system that
does not support IOMMU super pages, the unmap_pages interface will
ask to unmap 1024 4KB pages at the base IOVA.  dma_pte_clear_level()
will recurse down to level 2 of the page table where the first half
of the pfn range exactly matches the entire pte level.  We clear the
pte, increment the pfn by the level size, but (oops) the next pte is
on a new page, so we exit the loop an pop back up a level.  When we
then update the pfn based on that higher level, we seem to assume
that the previous pfn value was at the start of the level.  In this
case the level size is 256K pfns, which we add to the base pfn and
get a results of 0x7fe00, which is clearly greater than 0x401ff,
so we're done.  Meanwhile we never cleared the ptes for the remainder
of the range.  When the VM remaps this range, we're overwriting valid
ptes and the VT-d driver complains loudly, as reported by the user
report linked below.

The fix for this seems relatively simple, if each iteration of the
loop in dma_pte_clear_level() is assumed to clear to the end of the
level pte page, then our next pfn should be calculated from level_pfn
rather than our working pfn.

Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops callback")
Reported-by: Ajay Garg <ajaygargnsit@gmail.com>
Link: https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/iommu/intel/iommu.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d75f59ae28e6..f6395f5425f0 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1249,7 +1249,7 @@ static struct page *dma_pte_clear_level(struct dmar_domain *domain, int level,
 						       freelist);
 		}
 next:
-		pfn += level_size(level);
+		pfn = level_pfn + level_size(level);
 	} while (!first_pte_in_page(++pte) && pfn <= last_pfn);
 
 	if (first_pte)


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] iommu/vt-d: Fix unmap_pages support
  2021-11-11  0:32 [PATCH] iommu/vt-d: Fix unmap_pages support Alex Williamson
@ 2021-11-11  8:49 ` Ajay Garg
  2021-11-11  9:52 ` Giovanni Cabiddu
  2021-11-12  2:59 ` Lu Baolu
  2 siblings, 0 replies; 6+ messages in thread
From: Ajay Garg @ 2021-11-11  8:49 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, dwmw2

Thanks Alex, Baolu.
The patch fixes things at my end. No kernel-flooding is seen now
(tested starting/stopping vm > 10 times).


Thanks and Regards,
Ajay

On Thu, Nov 11, 2021 at 6:03 AM Alex Williamson
<alex.williamson@redhat.com> wrote:
>
> When supporting only the .map and .unmap callbacks of iommu_ops,
> the IOMMU driver can make assumptions about the size and alignment
> used for mappings based on the driver provided pgsize_bitmap.  VT-d
> previously used essentially PAGE_MASK for this bitmap as any power
> of two mapping was acceptably filled by native page sizes.
>
> However, with the .map_pages and .unmap_pages interface we're now
> getting page-size and count arguments.  If we simply combine these
> as (page-size * count) and make use of the previous map/unmap
> functions internally, any size and alignment assumptions are very
> different.
>
> As an example, a given vfio device assignment VM will often create
> a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff].  On a system that
> does not support IOMMU super pages, the unmap_pages interface will
> ask to unmap 1024 4KB pages at the base IOVA.  dma_pte_clear_level()
> will recurse down to level 2 of the page table where the first half
> of the pfn range exactly matches the entire pte level.  We clear the
> pte, increment the pfn by the level size, but (oops) the next pte is
> on a new page, so we exit the loop an pop back up a level.  When we
> then update the pfn based on that higher level, we seem to assume
> that the previous pfn value was at the start of the level.  In this
> case the level size is 256K pfns, which we add to the base pfn and
> get a results of 0x7fe00, which is clearly greater than 0x401ff,
> so we're done.  Meanwhile we never cleared the ptes for the remainder
> of the range.  When the VM remaps this range, we're overwriting valid
> ptes and the VT-d driver complains loudly, as reported by the user
> report linked below.
>
> The fix for this seems relatively simple, if each iteration of the
> loop in dma_pte_clear_level() is assumed to clear to the end of the
> level pte page, then our next pfn should be calculated from level_pfn
> rather than our working pfn.
>
> Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops callback")
> Reported-by: Ajay Garg <ajaygargnsit@gmail.com>
> Link: https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  drivers/iommu/intel/iommu.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d75f59ae28e6..f6395f5425f0 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1249,7 +1249,7 @@ static struct page *dma_pte_clear_level(struct dmar_domain *domain, int level,
>                                                        freelist);
>                 }
>  next:
> -               pfn += level_size(level);
> +               pfn = level_pfn + level_size(level);
>         } while (!first_pte_in_page(++pte) && pfn <= last_pfn);
>
>         if (first_pte)
>
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] iommu/vt-d: Fix unmap_pages support
  2021-11-11  0:32 [PATCH] iommu/vt-d: Fix unmap_pages support Alex Williamson
  2021-11-11  8:49 ` Ajay Garg
@ 2021-11-11  9:52 ` Giovanni Cabiddu
  2021-11-12  2:59 ` Lu Baolu
  2 siblings, 0 replies; 6+ messages in thread
From: Giovanni Cabiddu @ 2021-11-11  9:52 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, dwmw2

I was able to hit the issue that this patch is fixing using the memory
allocator in QATlib (https://github.com/intel/qatlib/) which uses vfio
to map and unmap memory.

This patch fixes the problem.

On Wed, Nov 10, 2021 at 05:32:50PM -0700, Alex Williamson wrote:
> When supporting only the .map and .unmap callbacks of iommu_ops,
> the IOMMU driver can make assumptions about the size and alignment
> used for mappings based on the driver provided pgsize_bitmap.  VT-d
> previously used essentially PAGE_MASK for this bitmap as any power
> of two mapping was acceptably filled by native page sizes.
> 
> However, with the .map_pages and .unmap_pages interface we're now
> getting page-size and count arguments.  If we simply combine these
> as (page-size * count) and make use of the previous map/unmap
> functions internally, any size and alignment assumptions are very
> different.
> 
> As an example, a given vfio device assignment VM will often create
> a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff].  On a system that
> does not support IOMMU super pages, the unmap_pages interface will
> ask to unmap 1024 4KB pages at the base IOVA.  dma_pte_clear_level()
> will recurse down to level 2 of the page table where the first half
> of the pfn range exactly matches the entire pte level.  We clear the
> pte, increment the pfn by the level size, but (oops) the next pte is
> on a new page, so we exit the loop an pop back up a level.  When we
> then update the pfn based on that higher level, we seem to assume
> that the previous pfn value was at the start of the level.  In this
> case the level size is 256K pfns, which we add to the base pfn and
> get a results of 0x7fe00, which is clearly greater than 0x401ff,
> so we're done.  Meanwhile we never cleared the ptes for the remainder
> of the range.  When the VM remaps this range, we're overwriting valid
> ptes and the VT-d driver complains loudly, as reported by the user
> report linked below.
> 
> The fix for this seems relatively simple, if each iteration of the
> loop in dma_pte_clear_level() is assumed to clear to the end of the
> level pte page, then our next pfn should be calculated from level_pfn
> rather than our working pfn.
> 
> Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops callback")
> Reported-by: Ajay Garg <ajaygargnsit@gmail.com>
> Link: https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] iommu/vt-d: Fix unmap_pages support
  2021-11-11  0:32 [PATCH] iommu/vt-d: Fix unmap_pages support Alex Williamson
  2021-11-11  8:49 ` Ajay Garg
  2021-11-11  9:52 ` Giovanni Cabiddu
@ 2021-11-12  2:59 ` Lu Baolu
  2021-11-18 19:48   ` Jerry Snitselaar
  2 siblings, 1 reply; 6+ messages in thread
From: Lu Baolu @ 2021-11-12  2:59 UTC (permalink / raw)
  To: Alex Williamson; +Cc: iommu, dwmw2

Hi Alex,

On 11/11/21 8:32 AM, Alex Williamson wrote:
> When supporting only the .map and .unmap callbacks of iommu_ops,
> the IOMMU driver can make assumptions about the size and alignment
> used for mappings based on the driver provided pgsize_bitmap.  VT-d
> previously used essentially PAGE_MASK for this bitmap as any power
> of two mapping was acceptably filled by native page sizes.
> 
> However, with the .map_pages and .unmap_pages interface we're now
> getting page-size and count arguments.  If we simply combine these
> as (page-size * count) and make use of the previous map/unmap
> functions internally, any size and alignment assumptions are very
> different.
> 
> As an example, a given vfio device assignment VM will often create
> a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff].  On a system that
> does not support IOMMU super pages, the unmap_pages interface will
> ask to unmap 1024 4KB pages at the base IOVA.  dma_pte_clear_level()
> will recurse down to level 2 of the page table where the first half
> of the pfn range exactly matches the entire pte level.  We clear the
> pte, increment the pfn by the level size, but (oops) the next pte is
> on a new page, so we exit the loop an pop back up a level.  When we
> then update the pfn based on that higher level, we seem to assume
> that the previous pfn value was at the start of the level.  In this
> case the level size is 256K pfns, which we add to the base pfn and
> get a results of 0x7fe00, which is clearly greater than 0x401ff,
> so we're done.  Meanwhile we never cleared the ptes for the remainder
> of the range.  When the VM remaps this range, we're overwriting valid
> ptes and the VT-d driver complains loudly, as reported by the user
> report linked below.
> 
> The fix for this seems relatively simple, if each iteration of the
> loop in dma_pte_clear_level() is assumed to clear to the end of the
> level pte page, then our next pfn should be calculated from level_pfn
> rather than our working pfn.
> 
> Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops callback")
> Reported-by: Ajay Garg <ajaygargnsit@gmail.com>
> Link: https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

Thank you for fixing this! I will queue it for v5.16.

Best regards,
baolu

> ---
>   drivers/iommu/intel/iommu.c |    2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d75f59ae28e6..f6395f5425f0 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1249,7 +1249,7 @@ static struct page *dma_pte_clear_level(struct dmar_domain *domain, int level,
>   						       freelist);
>   		}
>   next:
> -		pfn += level_size(level);
> +		pfn = level_pfn + level_size(level);
>   	} while (!first_pte_in_page(++pte) && pfn <= last_pfn);
>   
>   	if (first_pte)
> 
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] iommu/vt-d: Fix unmap_pages support
  2021-11-12  2:59 ` Lu Baolu
@ 2021-11-18 19:48   ` Jerry Snitselaar
  2021-11-19  0:52     ` Lu Baolu
  0 siblings, 1 reply; 6+ messages in thread
From: Jerry Snitselaar @ 2021-11-18 19:48 UTC (permalink / raw)
  To: Lu Baolu, Alex Williamson; +Cc: iommu, dwmw2

On Fri, 2021-11-12 at 10:59 +0800, Lu Baolu wrote:
> Hi Alex,
> 
> On 11/11/21 8:32 AM, Alex Williamson wrote:
> > When supporting only the .map and .unmap callbacks of iommu_ops,
> > the IOMMU driver can make assumptions about the size and alignment
> > used for mappings based on the driver provided pgsize_bitmap.  VT-d
> > previously used essentially PAGE_MASK for this bitmap as any power
> > of two mapping was acceptably filled by native page sizes.
> > 
> > However, with the .map_pages and .unmap_pages interface we're now
> > getting page-size and count arguments.  If we simply combine these
> > as (page-size * count) and make use of the previous map/unmap
> > functions internally, any size and alignment assumptions are very
> > different.
> > 
> > As an example, a given vfio device assignment VM will often create
> > a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff].  On a system that
> > does not support IOMMU super pages, the unmap_pages interface will
> > ask to unmap 1024 4KB pages at the base IOVA. 
> > dma_pte_clear_level()
> > will recurse down to level 2 of the page table where the first half
> > of the pfn range exactly matches the entire pte level.  We clear
> > the
> > pte, increment the pfn by the level size, but (oops) the next pte
> > is
> > on a new page, so we exit the loop an pop back up a level.  When we
> > then update the pfn based on that higher level, we seem to assume
> > that the previous pfn value was at the start of the level.  In this
> > case the level size is 256K pfns, which we add to the base pfn and
> > get a results of 0x7fe00, which is clearly greater than 0x401ff,
> > so we're done.  Meanwhile we never cleared the ptes for the
> > remainder
> > of the range.  When the VM remaps this range, we're overwriting
> > valid
> > ptes and the VT-d driver complains loudly, as reported by the user
> > report linked below.
> > 
> > The fix for this seems relatively simple, if each iteration of the
> > loop in dma_pte_clear_level() is assumed to clear to the end of the
> > level pte page, then our next pfn should be calculated from
> > level_pfn
> > rather than our working pfn.
> > 
> > Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages()
> > iommu_ops callback")
> > Reported-by: Ajay Garg <ajaygargnsit@gmail.com>
> > Link:
> > https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> 
> Thank you for fixing this! I will queue it for v5.16.
> 
> Best regards,
> baolu
> 

Hi Baolu,

Do you have an estimate of when this will be submitted?

Regards,
Jerry


> > ---
> >   drivers/iommu/intel/iommu.c |    2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/intel/iommu.c
> > b/drivers/iommu/intel/iommu.c
> > index d75f59ae28e6..f6395f5425f0 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -1249,7 +1249,7 @@ static struct page
> > *dma_pte_clear_level(struct dmar_domain *domain, int level,
> >                                                        freelist);
> >                 }
> >   next:
> > -               pfn += level_size(level);
> > +               pfn = level_pfn + level_size(level);
> >         } while (!first_pte_in_page(++pte) && pfn <= last_pfn);
> >   
> >         if (first_pte)
> > 
> > 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu


_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] iommu/vt-d: Fix unmap_pages support
  2021-11-18 19:48   ` Jerry Snitselaar
@ 2021-11-19  0:52     ` Lu Baolu
  0 siblings, 0 replies; 6+ messages in thread
From: Lu Baolu @ 2021-11-19  0:52 UTC (permalink / raw)
  To: Jerry Snitselaar, Alex Williamson; +Cc: iommu, dwmw2

Hi Jerry,

On 11/19/21 3:48 AM, Jerry Snitselaar wrote:
> On Fri, 2021-11-12 at 10:59 +0800, Lu Baolu wrote:
>> Hi Alex,
>>
>> On 11/11/21 8:32 AM, Alex Williamson wrote:
>>> When supporting only the .map and .unmap callbacks of iommu_ops,
>>> the IOMMU driver can make assumptions about the size and alignment
>>> used for mappings based on the driver provided pgsize_bitmap.  VT-d
>>> previously used essentially PAGE_MASK for this bitmap as any power
>>> of two mapping was acceptably filled by native page sizes.
>>>
>>> However, with the .map_pages and .unmap_pages interface we're now
>>> getting page-size and count arguments.  If we simply combine these
>>> as (page-size * count) and make use of the previous map/unmap
>>> functions internally, any size and alignment assumptions are very
>>> different.
>>>
>>> As an example, a given vfio device assignment VM will often create
>>> a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff].  On a system that
>>> does not support IOMMU super pages, the unmap_pages interface will
>>> ask to unmap 1024 4KB pages at the base IOVA.
>>> dma_pte_clear_level()
>>> will recurse down to level 2 of the page table where the first half
>>> of the pfn range exactly matches the entire pte level.  We clear
>>> the
>>> pte, increment the pfn by the level size, but (oops) the next pte
>>> is
>>> on a new page, so we exit the loop an pop back up a level.  When we
>>> then update the pfn based on that higher level, we seem to assume
>>> that the previous pfn value was at the start of the level.  In this
>>> case the level size is 256K pfns, which we add to the base pfn and
>>> get a results of 0x7fe00, which is clearly greater than 0x401ff,
>>> so we're done.  Meanwhile we never cleared the ptes for the
>>> remainder
>>> of the range.  When the VM remaps this range, we're overwriting
>>> valid
>>> ptes and the VT-d driver complains loudly, as reported by the user
>>> report linked below.
>>>
>>> The fix for this seems relatively simple, if each iteration of the
>>> loop in dma_pte_clear_level() is assumed to clear to the end of the
>>> level pte page, then our next pfn should be calculated from
>>> level_pfn
>>> rather than our working pfn.
>>>
>>> Fixes: 3f34f1259776 ("iommu/vt-d: Implement map/unmap_pages()
>>> iommu_ops callback")
>>> Reported-by: Ajay Garg <ajaygargnsit@gmail.com>
>>> Link:
>>> https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/
>>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
>>
>> Thank you for fixing this! I will queue it for v5.16.
>>
>> Best regards,
>> baolu
>>
> 
> Hi Baolu,
> 
> Do you have an estimate of when this will be submitted?

I will submit all fix patches in my queue to Joerg early the next week.

Best regards,
baolu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-11-19  0:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-11  0:32 [PATCH] iommu/vt-d: Fix unmap_pages support Alex Williamson
2021-11-11  8:49 ` Ajay Garg
2021-11-11  9:52 ` Giovanni Cabiddu
2021-11-12  2:59 ` Lu Baolu
2021-11-18 19:48   ` Jerry Snitselaar
2021-11-19  0:52     ` Lu Baolu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.