linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1] drm/ttm: Refcount allocated tail pages
@ 2022-08-15  9:54 Dmitry Osipenko
  2022-08-15 10:05 ` Christian König
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Osipenko @ 2022-08-15  9:54 UTC (permalink / raw)
  To: David Airlie, Huang Rui, Daniel Vetter, Christian König,
	Trigger Huang, Gert Wollny, Antonio Caggiano
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

Higher order pages allocated using alloc_pages() aren't refcounted and they
need to be refcounted, otherwise it's impossible to map them by KVM. This
patch sets the refcount of the tail pages and fixes the KVM memory mapping
faults.

Without this change guest virgl driver can't map host buffers into guest
and can't provide OpenGL 4.5 profile support to the guest. The host
mappings are also needed for enabling the Venus driver using host GPU
drivers that are utilizing TTM.

Based on a patch proposed by Trigger Huang.

Cc: stable@vger.kernel.org
Cc: Trigger Huang <Trigger.Huang@gmail.com>
Link: https://www.collabora.com/news-and-blog/blog/2021/11/26/venus-on-qemu-enabling-new-virtual-vulkan-driver/#qcom1343
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> # AMDGPU (Qemu and crosvm)
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
---
 drivers/gpu/drm/ttm/ttm_pool.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 21b61631f73a..11e92bb149c9 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -81,6 +81,7 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 	unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
 	struct ttm_pool_dma *dma;
 	struct page *p;
+	unsigned int i;
 	void *vaddr;
 
 	/* Don't set the __GFP_COMP flag for higher order allocations.
@@ -93,8 +94,10 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 
 	if (!pool->use_dma_alloc) {
 		p = alloc_pages(gfp_flags, order);
-		if (p)
+		if (p) {
 			p->private = order;
+			goto ref_tail_pages;
+		}
 		return p;
 	}
 
@@ -120,6 +123,23 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
 
 	dma->vaddr = (unsigned long)vaddr | order;
 	p->private = (unsigned long)dma;
+
+ref_tail_pages:
+	/*
+	 * KVM requires mapped tail pages to be refcounted because put_page()
+	 * is invoked on them in the end of the page fault handling, and thus,
+	 * tail pages need to be protected from the premature releasing.
+	 * In fact, KVM page fault handler refuses to map tail pages to guest
+	 * if they aren't refcounted because hva_to_pfn_remapped() checks the
+	 * refcount specifically for this case.
+	 *
+	 * In particular, unreferenced tail pages result in a KVM "Bad address"
+	 * failure for VMMs that use VirtIO-GPU when guest's Mesa VirGL driver
+	 * accesses mapped host TTM buffer that contains tail pages.
+	 */
+	for (i = 1; i < 1 << order; i++)
+		page_ref_inc(p + i);
+
 	return p;
 
 error_free:
@@ -133,6 +153,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
 {
 	unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
 	struct ttm_pool_dma *dma;
+	unsigned int i;
 	void *vaddr;
 
 #ifdef CONFIG_X86
@@ -142,6 +163,8 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
 	if (caching != ttm_cached && !PageHighMem(p))
 		set_pages_wb(p, 1 << order);
 #endif
+	for (i = 1; i < 1 << order; i++)
+		page_ref_dec(p + i);
 
 	if (!pool || !pool->use_dma_alloc) {
 		__free_pages(p, order);
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-15  9:54 [PATCH v1] drm/ttm: Refcount allocated tail pages Dmitry Osipenko
@ 2022-08-15 10:05 ` Christian König
  2022-08-15 10:09   ` Dmitry Osipenko
  2022-09-06 20:01   ` Daniel Vetter
  0 siblings, 2 replies; 27+ messages in thread
From: Christian König @ 2022-08-15 10:05 UTC (permalink / raw)
  To: Dmitry Osipenko, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
> Higher order pages allocated using alloc_pages() aren't refcounted and they
> need to be refcounted, otherwise it's impossible to map them by KVM. This
> patch sets the refcount of the tail pages and fixes the KVM memory mapping
> faults.
>
> Without this change guest virgl driver can't map host buffers into guest
> and can't provide OpenGL 4.5 profile support to the guest. The host
> mappings are also needed for enabling the Venus driver using host GPU
> drivers that are utilizing TTM.
>
> Based on a patch proposed by Trigger Huang.

Well I can't count how often I have repeated this: This is an absolutely 
clear NAK!

TTM pages are not reference counted in the first place and because of 
this giving them to virgl is illegal.

Please immediately stop this completely broken approach. We have 
discussed this multiple times now.

Regards,
Christian.

>
> Cc: stable@vger.kernel.org
> Cc: Trigger Huang <Trigger.Huang@gmail.com>
> Link: https://www.collabora.com/news-and-blog/blog/2021/11/26/venus-on-qemu-enabling-new-virtual-vulkan-driver/#qcom1343
> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> # AMDGPU (Qemu and crosvm)
> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
> ---
>   drivers/gpu/drm/ttm/ttm_pool.c | 25 ++++++++++++++++++++++++-
>   1 file changed, 24 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> index 21b61631f73a..11e92bb149c9 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -81,6 +81,7 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
>   	unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
>   	struct ttm_pool_dma *dma;
>   	struct page *p;
> +	unsigned int i;
>   	void *vaddr;
>   
>   	/* Don't set the __GFP_COMP flag for higher order allocations.
> @@ -93,8 +94,10 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
>   
>   	if (!pool->use_dma_alloc) {
>   		p = alloc_pages(gfp_flags, order);
> -		if (p)
> +		if (p) {
>   			p->private = order;
> +			goto ref_tail_pages;
> +		}
>   		return p;
>   	}
>   
> @@ -120,6 +123,23 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
>   
>   	dma->vaddr = (unsigned long)vaddr | order;
>   	p->private = (unsigned long)dma;
> +
> +ref_tail_pages:
> +	/*
> +	 * KVM requires mapped tail pages to be refcounted because put_page()
> +	 * is invoked on them in the end of the page fault handling, and thus,
> +	 * tail pages need to be protected from the premature releasing.
> +	 * In fact, KVM page fault handler refuses to map tail pages to guest
> +	 * if they aren't refcounted because hva_to_pfn_remapped() checks the
> +	 * refcount specifically for this case.
> +	 *
> +	 * In particular, unreferenced tail pages result in a KVM "Bad address"
> +	 * failure for VMMs that use VirtIO-GPU when guest's Mesa VirGL driver
> +	 * accesses mapped host TTM buffer that contains tail pages.
> +	 */
> +	for (i = 1; i < 1 << order; i++)
> +		page_ref_inc(p + i);
> +
>   	return p;
>   
>   error_free:
> @@ -133,6 +153,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
>   {
>   	unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
>   	struct ttm_pool_dma *dma;
> +	unsigned int i;
>   	void *vaddr;
>   
>   #ifdef CONFIG_X86
> @@ -142,6 +163,8 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
>   	if (caching != ttm_cached && !PageHighMem(p))
>   		set_pages_wb(p, 1 << order);
>   #endif
> +	for (i = 1; i < 1 << order; i++)
> +		page_ref_dec(p + i);
>   
>   	if (!pool || !pool->use_dma_alloc) {
>   		__free_pages(p, order);


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-15 10:05 ` Christian König
@ 2022-08-15 10:09   ` Dmitry Osipenko
  2022-08-15 10:11     ` Christian König
  2022-09-06 20:01   ` Daniel Vetter
  1 sibling, 1 reply; 27+ messages in thread
From: Dmitry Osipenko @ 2022-08-15 10:09 UTC (permalink / raw)
  To: Christian König, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

On 8/15/22 13:05, Christian König wrote:
> Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
>> Higher order pages allocated using alloc_pages() aren't refcounted and
>> they
>> need to be refcounted, otherwise it's impossible to map them by KVM. This
>> patch sets the refcount of the tail pages and fixes the KVM memory
>> mapping
>> faults.
>>
>> Without this change guest virgl driver can't map host buffers into guest
>> and can't provide OpenGL 4.5 profile support to the guest. The host
>> mappings are also needed for enabling the Venus driver using host GPU
>> drivers that are utilizing TTM.
>>
>> Based on a patch proposed by Trigger Huang.
> 
> Well I can't count how often I have repeated this: This is an absolutely
> clear NAK!
> 
> TTM pages are not reference counted in the first place and because of
> this giving them to virgl is illegal.

A? The first page is refcounted when allocated, the tail pages are not.

> Please immediately stop this completely broken approach. We have
> discussed this multiple times now.

Could you please give me a link to these discussions?

-- 
Best regards,
Dmitry

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-15 10:09   ` Dmitry Osipenko
@ 2022-08-15 10:11     ` Christian König
  2022-08-15 10:14       ` Christian König
  0 siblings, 1 reply; 27+ messages in thread
From: Christian König @ 2022-08-15 10:11 UTC (permalink / raw)
  To: Dmitry Osipenko, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:
> On 8/15/22 13:05, Christian König wrote:
>> Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
>>> Higher order pages allocated using alloc_pages() aren't refcounted and
>>> they
>>> need to be refcounted, otherwise it's impossible to map them by KVM. This
>>> patch sets the refcount of the tail pages and fixes the KVM memory
>>> mapping
>>> faults.
>>>
>>> Without this change guest virgl driver can't map host buffers into guest
>>> and can't provide OpenGL 4.5 profile support to the guest. The host
>>> mappings are also needed for enabling the Venus driver using host GPU
>>> drivers that are utilizing TTM.
>>>
>>> Based on a patch proposed by Trigger Huang.
>> Well I can't count how often I have repeated this: This is an absolutely
>> clear NAK!
>>
>> TTM pages are not reference counted in the first place and because of
>> this giving them to virgl is illegal.
> A? The first page is refcounted when allocated, the tail pages are not.

No they aren't. The first page is just by coincident initialized with a 
refcount of 1. This refcount is completely ignored and not used at all.

Incrementing the reference count and by this mapping the page into some 
other address space is illegal and corrupts the internal state tracking 
of TTM.

>> Please immediately stop this completely broken approach. We have
>> discussed this multiple times now.
> Could you please give me a link to these discussions?

Not of hand, please search the dri-devel list for similar patches. This 
was brought up multiple times now.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-15 10:11     ` Christian König
@ 2022-08-15 10:14       ` Christian König
  2022-08-15 10:18         ` Dmitry Osipenko
  0 siblings, 1 reply; 27+ messages in thread
From: Christian König @ 2022-08-15 10:14 UTC (permalink / raw)
  To: Dmitry Osipenko, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

Am 15.08.22 um 12:11 schrieb Christian König:
> Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:
>> On 8/15/22 13:05, Christian König wrote:
>>> Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
>>>> Higher order pages allocated using alloc_pages() aren't refcounted and
>>>> they
>>>> need to be refcounted, otherwise it's impossible to map them by 
>>>> KVM. This
>>>> patch sets the refcount of the tail pages and fixes the KVM memory
>>>> mapping
>>>> faults.
>>>>
>>>> Without this change guest virgl driver can't map host buffers into 
>>>> guest
>>>> and can't provide OpenGL 4.5 profile support to the guest. The host
>>>> mappings are also needed for enabling the Venus driver using host GPU
>>>> drivers that are utilizing TTM.
>>>>
>>>> Based on a patch proposed by Trigger Huang.
>>> Well I can't count how often I have repeated this: This is an 
>>> absolutely
>>> clear NAK!
>>>
>>> TTM pages are not reference counted in the first place and because of
>>> this giving them to virgl is illegal.
>> A? The first page is refcounted when allocated, the tail pages are not.
>
> No they aren't. The first page is just by coincident initialized with 
> a refcount of 1. This refcount is completely ignored and not used at all.
>
> Incrementing the reference count and by this mapping the page into 
> some other address space is illegal and corrupts the internal state 
> tracking of TTM.

See this comment in the source code as well:

         /* Don't set the __GFP_COMP flag for higher order allocations.
          * Mapping pages directly into an userspace process and calling
          * put_page() on a TTM allocated page is illegal.
          */

I have absolutely no idea how somebody had the idea he could do this.

Regards,
Christian.

>
>>> Please immediately stop this completely broken approach. We have
>>> discussed this multiple times now.
>> Could you please give me a link to these discussions?
>
> Not of hand, please search the dri-devel list for similar patches. 
> This was brought up multiple times now.
>
> Regards,
> Christian.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-15 10:14       ` Christian König
@ 2022-08-15 10:18         ` Dmitry Osipenko
  2022-08-15 10:42           ` Christian König
  2022-08-15 10:47           ` Dmitry Osipenko
  0 siblings, 2 replies; 27+ messages in thread
From: Dmitry Osipenko @ 2022-08-15 10:18 UTC (permalink / raw)
  To: Christian König, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

On 8/15/22 13:14, Christian König wrote:
> Am 15.08.22 um 12:11 schrieb Christian König:
>> Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:
>>> On 8/15/22 13:05, Christian König wrote:
>>>> Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
>>>>> Higher order pages allocated using alloc_pages() aren't refcounted and
>>>>> they
>>>>> need to be refcounted, otherwise it's impossible to map them by
>>>>> KVM. This
>>>>> patch sets the refcount of the tail pages and fixes the KVM memory
>>>>> mapping
>>>>> faults.
>>>>>
>>>>> Without this change guest virgl driver can't map host buffers into
>>>>> guest
>>>>> and can't provide OpenGL 4.5 profile support to the guest. The host
>>>>> mappings are also needed for enabling the Venus driver using host GPU
>>>>> drivers that are utilizing TTM.
>>>>>
>>>>> Based on a patch proposed by Trigger Huang.
>>>> Well I can't count how often I have repeated this: This is an
>>>> absolutely
>>>> clear NAK!
>>>>
>>>> TTM pages are not reference counted in the first place and because of
>>>> this giving them to virgl is illegal.
>>> A? The first page is refcounted when allocated, the tail pages are not.
>>
>> No they aren't. The first page is just by coincident initialized with
>> a refcount of 1. This refcount is completely ignored and not used at all.
>>
>> Incrementing the reference count and by this mapping the page into
>> some other address space is illegal and corrupts the internal state
>> tracking of TTM.
> 
> See this comment in the source code as well:
> 
>         /* Don't set the __GFP_COMP flag for higher order allocations.
>          * Mapping pages directly into an userspace process and calling
>          * put_page() on a TTM allocated page is illegal.
>          */
> 
> I have absolutely no idea how somebody had the idea he could do this.

I saw this comment, but it doesn't make sense because it doesn't explain
why it's illegal. Hence it looks like a bogus comment since the
refcouting certainly works, at least to a some degree because I haven't
noticed any problems in practice, maybe by luck :)

I'll try to dig out the older discussions, thank you for the quick reply!

-- 
Best regards,
Dmitry

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-15 10:18         ` Dmitry Osipenko
@ 2022-08-15 10:42           ` Christian König
  2022-08-15 10:47           ` Dmitry Osipenko
  1 sibling, 0 replies; 27+ messages in thread
From: Christian König @ 2022-08-15 10:42 UTC (permalink / raw)
  To: Dmitry Osipenko, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

Am 15.08.22 um 12:18 schrieb Dmitry Osipenko:
> On 8/15/22 13:14, Christian König wrote:
>> Am 15.08.22 um 12:11 schrieb Christian König:
>>> Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:
>>>> On 8/15/22 13:05, Christian König wrote:
>>>>> Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
>>>>>> Higher order pages allocated using alloc_pages() aren't refcounted and
>>>>>> they
>>>>>> need to be refcounted, otherwise it's impossible to map them by
>>>>>> KVM. This
>>>>>> patch sets the refcount of the tail pages and fixes the KVM memory
>>>>>> mapping
>>>>>> faults.
>>>>>>
>>>>>> Without this change guest virgl driver can't map host buffers into
>>>>>> guest
>>>>>> and can't provide OpenGL 4.5 profile support to the guest. The host
>>>>>> mappings are also needed for enabling the Venus driver using host GPU
>>>>>> drivers that are utilizing TTM.
>>>>>>
>>>>>> Based on a patch proposed by Trigger Huang.
>>>>> Well I can't count how often I have repeated this: This is an
>>>>> absolutely
>>>>> clear NAK!
>>>>>
>>>>> TTM pages are not reference counted in the first place and because of
>>>>> this giving them to virgl is illegal.
>>>> A? The first page is refcounted when allocated, the tail pages are not.
>>> No they aren't. The first page is just by coincident initialized with
>>> a refcount of 1. This refcount is completely ignored and not used at all.
>>>
>>> Incrementing the reference count and by this mapping the page into
>>> some other address space is illegal and corrupts the internal state
>>> tracking of TTM.
>> See this comment in the source code as well:
>>
>>          /* Don't set the __GFP_COMP flag for higher order allocations.
>>           * Mapping pages directly into an userspace process and calling
>>           * put_page() on a TTM allocated page is illegal.
>>           */
>>
>> I have absolutely no idea how somebody had the idea he could do this.
> I saw this comment, but it doesn't make sense because it doesn't explain
> why it's illegal. Hence it looks like a bogus comment since the
> refcouting certainly works, at least to a some degree because I haven't
> noticed any problems in practice, maybe by luck :)

Well exactly that's the problem. It does not work, you are just lucky :)

I will provide a patch to set the reference count to zero even for 
non-compound pages. Maybe that will yield more backtrace to abusers of 
this interface.

Regards,
Christian.

>
> I'll try to dig out the older discussions, thank you for the quick reply!
>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-15 10:18         ` Dmitry Osipenko
  2022-08-15 10:42           ` Christian König
@ 2022-08-15 10:47           ` Dmitry Osipenko
  2022-08-15 10:51             ` Christian König
  1 sibling, 1 reply; 27+ messages in thread
From: Dmitry Osipenko @ 2022-08-15 10:47 UTC (permalink / raw)
  To: Christian König, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

On 8/15/22 13:18, Dmitry Osipenko wrote:
> On 8/15/22 13:14, Christian König wrote:
>> Am 15.08.22 um 12:11 schrieb Christian König:
>>> Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:
>>>> On 8/15/22 13:05, Christian König wrote:
>>>>> Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
>>>>>> Higher order pages allocated using alloc_pages() aren't refcounted and
>>>>>> they
>>>>>> need to be refcounted, otherwise it's impossible to map them by
>>>>>> KVM. This
>>>>>> patch sets the refcount of the tail pages and fixes the KVM memory
>>>>>> mapping
>>>>>> faults.
>>>>>>
>>>>>> Without this change guest virgl driver can't map host buffers into
>>>>>> guest
>>>>>> and can't provide OpenGL 4.5 profile support to the guest. The host
>>>>>> mappings are also needed for enabling the Venus driver using host GPU
>>>>>> drivers that are utilizing TTM.
>>>>>>
>>>>>> Based on a patch proposed by Trigger Huang.
>>>>> Well I can't count how often I have repeated this: This is an
>>>>> absolutely
>>>>> clear NAK!
>>>>>
>>>>> TTM pages are not reference counted in the first place and because of
>>>>> this giving them to virgl is illegal.
>>>> A? The first page is refcounted when allocated, the tail pages are not.
>>>
>>> No they aren't. The first page is just by coincident initialized with
>>> a refcount of 1. This refcount is completely ignored and not used at all.
>>>
>>> Incrementing the reference count and by this mapping the page into
>>> some other address space is illegal and corrupts the internal state
>>> tracking of TTM.
>>
>> See this comment in the source code as well:
>>
>>         /* Don't set the __GFP_COMP flag for higher order allocations.
>>          * Mapping pages directly into an userspace process and calling
>>          * put_page() on a TTM allocated page is illegal.
>>          */
>>
>> I have absolutely no idea how somebody had the idea he could do this.
> 
> I saw this comment, but it doesn't make sense because it doesn't explain
> why it's illegal. Hence it looks like a bogus comment since the
> refcouting certainly works, at least to a some degree because I haven't
> noticed any problems in practice, maybe by luck :)
> 
> I'll try to dig out the older discussions, thank you for the quick reply!

Are you sure it was really discussed in public previously? All I can
find is yours two answers to a similar patches where you're saying that
this it's a wrong solution without in-depth explanation and further
discussions.

Maybe it was discussed privately? In this case I will be happy to get
more info from you about the root of the problem so I could start to
look at how to fix it properly. It's not apparent where the problem is
to a TTM newbie like me.

-- 
Best regards,
Dmitry

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-15 10:47           ` Dmitry Osipenko
@ 2022-08-15 10:51             ` Christian König
  2022-08-15 11:19               ` Dmitry Osipenko
  0 siblings, 1 reply; 27+ messages in thread
From: Christian König @ 2022-08-15 10:51 UTC (permalink / raw)
  To: Dmitry Osipenko, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

Am 15.08.22 um 12:47 schrieb Dmitry Osipenko:
> On 8/15/22 13:18, Dmitry Osipenko wrote:
>> On 8/15/22 13:14, Christian König wrote:
>>> Am 15.08.22 um 12:11 schrieb Christian König:
>>>> Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:
>>>>> On 8/15/22 13:05, Christian König wrote:
>>>>>> Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
>>>>>>> Higher order pages allocated using alloc_pages() aren't refcounted and
>>>>>>> they
>>>>>>> need to be refcounted, otherwise it's impossible to map them by
>>>>>>> KVM. This
>>>>>>> patch sets the refcount of the tail pages and fixes the KVM memory
>>>>>>> mapping
>>>>>>> faults.
>>>>>>>
>>>>>>> Without this change guest virgl driver can't map host buffers into
>>>>>>> guest
>>>>>>> and can't provide OpenGL 4.5 profile support to the guest. The host
>>>>>>> mappings are also needed for enabling the Venus driver using host GPU
>>>>>>> drivers that are utilizing TTM.
>>>>>>>
>>>>>>> Based on a patch proposed by Trigger Huang.
>>>>>> Well I can't count how often I have repeated this: This is an
>>>>>> absolutely
>>>>>> clear NAK!
>>>>>>
>>>>>> TTM pages are not reference counted in the first place and because of
>>>>>> this giving them to virgl is illegal.
>>>>> A? The first page is refcounted when allocated, the tail pages are not.
>>>> No they aren't. The first page is just by coincident initialized with
>>>> a refcount of 1. This refcount is completely ignored and not used at all.
>>>>
>>>> Incrementing the reference count and by this mapping the page into
>>>> some other address space is illegal and corrupts the internal state
>>>> tracking of TTM.
>>> See this comment in the source code as well:
>>>
>>>          /* Don't set the __GFP_COMP flag for higher order allocations.
>>>           * Mapping pages directly into an userspace process and calling
>>>           * put_page() on a TTM allocated page is illegal.
>>>           */
>>>
>>> I have absolutely no idea how somebody had the idea he could do this.
>> I saw this comment, but it doesn't make sense because it doesn't explain
>> why it's illegal. Hence it looks like a bogus comment since the
>> refcouting certainly works, at least to a some degree because I haven't
>> noticed any problems in practice, maybe by luck :)
>>
>> I'll try to dig out the older discussions, thank you for the quick reply!
> Are you sure it was really discussed in public previously? All I can
> find is yours two answers to a similar patches where you're saying that
> this it's a wrong solution without in-depth explanation and further
> discussions.

Yeah, that's my problem as well I can't find that of hand.

But yes it certainly was discussed in public.

>
> Maybe it was discussed privately? In this case I will be happy to get
> more info from you about the root of the problem so I could start to
> look at how to fix it properly. It's not apparent where the problem is
> to a TTM newbie like me.
>

Well this is completely unfixable. See the whole purpose of TTM is to 
allow tracing where what is mapped of a buffer object.

If you circumvent that and increase the page reference yourself than 
that whole functionality can't work correctly any more.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-15 10:51             ` Christian König
@ 2022-08-15 11:19               ` Dmitry Osipenko
  2022-08-15 11:28                 ` Christian König
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Osipenko @ 2022-08-15 11:19 UTC (permalink / raw)
  To: Christian König, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

On 8/15/22 13:51, Christian König wrote:
> Am 15.08.22 um 12:47 schrieb Dmitry Osipenko:
>> On 8/15/22 13:18, Dmitry Osipenko wrote:
>>> On 8/15/22 13:14, Christian König wrote:
>>>> Am 15.08.22 um 12:11 schrieb Christian König:
>>>>> Am 15.08.22 um 12:09 schrieb Dmitry Osipenko:
>>>>>> On 8/15/22 13:05, Christian König wrote:
>>>>>>> Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
>>>>>>>> Higher order pages allocated using alloc_pages() aren't
>>>>>>>> refcounted and
>>>>>>>> they
>>>>>>>> need to be refcounted, otherwise it's impossible to map them by
>>>>>>>> KVM. This
>>>>>>>> patch sets the refcount of the tail pages and fixes the KVM memory
>>>>>>>> mapping
>>>>>>>> faults.
>>>>>>>>
>>>>>>>> Without this change guest virgl driver can't map host buffers into
>>>>>>>> guest
>>>>>>>> and can't provide OpenGL 4.5 profile support to the guest. The host
>>>>>>>> mappings are also needed for enabling the Venus driver using
>>>>>>>> host GPU
>>>>>>>> drivers that are utilizing TTM.
>>>>>>>>
>>>>>>>> Based on a patch proposed by Trigger Huang.
>>>>>>> Well I can't count how often I have repeated this: This is an
>>>>>>> absolutely
>>>>>>> clear NAK!
>>>>>>>
>>>>>>> TTM pages are not reference counted in the first place and
>>>>>>> because of
>>>>>>> this giving them to virgl is illegal.
>>>>>> A? The first page is refcounted when allocated, the tail pages are
>>>>>> not.
>>>>> No they aren't. The first page is just by coincident initialized with
>>>>> a refcount of 1. This refcount is completely ignored and not used
>>>>> at all.
>>>>>
>>>>> Incrementing the reference count and by this mapping the page into
>>>>> some other address space is illegal and corrupts the internal state
>>>>> tracking of TTM.
>>>> See this comment in the source code as well:
>>>>
>>>>          /* Don't set the __GFP_COMP flag for higher order allocations.
>>>>           * Mapping pages directly into an userspace process and
>>>> calling
>>>>           * put_page() on a TTM allocated page is illegal.
>>>>           */
>>>>
>>>> I have absolutely no idea how somebody had the idea he could do this.
>>> I saw this comment, but it doesn't make sense because it doesn't explain
>>> why it's illegal. Hence it looks like a bogus comment since the
>>> refcouting certainly works, at least to a some degree because I haven't
>>> noticed any problems in practice, maybe by luck :)
>>>
>>> I'll try to dig out the older discussions, thank you for the quick
>>> reply!
>> Are you sure it was really discussed in public previously? All I can
>> find is yours two answers to a similar patches where you're saying that
>> this it's a wrong solution without in-depth explanation and further
>> discussions.
> 
> Yeah, that's my problem as well I can't find that of hand.
> 
> But yes it certainly was discussed in public.

If it was only CC'd to dri-devel, then could be that emails didn't pass
the spam moderation :/

>> Maybe it was discussed privately? In this case I will be happy to get
>> more info from you about the root of the problem so I could start to
>> look at how to fix it properly. It's not apparent where the problem is
>> to a TTM newbie like me.
>>
> 
> Well this is completely unfixable. See the whole purpose of TTM is to
> allow tracing where what is mapped of a buffer object.
> 
> If you circumvent that and increase the page reference yourself than
> that whole functionality can't work correctly any more.

Are you suggesting that the problem is that TTM doesn't see the KVM page
faults/mappings?

-- 
Best regards,
Dmitry

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-15 11:19               ` Dmitry Osipenko
@ 2022-08-15 11:28                 ` Christian König
  2022-08-15 11:50                   ` Dmitry Osipenko
  0 siblings, 1 reply; 27+ messages in thread
From: Christian König @ 2022-08-15 11:28 UTC (permalink / raw)
  To: Dmitry Osipenko, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

Am 15.08.22 um 13:19 schrieb Dmitry Osipenko:
> [SNIP]
>>>> I'll try to dig out the older discussions, thank you for the quick
>>>> reply!
>>> Are you sure it was really discussed in public previously? All I can
>>> find is yours two answers to a similar patches where you're saying that
>>> this it's a wrong solution without in-depth explanation and further
>>> discussions.
>> Yeah, that's my problem as well I can't find that of hand.
>>
>> But yes it certainly was discussed in public.
> If it was only CC'd to dri-devel, then could be that emails didn't pass
> the spam moderation :/

That might be possible.

>>> Maybe it was discussed privately? In this case I will be happy to get
>>> more info from you about the root of the problem so I could start to
>>> look at how to fix it properly. It's not apparent where the problem is
>>> to a TTM newbie like me.
>>>
>> Well this is completely unfixable. See the whole purpose of TTM is to
>> allow tracing where what is mapped of a buffer object.
>>
>> If you circumvent that and increase the page reference yourself than
>> that whole functionality can't work correctly any more.
> Are you suggesting that the problem is that TTM doesn't see the KVM page
> faults/mappings?

Yes, and no. It's one of the issues, but there is more behind that (e.g. 
what happens when TTM switches from pages to local memory for backing a BO).

Another question is why is KVM accessing the page structure in the first 
place? The VMA is mapped with VM_PFNMAP and VM_IO, KVM should never ever 
touch any of those pages.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-15 11:28                 ` Christian König
@ 2022-08-15 11:50                   ` Dmitry Osipenko
  2022-08-15 13:06                     ` Christian König
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Osipenko @ 2022-08-15 11:50 UTC (permalink / raw)
  To: Christian König, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

On 8/15/22 14:28, Christian König wrote:
>>>> Maybe it was discussed privately? In this case I will be happy to get
>>>> more info from you about the root of the problem so I could start to
>>>> look at how to fix it properly. It's not apparent where the problem is
>>>> to a TTM newbie like me.
>>>>
>>> Well this is completely unfixable. See the whole purpose of TTM is to
>>> allow tracing where what is mapped of a buffer object.
>>>
>>> If you circumvent that and increase the page reference yourself than
>>> that whole functionality can't work correctly any more.
>> Are you suggesting that the problem is that TTM doesn't see the KVM page
>> faults/mappings?
> 
> Yes, and no. It's one of the issues, but there is more behind that (e.g.
> what happens when TTM switches from pages to local memory for backing a
> BO).

If KVM page fault could reach TTM, then it should be able to relocate
BO. I see now where is the problem, thanks. Although, I'm wondering
whether it already works somehow.. I'll try to play with the the AMDGPU
shrinker and see what will happen on guest mapping of a relocated BO.

> Another question is why is KVM accessing the page structure in the first
> place? The VMA is mapped with VM_PFNMAP and VM_IO, KVM should never ever
> touch any of those pages.

https://elixir.bootlin.com/linux/v5.19/source/virt/kvm/kvm_main.c#L2549

-- 
Best regards,
Dmitry

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-15 11:50                   ` Dmitry Osipenko
@ 2022-08-15 13:06                     ` Christian König
  2022-08-15 13:45                       ` Dmitry Osipenko
  0 siblings, 1 reply; 27+ messages in thread
From: Christian König @ 2022-08-15 13:06 UTC (permalink / raw)
  To: Dmitry Osipenko, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

Am 15.08.22 um 13:50 schrieb Dmitry Osipenko:
> On 8/15/22 14:28, Christian König wrote:
>>>>> Maybe it was discussed privately? In this case I will be happy to get
>>>>> more info from you about the root of the problem so I could start to
>>>>> look at how to fix it properly. It's not apparent where the problem is
>>>>> to a TTM newbie like me.
>>>>>
>>>> Well this is completely unfixable. See the whole purpose of TTM is to
>>>> allow tracing where what is mapped of a buffer object.
>>>>
>>>> If you circumvent that and increase the page reference yourself than
>>>> that whole functionality can't work correctly any more.
>>> Are you suggesting that the problem is that TTM doesn't see the KVM page
>>> faults/mappings?
>> Yes, and no. It's one of the issues, but there is more behind that (e.g.
>> what happens when TTM switches from pages to local memory for backing a
>> BO).
> If KVM page fault could reach TTM, then it should be able to relocate
> BO. I see now where is the problem, thanks. Although, I'm wondering
> whether it already works somehow.. I'll try to play with the the AMDGPU
> shrinker and see what will happen on guest mapping of a relocated BO.

Well the page fault already somehow reaches TTM, otherwise the pfn 
couldn't be filled in in the first place.

The issues is more that KVM should never ever grab a page reference to 
pages mapped with VM_IO or VM_PFNMAP.

Essentially we need to apply the same restriction as with 
get_user_pages() here.

>> Another question is why is KVM accessing the page structure in the first
>> place? The VMA is mapped with VM_PFNMAP and VM_IO, KVM should never ever
>> touch any of those pages.
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.19%2Fsource%2Fvirt%2Fkvm%2Fkvm_main.c%23L2549&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C2f38c27f20f842fc582a08da7eb4580d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637961610314049167%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=Pu5F1EF9UvDPdOQ7sjJ1WDRt5XpFZmAMXdkexnDpEmU%3D&amp;reserved=0

Well that comment sounds like KVM is doing the right thing, so I'm 
wondering what exactly is going on here.

Regards,
Christian.

>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-15 13:06                     ` Christian König
@ 2022-08-15 13:45                       ` Dmitry Osipenko
  2022-08-15 13:53                         ` Christian König
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Osipenko @ 2022-08-15 13:45 UTC (permalink / raw)
  To: Christian König, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

On 8/15/22 16:06, Christian König wrote:
> Am 15.08.22 um 13:50 schrieb Dmitry Osipenko:
>> On 8/15/22 14:28, Christian König wrote:
>>>>>> Maybe it was discussed privately? In this case I will be happy to get
>>>>>> more info from you about the root of the problem so I could start to
>>>>>> look at how to fix it properly. It's not apparent where the
>>>>>> problem is
>>>>>> to a TTM newbie like me.
>>>>>>
>>>>> Well this is completely unfixable. See the whole purpose of TTM is to
>>>>> allow tracing where what is mapped of a buffer object.
>>>>>
>>>>> If you circumvent that and increase the page reference yourself than
>>>>> that whole functionality can't work correctly any more.
>>>> Are you suggesting that the problem is that TTM doesn't see the KVM
>>>> page
>>>> faults/mappings?
>>> Yes, and no. It's one of the issues, but there is more behind that (e.g.
>>> what happens when TTM switches from pages to local memory for backing a
>>> BO).
>> If KVM page fault could reach TTM, then it should be able to relocate
>> BO. I see now where is the problem, thanks. Although, I'm wondering
>> whether it already works somehow.. I'll try to play with the the AMDGPU
>> shrinker and see what will happen on guest mapping of a relocated BO.
> 
> Well the page fault already somehow reaches TTM, otherwise the pfn
> couldn't be filled in in the first place.
> 
> The issues is more that KVM should never ever grab a page reference to
> pages mapped with VM_IO or VM_PFNMAP.
> 
> Essentially we need to apply the same restriction as with
> get_user_pages() here.
> 
>>> Another question is why is KVM accessing the page structure in the first
>>> place? The VMA is mapped with VM_PFNMAP and VM_IO, KVM should never ever
>>> touch any of those pages.
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.19%2Fsource%2Fvirt%2Fkvm%2Fkvm_main.c%23L2549&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C2f38c27f20f842fc582a08da7eb4580d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637961610314049167%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=Pu5F1EF9UvDPdOQ7sjJ1WDRt5XpFZmAMXdkexnDpEmU%3D&amp;reserved=0
>>
> 
> Well that comment sounds like KVM is doing the right thing, so I'm
> wondering what exactly is going on here.

KVM actually doesn't hold the page reference, it takes the temporal
reference during page fault and then drops the reference once page is
mapped, IIUC. Is it still illegal for TTM? Or there is a possibility for
a race condition here?

-- 
Best regards,
Dmitry

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-15 13:45                       ` Dmitry Osipenko
@ 2022-08-15 13:53                         ` Christian König
  2022-08-15 14:57                           ` Dmitry Osipenko
  0 siblings, 1 reply; 27+ messages in thread
From: Christian König @ 2022-08-15 13:53 UTC (permalink / raw)
  To: Dmitry Osipenko, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

Am 15.08.22 um 15:45 schrieb Dmitry Osipenko:
> [SNIP]
>> Well that comment sounds like KVM is doing the right thing, so I'm
>> wondering what exactly is going on here.
> KVM actually doesn't hold the page reference, it takes the temporal
> reference during page fault and then drops the reference once page is
> mapped, IIUC. Is it still illegal for TTM? Or there is a possibility for
> a race condition here?
>

Well the question is why does KVM grab the page reference in the first 
place?

If that is to prevent the mapping from changing then yes that's illegal 
and won't work. It can always happen that you grab the address, solve 
the fault and then immediately fault again because the address you just 
grabbed is invalidated.

If it's for some other reason than we should probably investigate if we 
shouldn't stop doing this.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-15 13:53                         ` Christian König
@ 2022-08-15 14:57                           ` Dmitry Osipenko
  2022-08-15 15:54                             ` Dmitry Osipenko
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Osipenko @ 2022-08-15 14:57 UTC (permalink / raw)
  To: Christian König, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano, Paolo Bonzini
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

On 8/15/22 16:53, Christian König wrote:
> Am 15.08.22 um 15:45 schrieb Dmitry Osipenko:
>> [SNIP]
>>> Well that comment sounds like KVM is doing the right thing, so I'm
>>> wondering what exactly is going on here.
>> KVM actually doesn't hold the page reference, it takes the temporal
>> reference during page fault and then drops the reference once page is
>> mapped, IIUC. Is it still illegal for TTM? Or there is a possibility for
>> a race condition here?
>>
> 
> Well the question is why does KVM grab the page reference in the first
> place?
> 
> If that is to prevent the mapping from changing then yes that's illegal
> and won't work. It can always happen that you grab the address, solve
> the fault and then immediately fault again because the address you just
> grabbed is invalidated.
> 
> If it's for some other reason than we should probably investigate if we
> shouldn't stop doing this.

CC: +Paolo Bonzini who introduced this code

commit add6a0cd1c5ba51b201e1361b05a5df817083618
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Tue Jun 7 17:51:18 2016 +0200

    KVM: MMU: try to fix up page faults before giving up

    The vGPU folks would like to trap the first access to a BAR by setting
    vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
handler
    then can use remap_pfn_range to place some non-reserved pages in the
VMA.

    This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
    and fixup_user_fault together help supporting it.  The patch also
supports
    VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
    reference counting.

@Paolo,
https://lore.kernel.org/dri-devel/73e5ed8d-0d25-7d44-8fa2-e1d61b1f5a04@amd.com/T/#m7647ce5f8c4749599d2c6bc15a2b45f8d8cf8154

-- 
Best regards,
Dmitry

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-15 14:57                           ` Dmitry Osipenko
@ 2022-08-15 15:54                             ` Dmitry Osipenko
  2022-08-17 22:57                               ` Dmitry Osipenko
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Osipenko @ 2022-08-15 15:54 UTC (permalink / raw)
  To: Christian König, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano, Paolo Bonzini
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

On 8/15/22 17:57, Dmitry Osipenko wrote:
> On 8/15/22 16:53, Christian König wrote:
>> Am 15.08.22 um 15:45 schrieb Dmitry Osipenko:
>>> [SNIP]
>>>> Well that comment sounds like KVM is doing the right thing, so I'm
>>>> wondering what exactly is going on here.
>>> KVM actually doesn't hold the page reference, it takes the temporal
>>> reference during page fault and then drops the reference once page is
>>> mapped, IIUC. Is it still illegal for TTM? Or there is a possibility for
>>> a race condition here?
>>>
>>
>> Well the question is why does KVM grab the page reference in the first
>> place?
>>
>> If that is to prevent the mapping from changing then yes that's illegal
>> and won't work. It can always happen that you grab the address, solve
>> the fault and then immediately fault again because the address you just
>> grabbed is invalidated.
>>
>> If it's for some other reason than we should probably investigate if we
>> shouldn't stop doing this.
> 
> CC: +Paolo Bonzini who introduced this code
> 
> commit add6a0cd1c5ba51b201e1361b05a5df817083618
> Author: Paolo Bonzini <pbonzini@redhat.com>
> Date:   Tue Jun 7 17:51:18 2016 +0200
> 
>     KVM: MMU: try to fix up page faults before giving up
> 
>     The vGPU folks would like to trap the first access to a BAR by setting
>     vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
> handler
>     then can use remap_pfn_range to place some non-reserved pages in the
> VMA.
> 
>     This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
>     and fixup_user_fault together help supporting it.  The patch also
> supports
>     VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
>     reference counting.
> 
> @Paolo,
> https://lore.kernel.org/dri-devel/73e5ed8d-0d25-7d44-8fa2-e1d61b1f5a04@amd.com/T/#m7647ce5f8c4749599d2c6bc15a2b45f8d8cf8154
> 

If we need to bump the refcount only for VM_MIXEDMAP and not for
VM_PFNMAP, then perhaps we could add a flag for that to the kvm_main
code that will denote to kvm_release_page_clean whether it needs to put
the page?

-- 
Best regards,
Dmitry

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-15 15:54                             ` Dmitry Osipenko
@ 2022-08-17 22:57                               ` Dmitry Osipenko
  2022-08-17 23:13                                 ` Dmitry Osipenko
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Osipenko @ 2022-08-17 22:57 UTC (permalink / raw)
  To: Christian König, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano, Paolo Bonzini
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

On 8/15/22 18:54, Dmitry Osipenko wrote:
> On 8/15/22 17:57, Dmitry Osipenko wrote:
>> On 8/15/22 16:53, Christian König wrote:
>>> Am 15.08.22 um 15:45 schrieb Dmitry Osipenko:
>>>> [SNIP]
>>>>> Well that comment sounds like KVM is doing the right thing, so I'm
>>>>> wondering what exactly is going on here.
>>>> KVM actually doesn't hold the page reference, it takes the temporal
>>>> reference during page fault and then drops the reference once page is
>>>> mapped, IIUC. Is it still illegal for TTM? Or there is a possibility for
>>>> a race condition here?
>>>>
>>>
>>> Well the question is why does KVM grab the page reference in the first
>>> place?
>>>
>>> If that is to prevent the mapping from changing then yes that's illegal
>>> and won't work. It can always happen that you grab the address, solve
>>> the fault and then immediately fault again because the address you just
>>> grabbed is invalidated.
>>>
>>> If it's for some other reason than we should probably investigate if we
>>> shouldn't stop doing this.
>>
>> CC: +Paolo Bonzini who introduced this code
>>
>> commit add6a0cd1c5ba51b201e1361b05a5df817083618
>> Author: Paolo Bonzini <pbonzini@redhat.com>
>> Date:   Tue Jun 7 17:51:18 2016 +0200
>>
>>     KVM: MMU: try to fix up page faults before giving up
>>
>>     The vGPU folks would like to trap the first access to a BAR by setting
>>     vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
>> handler
>>     then can use remap_pfn_range to place some non-reserved pages in the
>> VMA.
>>
>>     This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
>>     and fixup_user_fault together help supporting it.  The patch also
>> supports
>>     VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
>>     reference counting.
>>
>> @Paolo,
>> https://lore.kernel.org/dri-devel/73e5ed8d-0d25-7d44-8fa2-e1d61b1f5a04@amd.com/T/#m7647ce5f8c4749599d2c6bc15a2b45f8d8cf8154
>>
> 
> If we need to bump the refcount only for VM_MIXEDMAP and not for
> VM_PFNMAP, then perhaps we could add a flag for that to the kvm_main
> code that will denote to kvm_release_page_clean whether it needs to put
> the page?

The other variant that kind of works is to mark TTM pages reserved using
SetPageReserved/ClearPageReserved, telling KVM not to mess with the page
struct. But the potential consequences of doing this are unclear to me.

Christian, do you think we can do it?

-- 
Best regards,
Dmitry

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-17 22:57                               ` Dmitry Osipenko
@ 2022-08-17 23:13                                 ` Dmitry Osipenko
  2022-08-18  9:41                                   ` Christian König
  0 siblings, 1 reply; 27+ messages in thread
From: Dmitry Osipenko @ 2022-08-17 23:13 UTC (permalink / raw)
  To: Christian König, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano, Paolo Bonzini
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

On 8/18/22 01:57, Dmitry Osipenko wrote:
> On 8/15/22 18:54, Dmitry Osipenko wrote:
>> On 8/15/22 17:57, Dmitry Osipenko wrote:
>>> On 8/15/22 16:53, Christian König wrote:
>>>> Am 15.08.22 um 15:45 schrieb Dmitry Osipenko:
>>>>> [SNIP]
>>>>>> Well that comment sounds like KVM is doing the right thing, so I'm
>>>>>> wondering what exactly is going on here.
>>>>> KVM actually doesn't hold the page reference, it takes the temporal
>>>>> reference during page fault and then drops the reference once page is
>>>>> mapped, IIUC. Is it still illegal for TTM? Or there is a possibility for
>>>>> a race condition here?
>>>>>
>>>>
>>>> Well the question is why does KVM grab the page reference in the first
>>>> place?
>>>>
>>>> If that is to prevent the mapping from changing then yes that's illegal
>>>> and won't work. It can always happen that you grab the address, solve
>>>> the fault and then immediately fault again because the address you just
>>>> grabbed is invalidated.
>>>>
>>>> If it's for some other reason than we should probably investigate if we
>>>> shouldn't stop doing this.
>>>
>>> CC: +Paolo Bonzini who introduced this code
>>>
>>> commit add6a0cd1c5ba51b201e1361b05a5df817083618
>>> Author: Paolo Bonzini <pbonzini@redhat.com>
>>> Date:   Tue Jun 7 17:51:18 2016 +0200
>>>
>>>     KVM: MMU: try to fix up page faults before giving up
>>>
>>>     The vGPU folks would like to trap the first access to a BAR by setting
>>>     vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
>>> handler
>>>     then can use remap_pfn_range to place some non-reserved pages in the
>>> VMA.
>>>
>>>     This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
>>>     and fixup_user_fault together help supporting it.  The patch also
>>> supports
>>>     VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
>>>     reference counting.
>>>
>>> @Paolo,
>>> https://lore.kernel.org/dri-devel/73e5ed8d-0d25-7d44-8fa2-e1d61b1f5a04@amd.com/T/#m7647ce5f8c4749599d2c6bc15a2b45f8d8cf8154
>>>
>>
>> If we need to bump the refcount only for VM_MIXEDMAP and not for
>> VM_PFNMAP, then perhaps we could add a flag for that to the kvm_main
>> code that will denote to kvm_release_page_clean whether it needs to put
>> the page?
> 
> The other variant that kind of works is to mark TTM pages reserved using
> SetPageReserved/ClearPageReserved, telling KVM not to mess with the page
> struct. But the potential consequences of doing this are unclear to me.
> 
> Christian, do you think we can do it?

Although, no. It also doesn't work with KVM without additional changes
to KVM.

-- 
Best regards,
Dmitry

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-17 23:13                                 ` Dmitry Osipenko
@ 2022-08-18  9:41                                   ` Christian König
  2023-01-11 17:05                                     ` Sean Christopherson
  0 siblings, 1 reply; 27+ messages in thread
From: Christian König @ 2022-08-18  9:41 UTC (permalink / raw)
  To: Dmitry Osipenko, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano, Paolo Bonzini
  Cc: dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

Am 18.08.22 um 01:13 schrieb Dmitry Osipenko:
> On 8/18/22 01:57, Dmitry Osipenko wrote:
>> On 8/15/22 18:54, Dmitry Osipenko wrote:
>>> On 8/15/22 17:57, Dmitry Osipenko wrote:
>>>> On 8/15/22 16:53, Christian König wrote:
>>>>> Am 15.08.22 um 15:45 schrieb Dmitry Osipenko:
>>>>>> [SNIP]
>>>>>>> Well that comment sounds like KVM is doing the right thing, so I'm
>>>>>>> wondering what exactly is going on here.
>>>>>> KVM actually doesn't hold the page reference, it takes the temporal
>>>>>> reference during page fault and then drops the reference once page is
>>>>>> mapped, IIUC. Is it still illegal for TTM? Or there is a possibility for
>>>>>> a race condition here?
>>>>>>
>>>>> Well the question is why does KVM grab the page reference in the first
>>>>> place?
>>>>>
>>>>> If that is to prevent the mapping from changing then yes that's illegal
>>>>> and won't work. It can always happen that you grab the address, solve
>>>>> the fault and then immediately fault again because the address you just
>>>>> grabbed is invalidated.
>>>>>
>>>>> If it's for some other reason than we should probably investigate if we
>>>>> shouldn't stop doing this.
>>>> CC: +Paolo Bonzini who introduced this code
>>>>
>>>> commit add6a0cd1c5ba51b201e1361b05a5df817083618
>>>> Author: Paolo Bonzini <pbonzini@redhat.com>
>>>> Date:   Tue Jun 7 17:51:18 2016 +0200
>>>>
>>>>      KVM: MMU: try to fix up page faults before giving up
>>>>
>>>>      The vGPU folks would like to trap the first access to a BAR by setting
>>>>      vm_ops on the VMAs produced by mmap-ing a VFIO device.  The fault
>>>> handler
>>>>      then can use remap_pfn_range to place some non-reserved pages in the
>>>> VMA.
>>>>
>>>>      This kind of VM_PFNMAP mapping is not handled by KVM, but follow_pfn
>>>>      and fixup_user_fault together help supporting it.  The patch also
>>>> supports
>>>>      VM_MIXEDMAP vmas where the pfns are not reserved and thus subject to
>>>>      reference counting.
>>>>
>>>> @Paolo,
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fdri-devel%2F73e5ed8d-0d25-7d44-8fa2-e1d61b1f5a04%40amd.com%2FT%2F%23m7647ce5f8c4749599d2c6bc15a2b45f8d8cf8154&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7Cecb0f0eb6c2d43afa15b08da80a625ff%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637963748360952327%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=52Dvisa7p%2BckmBxMvDJFScGSNy9JRPDdnPK05C%2F6n7A%3D&amp;reserved=0
>>>>
>>> If we need to bump the refcount only for VM_MIXEDMAP and not for
>>> VM_PFNMAP, then perhaps we could add a flag for that to the kvm_main
>>> code that will denote to kvm_release_page_clean whether it needs to put
>>> the page?
>> The other variant that kind of works is to mark TTM pages reserved using
>> SetPageReserved/ClearPageReserved, telling KVM not to mess with the page
>> struct. But the potential consequences of doing this are unclear to me.
>>
>> Christian, do you think we can do it?
> Although, no. It also doesn't work with KVM without additional changes
> to KVM.

Well my fundamental problem is that I can't fit together why KVM is 
grabing a page reference in the first place.

See the idea of the page reference is that you have one reference is 
that you count the reference so that the memory is not reused while you 
access it, e.g. for I/O or mapping it into different address spaces etc...

But none of those use cases seem to apply to KVM. If I'm not totally 
mistaken in KVM you want to make sure that the address space mapping, 
e.g. the translation between virtual and physical address, don't change 
while you handle it, but grabbing a page reference is the completely 
wrong approach for that.

Regards,
Christian.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-15 10:05 ` Christian König
  2022-08-15 10:09   ` Dmitry Osipenko
@ 2022-09-06 20:01   ` Daniel Vetter
  2022-09-06 20:05     ` Daniel Vetter
  2022-09-08 11:04     ` Rob Clark
  1 sibling, 2 replies; 27+ messages in thread
From: Daniel Vetter @ 2022-09-06 20:01 UTC (permalink / raw)
  To: Christian König
  Cc: Dmitry Osipenko, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano, dri-devel,
	linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

On Mon, Aug 15, 2022 at 12:05:19PM +0200, Christian König wrote:
> Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
> > Higher order pages allocated using alloc_pages() aren't refcounted and they
> > need to be refcounted, otherwise it's impossible to map them by KVM. This
> > patch sets the refcount of the tail pages and fixes the KVM memory mapping
> > faults.
> > 
> > Without this change guest virgl driver can't map host buffers into guest
> > and can't provide OpenGL 4.5 profile support to the guest. The host
> > mappings are also needed for enabling the Venus driver using host GPU
> > drivers that are utilizing TTM.
> > 
> > Based on a patch proposed by Trigger Huang.
> 
> Well I can't count how often I have repeated this: This is an absolutely
> clear NAK!
> 
> TTM pages are not reference counted in the first place and because of this
> giving them to virgl is illegal.
> 
> Please immediately stop this completely broken approach. We have discussed
> this multiple times now.

Yeah we need to get this stuff closed for real by tagging them all with
VM_IO or VM_PFNMAP asap.

It seems ot be a recurring amount of fun that people try to mmap dma-buf
and then call get_user_pages on them.

Which just doesn't work. I guess this is also why Rob Clark send out that
dma-buf patch to expos mapping information (i.e. wc vs wb vs uncached).

There seems to be some serious bonghits going on :-/
-Daniel

> 
> Regards,
> Christian.
> 
> > 
> > Cc: stable@vger.kernel.org
> > Cc: Trigger Huang <Trigger.Huang@gmail.com>
> > Link: https://www.collabora.com/news-and-blog/blog/2021/11/26/venus-on-qemu-enabling-new-virtual-vulkan-driver/#qcom1343
> > Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> # AMDGPU (Qemu and crosvm)
> > Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
> > ---
> >   drivers/gpu/drm/ttm/ttm_pool.c | 25 ++++++++++++++++++++++++-
> >   1 file changed, 24 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> > index 21b61631f73a..11e92bb149c9 100644
> > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > @@ -81,6 +81,7 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
> >   	unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
> >   	struct ttm_pool_dma *dma;
> >   	struct page *p;
> > +	unsigned int i;
> >   	void *vaddr;
> >   	/* Don't set the __GFP_COMP flag for higher order allocations.
> > @@ -93,8 +94,10 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
> >   	if (!pool->use_dma_alloc) {
> >   		p = alloc_pages(gfp_flags, order);
> > -		if (p)
> > +		if (p) {
> >   			p->private = order;
> > +			goto ref_tail_pages;
> > +		}
> >   		return p;
> >   	}
> > @@ -120,6 +123,23 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
> >   	dma->vaddr = (unsigned long)vaddr | order;
> >   	p->private = (unsigned long)dma;
> > +
> > +ref_tail_pages:
> > +	/*
> > +	 * KVM requires mapped tail pages to be refcounted because put_page()
> > +	 * is invoked on them in the end of the page fault handling, and thus,
> > +	 * tail pages need to be protected from the premature releasing.
> > +	 * In fact, KVM page fault handler refuses to map tail pages to guest
> > +	 * if they aren't refcounted because hva_to_pfn_remapped() checks the
> > +	 * refcount specifically for this case.
> > +	 *
> > +	 * In particular, unreferenced tail pages result in a KVM "Bad address"
> > +	 * failure for VMMs that use VirtIO-GPU when guest's Mesa VirGL driver
> > +	 * accesses mapped host TTM buffer that contains tail pages.
> > +	 */
> > +	for (i = 1; i < 1 << order; i++)
> > +		page_ref_inc(p + i);
> > +
> >   	return p;
> >   error_free:
> > @@ -133,6 +153,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
> >   {
> >   	unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
> >   	struct ttm_pool_dma *dma;
> > +	unsigned int i;
> >   	void *vaddr;
> >   #ifdef CONFIG_X86
> > @@ -142,6 +163,8 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
> >   	if (caching != ttm_cached && !PageHighMem(p))
> >   		set_pages_wb(p, 1 << order);
> >   #endif
> > +	for (i = 1; i < 1 << order; i++)
> > +		page_ref_dec(p + i);
> >   	if (!pool || !pool->use_dma_alloc) {
> >   		__free_pages(p, order);
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-09-06 20:01   ` Daniel Vetter
@ 2022-09-06 20:05     ` Daniel Vetter
  2022-09-07  6:48       ` Christian König
  2023-01-11 17:13       ` Sean Christopherson
  2022-09-08 11:04     ` Rob Clark
  1 sibling, 2 replies; 27+ messages in thread
From: Daniel Vetter @ 2022-09-06 20:05 UTC (permalink / raw)
  To: Christian König, Dmitry Osipenko, David Airlie, Huang Rui,
	Trigger Huang, Gert Wollny, Antonio Caggiano, dri-devel,
	linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

On Tue, Sep 06, 2022 at 10:01:47PM +0200, Daniel Vetter wrote:
> On Mon, Aug 15, 2022 at 12:05:19PM +0200, Christian König wrote:
> > Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
> > > Higher order pages allocated using alloc_pages() aren't refcounted and they
> > > need to be refcounted, otherwise it's impossible to map them by KVM. This
> > > patch sets the refcount of the tail pages and fixes the KVM memory mapping
> > > faults.
> > > 
> > > Without this change guest virgl driver can't map host buffers into guest
> > > and can't provide OpenGL 4.5 profile support to the guest. The host
> > > mappings are also needed for enabling the Venus driver using host GPU
> > > drivers that are utilizing TTM.
> > > 
> > > Based on a patch proposed by Trigger Huang.
> > 
> > Well I can't count how often I have repeated this: This is an absolutely
> > clear NAK!
> > 
> > TTM pages are not reference counted in the first place and because of this
> > giving them to virgl is illegal.
> > 
> > Please immediately stop this completely broken approach. We have discussed
> > this multiple times now.
> 
> Yeah we need to get this stuff closed for real by tagging them all with
> VM_IO or VM_PFNMAP asap.

For a bit more context: Anything mapping a bo should be VM_SPECIAL. And I
think we should add the checks to the gem and dma-buf mmap functions to
validate for that, and fix all the fallout.

Otherwise this dragon keeps resurrecting ...

VM_SPECIAL _will_ block get_user_pages, which will block everyone from
even trying to refcount this stuff.

Minimally we need to fix this for all ttm drivers, and it sounds like
that's still not yet the case :-( Iirc last time around some funky amdkfd
userspace was the hold-up because regressions?
-Daniel

> 
> It seems ot be a recurring amount of fun that people try to mmap dma-buf
> and then call get_user_pages on them.
> 
> Which just doesn't work. I guess this is also why Rob Clark send out that
> dma-buf patch to expos mapping information (i.e. wc vs wb vs uncached).
> 
> There seems to be some serious bonghits going on :-/
> -Daniel
> 
> > 
> > Regards,
> > Christian.
> > 
> > > 
> > > Cc: stable@vger.kernel.org
> > > Cc: Trigger Huang <Trigger.Huang@gmail.com>
> > > Link: https://www.collabora.com/news-and-blog/blog/2021/11/26/venus-on-qemu-enabling-new-virtual-vulkan-driver/#qcom1343
> > > Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> # AMDGPU (Qemu and crosvm)
> > > Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
> > > ---
> > >   drivers/gpu/drm/ttm/ttm_pool.c | 25 ++++++++++++++++++++++++-
> > >   1 file changed, 24 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> > > index 21b61631f73a..11e92bb149c9 100644
> > > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > > @@ -81,6 +81,7 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
> > >   	unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
> > >   	struct ttm_pool_dma *dma;
> > >   	struct page *p;
> > > +	unsigned int i;
> > >   	void *vaddr;
> > >   	/* Don't set the __GFP_COMP flag for higher order allocations.
> > > @@ -93,8 +94,10 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
> > >   	if (!pool->use_dma_alloc) {
> > >   		p = alloc_pages(gfp_flags, order);
> > > -		if (p)
> > > +		if (p) {
> > >   			p->private = order;
> > > +			goto ref_tail_pages;
> > > +		}
> > >   		return p;
> > >   	}
> > > @@ -120,6 +123,23 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
> > >   	dma->vaddr = (unsigned long)vaddr | order;
> > >   	p->private = (unsigned long)dma;
> > > +
> > > +ref_tail_pages:
> > > +	/*
> > > +	 * KVM requires mapped tail pages to be refcounted because put_page()
> > > +	 * is invoked on them in the end of the page fault handling, and thus,
> > > +	 * tail pages need to be protected from the premature releasing.
> > > +	 * In fact, KVM page fault handler refuses to map tail pages to guest
> > > +	 * if they aren't refcounted because hva_to_pfn_remapped() checks the
> > > +	 * refcount specifically for this case.
> > > +	 *
> > > +	 * In particular, unreferenced tail pages result in a KVM "Bad address"
> > > +	 * failure for VMMs that use VirtIO-GPU when guest's Mesa VirGL driver
> > > +	 * accesses mapped host TTM buffer that contains tail pages.
> > > +	 */
> > > +	for (i = 1; i < 1 << order; i++)
> > > +		page_ref_inc(p + i);
> > > +
> > >   	return p;
> > >   error_free:
> > > @@ -133,6 +153,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
> > >   {
> > >   	unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
> > >   	struct ttm_pool_dma *dma;
> > > +	unsigned int i;
> > >   	void *vaddr;
> > >   #ifdef CONFIG_X86
> > > @@ -142,6 +163,8 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
> > >   	if (caching != ttm_cached && !PageHighMem(p))
> > >   		set_pages_wb(p, 1 << order);
> > >   #endif
> > > +	for (i = 1; i < 1 << order; i++)
> > > +		page_ref_dec(p + i);
> > >   	if (!pool || !pool->use_dma_alloc) {
> > >   		__free_pages(p, order);
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-09-06 20:05     ` Daniel Vetter
@ 2022-09-07  6:48       ` Christian König
  2023-01-11 17:13       ` Sean Christopherson
  1 sibling, 0 replies; 27+ messages in thread
From: Christian König @ 2022-09-07  6:48 UTC (permalink / raw)
  To: Dmitry Osipenko, David Airlie, Huang Rui, Trigger Huang,
	Gert Wollny, Antonio Caggiano, dri-devel, linux-kernel,
	Dmitry Osipenko, kvm, kernel, virtualization

Am 06.09.22 um 22:05 schrieb Daniel Vetter:
> On Tue, Sep 06, 2022 at 10:01:47PM +0200, Daniel Vetter wrote:
>> On Mon, Aug 15, 2022 at 12:05:19PM +0200, Christian König wrote:
>>> Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
>>>> Higher order pages allocated using alloc_pages() aren't refcounted and they
>>>> need to be refcounted, otherwise it's impossible to map them by KVM. This
>>>> patch sets the refcount of the tail pages and fixes the KVM memory mapping
>>>> faults.
>>>>
>>>> Without this change guest virgl driver can't map host buffers into guest
>>>> and can't provide OpenGL 4.5 profile support to the guest. The host
>>>> mappings are also needed for enabling the Venus driver using host GPU
>>>> drivers that are utilizing TTM.
>>>>
>>>> Based on a patch proposed by Trigger Huang.
>>> Well I can't count how often I have repeated this: This is an absolutely
>>> clear NAK!
>>>
>>> TTM pages are not reference counted in the first place and because of this
>>> giving them to virgl is illegal.
>>>
>>> Please immediately stop this completely broken approach. We have discussed
>>> this multiple times now.
>> Yeah we need to get this stuff closed for real by tagging them all with
>> VM_IO or VM_PFNMAP asap.
> For a bit more context: Anything mapping a bo should be VM_SPECIAL. And I
> think we should add the checks to the gem and dma-buf mmap functions to
> validate for that, and fix all the fallout.
>
> Otherwise this dragon keeps resurrecting ...
>
> VM_SPECIAL _will_ block get_user_pages, which will block everyone from
> even trying to refcount this stuff.
>
> Minimally we need to fix this for all ttm drivers, and it sounds like
> that's still not yet the case :-( Iirc last time around some funky amdkfd
> userspace was the hold-up because regressions?

My recollection is that Felix and I fixed this with a KFD specific 
workaround. But I can double check with Felix on Monday.

Christian.

> -Daniel
>
>> It seems ot be a recurring amount of fun that people try to mmap dma-buf
>> and then call get_user_pages on them.
>>
>> Which just doesn't work. I guess this is also why Rob Clark send out that
>> dma-buf patch to expos mapping information (i.e. wc vs wb vs uncached).
>>
>> There seems to be some serious bonghits going on :-/
>> -Daniel
>>
>>> Regards,
>>> Christian.
>>>
>>>> Cc: stable@vger.kernel.org
>>>> Cc: Trigger Huang <Trigger.Huang@gmail.com>
>>>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.collabora.com%2Fnews-and-blog%2Fblog%2F2021%2F11%2F26%2Fvenus-on-qemu-enabling-new-virtual-vulkan-driver%2F%23qcom1343&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C37a7d9b0f91249da415b08da90432d3a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637980915471280078%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=XN6wFiWc6Jljekmst0aOCPSTsFLlmkUjD9F%2Fl9nluAs%3D&amp;reserved=0
>>>> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> # AMDGPU (Qemu and crosvm)
>>>> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
>>>> ---
>>>>    drivers/gpu/drm/ttm/ttm_pool.c | 25 ++++++++++++++++++++++++-
>>>>    1 file changed, 24 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
>>>> index 21b61631f73a..11e92bb149c9 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_pool.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
>>>> @@ -81,6 +81,7 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
>>>>    	unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
>>>>    	struct ttm_pool_dma *dma;
>>>>    	struct page *p;
>>>> +	unsigned int i;
>>>>    	void *vaddr;
>>>>    	/* Don't set the __GFP_COMP flag for higher order allocations.
>>>> @@ -93,8 +94,10 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
>>>>    	if (!pool->use_dma_alloc) {
>>>>    		p = alloc_pages(gfp_flags, order);
>>>> -		if (p)
>>>> +		if (p) {
>>>>    			p->private = order;
>>>> +			goto ref_tail_pages;
>>>> +		}
>>>>    		return p;
>>>>    	}
>>>> @@ -120,6 +123,23 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
>>>>    	dma->vaddr = (unsigned long)vaddr | order;
>>>>    	p->private = (unsigned long)dma;
>>>> +
>>>> +ref_tail_pages:
>>>> +	/*
>>>> +	 * KVM requires mapped tail pages to be refcounted because put_page()
>>>> +	 * is invoked on them in the end of the page fault handling, and thus,
>>>> +	 * tail pages need to be protected from the premature releasing.
>>>> +	 * In fact, KVM page fault handler refuses to map tail pages to guest
>>>> +	 * if they aren't refcounted because hva_to_pfn_remapped() checks the
>>>> +	 * refcount specifically for this case.
>>>> +	 *
>>>> +	 * In particular, unreferenced tail pages result in a KVM "Bad address"
>>>> +	 * failure for VMMs that use VirtIO-GPU when guest's Mesa VirGL driver
>>>> +	 * accesses mapped host TTM buffer that contains tail pages.
>>>> +	 */
>>>> +	for (i = 1; i < 1 << order; i++)
>>>> +		page_ref_inc(p + i);
>>>> +
>>>>    	return p;
>>>>    error_free:
>>>> @@ -133,6 +153,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
>>>>    {
>>>>    	unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
>>>>    	struct ttm_pool_dma *dma;
>>>> +	unsigned int i;
>>>>    	void *vaddr;
>>>>    #ifdef CONFIG_X86
>>>> @@ -142,6 +163,8 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
>>>>    	if (caching != ttm_cached && !PageHighMem(p))
>>>>    		set_pages_wb(p, 1 << order);
>>>>    #endif
>>>> +	for (i = 1; i < 1 << order; i++)
>>>> +		page_ref_dec(p + i);
>>>>    	if (!pool || !pool->use_dma_alloc) {
>>>>    		__free_pages(p, order);
>> -- 
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C37a7d9b0f91249da415b08da90432d3a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637980915471280078%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=bGZ1OAxL%2Fd99Nqu49soWZVqvvUKjuD6n6BKkAhMv4fs%3D&amp;reserved=0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-09-06 20:01   ` Daniel Vetter
  2022-09-06 20:05     ` Daniel Vetter
@ 2022-09-08 11:04     ` Rob Clark
  1 sibling, 0 replies; 27+ messages in thread
From: Rob Clark @ 2022-09-08 11:04 UTC (permalink / raw)
  To: Christian König, Dmitry Osipenko, David Airlie, Huang Rui,
	Trigger Huang, Gert Wollny, Antonio Caggiano, dri-devel,
	linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

On Tue, Sep 6, 2022 at 1:01 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Mon, Aug 15, 2022 at 12:05:19PM +0200, Christian König wrote:
> > Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
> > > Higher order pages allocated using alloc_pages() aren't refcounted and they
> > > need to be refcounted, otherwise it's impossible to map them by KVM. This
> > > patch sets the refcount of the tail pages and fixes the KVM memory mapping
> > > faults.
> > >
> > > Without this change guest virgl driver can't map host buffers into guest
> > > and can't provide OpenGL 4.5 profile support to the guest. The host
> > > mappings are also needed for enabling the Venus driver using host GPU
> > > drivers that are utilizing TTM.
> > >
> > > Based on a patch proposed by Trigger Huang.
> >
> > Well I can't count how often I have repeated this: This is an absolutely
> > clear NAK!
> >
> > TTM pages are not reference counted in the first place and because of this
> > giving them to virgl is illegal.
> >
> > Please immediately stop this completely broken approach. We have discussed
> > this multiple times now.
>
> Yeah we need to get this stuff closed for real by tagging them all with
> VM_IO or VM_PFNMAP asap.
>
> It seems ot be a recurring amount of fun that people try to mmap dma-buf
> and then call get_user_pages on them.
>
> Which just doesn't work. I guess this is also why Rob Clark send out that
> dma-buf patch to expos mapping information (i.e. wc vs wb vs uncached).

No, not really.. my patch was simply so that the VMM side of virtgpu
could send the correct cache mode to the guest when handed a dma-buf
;-)

BR,
-R

>
> There seems to be some serious bonghits going on :-/
> -Daniel
>
> >
> > Regards,
> > Christian.
> >
> > >
> > > Cc: stable@vger.kernel.org
> > > Cc: Trigger Huang <Trigger.Huang@gmail.com>
> > > Link: https://www.collabora.com/news-and-blog/blog/2021/11/26/venus-on-qemu-enabling-new-virtual-vulkan-driver/#qcom1343
> > > Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> # AMDGPU (Qemu and crosvm)
> > > Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
> > > ---
> > >   drivers/gpu/drm/ttm/ttm_pool.c | 25 ++++++++++++++++++++++++-
> > >   1 file changed, 24 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> > > index 21b61631f73a..11e92bb149c9 100644
> > > --- a/drivers/gpu/drm/ttm/ttm_pool.c
> > > +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> > > @@ -81,6 +81,7 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
> > >     unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
> > >     struct ttm_pool_dma *dma;
> > >     struct page *p;
> > > +   unsigned int i;
> > >     void *vaddr;
> > >     /* Don't set the __GFP_COMP flag for higher order allocations.
> > > @@ -93,8 +94,10 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
> > >     if (!pool->use_dma_alloc) {
> > >             p = alloc_pages(gfp_flags, order);
> > > -           if (p)
> > > +           if (p) {
> > >                     p->private = order;
> > > +                   goto ref_tail_pages;
> > > +           }
> > >             return p;
> > >     }
> > > @@ -120,6 +123,23 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
> > >     dma->vaddr = (unsigned long)vaddr | order;
> > >     p->private = (unsigned long)dma;
> > > +
> > > +ref_tail_pages:
> > > +   /*
> > > +    * KVM requires mapped tail pages to be refcounted because put_page()
> > > +    * is invoked on them in the end of the page fault handling, and thus,
> > > +    * tail pages need to be protected from the premature releasing.
> > > +    * In fact, KVM page fault handler refuses to map tail pages to guest
> > > +    * if they aren't refcounted because hva_to_pfn_remapped() checks the
> > > +    * refcount specifically for this case.
> > > +    *
> > > +    * In particular, unreferenced tail pages result in a KVM "Bad address"
> > > +    * failure for VMMs that use VirtIO-GPU when guest's Mesa VirGL driver
> > > +    * accesses mapped host TTM buffer that contains tail pages.
> > > +    */
> > > +   for (i = 1; i < 1 << order; i++)
> > > +           page_ref_inc(p + i);
> > > +
> > >     return p;
> > >   error_free:
> > > @@ -133,6 +153,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
> > >   {
> > >     unsigned long attr = DMA_ATTR_FORCE_CONTIGUOUS;
> > >     struct ttm_pool_dma *dma;
> > > +   unsigned int i;
> > >     void *vaddr;
> > >   #ifdef CONFIG_X86
> > > @@ -142,6 +163,8 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
> > >     if (caching != ttm_cached && !PageHighMem(p))
> > >             set_pages_wb(p, 1 << order);
> > >   #endif
> > > +   for (i = 1; i < 1 << order; i++)
> > > +           page_ref_dec(p + i);
> > >     if (!pool || !pool->use_dma_alloc) {
> > >             __free_pages(p, order);
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-08-18  9:41                                   ` Christian König
@ 2023-01-11 17:05                                     ` Sean Christopherson
  2023-01-11 21:24                                       ` Dmitry Osipenko
  0 siblings, 1 reply; 27+ messages in thread
From: Sean Christopherson @ 2023-01-11 17:05 UTC (permalink / raw)
  To: Christian König
  Cc: Dmitry Osipenko, David Airlie, Huang Rui, Daniel Vetter,
	Trigger Huang, Gert Wollny, Antonio Caggiano, Paolo Bonzini,
	dri-devel, linux-kernel, Dmitry Osipenko, kvm, kernel,
	virtualization

On Thu, Aug 18, 2022, Christian König wrote:
> Am 18.08.22 um 01:13 schrieb Dmitry Osipenko:
> > On 8/18/22 01:57, Dmitry Osipenko wrote:
> > > On 8/15/22 18:54, Dmitry Osipenko wrote:
> > > > On 8/15/22 17:57, Dmitry Osipenko wrote:
> > > > > On 8/15/22 16:53, Christian König wrote:
> > > > > > Am 15.08.22 um 15:45 schrieb Dmitry Osipenko:
> > > > > > > [SNIP]
> > > > > > > > Well that comment sounds like KVM is doing the right thing, so I'm
> > > > > > > > wondering what exactly is going on here.
> > > > > > > KVM actually doesn't hold the page reference, it takes the temporal
> > > > > > > reference during page fault and then drops the reference once page is
> > > > > > > mapped, IIUC. Is it still illegal for TTM? Or there is a possibility for
> > > > > > > a race condition here?
> > > > > > > 
> > > > > > Well the question is why does KVM grab the page reference in the first
> > > > > > place?
> > > > > > 
> > > > > > If that is to prevent the mapping from changing then yes that's illegal
> > > > > > and won't work. It can always happen that you grab the address, solve
> > > > > > the fault and then immediately fault again because the address you just
> > > > > > grabbed is invalidated.
> > > > > > 
> > > > > > If it's for some other reason than we should probably investigate if we
> > > > > > shouldn't stop doing this.

...

> > > > If we need to bump the refcount only for VM_MIXEDMAP and not for
> > > > VM_PFNMAP, then perhaps we could add a flag for that to the kvm_main
> > > > code that will denote to kvm_release_page_clean whether it needs to put
> > > > the page?
> > > The other variant that kind of works is to mark TTM pages reserved using
> > > SetPageReserved/ClearPageReserved, telling KVM not to mess with the page
> > > struct. But the potential consequences of doing this are unclear to me.
> > > 
> > > Christian, do you think we can do it?
> > Although, no. It also doesn't work with KVM without additional changes
> > to KVM.
> 
> Well my fundamental problem is that I can't fit together why KVM is grabing
> a page reference in the first place.

It's to workaround a deficiency in KVM.

> See the idea of the page reference is that you have one reference is that
> you count the reference so that the memory is not reused while you access
> it, e.g. for I/O or mapping it into different address spaces etc...
> 
> But none of those use cases seem to apply to KVM. If I'm not totally
> mistaken in KVM you want to make sure that the address space mapping, e.g.
> the translation between virtual and physical address, don't change while you
> handle it, but grabbing a page reference is the completely wrong approach
> for that.

TL;DR: 100% agree, and we're working on fixing this in KVM, but were still months
away from a full solution.

Yep.  KVM uses mmu_notifiers to react to mapping changes, with a few caveats that
we are (slowly) fixing, though those caveats are only tangentially related.

The deficiency in KVM is that KVM's internal APIs to translate a virtual address
to a physical address spit out only the resulting host PFN.  The details of _how_
that PFN was acquired are not captured.  Specifically, KVM loses track of whether
or not a PFN was acquired via gup() or follow_pte() (KVM is very permissive when
it comes to backing guest memory).

Because gup() gifts the caller a reference, that means KVM also loses track of
whether or not KVM holds a page refcount.  To avoid pinning guest memory, KVM does
quickly put the reference gifted by gup(), but because KVM doesn't _know_ if it
holds a reference, KVM uses a heuristic, which is essentially "is the PFN associated
with a 'normal' struct page?".

   /*
    * Returns a 'struct page' if the pfn is "valid" and backed by a refcounted
    * page, NULL otherwise.  Note, the list of refcounted PG_reserved page types
    * is likely incomplete, it has been compiled purely through people wanting to
    * back guest with a certain type of memory and encountering issues.
    */
   struct page *kvm_pfn_to_refcounted_page(kvm_pfn_t pfn)

That heuristic also triggers if follow_pte() resolves to a PFN that is associated
with a "struct page", and so to avoid putting a reference it doesn't own, KVM does
the silly thing of manually getting a reference immediately after follow_pte().

And that in turn gets tripped up non-refcounted tail pages because KVM sees a
normal, valid "struct page" and assumes it's refcounted.  To fudge around that
issue, KVM requires "struct page" memory to be refcounted.

The long-term solution is to refactor KVM to precisely track whether or not KVM
holds a reference.  Patches have been prosposed to do exactly that[1], but they
were put on hold due to the aforementioned caveats with mmu_notifiers.  The
caveats are that most flows where KVM plumbs a physical address into hardware
structures aren't wired up to KVM's mmu_notifier.

KVM could support non-refcounted struct page memory without first fixing the
mmu_notifier issues, but I was (and still am) concerned that that would create an
even larger hole in KVM until the mmu_notifier issues are sorted out[2].
 
[1] https://lore.kernel.org/all/20211129034317.2964790-1-stevensd@google.com
[2] https://lore.kernel.org/all/Ydhq5aHW+JFo15UF@google.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2022-09-06 20:05     ` Daniel Vetter
  2022-09-07  6:48       ` Christian König
@ 2023-01-11 17:13       ` Sean Christopherson
  1 sibling, 0 replies; 27+ messages in thread
From: Sean Christopherson @ 2023-01-11 17:13 UTC (permalink / raw)
  To: Christian König, Dmitry Osipenko, David Airlie, Huang Rui,
	Trigger Huang, Gert Wollny, Antonio Caggiano, dri-devel,
	linux-kernel, Dmitry Osipenko, kvm, kernel, virtualization

On Tue, Sep 06, 2022, Daniel Vetter wrote:
> On Tue, Sep 06, 2022 at 10:01:47PM +0200, Daniel Vetter wrote:
> > On Mon, Aug 15, 2022 at 12:05:19PM +0200, Christian König wrote:
> > > Am 15.08.22 um 11:54 schrieb Dmitry Osipenko:
> > > > Higher order pages allocated using alloc_pages() aren't refcounted and they
> > > > need to be refcounted, otherwise it's impossible to map them by KVM. This
> > > > patch sets the refcount of the tail pages and fixes the KVM memory mapping
> > > > faults.
> > > > 
> > > > Without this change guest virgl driver can't map host buffers into guest
> > > > and can't provide OpenGL 4.5 profile support to the guest. The host
> > > > mappings are also needed for enabling the Venus driver using host GPU
> > > > drivers that are utilizing TTM.
> > > > 
> > > > Based on a patch proposed by Trigger Huang.
> > > 
> > > Well I can't count how often I have repeated this: This is an absolutely
> > > clear NAK!
> > > 
> > > TTM pages are not reference counted in the first place and because of this
> > > giving them to virgl is illegal.
> > > 
> > > Please immediately stop this completely broken approach. We have discussed
> > > this multiple times now.
> > 
> > Yeah we need to get this stuff closed for real by tagging them all with
> > VM_IO or VM_PFNMAP asap.
> 
> For a bit more context: Anything mapping a bo should be VM_SPECIAL. And I
> think we should add the checks to the gem and dma-buf mmap functions to
> validate for that, and fix all the fallout.
> 
> Otherwise this dragon keeps resurrecting ...
> 
> VM_SPECIAL _will_ block get_user_pages, which will block everyone from
> even trying to refcount this stuff.

FWIW, IIUC that won't change the KVM story.  KVM acquires the PFN for these pages
via follow_pte(), not by gup().  Details are in a different strand of this thread[*].

If TTM pages aren't tied into mmu_notifiers, then I believe the only solution is
to not allow them to be mapped into user page tables.  If they are tied into
mmu_notifiers, then this is fully a KVM limitation that we are (slowly) resolving.

[*] https://lore.kernel.org/all/Y77sQZI0IfFVx7Jo@google.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v1] drm/ttm: Refcount allocated tail pages
  2023-01-11 17:05                                     ` Sean Christopherson
@ 2023-01-11 21:24                                       ` Dmitry Osipenko
  0 siblings, 0 replies; 27+ messages in thread
From: Dmitry Osipenko @ 2023-01-11 21:24 UTC (permalink / raw)
  To: Sean Christopherson, Christian König
  Cc: David Airlie, Huang Rui, Daniel Vetter, Trigger Huang,
	Gert Wollny, Paolo Bonzini, dri-devel, linux-kernel,
	Dmitry Osipenko, kvm, kernel, virtualization, Bob Beckett

Hello Sean,

On 1/11/23 20:05, Sean Christopherson wrote:
> On Thu, Aug 18, 2022, Christian König wrote:
>> Am 18.08.22 um 01:13 schrieb Dmitry Osipenko:
>>> On 8/18/22 01:57, Dmitry Osipenko wrote:
>>>> On 8/15/22 18:54, Dmitry Osipenko wrote:
>>>>> On 8/15/22 17:57, Dmitry Osipenko wrote:
>>>>>> On 8/15/22 16:53, Christian König wrote:
>>>>>>> Am 15.08.22 um 15:45 schrieb Dmitry Osipenko:
>>>>>>>> [SNIP]
>>>>>>>>> Well that comment sounds like KVM is doing the right thing, so I'm
>>>>>>>>> wondering what exactly is going on here.
>>>>>>>> KVM actually doesn't hold the page reference, it takes the temporal
>>>>>>>> reference during page fault and then drops the reference once page is
>>>>>>>> mapped, IIUC. Is it still illegal for TTM? Or there is a possibility for
>>>>>>>> a race condition here?
>>>>>>>>
>>>>>>> Well the question is why does KVM grab the page reference in the first
>>>>>>> place?
>>>>>>>
>>>>>>> If that is to prevent the mapping from changing then yes that's illegal
>>>>>>> and won't work. It can always happen that you grab the address, solve
>>>>>>> the fault and then immediately fault again because the address you just
>>>>>>> grabbed is invalidated.
>>>>>>>
>>>>>>> If it's for some other reason than we should probably investigate if we
>>>>>>> shouldn't stop doing this.
> 
> ...
> 
>>>>> If we need to bump the refcount only for VM_MIXEDMAP and not for
>>>>> VM_PFNMAP, then perhaps we could add a flag for that to the kvm_main
>>>>> code that will denote to kvm_release_page_clean whether it needs to put
>>>>> the page?
>>>> The other variant that kind of works is to mark TTM pages reserved using
>>>> SetPageReserved/ClearPageReserved, telling KVM not to mess with the page
>>>> struct. But the potential consequences of doing this are unclear to me.
>>>>
>>>> Christian, do you think we can do it?
>>> Although, no. It also doesn't work with KVM without additional changes
>>> to KVM.
>>
>> Well my fundamental problem is that I can't fit together why KVM is grabing
>> a page reference in the first place.
> 
> It's to workaround a deficiency in KVM.
> 
>> See the idea of the page reference is that you have one reference is that
>> you count the reference so that the memory is not reused while you access
>> it, e.g. for I/O or mapping it into different address spaces etc...
>>
>> But none of those use cases seem to apply to KVM. If I'm not totally
>> mistaken in KVM you want to make sure that the address space mapping, e.g.
>> the translation between virtual and physical address, don't change while you
>> handle it, but grabbing a page reference is the completely wrong approach
>> for that.
> 
> TL;DR: 100% agree, and we're working on fixing this in KVM, but were still months
> away from a full solution.
> 
> Yep.  KVM uses mmu_notifiers to react to mapping changes, with a few caveats that
> we are (slowly) fixing, though those caveats are only tangentially related.
> 
> The deficiency in KVM is that KVM's internal APIs to translate a virtual address
> to a physical address spit out only the resulting host PFN.  The details of _how_
> that PFN was acquired are not captured.  Specifically, KVM loses track of whether
> or not a PFN was acquired via gup() or follow_pte() (KVM is very permissive when
> it comes to backing guest memory).
> 
> Because gup() gifts the caller a reference, that means KVM also loses track of
> whether or not KVM holds a page refcount.  To avoid pinning guest memory, KVM does
> quickly put the reference gifted by gup(), but because KVM doesn't _know_ if it
> holds a reference, KVM uses a heuristic, which is essentially "is the PFN associated
> with a 'normal' struct page?".
> 
>    /*
>     * Returns a 'struct page' if the pfn is "valid" and backed by a refcounted
>     * page, NULL otherwise.  Note, the list of refcounted PG_reserved page types
>     * is likely incomplete, it has been compiled purely through people wanting to
>     * back guest with a certain type of memory and encountering issues.
>     */
>    struct page *kvm_pfn_to_refcounted_page(kvm_pfn_t pfn)
> 
> That heuristic also triggers if follow_pte() resolves to a PFN that is associated
> with a "struct page", and so to avoid putting a reference it doesn't own, KVM does
> the silly thing of manually getting a reference immediately after follow_pte().
> 
> And that in turn gets tripped up non-refcounted tail pages because KVM sees a
> normal, valid "struct page" and assumes it's refcounted.  To fudge around that
> issue, KVM requires "struct page" memory to be refcounted.
> 
> The long-term solution is to refactor KVM to precisely track whether or not KVM
> holds a reference.  Patches have been prosposed to do exactly that[1], but they
> were put on hold due to the aforementioned caveats with mmu_notifiers.  The
> caveats are that most flows where KVM plumbs a physical address into hardware
> structures aren't wired up to KVM's mmu_notifier.
> 
> KVM could support non-refcounted struct page memory without first fixing the
> mmu_notifier issues, but I was (and still am) concerned that that would create an
> even larger hole in KVM until the mmu_notifier issues are sorted out[2].
>  
> [1] https://lore.kernel.org/all/20211129034317.2964790-1-stevensd@google.com
> [2] https://lore.kernel.org/all/Ydhq5aHW+JFo15UF@google.com

Thanks for the summary! Indeed, it's the KVM side that needs to be
patched. Couple months ago I found that a non-TTM i915 driver also
suffers from the same problem because it uses huge pages that we want
map to a guest. So we definitely will need to fix the KVM side.

-- 
Best regards,
Dmitry


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2023-01-11 21:24 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-15  9:54 [PATCH v1] drm/ttm: Refcount allocated tail pages Dmitry Osipenko
2022-08-15 10:05 ` Christian König
2022-08-15 10:09   ` Dmitry Osipenko
2022-08-15 10:11     ` Christian König
2022-08-15 10:14       ` Christian König
2022-08-15 10:18         ` Dmitry Osipenko
2022-08-15 10:42           ` Christian König
2022-08-15 10:47           ` Dmitry Osipenko
2022-08-15 10:51             ` Christian König
2022-08-15 11:19               ` Dmitry Osipenko
2022-08-15 11:28                 ` Christian König
2022-08-15 11:50                   ` Dmitry Osipenko
2022-08-15 13:06                     ` Christian König
2022-08-15 13:45                       ` Dmitry Osipenko
2022-08-15 13:53                         ` Christian König
2022-08-15 14:57                           ` Dmitry Osipenko
2022-08-15 15:54                             ` Dmitry Osipenko
2022-08-17 22:57                               ` Dmitry Osipenko
2022-08-17 23:13                                 ` Dmitry Osipenko
2022-08-18  9:41                                   ` Christian König
2023-01-11 17:05                                     ` Sean Christopherson
2023-01-11 21:24                                       ` Dmitry Osipenko
2022-09-06 20:01   ` Daniel Vetter
2022-09-06 20:05     ` Daniel Vetter
2022-09-07  6:48       ` Christian König
2023-01-11 17:13       ` Sean Christopherson
2022-09-08 11:04     ` Rob Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).