All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: "Christian König" <christian.koenig@amd.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"David Airlie" <airlied@linux.ie>,
	"Daniel Vetter" <daniel.vetter@ffwll.ch>,
	"Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
	"DRI Development" <dri-devel@lists.freedesktop.org>,
	"Daniel Vetter" <daniel.vetter@intel.com>
Subject: Re: [PATCH v4 3/4] drm/shmem-helpers: Allocate wc pages on x86
Date: Wed, 14 Jul 2021 18:16:59 +0200	[thread overview]
Message-ID: <YO8N+/1S+HvwxfWM@phenom.ffwll.local> (raw)
In-Reply-To: <31608068-97ba-70e7-b496-650bac88e0db@amd.com>

On Wed, Jul 14, 2021 at 02:58:26PM +0200, Christian König wrote:
> Am 14.07.21 um 14:48 schrieb Daniel Vetter:
> > On Wed, Jul 14, 2021 at 01:54:50PM +0200, Christian König wrote:
> > > Am 13.07.21 um 22:51 schrieb Daniel Vetter:
> > > > intel-gfx-ci realized that something is not quite coherent anymore on
> > > > some platforms for our i915+vgem tests, when I tried to switch vgem
> > > > over to shmem helpers.
> > > > 
> > > > After lots of head-scratching I realized that I've removed calls to
> > > > drm_clflush. And we need those. To make this a bit cleaner use the
> > > > same page allocation tooling as ttm, which does internally clflush
> > > > (and more, as neeeded on any platform instead of just the intel x86
> > > > cpus i915 can be combined with).
> > > > 
> > > > Unfortunately this doesn't exist on arm, or as a generic feature. For
> > > > that I think only the dma-api can get at wc memory reliably, so maybe
> > > > we'd need some kind of GFP_WC flag to do this properly.
> > > The problem is that this stuff is extremely architecture specific. So GFP_WC
> > > and GFP_UNCACHED are really what we should aim for in the long term.
> > > 
> > > And as far as I know we have at least the following possibilities how it is
> > > implemented:
> > > 
> > > * A fixed amount of registers which tells the CPU the caching behavior for a
> > > memory region, e.g. MTRR.
> > > * Some bits of the memory pointers used, e.g. you see the same memory at
> > > different locations with different caching attributes.
> > > * Some bits in the CPUs page table.
> > > * Some bits in a separate page table.
> > > 
> > > On top of that there is the PCIe specification which defines non-cache
> > > snooping access as an extension.
> > Yeah dma-buf is extremely ill-defined even on x86 if you combine these
> > all. We just play a game of whack-a-mole with the cacheline dirt until
> > it's gone.
> > 
> > That's the other piece here, how do you even make sure that the page is
> > properly flushed and ready for wc access:
> > - easy case is x86 with clflush available pretty much everywhere (since
> >    10+ years at least)
> > - next are cpus which have some cache flush instructions, but it's highly
> >    cpu model specific
> > - next up is the same, but you absolutely have to make sure there's no
> >    other mapping around anymore or the coherency fabric just dies
> > - and I'm pretty sure there's worse stuff where you defacto can only
> >    allocate wc memory that's set aside at boot-up and that's all you ever
> >    get.
> 
> Well long story short you don't make sure that the page is flushed at all.
> 
> What you do is to allocate the page as WC in the first place, if you fail to
> do this you can't use it.

Oh sure, but even when you allocate as wc you need to make sure the page
you have is actually wc coherent from the start. I'm chasing some fun
trying to convert vgem over to shmem helpers right now (i.e. this patch
series), and if you don't start out with flushed pages some of the vgem +
i915 igts just fail on the less coherent igpu platforms we have.

And if you look into what set_pages_wc actually does, then you spot the
clflush somewhere deep down (aside from all the other things it does).

On some ARM platforms that's just not possible, and you have to do a
carveout that you never even map as wb (so needs to be excluded from the
kernel map too and treated as highmem). There's some really bonkers stuff
here.

> The whole idea TTM try to sell until a while ago that you can actually
> change that on the fly only works on x86 and even there only very very
> limited.

Yeah that's clear, this is why we're locking down the i915 gem uapi a lot
for dgpu. All the tricks are out the window.
-Daniel


> 
> Cheers,
> Christian.
> 
> > 
> > Cheers, Daniel
> > 
> > > Mixing that with the CPU caching behavior gets you some really nice ways to
> > > break a driver. In general x86 seems to be rather graceful, but arm and
> > > PowerPC are easily pissed if you mess that up.
> > > 
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > Cc: Christian König <christian.koenig@amd.com>
> > > > Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
> > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > > Cc: Maxime Ripard <mripard@kernel.org>
> > > > Cc: Thomas Zimmermann <tzimmermann@suse.de>
> > > > Cc: David Airlie <airlied@linux.ie>
> > > > Cc: Daniel Vetter <daniel@ffwll.ch>
> > > Acked-by: Christian könig <christian.koenig@amd.com>
> > > 
> > > Regards,
> > > Christian.
> > > 
> > > > ---
> > > >    drivers/gpu/drm/drm_gem_shmem_helper.c | 14 ++++++++++++++
> > > >    1 file changed, 14 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
> > > > index 296ab1b7c07f..657d2490aaa5 100644
> > > > --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
> > > > +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
> > > > @@ -10,6 +10,10 @@
> > > >    #include <linux/slab.h>
> > > >    #include <linux/vmalloc.h>
> > > > +#ifdef CONFIG_X86
> > > > +#include <asm/set_memory.h>
> > > > +#endif
> > > > +
> > > >    #include <drm/drm.h>
> > > >    #include <drm/drm_device.h>
> > > >    #include <drm/drm_drv.h>
> > > > @@ -162,6 +166,11 @@ static int drm_gem_shmem_get_pages_locked(struct drm_gem_shmem_object *shmem)
> > > >    		return PTR_ERR(pages);
> > > >    	}
> > > > +#ifdef CONFIG_X86
> > > > +	if (shmem->map_wc)
> > > > +		set_pages_array_wc(pages, obj->size >> PAGE_SHIFT);
> > > > +#endif
> > > > +
> > > >    	shmem->pages = pages;
> > > >    	return 0;
> > > > @@ -203,6 +212,11 @@ static void drm_gem_shmem_put_pages_locked(struct drm_gem_shmem_object *shmem)
> > > >    	if (--shmem->pages_use_count > 0)
> > > >    		return;
> > > > +#ifdef CONFIG_X86
> > > > +	if (shmem->map_wc)
> > > > +		set_pages_array_wb(shmem->pages, obj->size >> PAGE_SHIFT);
> > > > +#endif
> > > > +
> > > >    	drm_gem_put_pages(obj, shmem->pages,
> > > >    			  shmem->pages_mark_dirty_on_put,
> > > >    			  shmem->pages_mark_accessed_on_put);
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch>
To: "Christian König" <christian.koenig@amd.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Thomas Zimmermann" <tzimmermann@suse.de>,
	"David Airlie" <airlied@linux.ie>,
	"Daniel Vetter" <daniel.vetter@ffwll.ch>,
	"Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
	"DRI Development" <dri-devel@lists.freedesktop.org>,
	"Maxime Ripard" <mripard@kernel.org>,
	"Daniel Vetter" <daniel.vetter@intel.com>
Subject: Re: [Intel-gfx] [PATCH v4 3/4] drm/shmem-helpers: Allocate wc pages on x86
Date: Wed, 14 Jul 2021 18:16:59 +0200	[thread overview]
Message-ID: <YO8N+/1S+HvwxfWM@phenom.ffwll.local> (raw)
In-Reply-To: <31608068-97ba-70e7-b496-650bac88e0db@amd.com>

On Wed, Jul 14, 2021 at 02:58:26PM +0200, Christian König wrote:
> Am 14.07.21 um 14:48 schrieb Daniel Vetter:
> > On Wed, Jul 14, 2021 at 01:54:50PM +0200, Christian König wrote:
> > > Am 13.07.21 um 22:51 schrieb Daniel Vetter:
> > > > intel-gfx-ci realized that something is not quite coherent anymore on
> > > > some platforms for our i915+vgem tests, when I tried to switch vgem
> > > > over to shmem helpers.
> > > > 
> > > > After lots of head-scratching I realized that I've removed calls to
> > > > drm_clflush. And we need those. To make this a bit cleaner use the
> > > > same page allocation tooling as ttm, which does internally clflush
> > > > (and more, as neeeded on any platform instead of just the intel x86
> > > > cpus i915 can be combined with).
> > > > 
> > > > Unfortunately this doesn't exist on arm, or as a generic feature. For
> > > > that I think only the dma-api can get at wc memory reliably, so maybe
> > > > we'd need some kind of GFP_WC flag to do this properly.
> > > The problem is that this stuff is extremely architecture specific. So GFP_WC
> > > and GFP_UNCACHED are really what we should aim for in the long term.
> > > 
> > > And as far as I know we have at least the following possibilities how it is
> > > implemented:
> > > 
> > > * A fixed amount of registers which tells the CPU the caching behavior for a
> > > memory region, e.g. MTRR.
> > > * Some bits of the memory pointers used, e.g. you see the same memory at
> > > different locations with different caching attributes.
> > > * Some bits in the CPUs page table.
> > > * Some bits in a separate page table.
> > > 
> > > On top of that there is the PCIe specification which defines non-cache
> > > snooping access as an extension.
> > Yeah dma-buf is extremely ill-defined even on x86 if you combine these
> > all. We just play a game of whack-a-mole with the cacheline dirt until
> > it's gone.
> > 
> > That's the other piece here, how do you even make sure that the page is
> > properly flushed and ready for wc access:
> > - easy case is x86 with clflush available pretty much everywhere (since
> >    10+ years at least)
> > - next are cpus which have some cache flush instructions, but it's highly
> >    cpu model specific
> > - next up is the same, but you absolutely have to make sure there's no
> >    other mapping around anymore or the coherency fabric just dies
> > - and I'm pretty sure there's worse stuff where you defacto can only
> >    allocate wc memory that's set aside at boot-up and that's all you ever
> >    get.
> 
> Well long story short you don't make sure that the page is flushed at all.
> 
> What you do is to allocate the page as WC in the first place, if you fail to
> do this you can't use it.

Oh sure, but even when you allocate as wc you need to make sure the page
you have is actually wc coherent from the start. I'm chasing some fun
trying to convert vgem over to shmem helpers right now (i.e. this patch
series), and if you don't start out with flushed pages some of the vgem +
i915 igts just fail on the less coherent igpu platforms we have.

And if you look into what set_pages_wc actually does, then you spot the
clflush somewhere deep down (aside from all the other things it does).

On some ARM platforms that's just not possible, and you have to do a
carveout that you never even map as wb (so needs to be excluded from the
kernel map too and treated as highmem). There's some really bonkers stuff
here.

> The whole idea TTM try to sell until a while ago that you can actually
> change that on the fly only works on x86 and even there only very very
> limited.

Yeah that's clear, this is why we're locking down the i915 gem uapi a lot
for dgpu. All the tricks are out the window.
-Daniel


> 
> Cheers,
> Christian.
> 
> > 
> > Cheers, Daniel
> > 
> > > Mixing that with the CPU caching behavior gets you some really nice ways to
> > > break a driver. In general x86 seems to be rather graceful, but arm and
> > > PowerPC are easily pissed if you mess that up.
> > > 
> > > > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > > > Cc: Christian König <christian.koenig@amd.com>
> > > > Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
> > > > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > > > Cc: Maxime Ripard <mripard@kernel.org>
> > > > Cc: Thomas Zimmermann <tzimmermann@suse.de>
> > > > Cc: David Airlie <airlied@linux.ie>
> > > > Cc: Daniel Vetter <daniel@ffwll.ch>
> > > Acked-by: Christian könig <christian.koenig@amd.com>
> > > 
> > > Regards,
> > > Christian.
> > > 
> > > > ---
> > > >    drivers/gpu/drm/drm_gem_shmem_helper.c | 14 ++++++++++++++
> > > >    1 file changed, 14 insertions(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
> > > > index 296ab1b7c07f..657d2490aaa5 100644
> > > > --- a/drivers/gpu/drm/drm_gem_shmem_helper.c
> > > > +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
> > > > @@ -10,6 +10,10 @@
> > > >    #include <linux/slab.h>
> > > >    #include <linux/vmalloc.h>
> > > > +#ifdef CONFIG_X86
> > > > +#include <asm/set_memory.h>
> > > > +#endif
> > > > +
> > > >    #include <drm/drm.h>
> > > >    #include <drm/drm_device.h>
> > > >    #include <drm/drm_drv.h>
> > > > @@ -162,6 +166,11 @@ static int drm_gem_shmem_get_pages_locked(struct drm_gem_shmem_object *shmem)
> > > >    		return PTR_ERR(pages);
> > > >    	}
> > > > +#ifdef CONFIG_X86
> > > > +	if (shmem->map_wc)
> > > > +		set_pages_array_wc(pages, obj->size >> PAGE_SHIFT);
> > > > +#endif
> > > > +
> > > >    	shmem->pages = pages;
> > > >    	return 0;
> > > > @@ -203,6 +212,11 @@ static void drm_gem_shmem_put_pages_locked(struct drm_gem_shmem_object *shmem)
> > > >    	if (--shmem->pages_use_count > 0)
> > > >    		return;
> > > > +#ifdef CONFIG_X86
> > > > +	if (shmem->map_wc)
> > > > +		set_pages_array_wb(shmem->pages, obj->size >> PAGE_SHIFT);
> > > > +#endif
> > > > +
> > > >    	drm_gem_put_pages(obj, shmem->pages,
> > > >    			  shmem->pages_mark_dirty_on_put,
> > > >    			  shmem->pages_mark_accessed_on_put);
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2021-07-14 16:17 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-13 20:51 [PATCH v4 0/4] shmem helpers for vgem Daniel Vetter
2021-07-13 20:51 ` [Intel-gfx] " Daniel Vetter
2021-07-13 20:51 ` [PATCH v4 1/4] dma-buf: Require VM_PFNMAP vma for mmap Daniel Vetter
2021-07-13 20:51   ` [Intel-gfx] " Daniel Vetter
2021-07-13 20:51   ` Daniel Vetter
2021-07-23 18:45   ` Thomas Zimmermann
2021-07-23 18:45     ` [Intel-gfx] " Thomas Zimmermann
2021-07-23 18:45     ` Thomas Zimmermann
2021-07-13 20:51 ` [PATCH v4 2/4] drm/shmem-helper: Switch to vmf_insert_pfn Daniel Vetter
2021-07-13 20:51   ` [Intel-gfx] " Daniel Vetter
2021-07-22 18:22   ` Thomas Zimmermann
2021-07-22 18:22     ` [Intel-gfx] " Thomas Zimmermann
2021-07-23  7:32     ` Daniel Vetter
2021-07-23  7:32       ` [Intel-gfx] " Daniel Vetter
2021-08-12 13:05     ` Daniel Vetter
2021-08-12 13:05       ` [Intel-gfx] " Daniel Vetter
2021-07-13 20:51 ` [PATCH v4 3/4] drm/shmem-helpers: Allocate wc pages on x86 Daniel Vetter
2021-07-13 20:51   ` [Intel-gfx] " Daniel Vetter
2021-07-14 11:54   ` Christian König
2021-07-14 11:54     ` [Intel-gfx] " Christian König
2021-07-14 12:48     ` Daniel Vetter
2021-07-14 12:48       ` [Intel-gfx] " Daniel Vetter
2021-07-14 12:58       ` Christian König
2021-07-14 12:58         ` [Intel-gfx] " Christian König
2021-07-14 16:16         ` Daniel Vetter [this message]
2021-07-14 16:16           ` Daniel Vetter
2021-07-22 18:40   ` Thomas Zimmermann
2021-07-22 18:40     ` [Intel-gfx] " Thomas Zimmermann
2021-07-23  7:36     ` Daniel Vetter
2021-07-23  7:36       ` [Intel-gfx] " Daniel Vetter
2021-07-23  8:02       ` Christian König
2021-07-23  8:02         ` [Intel-gfx] " Christian König
2021-07-23  8:34         ` Daniel Vetter
2021-07-23  8:34           ` [Intel-gfx] " Daniel Vetter
2021-08-05 18:40       ` Thomas Zimmermann
2021-08-05 18:40         ` [Intel-gfx] " Thomas Zimmermann
2021-07-13 20:51 ` [PATCH v4 4/4] drm/vgem: use shmem helpers Daniel Vetter
2021-07-13 20:51   ` [Intel-gfx] " Daniel Vetter
2021-07-14 12:45   ` [PATCH] " Daniel Vetter
2021-07-14 12:45     ` [Intel-gfx] " Daniel Vetter
2021-07-22 18:50   ` [PATCH v4 4/4] " Thomas Zimmermann
2021-07-22 18:50     ` [Intel-gfx] " Thomas Zimmermann
2021-07-23  7:38     ` Daniel Vetter
2021-07-23  7:38       ` [Intel-gfx] " Daniel Vetter
2021-07-13 23:43 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for shmem helpers for vgem (rev6) Patchwork
2021-07-14  0:11 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
2021-07-16 13:29 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for shmem helpers for vgem (rev8) Patchwork
2021-07-16 13:58 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-07-16 16:43 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YO8N+/1S+HvwxfWM@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=airlied@linux.ie \
    --cc=christian.koenig@amd.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=daniel.vetter@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.