From mboxrd@z Thu Jan  1 00:00:00 1970
From: Kenneth Graunke <kenneth@whitecape.org>
Subject: Re: [PATCH v3] drm/i915: Remove partial attempt to
 swizzle on pread/pwrite
Date: Fri, 16 Nov 2018 07:59:08 -0800
Message-ID: <3927245.3ZmsaQhG20@kirito>
References: <20180723090208.27226-1-chris@chris-wilson.co.uk>
 <20181012115621.24435-1-chris@chris-wilson.co.uk>
 <154238052406.14497.7203847457523975435@jlahtine-desk.ger.corp.intel.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============0800515515=="
Return-path: <intel-gfx-bounces@lists.freedesktop.org>
Received: from smtp81.iad3a.emailsrvr.com (smtp81.iad3a.emailsrvr.com
 [173.203.187.81])
 by gabe.freedesktop.org (Postfix) with ESMTPS id A515E6E705
 for <intel-gfx@lists.freedesktop.org>; Fri, 16 Nov 2018 16:06:46 +0000 (UTC)
Received: from smtp19.relay.iad3a.emailsrvr.com (localhost [127.0.0.1])
 by smtp19.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 222565716
 for <intel-gfx@lists.freedesktop.org>; Fri, 16 Nov 2018 10:59:16 -0500 (EST)
Received: from smtp65.iad3a.emailsrvr.com (relay.iad3a.rsapps.net
 [172.27.255.110])
 by smtp19.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTPS id 0F3C35707
 for <intel-gfx@lists.freedesktop.org>; Fri, 16 Nov 2018 10:59:16 -0500 (EST)
In-Reply-To: <154238052406.14497.7203847457523975435@jlahtine-desk.ger.corp.intel.com>
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
 <mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
To: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org, Ian Romanick <idr@freedesktop.org>
List-Id: intel-gfx@lists.freedesktop.org

--===============0800515515==
Content-Type: multipart/signed; boundary="nextPart3783674.NFTmG1JQBL"; micalg="pgp-sha256"; protocol="application/pgp-signature"

--nextPart3783674.NFTmG1JQBL
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="us-ascii"

On Friday, November 16, 2018 7:02:04 AM PST Joonas Lahtinen wrote:
> Quoting Chris Wilson (2018-10-12 14:56:21)
> > Our attempt to account for bit17 swizzling of pread/pwrite onto tiled
> > objects was flawed due to the simple fact that we do not always know the
> > swizzling for a particular page (due to the swizzling varying based on
> > location in certain unbalanced configurations). Furthermore, the
> > pread/pwrite paths are now unbalanced in that we are required to use the
> > GTT as in some cases we do not have direct CPU access to the backing
> > physical pages (thus some paths trying to account for the swizzle, but
> > others neglecting, chaos ensues).
> > 
> > There are no known users who do use pread/pwrite into a tiled object
> > (you need to manually detile anyway, so why now just use mmap and avoid
> > the copy?) and no user bug reports to indicate that it is being used in
> > the wild. As no one is hitting the buggy path, we can just remove the
> > buggy code.
> 
> Adding a few Mesa folks for an Ack, they should remember if they've done it
> in the past.

Eheheh...bit17 and 9xx hardware are before my time. :(  I've copied Ian
in case he remembers anything from that era.

It looks like the only users of pwrite and pread in Mesa's i915 driver
are for linear buffer objects, not miptrees which could be tiled.

So I think this is fine...(famous last words...)
Acked-by: Kenneth Graunke <kenneth@whitecape.org>

> If we have good enough confidence that it was unused in the userspace
> drivers, should be OK to be dropped. And it would have been broken
> anyway, which pretty much automatically means it didn't see use, but
> doesn't hurt to hear from them.
> 
> Regards, Joonas
> 
> > 
> > v2: Just use the fault allowing kmap() + normal copy_(to|from)_user
> > v3: Avoid int overflow in computing 'length' from 'remain' (Tvrtko)
> > 
> > References: fe115628d567 ("drm/i915: Implement pwrite without struct-mutex")
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 195 ++++----------------------------
> >  1 file changed, 25 insertions(+), 170 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 7d45e71100bc..8049458ab6ac 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -859,58 +859,6 @@ flush_write_domain(struct drm_i915_gem_object *obj, unsigned int flush_domains)
> >         obj->write_domain = 0;
> >  }
> >  
> > -static inline int
> > -__copy_to_user_swizzled(char __user *cpu_vaddr,
> > -                       const char *gpu_vaddr, int gpu_offset,
> > -                       int length)
> > -{
> > -       int ret, cpu_offset = 0;
> > -
> > -       while (length > 0) {
> > -               int cacheline_end = ALIGN(gpu_offset + 1, 64);
> > -               int this_length = min(cacheline_end - gpu_offset, length);
> > -               int swizzled_gpu_offset = gpu_offset ^ 64;
> > -
> > -               ret = __copy_to_user(cpu_vaddr + cpu_offset,
> > -                                    gpu_vaddr + swizzled_gpu_offset,
> > -                                    this_length);
> > -               if (ret)
> > -                       return ret + length;
> > -
> > -               cpu_offset += this_length;
> > -               gpu_offset += this_length;
> > -               length -= this_length;
> > -       }
> > -
> > -       return 0;
> > -}
> > -
> > -static inline int
> > -__copy_from_user_swizzled(char *gpu_vaddr, int gpu_offset,
> > -                         const char __user *cpu_vaddr,
> > -                         int length)
> > -{
> > -       int ret, cpu_offset = 0;
> > -
> > -       while (length > 0) {
> > -               int cacheline_end = ALIGN(gpu_offset + 1, 64);
> > -               int this_length = min(cacheline_end - gpu_offset, length);
> > -               int swizzled_gpu_offset = gpu_offset ^ 64;
> > -
> > -               ret = __copy_from_user(gpu_vaddr + swizzled_gpu_offset,
> > -                                      cpu_vaddr + cpu_offset,
> > -                                      this_length);
> > -               if (ret)
> > -                       return ret + length;
> > -
> > -               cpu_offset += this_length;
> > -               gpu_offset += this_length;
> > -               length -= this_length;
> > -       }
> > -
> > -       return 0;
> > -}
> > -
> >  /*
> >   * Pins the specified object's pages and synchronizes the object with
> >   * GPU accesses. Sets needs_clflush to non-zero if the caller should
> > @@ -1030,72 +978,23 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
> >         return ret;
> >  }
> >  
> > -static void
> > -shmem_clflush_swizzled_range(char *addr, unsigned long length,
> > -                            bool swizzled)
> > -{
> > -       if (unlikely(swizzled)) {
> > -               unsigned long start = (unsigned long) addr;
> > -               unsigned long end = (unsigned long) addr + length;
> > -
> > -               /* For swizzling simply ensure that we always flush both
> > -                * channels. Lame, but simple and it works. Swizzled
> > -                * pwrite/pread is far from a hotpath - current userspace
> > -                * doesn't use it at all. */
> > -               start = round_down(start, 128);
> > -               end = round_up(end, 128);
> > -
> > -               drm_clflush_virt_range((void *)start, end - start);
> > -       } else {
> > -               drm_clflush_virt_range(addr, length);
> > -       }
> > -
> > -}
> > -
> > -/* Only difference to the fast-path function is that this can handle bit17
> > - * and uses non-atomic copy and kmap functions. */
> >  static int
> > -shmem_pread_slow(struct page *page, int offset, int length,
> > -                char __user *user_data,
> > -                bool page_do_bit17_swizzling, bool needs_clflush)
> > +shmem_pread(struct page *page, int offset, int len, char __user *user_data,
> > +           bool needs_clflush)
> >  {
> >         char *vaddr;
> >         int ret;
> >  
> >         vaddr = kmap(page);
> > -       if (needs_clflush)
> > -               shmem_clflush_swizzled_range(vaddr + offset, length,
> > -                                            page_do_bit17_swizzling);
> >  
> > -       if (page_do_bit17_swizzling)
> > -               ret = __copy_to_user_swizzled(user_data, vaddr, offset, length);
> > -       else
> > -               ret = __copy_to_user(user_data, vaddr + offset, length);
> > -       kunmap(page);
> > -
> > -       return ret ? - EFAULT : 0;
> > -}
> > -
> > -static int
> > -shmem_pread(struct page *page, int offset, int length, char __user *user_data,
> > -           bool page_do_bit17_swizzling, bool needs_clflush)
> > -{
> > -       int ret;
> > +       if (needs_clflush)
> > +               drm_clflush_virt_range(vaddr + offset, len);
> >  
> > -       ret = -ENODEV;
> > -       if (!page_do_bit17_swizzling) {
> > -               char *vaddr = kmap_atomic(page);
> > +       ret = __copy_to_user(user_data, vaddr + offset, len);
> >  
> > -               if (needs_clflush)
> > -                       drm_clflush_virt_range(vaddr + offset, length);
> > -               ret = __copy_to_user_inatomic(user_data, vaddr + offset, length);
> > -               kunmap_atomic(vaddr);
> > -       }
> > -       if (ret == 0)
> > -               return 0;
> > +       kunmap(page);
> >  
> > -       return shmem_pread_slow(page, offset, length, user_data,
> > -                               page_do_bit17_swizzling, needs_clflush);
> > +       return ret ? -EFAULT : 0;
> >  }
> >  
> >  static int
> > @@ -1104,15 +1003,10 @@ i915_gem_shmem_pread(struct drm_i915_gem_object *obj,
> >  {
> >         char __user *user_data;
> >         u64 remain;
> > -       unsigned int obj_do_bit17_swizzling;
> >         unsigned int needs_clflush;
> >         unsigned int idx, offset;
> >         int ret;
> >  
> > -       obj_do_bit17_swizzling = 0;
> > -       if (i915_gem_object_needs_bit17_swizzle(obj))
> > -               obj_do_bit17_swizzling = BIT(17);
> > -
> >         ret = mutex_lock_interruptible(&obj->base.dev->struct_mutex);
> >         if (ret)
> >                 return ret;
> > @@ -1127,14 +1021,14 @@ i915_gem_shmem_pread(struct drm_i915_gem_object *obj,
> >         offset = offset_in_page(args->offset);
> >         for (idx = args->offset >> PAGE_SHIFT; remain; idx++) {
> >                 struct page *page = i915_gem_object_get_page(obj, idx);
> > -               int length;
> > +               unsigned int length;
> >  
> > -               length = remain;
> > -               if (offset + length > PAGE_SIZE)
> > +               if (remain > PAGE_SIZE - offset)
> >                         length = PAGE_SIZE - offset;
> > +               else
> > +                       length = remain;
> >  
> >                 ret = shmem_pread(page, offset, length, user_data,
> > -                                 page_to_phys(page) & obj_do_bit17_swizzling,
> >                                   needs_clflush);
> >                 if (ret)
> >                         break;
> > @@ -1475,33 +1369,6 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj,
> >         return ret;
> >  }
> >  
> > -static int
> > -shmem_pwrite_slow(struct page *page, int offset, int length,
> > -                 char __user *user_data,
> > -                 bool page_do_bit17_swizzling,
> > -                 bool needs_clflush_before,
> > -                 bool needs_clflush_after)
> > -{
> > -       char *vaddr;
> > -       int ret;
> > -
> > -       vaddr = kmap(page);
> > -       if (unlikely(needs_clflush_before || page_do_bit17_swizzling))
> > -               shmem_clflush_swizzled_range(vaddr + offset, length,
> > -                                            page_do_bit17_swizzling);
> > -       if (page_do_bit17_swizzling)
> > -               ret = __copy_from_user_swizzled(vaddr, offset, user_data,
> > -                                               length);
> > -       else
> > -               ret = __copy_from_user(vaddr + offset, user_data, length);
> > -       if (needs_clflush_after)
> > -               shmem_clflush_swizzled_range(vaddr + offset, length,
> > -                                            page_do_bit17_swizzling);
> > -       kunmap(page);
> > -
> > -       return ret ? -EFAULT : 0;
> > -}
> > -
> >  /* Per-page copy function for the shmem pwrite fastpath.
> >   * Flushes invalid cachelines before writing to the target if
> >   * needs_clflush_before is set and flushes out any written cachelines after
> > @@ -1509,31 +1376,24 @@ shmem_pwrite_slow(struct page *page, int offset, int length,
> >   */
> >  static int
> >  shmem_pwrite(struct page *page, int offset, int len, char __user *user_data,
> > -            bool page_do_bit17_swizzling,
> >              bool needs_clflush_before,
> >              bool needs_clflush_after)
> >  {
> > +       char *vaddr;
> >         int ret;
> >  
> > -       ret = -ENODEV;
> > -       if (!page_do_bit17_swizzling) {
> > -               char *vaddr = kmap_atomic(page);
> > +       vaddr = kmap(page);
> >  
> > -               if (needs_clflush_before)
> > -                       drm_clflush_virt_range(vaddr + offset, len);
> > -               ret = __copy_from_user_inatomic(vaddr + offset, user_data, len);
> > -               if (needs_clflush_after)
> > -                       drm_clflush_virt_range(vaddr + offset, len);
> > +       if (needs_clflush_before)
> > +               drm_clflush_virt_range(vaddr + offset, len);
> >  
> > -               kunmap_atomic(vaddr);
> > -       }
> > -       if (ret == 0)
> > -               return ret;
> > +       ret = __copy_from_user(vaddr + offset, user_data, len);
> > +       if (!ret && needs_clflush_after)
> > +               drm_clflush_virt_range(vaddr + offset, len);
> > +
> > +       kunmap(page);
> >  
> > -       return shmem_pwrite_slow(page, offset, len, user_data,
> > -                                page_do_bit17_swizzling,
> > -                                needs_clflush_before,
> > -                                needs_clflush_after);
> > +       return ret ? -EFAULT : 0;
> >  }
> >  
> >  static int
> > @@ -1543,7 +1403,6 @@ i915_gem_shmem_pwrite(struct drm_i915_gem_object *obj,
> >         struct drm_i915_private *i915 = to_i915(obj->base.dev);
> >         void __user *user_data;
> >         u64 remain;
> > -       unsigned int obj_do_bit17_swizzling;
> >         unsigned int partial_cacheline_write;
> >         unsigned int needs_clflush;
> >         unsigned int offset, idx;
> > @@ -1558,10 +1417,6 @@ i915_gem_shmem_pwrite(struct drm_i915_gem_object *obj,
> >         if (ret)
> >                 return ret;
> >  
> > -       obj_do_bit17_swizzling = 0;
> > -       if (i915_gem_object_needs_bit17_swizzle(obj))
> > -               obj_do_bit17_swizzling = BIT(17);
> > -
> >         /* If we don't overwrite a cacheline completely we need to be
> >          * careful to have up-to-date data by first clflushing. Don't
> >          * overcomplicate things and flush the entire patch.
> > @@ -1575,14 +1430,14 @@ i915_gem_shmem_pwrite(struct drm_i915_gem_object *obj,
> >         offset = offset_in_page(args->offset);
> >         for (idx = args->offset >> PAGE_SHIFT; remain; idx++) {
> >                 struct page *page = i915_gem_object_get_page(obj, idx);
> > -               int length;
> > +               unsigned int length;
> >  
> > -               length = remain;
> > -               if (offset + length > PAGE_SIZE)
> > +               if (remain > PAGE_SIZE - offset)
> >                         length = PAGE_SIZE - offset;
> > +               else
> > +                       length = remain;
> >  
> >                 ret = shmem_pwrite(page, offset, length, user_data,
> > -                                  page_to_phys(page) & obj_do_bit17_swizzling,
> >                                    (offset | length) & partial_cacheline_write,
> >                                    needs_clflush & CLFLUSH_AFTER);
> >                 if (ret)
> 


--nextPart3783674.NFTmG1JQBL
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: This is a digitally signed message part.
Content-Transfer-Encoding: 7Bit

-----BEGIN PGP SIGNATURE-----

iQIzBAABCAAdFiEE6OtbNAgc4e6ibv4ZW1vaBx1JzDgFAlvu6UwACgkQW1vaBx1J
zDiqFg/7B8olHgORhkS1upesEhNrxv9YZ3dx0VEoFzro31We+unz7EbwWynihytg
B0MUGw77ffL7UZdChaF1m5rLOJgmk0srplrwEJSv+Zi5vJxe3j9lAzk2nZiv7j4t
LrDYsRQimaAo9XpPWXNejBOoC4+ghwGYhTh1myG7UsJTXl1wYVD+Y0uG0T+i1TyZ
Avj8A5CWuADsQd8FlYh8PTmzunsL8A1z6HI92O6rxq9zKr5nkV9ggrHBlurS/tjI
asgP7GsdHxpdCMXw/7Dq/HwIK4cC0qxVX19lOdQYiM9eLkieZU65yWjcTu/pQUcr
rmXax7qBqIf2bWpZ0NWWEKOZhUwqHctL4qx/onw0jSj+DkPydR6X/Peksg7Guc88
1RNljSWM6qHdE3SgVioRxZDtYHrgh2yOTqB9eLe86WZm9TUoyylW/8KohnAuslf5
E1/8NapFadip6hU0uA+7jZQKbDMhzRddFuVfOqfnAufMQhYilnT59iNhJme4JPfU
RwIQQbLKuXMJF/3w03+wiUOfzQbLyL9a1mhj/yDaxaWS/Y2769jzyI8K7YhlfH4P
i981V2fETD8I34D1uXjT93NElGpe+heLtUGxG1rP+cnee9uO6Xje/1Bv21gLneUb
58MBaQNpIyheR+x5zwZD9Jz+h3QjlYGTd74fM4xwe7T2GpB1n44=
=Fh45
-----END PGP SIGNATURE-----

--nextPart3783674.NFTmG1JQBL--


--===============0800515515==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline

X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KSW50ZWwtZ2Z4
IG1haWxpbmcgbGlzdApJbnRlbC1nZnhAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vaW50ZWwtZ2Z4Cg==

--===============0800515515==--