Re: [PATCH 15/16] drm/i915: fixup in-line clflushing on bit17 swizzled bos

From: Daniel Vetter <daniel@ffwll.ch>
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
	Intel Graphics Development <intel-gfx@lists.freedesktop.org>
Subject: Re: [PATCH 15/16] drm/i915: fixup in-line clflushing on bit17 swizzled bos
Date: Mon, 26 Mar 2012 11:26:33 +0200	[thread overview]
Message-ID: <20120326092633.GG4014@phenom.ffwll.local> (raw)
In-Reply-To: <1332753528_97351@CP5-2952>

On Mon, Mar 26, 2012 at 10:18:39AM +0100, Chris Wilson wrote:
> On Sun, 25 Mar 2012 19:47:42 +0200, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > The issue is that with inline clflushing the clflushing isn't properly
> > swizzled. Fix this by
> > - always clflushing entire 128 byte chunks and
> > - unconditionally flush before writes when swizzling a given page.
> >   We could be clever and check whether we pwrite a partial 128 byte
> >   chunk instead of a partial cacheline, but I've figured that's not
> >   worth it.
> 
> There's some black magic here that I haven't fully grasped. We only ever
> swizzle the gpu address (by whole cachelines), so why do we need to
> invalidate a pair of cachelines for a single cacheline write?

Well, we do swizzle when doing the actual copy_to|from_user, so strictly
speaking we should also swizzle the clflushing in this case. No bit17
swizzling pwrite/pread is pretty much only around for backwards-compat
with dead-old userspace, so I've figure I'll just unconditionally align
the clflush range with even cachelines when bit17 swizzling is effective
on the current page. Instead of adding a complex and rather untested
swizzled clflush helper.

> Also we have a lot of assumptions that the cacheline is 64 bytes. Have
> we tested on gen2 where the GPU cacheline is 32 bytes?

We're lucking out in that regard because gen2 doesn't do swizzling. At
least my i855gm here, where I could run the corresponding i-g-t tests. And
there's a comment in the code that i865g is unswizzled, too.

So as long as people create dram controller where 64 bytes is the most
effective transaction size, we should be fine. It'll be a fun day though
when that changes. Otoh with gen5+ we don't have any bit17 swizzling
nonsense anymore because the gpu is much more integrated with the cpu. I
hope that trend continues and will prevent any bit17 madness in the
future.
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch