From mboxrd@z Thu Jan  1 00:00:00 1970
From: Daniel Vetter <daniel@ffwll.ch>
Subject: Re: [PATCH 4/5] drm/i915: rewrite shmem_pwrite_slow to
 use copy_from_user
Date: Sun, 18 Sep 2011 10:44:57 +0200
Message-ID: <20110918084457.GA2815@phenom.ffwll.local>
References: <1316285749-30130-1-git-send-email-daniel.vetter@ffwll.ch>
	<1316285749-30130-5-git-send-email-daniel.vetter@ffwll.ch>
	<d08817$1fk0d3@azsmga001.ch.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org>
Received: from mail-fx0-f49.google.com (mail-fx0-f49.google.com
	[209.85.161.49])
	by gabe.freedesktop.org (Postfix) with ESMTP id DCB409E946
	for <intel-gfx@lists.freedesktop.org>;
	Sun, 18 Sep 2011 01:44:34 -0700 (PDT)
Received: by fxg7 with SMTP id 7so3705229fxg.36
	for <intel-gfx@lists.freedesktop.org>;
	Sun, 18 Sep 2011 01:44:34 -0700 (PDT)
Content-Disposition: inline
In-Reply-To: <d08817$1fk0d3@azsmga001.ch.intel.com>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/intel-gfx>
List-Post: <mailto:intel-gfx@lists.freedesktop.org>
List-Help: <mailto:intel-gfx-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/intel-gfx>,
	<mailto:intel-gfx-request@lists.freedesktop.org?subject=subscribe>
Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org
To: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>, intel-gfx <intel-gfx@lists.freedesktop.org>
List-Id: intel-gfx@lists.freedesktop.org

On Sat, Sep 17, 2011 at 10:00:48PM +0100, Chris Wilson wrote:
> On Sat, 17 Sep 2011 20:55:48 +0200, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > +static inline int
> > +copy_from_user_swizzled(char __user *gpu_vaddr, int gpu_offset,
> __copy_from_user_swizzled as we user the fast/non-checking variant of
> __copy_from_user. (Ditto for the following patch)

Good idea.
> >  out:
> > -	for (i = 0; i < pinned_pages; i++)
> > -		page_cache_release(user_pages[i]);
> > -	drm_free_large(user_pages);
> > +	mutex_lock(&dev->struct_mutex);
> > +	/* Fixup: Kill any reinstated backing storage pages */
> > +	if (obj->madv == __I915_MADV_PURGED)
> > +		i915_gem_object_truncate(obj);
> > +	/* and flush dirty cachelines in case the object isn't in the cpu write
> > +	 * domain anymore. */
> > +	if (obj->base.write_domain != I915_GEM_DOMAIN_CPU)
> > +		i915_gem_clflush_object(obj);
> 
> This is a little too scary. And enough for me to have doubts over the
> safety of the patch.

I agree in terms of scariness. Now the only other option I'm seeing is
copying the data from userspace into some temporary storage outside of
struct_mutex so that we can then savely move it to the right place under
the protection of struct_mutex. I find that approach even more
unsatisfactory.

So here's why I think this is safe:

First about not breaking the userspace abi: We already drop the
struct_mutex when falling back to the slow paths, so userspace is already
facing the risk of seein partial writes when racing with itself. So I
think anything that reads/writes data concurrently is already undefined,
so no problem. For the same reasons, set_tiling can already race with
pwrite/pread and furthermore set_tiling is only called on unused buffers
by libdrm.

This imo only leaves set_domain(GTT) without any following read/writes as
a sensible ioctl to consider. For example when one process syncs up with
gpu excution while another does some s/w fallback uploading. Now this is
obviously a nice case of userspace racing with itself, so the only thing
we have to guarantee is that the pwrite has completely when the pwrite
ioctl finished with errno=0 and is coherent wrt the domain tracking. I.e.
we're not allowed to leave unflushed dirty cachelines behind if the
write_domain is no longer the cpu domain. The clflush (and the missing
chipset flush) take care of that.

Second aspect is whether userspace can screw over the kernel. So lets
carefully check what invariants the code outside struct_mutex depends on
and what state it is mangling:

The code only needs the shmem mapping (which cannot disapear because we're
holding a ref on the gem_object which in turn is holding a ref to the
underlying shmem file for its entire lifetime) and the userspace pointers
(which is handled by copy_*_user). For correctness we also depend upon the
bit17 swizzling state, but that can only go stale if userspaces races a
set_tiling call with the ongoing pwrite. And the kernel doesn't care about
writing undefined stuff into the backing storage if that is what userspace
demands.

Also note that we already need to handle error case from
shmem_read_mapping, so in case we change our madv truncating behavior in
the future to actually truncate the underlying inode (and not just drop
the backing storage) we're still save.

So lets look at what state we're touching outside the struct_mutex: I see
to things relevant to i915/gem: We might (re-)instate backing storage
pages and we obviously pollute the cpu caches. These two things get fixed
up in case userspace tries to screw over the kernel. After these two
fixups, the bo state is consistent and we can return;

Now besides that this fixes the bug without some ugly temp storage hack, I
see a few other things in favour of this approach:
- shmem slow&fastpaths are now almost identical, code-flow wise. Unifying
  these safes quite a few lines of not-so-heavily-used code. Also this
  opens up the possibility of further cleanups and codesize reductions I
  have in mind.
- pwrite/pread now work rather similar to pagecache read/write (especially
  with the unified slow/fast paths), and I like consistency.

So in conclusion I really think this is the best way to fix this, despite
the scariness of the fixup path.

-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48