From mboxrd@z Thu Jan  1 00:00:00 1970
From: =?ISO-8859-1?Q?Christian_K=F6nig?= <deathsimple@vodafone.de>
Subject: Re: [PATCH 13/13] drm/radeon: rework recursive gpu reset handling
Date: Fri, 20 Apr 2012 11:38:23 +0200
Message-ID: <4F912E8F.4020807@vodafone.de>
References: <1334875160-5454-1-git-send-email-deathsimple@vodafone.de>
	<1334875160-5454-14-git-send-email-deathsimple@vodafone.de>
	<CAPM=9txr5EMjY4T8CtEtphkzTcxK8vRerzurXSNnLfHXY7CTxw@mail.gmail.com>
	<20120420075052.GD4217@phenom.ffwll.local>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
Content-Transfer-Encoding: quoted-printable
Return-path: <dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org>
Received: from outgoing.email.vodafone.de (outgoing.email.vodafone.de
	[139.7.28.128])
	by gabe.freedesktop.org (Postfix) with ESMTP id 7461FA0E95
	for <dri-devel@lists.freedesktop.org>;
	Fri, 20 Apr 2012 02:38:27 -0700 (PDT)
In-Reply-To: <20120420075052.GD4217@phenom.ffwll.local>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org
Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org
To: Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org

On 20.04.2012 09:50, Daniel Vetter wrote:
> On Fri, Apr 20, 2012 at 07:57:09AM +0100, Dave Airlie wrote:
>> 2012/4/19 Christian K=F6nig<deathsimple@vodafone.de>:
>>> Instead of all this humpy pumpy with recursive
>>> mutex (which also fixes only halve of the problem)
>>> move the actual gpu reset out of the fence code,
>>> return -EDEADLK and then reset the gpu in the
>>> calling ioctl function.
>> I'm trying to figure out if this has any disadvantages over doing what
>> I proposed before and just kicking a thread to reset the gpu.
>>
>> It seems like this should also avoid the locking problems, I'd like to
>> make sure we don't return -EDEADLK to userspace by accident anywhere,
>> since I don't think it prepared for it and it would be an ABI change.
> Fyi, the trick i915 uses to solve the reset problem is to bail out with
> -EAGAIN and rely on drmIOCtl restarting the ioctl. This way we use the
> same codepaths we use to bail out when getting a signal, and thanks to X
> these are rather well-tested. The hangcheck code also fires of a work ite=
m to
> do all the reset magic. In all the ioctls that might wait for the gpu we
> have a fancy piece of code which checks whether a gpu reset is pending,
> and if so waits for that to complete. It also checks whether the reset
> succeeded and if not bails out with -EIO.
> -Daniel
Well I considered using an asynchronous work item also, but didn't know =

how to probably prevent multiple GPU resets at the same time, signaling =

the result back to the ioctls, etc.. It just seemed to be more =

complicated without any real benefit (maybe except that you don't have =

to check every ioctl result separately, but there are not so many).

Also I didn't know what to tell userspace to retry the current =

operation, but if it's already prepared for -EAGAIN than this sounds =

like the proper solution here.

And regarding returning -EDEADLK to userspace: I think I handle every =

ioctl that could cause the lockup detection to run, but checking that =

again won't hurt.

Christian.