From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?ISO-8859-1?Q?Christian_K=F6nig?= Subject: Re: [PATCH 13/13] drm/radeon: rework recursive gpu reset handling Date: Fri, 20 Apr 2012 11:38:23 +0200 Message-ID: <4F912E8F.4020807@vodafone.de> References: <1334875160-5454-1-git-send-email-deathsimple@vodafone.de> <1334875160-5454-14-git-send-email-deathsimple@vodafone.de> <20120420075052.GD4217@phenom.ffwll.local> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" Content-Transfer-Encoding: quoted-printable Return-path: Received: from outgoing.email.vodafone.de (outgoing.email.vodafone.de [139.7.28.128]) by gabe.freedesktop.org (Postfix) with ESMTP id 7461FA0E95 for ; Fri, 20 Apr 2012 02:38:27 -0700 (PDT) In-Reply-To: <20120420075052.GD4217@phenom.ffwll.local> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org To: Daniel Vetter Cc: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org On 20.04.2012 09:50, Daniel Vetter wrote: > On Fri, Apr 20, 2012 at 07:57:09AM +0100, Dave Airlie wrote: >> 2012/4/19 Christian K=F6nig: >>> Instead of all this humpy pumpy with recursive >>> mutex (which also fixes only halve of the problem) >>> move the actual gpu reset out of the fence code, >>> return -EDEADLK and then reset the gpu in the >>> calling ioctl function. >> I'm trying to figure out if this has any disadvantages over doing what >> I proposed before and just kicking a thread to reset the gpu. >> >> It seems like this should also avoid the locking problems, I'd like to >> make sure we don't return -EDEADLK to userspace by accident anywhere, >> since I don't think it prepared for it and it would be an ABI change. > Fyi, the trick i915 uses to solve the reset problem is to bail out with > -EAGAIN and rely on drmIOCtl restarting the ioctl. This way we use the > same codepaths we use to bail out when getting a signal, and thanks to X > these are rather well-tested. The hangcheck code also fires of a work ite= m to > do all the reset magic. In all the ioctls that might wait for the gpu we > have a fancy piece of code which checks whether a gpu reset is pending, > and if so waits for that to complete. It also checks whether the reset > succeeded and if not bails out with -EIO. > -Daniel Well I considered using an asynchronous work item also, but didn't know = how to probably prevent multiple GPU resets at the same time, signaling = the result back to the ioctls, etc.. It just seemed to be more = complicated without any real benefit (maybe except that you don't have = to check every ioctl result separately, but there are not so many). Also I didn't know what to tell userspace to retry the current = operation, but if it's already prepared for -EAGAIN than this sounds = like the proper solution here. And regarding returning -EDEADLK to userspace: I think I handle every = ioctl that could cause the lockup detection to run, but checking that = again won't hurt. Christian.