From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751713AbaHDIzf (ORCPT ); Mon, 4 Aug 2014 04:55:35 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:37316 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751470AbaHDIzc (ORCPT ); Mon, 4 Aug 2014 04:55:32 -0400 Message-ID: <53DF4A7D.3040505@canonical.com> Date: Mon, 04 Aug 2014 10:55:25 +0200 From: Maarten Lankhorst User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: =?UTF-8?B?Q2hyaXN0aWFuIEvDtm5pZw==?= , airlied@linux.ie CC: thellstrom@vmware.com, nouveau@lists.freedesktop.org, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, bskeggs@redhat.com, alexander.deucher@amd.com Subject: Re: [PATCH 09/19] drm/radeon: handle lockup in delayed work, v2 References: <20140731153245.15061.63023.stgit@patser> <20140731153342.15061.54264.stgit@patser> <53DBC1EC.1010001@amd.com> <53DBD269.80807@canonical.com> <53DF462B.2060102@amd.com> In-Reply-To: <53DF462B.2060102@amd.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org op 04-08-14 10:36, Christian König schreef: > Hi Maarten, > > Sorry for the delay. I've got way to much todo recently. > > Am 01.08.2014 um 19:46 schrieb Maarten Lankhorst: >> >> On 01-08-14 18:35, Christian König wrote: >>> Am 31.07.2014 um 17:33 schrieb Maarten Lankhorst: >>>> Signed-off-by: Maarten Lankhorst >>>> --- >>>> V1 had a nasty bug breaking gpu lockup recovery. The fix is not >>>> allowing radeon_fence_driver_check_lockup to take exclusive_lock, >>>> and kill it during lockup recovery instead. >>> That looks like the delayed work starts running as soon as we submit a fence, and not when it's needed for waiting. >>> >>> Since it's a backup for failing IRQs I would rather put it into radeon_irq_kms.c and start/stop it when the IRQs are started/stoped. >> The delayed work is not just for failing irq's, it's also the handler that's used to detect lockups, which is why I trigger after processing fences, and reset the timer after processing. > > The idea was turning the delayed work on and off when we turn the irq on and off as well, processing of the delayed work handler can still happen in radeon_fence.c > >> >> Specifically what happened was this scenario: >> >> - lock up occurs >> - write lock taken by gpu_reset >> - delayed work runs, tries to acquire read lock, blocks >> - gpu_reset tries to cancel delayed work synchronously >> - has to wait for delayed work to finish -> deadlock > > Why do you want to disable the work item from the lockup handler in the first place? > > Just take the exclusive lock in the work item, when it concurrently runs with the lockup handler it will just block for the lockup handler to complete. With the delayed work radeon_fence_wait no longer handles unreliable interrupts itself, so it has to run from the lockup handler. But an alternative solution could be adding a radeon_fence_wait_timeout, ignore the timeout and check if fence is signaled on timeout. This would probably be a cleaner solution. ~Maarten From mboxrd@z Thu Jan 1 00:00:00 1970 From: Maarten Lankhorst Subject: Re: [PATCH 09/19] drm/radeon: handle lockup in delayed work, v2 Date: Mon, 04 Aug 2014 10:55:25 +0200 Message-ID: <53DF4A7D.3040505@canonical.com> References: <20140731153245.15061.63023.stgit@patser> <20140731153342.15061.54264.stgit@patser> <53DBC1EC.1010001@amd.com> <53DBD269.80807@canonical.com> <53DF462B.2060102@amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Return-path: In-Reply-To: <53DF462B.2060102-5C7GfCeVMHo@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: nouveau-bounces-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org Sender: "Nouveau" To: =?UTF-8?B?Q2hyaXN0aWFuIEvDtm5pZw==?= , airlied-cv59FeDIM0c@public.gmane.org Cc: thellstrom-pghWNbHTmq7QT0dZR+AlfA@public.gmane.org, nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, bskeggs-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, alexander.deucher-5C7GfCeVMHo@public.gmane.org List-Id: nouveau.vger.kernel.org b3AgMDQtMDgtMTQgMTA6MzYsIENocmlzdGlhbiBLw7ZuaWcgc2NocmVlZjoKPiBIaSBNYWFydGVu LAo+Cj4gU29ycnkgZm9yIHRoZSBkZWxheS4gSSd2ZSBnb3Qgd2F5IHRvIG11Y2ggdG9kbyByZWNl bnRseS4KPgo+IEFtIDAxLjA4LjIwMTQgdW0gMTk6NDYgc2NocmllYiBNYWFydGVuIExhbmtob3Jz dDoKPj4KPj4gT24gMDEtMDgtMTQgMTg6MzUsIENocmlzdGlhbiBLw7ZuaWcgd3JvdGU6Cj4+PiBB bSAzMS4wNy4yMDE0IHVtIDE3OjMzIHNjaHJpZWIgTWFhcnRlbiBMYW5raG9yc3Q6Cj4+Pj4gU2ln bmVkLW9mZi1ieTogTWFhcnRlbiBMYW5raG9yc3QgPG1hYXJ0ZW4ubGFua2hvcnN0QGNhbm9uaWNh bC5jb20+Cj4+Pj4gLS0tCj4+Pj4gVjEgaGFkIGEgbmFzdHkgYnVnIGJyZWFraW5nIGdwdSBsb2Nr dXAgcmVjb3ZlcnkuIFRoZSBmaXggaXMgbm90Cj4+Pj4gYWxsb3dpbmcgcmFkZW9uX2ZlbmNlX2Ry aXZlcl9jaGVja19sb2NrdXAgdG8gdGFrZSBleGNsdXNpdmVfbG9jaywKPj4+PiBhbmQga2lsbCBp dCBkdXJpbmcgbG9ja3VwIHJlY292ZXJ5IGluc3RlYWQuCj4+PiBUaGF0IGxvb2tzIGxpa2UgdGhl IGRlbGF5ZWQgd29yayBzdGFydHMgcnVubmluZyBhcyBzb29uIGFzIHdlIHN1Ym1pdCBhIGZlbmNl LCBhbmQgbm90IHdoZW4gaXQncyBuZWVkZWQgZm9yIHdhaXRpbmcuCj4+Pgo+Pj4gU2luY2UgaXQn cyBhIGJhY2t1cCBmb3IgZmFpbGluZyBJUlFzIEkgd291bGQgcmF0aGVyIHB1dCBpdCBpbnRvIHJh ZGVvbl9pcnFfa21zLmMgYW5kIHN0YXJ0L3N0b3AgaXQgd2hlbiB0aGUgSVJRcyBhcmUgc3RhcnRl ZC9zdG9wZWQuCj4+IFRoZSBkZWxheWVkIHdvcmsgaXMgbm90IGp1c3QgZm9yIGZhaWxpbmcgaXJx J3MsIGl0J3MgYWxzbyB0aGUgaGFuZGxlciB0aGF0J3MgdXNlZCB0byBkZXRlY3QgbG9ja3Vwcywg d2hpY2ggaXMgd2h5IEkgdHJpZ2dlciBhZnRlciBwcm9jZXNzaW5nIGZlbmNlcywgYW5kIHJlc2V0 IHRoZSB0aW1lciBhZnRlciBwcm9jZXNzaW5nLgo+Cj4gVGhlIGlkZWEgd2FzIHR1cm5pbmcgdGhl IGRlbGF5ZWQgd29yayBvbiBhbmQgb2ZmIHdoZW4gd2UgdHVybiB0aGUgaXJxIG9uIGFuZCBvZmYg YXMgd2VsbCwgcHJvY2Vzc2luZyBvZiB0aGUgZGVsYXllZCB3b3JrIGhhbmRsZXIgY2FuIHN0aWxs IGhhcHBlbiBpbiByYWRlb25fZmVuY2UuYwo+Cj4+Cj4+IFNwZWNpZmljYWxseSB3aGF0IGhhcHBl bmVkIHdhcyB0aGlzIHNjZW5hcmlvOgo+Pgo+PiAtIGxvY2sgdXAgb2NjdXJzCj4+IC0gd3JpdGUg bG9jayB0YWtlbiBieSBncHVfcmVzZXQKPj4gLSBkZWxheWVkIHdvcmsgcnVucywgdHJpZXMgdG8g YWNxdWlyZSByZWFkIGxvY2ssIGJsb2Nrcwo+PiAtIGdwdV9yZXNldCB0cmllcyB0byBjYW5jZWwg ZGVsYXllZCB3b3JrIHN5bmNocm9ub3VzbHkKPj4gLSBoYXMgdG8gd2FpdCBmb3IgZGVsYXllZCB3 b3JrIHRvIGZpbmlzaCAtPiBkZWFkbG9jawo+Cj4gV2h5IGRvIHlvdSB3YW50IHRvIGRpc2FibGUg dGhlIHdvcmsgaXRlbSBmcm9tIHRoZSBsb2NrdXAgaGFuZGxlciBpbiB0aGUgZmlyc3QgcGxhY2U/ Cj4KPiBKdXN0IHRha2UgdGhlIGV4Y2x1c2l2ZSBsb2NrIGluIHRoZSB3b3JrIGl0ZW0sIHdoZW4g aXQgY29uY3VycmVudGx5IHJ1bnMgd2l0aCB0aGUgbG9ja3VwIGhhbmRsZXIgaXQgd2lsbCBqdXN0 IGJsb2NrIGZvciB0aGUgbG9ja3VwIGhhbmRsZXIgdG8gY29tcGxldGUuCldpdGggdGhlIGRlbGF5 ZWQgd29yayByYWRlb25fZmVuY2Vfd2FpdCBubyBsb25nZXIgaGFuZGxlcyB1bnJlbGlhYmxlIGlu dGVycnVwdHMgaXRzZWxmLCBzbyBpdCBoYXMgdG8gcnVuIGZyb20gdGhlIGxvY2t1cCBoYW5kbGVy LgpCdXQgYW4gYWx0ZXJuYXRpdmUgc29sdXRpb24gY291bGQgYmUgYWRkaW5nIGEgcmFkZW9uX2Zl bmNlX3dhaXRfdGltZW91dCwgaWdub3JlIHRoZSB0aW1lb3V0IGFuZCBjaGVjayBpZiBmZW5jZSBp cyBzaWduYWxlZCBvbiB0aW1lb3V0LgpUaGlzIHdvdWxkIHByb2JhYmx5IGJlIGEgY2xlYW5lciBz b2x1dGlvbi4KCn5NYWFydGVuCgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fXwpOb3V2ZWF1IG1haWxpbmcgbGlzdApOb3V2ZWF1QGxpc3RzLmZyZWVkZXNrdG9w Lm9yZwpodHRwOi8vbGlzdHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vbm91dmVh dQo=