From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gustavo Bittencourt Subject: Re: [PATCH] rtmutex: enable deadlock detection in ww_mutex_lock functions Date: Mon, 26 Jan 2015 13:06:35 -0200 Message-ID: <54C657FB.2060106@gmail.com> References: <20150120200229.GA17473@gbitten> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Cc: linux-rt-users To: Paul Gortmaker Return-path: Received: from mail-qa0-f41.google.com ([209.85.216.41]:39332 "EHLO mail-qa0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751221AbbAZPGm (ORCPT ); Mon, 26 Jan 2015 10:06:42 -0500 Received: by mail-qa0-f41.google.com with SMTP id bm13so7063848qab.0 for ; Mon, 26 Jan 2015 07:06:41 -0800 (PST) In-Reply-To: Sender: linux-rt-users-owner@vger.kernel.org List-ID: I'm not sure if those patches are connected, so I am posting my findings for anyone. ABOUT NOUVEAU AND WW_MUTEX: The nouveau driver intensely uses the ww_mutex. According the ww-mutex-design.txt documentation, the ww_mutex makes the following steps when try to lock a resource: 1) It calls the ww_mutex_lock (or ww_mutex_lock_interruptible) function for normal lock acquisition. 2) If it returns -EDEADLK, the wounded task must drop all already acquired locks and call the ww_mutex_lock_slow (or ww_mutex_lock_slow_interruptible) function. In the nouveau driver, those steps happen in the validate_init function (drivers/gpu/drm/nouveau_gem.c) when it calls the functions ttm_bo_reserve (step 1) and ttm_bo_reserve_slowpath (step 2): validate_init(...) { ... ret = ttm_bo_reserve(&nvbo->bo, true, false, true, &op->ticket); if (ret) { validate_fini_no_ticket(op, NULL); if (unlikely(ret == -EDEADLK)) { ret = ttm_bo_reserve_slowpath(&nvbo->bo, true, &op->ticket); ... } THE PROBLEM: When I run the PREEMPT_RT in my computer, it freezes as soon I start some big graphical application (e.g. Firefox or Chrome). The following execution path happens until reaching an infinite loop: a) nouveau_gem_ioctl_pushbuf() b) nouveau_gem_pushbuf_validate() c) validate_init() d) ttm_bo_reserve() e) ttm_bo_reserve_nolru() f) ww_mutex_lock_interruptible() g) __ww_mutex_lock_interruptible() h) rt_mutex_slowlock() i) rt_mutex_handle_deadlock() -> infinite loop The infinite loop that I mentioned above is the while(1){...} inside the rt_mutex_handle_deadlock function: static void rt_mutex_handle_deadlock(int res, int detect_deadlock, struct rt_mutex_waiter *w) { /* * If the result is not -EDEADLOCK or the caller requested * deadlock detection, nothing to do here. */ if (res != -EDEADLOCK || detect_deadlock) return; /* * Yell lowdly and stop the task right here. */ rt_mutex_print_deadlock(w); while (1) { set_current_state(TASK_INTERRUPTIBLE); schedule(); } } CONCLUSION: My conclusion was the detect_deadlock should be enabled to the rt_mutex_handle_deadlock function returns -EDEADLK and hence the ww_mutex_lock_interruptible returns -EDEADLK too. BTW, since I have applied my patch, I've not had anymore freezing issue. PS: I am a kernelnewbie so it is possible that I am writing some BS here. On 01/25/2015 06:55 PM, Paul Gortmaker wrote: > On Tue, Jan 20, 2015 at 3:02 PM, Gustavo Bittencourt wrote: >> The functions ww_mutex_lock_interruptible and ww_mutex_lock should return -EDEADLK when faced with >> a deadlock. To do so, the paramenter detect_deadlock in rt_mutex_slowlock must be TRUE. >> This patch corrects potential deadlocks when running PREEMPT_RT with nouveau driver. >> >> Kernel version: 3.14.25-rt22 > > You might want to have a look at this: > > http://www.spinics.net/lists/linux-rt-users/msg12259.html > > as it also impacts what the deadlock detection flag does. > > Paul. > -- > >> >> Signed-off-by: Gustavo Bittencourt >> --- >> kernel/locking/rtmutex.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c >> index 6c40660..3f6ef91 100644 >> --- a/kernel/locking/rtmutex.c >> +++ b/kernel/locking/rtmutex.c >> @@ -1965,7 +1965,7 @@ __ww_mutex_lock_interruptible(struct ww_mutex *lock, struct ww_acquire_ctx *ww_c >> might_sleep(); >> >> mutex_acquire(&lock->base.dep_map, 0, 0, _RET_IP_); >> - ret = rt_mutex_slowlock(&lock->base.lock, TASK_INTERRUPTIBLE, NULL, 0, ww_ctx); >> + ret = rt_mutex_slowlock(&lock->base.lock, TASK_INTERRUPTIBLE, NULL, 1, ww_ctx); >> if (ret) >> mutex_release(&lock->base.dep_map, 1, _RET_IP_); >> else if (!ret && ww_ctx->acquired > 1) >> @@ -1984,7 +1984,7 @@ __ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ww_ctx) >> >> mutex_acquire_nest(&lock->base.dep_map, 0, 0, &ww_ctx->dep_map, >> _RET_IP_); >> - ret = rt_mutex_slowlock(&lock->base.lock, TASK_UNINTERRUPTIBLE, NULL, 0, ww_ctx); >> + ret = rt_mutex_slowlock(&lock->base.lock, TASK_UNINTERRUPTIBLE, NULL, 1, ww_ctx); >> if (ret) >> mutex_release(&lock->base.dep_map, 1, _RET_IP_); >> else if (!ret && ww_ctx->acquired > 1) >> -- >> 1.9.1 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html