From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751961AbcFNNHM (ORCPT ); Tue, 14 Jun 2016 09:07:12 -0400 Received: from foss.arm.com ([217.140.101.70]:58598 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751282AbcFNNHK (ORCPT ); Tue, 14 Jun 2016 09:07:10 -0400 Date: Tue, 14 Jun 2016 14:07:07 +0100 From: Juri Lelli To: xlpang@redhat.com Cc: Peter Zijlstra , mingo@kernel.org, tglx@linutronix.de, rostedt@goodmis.org, linux-kernel@vger.kernel.org, mathieu.desnoyers@efficios.com, jdesfossez@efficios.com, bristot@redhat.com, Ingo Molnar Subject: Re: [RFC][PATCH 2/8] sched/rtmutex/deadline: Fix a PI crash for deadline tasks Message-ID: <20160614130707.GJ5981@e106622-lin> References: <20160607195635.710022345@infradead.org> <20160607200215.719626477@infradead.org> <20160614102109.GF5981@e106622-lin> <575FFE58.1090102@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <575FFE58.1090102@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 14/06/16 20:53, Xunlei Pang wrote: > On 2016/06/14 at 18:21, Juri Lelli wrote: > > Hi, > > > > On 07/06/16 21:56, Peter Zijlstra wrote: > >> From: Xunlei Pang > >> > >> A crash happened while I was playing with deadline PI rtmutex. > >> > >> BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 > >> IP: [] rt_mutex_get_top_task+0x1f/0x30 > >> PGD 232a75067 PUD 230947067 PMD 0 > >> Oops: 0000 [#1] SMP > >> CPU: 1 PID: 10994 Comm: a.out Not tainted > >> > >> Call Trace: > >> [] enqueue_task+0x2c/0x80 > >> [] activate_task+0x23/0x30 > >> [] pull_dl_task+0x1d5/0x260 > >> [] pre_schedule_dl+0x16/0x20 > >> [] __schedule+0xd3/0x900 > >> [] schedule+0x29/0x70 > >> [] __rt_mutex_slowlock+0x4b/0xc0 > >> [] rt_mutex_slowlock+0xd1/0x190 > >> [] rt_mutex_timed_lock+0x53/0x60 > >> [] futex_lock_pi.isra.18+0x28c/0x390 > >> [] do_futex+0x190/0x5b0 > >> [] SyS_futex+0x80/0x180 > >> > > This seems to be caused by the race condition you detail below between > > load balancing and PI code. I tried to reproduce the BUG on my box, but > > it looks hard to get. Do you have a reproducer I can give a try? > > > >> This is because rt_mutex_enqueue_pi() and rt_mutex_dequeue_pi() > >> are only protected by pi_lock when operating pi waiters, while > >> rt_mutex_get_top_task(), will access them with rq lock held but > >> not holding pi_lock. > >> > >> In order to tackle it, we introduce new "pi_top_task" pointer > >> cached in task_struct, and add new rt_mutex_update_top_task() > >> to update its value, it can be called by rt_mutex_setprio() > >> which held both owner's pi_lock and rq lock. Thus "pi_top_task" > >> can be safely accessed by enqueue_task_dl() under rq lock. > >> > >> [XXX this next section is unparsable] > > Yes, a bit hard to understand. However, am I correct in assuming this > > patch and the previous one should fix this problem? Or are there still > > other races causing issues? > > Yes, these two patches can fix the problem. > > > > >> One problem is when rt_mutex_adjust_prio()->...->rt_mutex_setprio(), > >> at that time rtmutex lock was released and owner was marked off, > >> this can cause "pi_top_task" dereferenced to be a running one(as it > >> can be falsely woken up by others before rt_mutex_setprio() is > >> made to update "pi_top_task"). We solve this by directly calling > >> __rt_mutex_adjust_prio() in mark_wakeup_next_waiter() which held > >> pi_lock and rtmutex lock, and remove rt_mutex_adjust_prio(). Since > >> now we moved the deboost point, in order to avoid current to be > >> preempted due to deboost earlier before wake_up_q(), we also moved > >> preempt_disable() before unlocking rtmutex. > >> > >> Cc: Steven Rostedt > >> Cc: Ingo Molnar > >> Cc: Juri Lelli > >> Originally-From: Peter Zijlstra > >> Signed-off-by: Xunlei Pang > >> Signed-off-by: Peter Zijlstra (Intel) > >> Link: http://lkml.kernel.org/r/1461659449-19497-2-git-send-email-xlpang@redhat.com > > The idea of this fix makes sense to me. But, I would like to be able to > > see the BUG and test the fix. What I have is a test in which I create N > > DEADLINE workers that share a PI mutex. They get migrated around and > > seem to stress PI code. But I couldn't hit the BUG yet. Maybe I let it > > run for some more time. > > You can use this reproducer attached(gcc crash_deadline_pi.c -lpthread -lrt ). > Start multiple instances, then it will hit the bug very soon. > Great, thanks! I'll use it. Best, - Juri