From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751977AbcF0MXq (ORCPT ); Mon, 27 Jun 2016 08:23:46 -0400 Received: from merlin.infradead.org ([205.233.59.134]:53197 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751714AbcF0MXp (ORCPT ); Mon, 27 Jun 2016 08:23:45 -0400 Date: Mon, 27 Jun 2016 14:23:35 +0200 From: Peter Zijlstra To: Juri Lelli Cc: mingo@kernel.org, tglx@linutronix.de, rostedt@goodmis.org, xlpang@redhat.com, linux-kernel@vger.kernel.org, mathieu.desnoyers@efficios.com, jdesfossez@efficios.com, bristot@redhat.com Subject: Re: [RFC][PATCH 8/8] rtmutex: Fix PI chain order integrity Message-ID: <20160627122335.GB30154@twins.programming.kicks-ass.net> References: <20160607195635.710022345@infradead.org> <20160607200216.117270606@infradead.org> <20160614173908.GQ5981@e106622-lin> <20160614194401.GL30921@twins.programming.kicks-ass.net> <20160615072507.GS5981@e106622-lin> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160615072507.GS5981@e106622-lin> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 15, 2016 at 08:25:07AM +0100, Juri Lelli wrote: > I guess it's not that likely, but yes it could potentially happen that a > waiter is optimistically spinning, depletes its runtime, gets throttled > and then replenished when still spinning. Maybe it doesn't really make > sense continuing spinning in this situation, but I guess things get > really complicated. :-/ > > Anyway, as said, I think this patch is OK. Maybe we want to add a > comment just to remember what situation can cause an issue if we don't > do this? Patch changelog would be OK as well for such a comment IMHO. OK, so I went to write a simple comment and ended up with the below :/ While writing the comment I noticed two issues: - we update the waiter order fields while the entry is still enqueued on the pi_waiters tree, which is also sorted by these exact fields. - another one of these pure ->prio comparisons Please double check, there be dragons here. --- --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -158,6 +158,9 @@ static inline bool unlock_rt_mutex_safe( } #endif +#define cmp_task(p) \ + &(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = (p)->dl.deadline } + static inline int rt_mutex_waiter_less(struct rt_mutex_waiter *left, struct rt_mutex_waiter *right) @@ -583,8 +586,26 @@ static int rt_mutex_adjust_prio_chain(st /* [7] Requeue the waiter in the lock waiter tree. */ rt_mutex_dequeue(lock, waiter); + + /* + * Update the waiter prio fields now that we're dequeued. + * + * These values can have changed through either: + * + * sys_sched_set_scheduler() / sys_sched_setattr() + * + * or + * + * DL CBS enforcement advancing the effective deadline. + * + * Even though pi_waiters also uses these fields, and that tree is only + * updated in [11], we can do this here, since we hold [L], which + * serializes all pi_waiters access and rb_erase() does not care about + * the values of the node being removed. + */ waiter->prio = task->prio; waiter->deadline = task->dl.deadline; + rt_mutex_enqueue(lock, waiter); /* [8] Release the task */ @@ -711,6 +732,8 @@ static int rt_mutex_adjust_prio_chain(st static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, struct rt_mutex_waiter *waiter) { + lockdep_assert_held(&lock->wait_lock); + /* * Before testing whether we can acquire @lock, we set the * RT_MUTEX_HAS_WAITERS bit in @lock->owner. This forces all @@ -770,7 +793,8 @@ static int try_to_take_rt_mutex(struct r * the top waiter priority (kernel view), * @task lost. */ - if (task->prio >= rt_mutex_top_waiter(lock)->prio) + if (!rt_mutex_waiter_less(cmp_task(task), + rt_mutex_top_waiter(lock))) return 0; /* @@ -838,6 +862,8 @@ static int task_blocks_on_rt_mutex(struc struct rt_mutex *next_lock; int chain_walk = 0, res; + lockdep_assert_held(&lock->wait_lock); + /* * Early deadlock detection. We really don't want the task to * enqueue on itself just to untangle the mess later. It's not @@ -973,6 +999,8 @@ static void remove_waiter(struct rt_mute struct task_struct *owner = rt_mutex_owner(lock); struct rt_mutex *next_lock; + lockdep_assert_held(&lock->wait_lock); + raw_spin_lock(¤t->pi_lock); rt_mutex_dequeue(lock, waiter); current->pi_blocked_on = NULL;