linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Esben Nielsen <simlo@phys.au.dk>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org, robustmutexes@lists.osdl.org,
	Ingo Molnar <mingo@elte.hu>,
	Dinakar Guniguntala <dino@in.ibm.com>
Subject: Re: Recursion bug in -rt
Date: Tue, 20 Dec 2005 21:42:07 +0100 (MET)	[thread overview]
Message-ID: <Pine.OSF.4.05.10512202042300.1720-100000@da410.phys.au.dk> (raw)
In-Reply-To: <1135107232.13138.348.camel@localhost.localdomain>

On Tue, 20 Dec 2005, Steven Rostedt wrote:

> On Tue, 2005-12-20 at 18:43 +0100, Esben Nielsen wrote:
> > 
> > On Tue, 20 Dec 2005, Dinakar Guniguntala wrote:
> > 
> > > On Tue, Dec 20, 2005 at 02:19:56PM +0100, Ingo Molnar wrote:
> > > > 
> > > > hm, i'm looking at -rf4 - these changes look fishy:
> > > > 
> > > > -       _raw_spin_lock(&lock_owner(lock)->task->pi_lock);
> > > > +       if (current != lock_owner(lock)->task)
> > > > +               _raw_spin_lock(&lock_owner(lock)->task->pi_lock);
> > > > 
> > > > why is this done?
> > > >
> > >  
> > > Ingo, this is to prevent a kernel hang due to application error.
> > > 
> > > Basically when an application does a pthread_mutex_lock twice on a
> > > _nonrecursive_ mutex with robust/PI attributes the whole system hangs.
> > > Ofcourse the application clearly should not be doing anything like
> > > that, but it should not end up hanging the system either
> > >
> > 
> > Hmm, reading the comment on the function, wouldn't it be more natural to
> > use 
> >     if(task != lock_owner(lock)->task)
> > as it assumes that task->pi_lock is locked, not that current->pi_lock is
> > locked.
> > 
> > By the way:
> >  task->pi_lock is taken. lock_owner(lock)->task->pi_lock will be taken.
> > What if the task lock_owner(lock)->task tries to lock another futex,
> > (lock2) with which has lock_owner(lock2)->task==task.
> > Can't you promote a user space futex deadlock into a kernel spin deadlock 
> > this way?
> 
> Yes!
> 
> The locking code of the pi locks in the rt.c code is VERY dependent on
> the order of locks taken.  It works by assuming the order of locks taken
> will not themselves cause a deadlock.  I just recently submitted a patch
> to Ingo because I found that mutex_trylock can cause a deadlock, since
> it is not bound to the order of locks.
> 
> So, to answer your question more formal this time.  If the futex code
> uses the pi_lock code in rt.c and the futex causes a deadlock, then the
> kernel can deadlock too.
> 
This is ok for kernel mutexes, which are supposed not to cause deadlocks.
But user space deadlocks must not cause kernel deadlocks. Therefore the
robust futex code _must_ be fixed.

> The benefit of this locking order is that we got rid of the global
> pi_lock, and that was worth the problems you face today.
>

I believe this problem can be solved in the pi_lock code - but it will
require quite a bit of recoding. I started on it a little while ago but
didn't at all get the time to get into anything to even compile :-(
I don't have time to finish any code at all but I guess I can plant an
the idea instead:

When resolving the mutex chain (task A locks mutex 1 owned by B blocked
on 2 owned by C etc) for PI boosting and also when finding deadlocks,
release _all_ locks before going to the next step in the chain. Use
get_task_struct on the next task in the chain, release the locks,
take the pi_locks in a fixed order (sort by address forinstance), do the
PI boosting, get_task_struct on the next lock, release all the locks, do
the put_task_struct() on the previous task etc.
This a lot more expensive approach as it involves double as many spinlock
operations and get/put_task_struct() calls , but it has the added benifit
of reducing the overall system latency for long locking chains as there
are spots where interrupts and preemption will be enabled for each step in
the chain. Remember also that deep locking chains should be considered an
exeption so the code doesn't have to be optimal wrt. performance. 

I added my feeble attempts to implement this below. I have no chance of
ever getting time finish it :-(

> -- Steve
> 
> 

This is a "near" replacement for task_blocks_on_lock(). It leaves with all
spinlocks unlocked. A lot of other code needs to be changed as well to
accept the new right order of locking.

static void
task_blocks_on_lock2(struct rt_mutex_waiter *waiter, struct thread_info *ti,
		    struct rt_mutex *lock __EIP_DECL__)
{
	task_t *owner, *blocked_task = ti->task;

#ifdef CONFIG_RT_DEADLOCK_DETECT
	/* mark the current thread as blocked on the lock */
	waiter->eip = eip;
#endif
	
	blocked_task->blocked_on = waiter;
	waiter->lock = lock;
	waiter->ti = blocked_task->thread_info;
	waiter->on_pi_waiters = 0;
	plist_init(&waiter->pi_list, blocked_task->prio);
	plist_init(&waiter->list, blocked_task->prio);

	SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&blocked_task->pi_lock));
	SMP_TRACE_BUG_ON_LOCKED(!spin_is_locked(&lock->wait_lock));

	plist_add(&waiter->list, &lock->wait_list);
	set_lock_owner_pending(lock);

	owner = lock_owner_task(lock);
	/* Put code to grab the lock and return on success here */
	
	/* Code for waiting: */
	get_task_struct(owner);
	get_task_struct(blocked_task);
	blocked_task->blocked_on = waiter;
	
	_raw_spin_unlock(&lock->wait_lock);
	_raw_spin_unlock(&current->pi_lock);
	
	while(owner) {
		/* No spin-locks held here! */
		
		struct rt_mutex *this_lock;
		task_t *next_owner = NULL;
		
		if(owner-blocked_task<0) {
			_raw_spin_lock(&owner->pi_lock);
			_raw_spin_lock(&blocked_task->pi_lock);
		}
		else {
			_raw_spin_lock(&blocked_task->pi_lock);
			_raw_spin_lock(&owner->pi_lock);
		}

		if(!blocked_task->blocked_on ||
		   !blocked_task->blocked_on->ti) {
			/* The lock have been released!! */
			_raw_spin_unlock(&blocked_task->pi_lock);
			_raw_spin_unlock(&owner->pi_lock);
			put_task_struct(owner);
			break;			
		}
 		waiter = blocked_task->blocked_on;
		this_lock = waiter->lock;
		_raw_spin_lock(&this_lock->wait_lock);
		if(lock_owner_task(this_lock)!=owner) {
			/* Ups, owner have changed 
			   redo it with the new owner */
			next_owner = lock_owner_task(this_lock);
			if(next_owner)
				get_task_struct(next_owner);
			_raw_spin_unlock(&this_lock->wait_lock);
			_raw_spin_unlock(&blocked_task->pi_lock);
			_raw_spin_unlock(&owner->pi_lock);
			put_task_struct(owner);
			owner = next_owner;
			continue;
		}

		/* Prio might have changed - it does so in the
		   previous loop. So the lists needs to be resorted 
		*/
		plist_del(&waiter->list, &lock->wait_list);
		plist_init(&waiter->list, blocked_task->prio);
		plist_add(&waiter->list, &lock->wait_list);

		if(waiter->on_pi_waiters) {
			plist_del(&waiter->pi_list,
				  &owner->pi_waiters);
			waiter->on_pi_waiters = 0;
		}

		/* We only need the highest priority waiter on 
		   owner->pi_list  */
		if(plist_first_entry(&lock->wait_list,
				      struct rt_mutex_waiter, pi_list) 
		   == waiter) {
			plist_init(&waiter->pi_list, blocked_task->prio);
			plist_add(&waiter->pi_list, 
				  &owner->pi_waiters);
			waiter->on_pi_waiters = 1;
		}

		_raw_spin_unlock(&this_lock->wait_lock);
		_raw_spin_unlock(&blocked_task->pi_lock);
		if(DEADLOCK_DETECT || owner->prio > blocked_task->prio) {
			if(owner->prio > blocked_task->prio)
				mutex_setprio(owner,blocked_task->prio);
			if(owner->blocked_on) {
				waiter = owner->blocked_on;
				_raw_spin_lock(&waiter->lock->wait_lock);
				next_owner = lock_owner_task(waiter->lock);
				if(next_owner)
					get_task_struct(next_owner);
                                _raw_spin_unlock(&waiter->lock->wait_lock);
			}
		}
		put_task_struct(blocked_task);
		blocked_task = owner;
		owner = next_owner;
        }
	BUG_ON(!blocked_task);
	put_task_struct(blocked_task);
}



  reply	other threads:[~2005-12-20 20:42 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-14 22:39 Recursion bug in -rt Dinakar Guniguntala
2005-12-15  1:03 ` david singleton
2005-12-15 19:44   ` Dinakar Guniguntala
2005-12-15 20:40     ` David Singleton
2005-12-16  0:02     ` david singleton
2005-12-16 18:42       ` Dinakar Guniguntala
2005-12-16 21:26         ` David Singleton
2005-12-19 11:56           ` Dinakar Guniguntala
2005-12-19 20:11             ` David Singleton
2005-12-15 19:00 ` David Singleton
2005-12-15 19:52   ` Dinakar Guniguntala
2005-12-20 13:19   ` Ingo Molnar
2005-12-20 15:50     ` Dinakar Guniguntala
2005-12-20 17:43       ` Esben Nielsen
2005-12-20 19:33         ` Steven Rostedt
2005-12-20 20:42           ` Esben Nielsen [this message]
2005-12-20 21:20             ` Steven Rostedt
2005-12-20 21:55               ` david singleton
2005-12-20 22:56                 ` Esben Nielsen
2005-12-20 23:12                   ` Steven Rostedt
2005-12-20 23:55                     ` Esben Nielsen
2005-12-22  4:37                       ` david singleton
2005-12-20 22:43               ` Esben Nielsen
2005-12-20 22:59                 ` Steven Rostedt
2006-01-03  1:54       ` david singleton
2006-01-05  2:14       ` david singleton
2006-01-05  9:43         ` Esben Nielsen
2006-01-05 17:11           ` david singleton
2006-01-05 17:47             ` Esben Nielsen
2006-01-05 18:26               ` david singleton
2006-01-07  2:40               ` robust futex deadlock detection patch david singleton
     [not found]                 ` <a36005b50601071145y7e2ead9an4a4ca7896f35a85e@mail.gmail.com>
2006-01-07 19:49                   ` Ulrich Drepper
2006-01-09  9:23                 ` Esben Nielsen
2006-01-09 20:01                   ` David Singleton
2006-01-09 20:16                     ` Esben Nielsen
2006-01-09 21:08                       ` Steven Rostedt
2006-01-09 21:19                         ` Esben Nielsen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.OSF.4.05.10512202042300.1720-100000@da410.phys.au.dk \
    --to=simlo@phys.au.dk \
    --cc=dino@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=robustmutexes@lists.osdl.org \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).