linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Davidlohr Bueso <dave@stgolabs.net>
Cc: tglx@linutronix.de, mingo@kernel.org, bigeasy@linutronix.de,
	umgwanakikbuti@gmail.com, paulmck@linux.vnet.ibm.com,
	linux-kernel@vger.kernel.org, kmo@daterainc.com,
	heiko.carstens@de.ibm.com
Subject: Re: [PATCH 4/3] rtmutex: Avoid barrier in rt_mutex_handle_deadlock
Date: Tue, 22 Mar 2016 11:21:53 +0100	[thread overview]
Message-ID: <20160322102153.GL6344@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20160321181622.GB32012@linux-uzut.site>

On Mon, Mar 21, 2016 at 11:16:22AM -0700, Davidlohr Bueso wrote:

> +/*
> + * Helpers for modifying the state of either the current task, or a foreign
> + * task. Each of these calls come in both full barrier and weak flavors:
> + *
> + *                                           Weak
> + *     set_task_state()           __set_task_state()
> + *     set_current_state()        __set_current_state()
> + *
> + * Where set_current_state() and set_task_state() includes a full smp barrier
> + * -after- the write of ->state is correctly serialized with the later test
> + * of whether to actually sleep:
> + *
> + *	for (;;) {
> + *		set_current_state(TASK_UNINTERRUPTIBLE);
> + *		if (event_indicated)
> + *			break;
> + *		schedule();
> + *	}
> + *
> + * This is commonly necessary for processes sleeping and waking through flag
> + * based events. If the caller does not need such serialization, then use
> + * weaker counterparts, which simply writes the state.
> + *
> + * Refer to Documentation/memory-barriers.txt
> + */

I would prefer to pretend set_task_state() does not exist, using it on
anything other than task==current is very very tricky.

With the below patch; we're only left with:

arch/s390/mm/fault.c:			__set_task_state(tsk, TASK_UNINTERRUPTIBLE);
arch/s390/mm/fault.c:			__set_task_state(tsk, TASK_UNINTERRUPTIBLE);
drivers/md/bcache/btree.c:	set_task_state(c->gc_thread, TASK_INTERRUPTIBLE);
kernel/exit.c:			set_task_state(tsk, TASK_UNINTERRUPTIBLE);
kernel/exit.c:		__set_task_state(tsk, TASK_RUNNING);

exit most probably also has tsk==current, but I didn't check.

bacache seems to rely on the fact that the task is not running after
kthread_create() to change the state. But I've no idea why; the only
think I can come up with is because load accounting, a new thread blocks
in UNINTERRUPTIBLE which adds to load. But by setting it to
INTERRUPTIBLE before waking up it can actually mess that up. This really
should be fixed.

And s390 does something entirely vile, no idea what.

---
 arch/um/drivers/random.c                                 |  2 +-
 drivers/md/dm-bufio.c                                    |  2 +-
 drivers/md/persistent-data/dm-block-manager.c            |  4 ++--
 drivers/staging/lustre/lustre/libcfs/linux/linux-debug.c |  6 ++++--
 drivers/tty/tty_ldsem.c                                  | 10 +++++-----
 kernel/locking/mutex.c                                   |  4 ++--
 kernel/locking/rwsem-spinlock.c                          | 12 +++++-------
 kernel/locking/rwsem-xadd.c                              |  4 ++--
 kernel/locking/semaphore.c                               |  2 +-
 9 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/arch/um/drivers/random.c b/arch/um/drivers/random.c
index dd16c902ff70..19d41a583288 100644
--- a/arch/um/drivers/random.c
+++ b/arch/um/drivers/random.c
@@ -76,7 +76,7 @@ static ssize_t rng_dev_read (struct file *filp, char __user *buf, size_t size,
 			add_sigio_fd(random_fd);
 
 			add_wait_queue(&host_read_wait, &wait);
-			set_task_state(current, TASK_INTERRUPTIBLE);
+			set_current_state(TASK_INTERRUPTIBLE);
 
 			schedule();
 			remove_wait_queue(&host_read_wait, &wait);
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index cd77216beff1..c5e89f358d98 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -807,7 +807,7 @@ static void __wait_for_free_buffer(struct dm_bufio_client *c)
 	DECLARE_WAITQUEUE(wait, current);
 
 	add_wait_queue(&c->free_buffer_wait, &wait);
-	set_task_state(current, TASK_UNINTERRUPTIBLE);
+	set_current_state(TASK_UNINTERRUPTIBLE);
 	dm_bufio_unlock(c);
 
 	io_schedule();
diff --git a/drivers/md/persistent-data/dm-block-manager.c b/drivers/md/persistent-data/dm-block-manager.c
index 1e33dd51c21f..821a26b934c2 100644
--- a/drivers/md/persistent-data/dm-block-manager.c
+++ b/drivers/md/persistent-data/dm-block-manager.c
@@ -118,7 +118,7 @@ static int __check_holder(struct block_lock *lock)
 static void __wait(struct waiter *w)
 {
 	for (;;) {
-		set_task_state(current, TASK_UNINTERRUPTIBLE);
+		set_current_state(TASK_UNINTERRUPTIBLE);
 
 		if (!w->task)
 			break;
@@ -126,7 +126,7 @@ static void __wait(struct waiter *w)
 		schedule();
 	}
 
-	set_task_state(current, TASK_RUNNING);
+	set_current_state(TASK_RUNNING);
 }
 
 static void __wake_waiter(struct waiter *w)
diff --git a/drivers/staging/lustre/lustre/libcfs/linux/linux-debug.c b/drivers/staging/lustre/lustre/libcfs/linux/linux-debug.c
index 59c7bf3cbc1f..087d7e49cf3e 100644
--- a/drivers/staging/lustre/lustre/libcfs/linux/linux-debug.c
+++ b/drivers/staging/lustre/lustre/libcfs/linux/linux-debug.c
@@ -162,9 +162,11 @@ void __noreturn lbug_with_loc(struct libcfs_debug_msg_data *msgdata)
 	libcfs_run_lbug_upcall(msgdata);
 	if (libcfs_panic_on_lbug)
 		panic("LBUG");
-	set_task_state(current, TASK_UNINTERRUPTIBLE);
-	while (1)
+
+	while (1) {
+		set_current_state(TASK_UNINTERRUPTIBLE);
 		schedule();
+	}
 }
 
 static int panic_notifier(struct notifier_block *self, unsigned long unused1,
diff --git a/drivers/tty/tty_ldsem.c b/drivers/tty/tty_ldsem.c
index 1bf8ed13f827..c94bc0eef85d 100644
--- a/drivers/tty/tty_ldsem.c
+++ b/drivers/tty/tty_ldsem.c
@@ -232,7 +232,7 @@ down_read_failed(struct ld_semaphore *sem, long count, long timeout)
 
 	/* wait to be given the lock */
 	for (;;) {
-		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+		set_current_state(TASK_UNINTERRUPTIBLE);
 
 		if (!waiter.task)
 			break;
@@ -241,7 +241,7 @@ down_read_failed(struct ld_semaphore *sem, long count, long timeout)
 		timeout = schedule_timeout(timeout);
 	}
 
-	__set_task_state(tsk, TASK_RUNNING);
+	__set_current_state(TASK_RUNNING);
 
 	if (!timeout) {
 		/* lock timed out but check if this task was just
@@ -291,14 +291,14 @@ down_write_failed(struct ld_semaphore *sem, long count, long timeout)
 
 	waiter.task = tsk;
 
-	set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+	set_current_state(TASK_UNINTERRUPTIBLE);
 	for (;;) {
 		if (!timeout)
 			break;
 		raw_spin_unlock_irq(&sem->wait_lock);
 		timeout = schedule_timeout(timeout);
 		raw_spin_lock_irq(&sem->wait_lock);
-		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+		set_current_state(TASK_UNINTERRUPTIBLE);
 		locked = writer_trylock(sem);
 		if (locked)
 			break;
@@ -309,7 +309,7 @@ down_write_failed(struct ld_semaphore *sem, long count, long timeout)
 	list_del(&waiter.list);
 	raw_spin_unlock_irq(&sem->wait_lock);
 
-	__set_task_state(tsk, TASK_RUNNING);
+	__set_current_state(TASK_RUNNING);
 
 	/* lock wait may have timed out */
 	if (!locked)
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index e364b424b019..c10fe056c34a 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -572,14 +572,14 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 				goto err;
 		}
 
-		__set_task_state(task, state);
+		__set_current_state(state);
 
 		/* didn't get the lock, go to sleep: */
 		spin_unlock_mutex(&lock->wait_lock, flags);
 		schedule_preempt_disabled();
 		spin_lock_mutex(&lock->wait_lock, flags);
 	}
-	__set_task_state(task, TASK_RUNNING);
+	__set_current_state(TASK_RUNNING);
 
 	mutex_remove_waiter(lock, &waiter, current_thread_info());
 	/* set it to 0 if there are no waiters left: */
diff --git a/kernel/locking/rwsem-spinlock.c b/kernel/locking/rwsem-spinlock.c
index 3a5048572065..dfe5ea3736a8 100644
--- a/kernel/locking/rwsem-spinlock.c
+++ b/kernel/locking/rwsem-spinlock.c
@@ -141,7 +141,7 @@ void __sched __down_read(struct rw_semaphore *sem)
 	}
 
 	tsk = current;
-	set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+	set_current_state(TASK_UNINTERRUPTIBLE);
 
 	/* set up my own style of waitqueue */
 	waiter.task = tsk;
@@ -158,10 +158,10 @@ void __sched __down_read(struct rw_semaphore *sem)
 		if (!waiter.task)
 			break;
 		schedule();
-		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+		set_current_state(TASK_UNINTERRUPTIBLE);
 	}
 
-	__set_task_state(tsk, TASK_RUNNING);
+	__set_current_state(TASK_RUNNING);
  out:
 	;
 }
@@ -194,14 +194,12 @@ int __down_read_trylock(struct rw_semaphore *sem)
 void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
 {
 	struct rwsem_waiter waiter;
-	struct task_struct *tsk;
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&sem->wait_lock, flags);
 
 	/* set up my own style of waitqueue */
-	tsk = current;
-	waiter.task = tsk;
+	waiter.task = current;
 	waiter.type = RWSEM_WAITING_FOR_WRITE;
 	list_add_tail(&waiter.list, &sem->wait_list);
 
@@ -215,7 +213,7 @@ void __sched __down_write_nested(struct rw_semaphore *sem, int subclass)
 		 */
 		if (sem->count == 0)
 			break;
-		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+		set_current_state(TASK_UNINTERRUPTIBLE);
 		raw_spin_unlock_irqrestore(&sem->wait_lock, flags);
 		schedule();
 		raw_spin_lock_irqsave(&sem->wait_lock, flags);
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index a4d4de05b2d1..a33ffc2ee236 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -244,13 +244,13 @@ struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
 
 	/* wait to be given the lock */
 	while (true) {
-		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
+		set_current_state(TASK_UNINTERRUPTIBLE);
 		if (!waiter.task)
 			break;
 		schedule();
 	}
 
-	__set_task_state(tsk, TASK_RUNNING);
+	__set_current_state(TASK_RUNNING);
 	return sem;
 }
 EXPORT_SYMBOL(rwsem_down_read_failed);
diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index b8120abe594b..2f8cdb712b63 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -216,7 +216,7 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
 			goto interrupted;
 		if (unlikely(timeout <= 0))
 			goto timed_out;
-		__set_task_state(task, state);
+		__set_current_state(state);
 		raw_spin_unlock_irq(&sem->lock);
 		timeout = schedule_timeout(timeout);
 		raw_spin_lock_irq(&sem->lock);

  reply	other threads:[~2016-03-22 10:22 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-08 18:20 [PATCH -tip 0/3] locking/rtmutex: Another crack at spin on owner Davidlohr Bueso
2016-03-08 18:20 ` [PATCH 1/3] rtmutex: Delete save_state member of struct rt_mutex Davidlohr Bueso
2016-03-08 18:20 ` [PATCH 2/3] rtmutex: Add rt_mutex_init_waiter helper Davidlohr Bueso
2016-03-14 13:16   ` Peter Zijlstra
2016-03-08 18:20 ` [PATCH 3/3] rtmutex: Reduce top-waiter blocking on a lock Davidlohr Bueso
2016-03-14 13:23   ` Peter Zijlstra
2016-03-08 22:05 ` [PATCH 4/3] rtmutex: Avoid barrier in rt_mutex_handle_deadlock Davidlohr Bueso
2016-03-14 13:40   ` Peter Zijlstra
2016-03-21 18:16     ` Davidlohr Bueso
2016-03-22 10:21       ` Peter Zijlstra [this message]
2016-03-22 11:32         ` Heiko Carstens
2016-03-22 12:20           ` Peter Zijlstra
2016-03-22 13:26             ` Heiko Carstens
2016-03-22 13:55               ` Peter Zijlstra
2016-03-22 14:45                 ` Heiko Carstens
2016-03-22 16:41                   ` Peter Zijlstra
2016-03-22 21:46                     ` Heiko Carstens
2016-03-25  2:30         ` Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160322102153.GL6344@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=bigeasy@linutronix.de \
    --cc=dave@stgolabs.net \
    --cc=heiko.carstens@de.ibm.com \
    --cc=kmo@daterainc.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).