All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5][v3] rq-qos memory barrier shenanigans
@ 2019-07-16 20:19 Josef Bacik
  2019-07-16 20:19 ` [PATCH 1/5] wait: add wq_has_single_sleeper helper Josef Bacik
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Josef Bacik @ 2019-07-16 20:19 UTC (permalink / raw)
  To: axboe, kernel-team, linux-block, peterz, oleg

This is the patch series to address the hang we saw in production because of
missed wakeups, and the other issues that Oleg noticed while reviewing the code.

v2->v3:
- apparently I don't understand what READ/WRITE_ONCE does
- set ourselves to TASK_UNINTERRUPTIBLE on wakeup just in case
- add a comment about why we don't need a mb for the first data.token check
  which I'm sure Oleg will tell me is wrong and I'll have to send a v4

v1->v2:
- rename wq_has_multiple_sleepers to wq_has_single_sleeper
- fix the check for has_sleepers in the missed wake-ups patch
- fix the barrier issues around got_token that Oleg noticed
- dropped the has_sleepers reset that Oleg noticed we didn't need

Thanks,

Josef

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/5] wait: add wq_has_single_sleeper helper
  2019-07-16 20:19 [PATCH 0/5][v3] rq-qos memory barrier shenanigans Josef Bacik
@ 2019-07-16 20:19 ` Josef Bacik
  2019-07-16 20:19 ` [PATCH 2/5] rq-qos: fix missed wake-ups in rq_qos_throttle Josef Bacik
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Josef Bacik @ 2019-07-16 20:19 UTC (permalink / raw)
  To: axboe, kernel-team, linux-block, peterz, oleg

rq-qos sits in the io path so we want to take locks as sparingly as
possible.  To accomplish this we try not to take the waitqueue head lock
unless we are sure we need to go to sleep, and we have an optimization
to make sure that we don't starve out existing waiters.  Since we check
if there are existing waiters locklessly we need to be able to update
our view of the waitqueue list after we've added ourselves to the
waitqueue.  Accomplish this by adding this helper to see if there is
more than just ourselves on the list.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 include/linux/wait.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/linux/wait.h b/include/linux/wait.h
index b6f77cf60dd7..30c515520fb2 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -126,6 +126,19 @@ static inline int waitqueue_active(struct wait_queue_head *wq_head)
 	return !list_empty(&wq_head->head);
 }
 
+/**
+ * wq_has_single_sleeper - check if there is only one sleeper
+ * @wq_head: wait queue head
+ *
+ * Returns true of wq_head has only one sleeper on the list.
+ *
+ * Please refer to the comment for waitqueue_active.
+ */
+static inline bool wq_has_single_sleeper(struct wait_queue_head *wq_head)
+{
+	return list_is_singular(&wq_head->head);
+}
+
 /**
  * wq_has_sleeper - check if there are any waiting processes
  * @wq_head: wait queue head
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/5] rq-qos: fix missed wake-ups in rq_qos_throttle
  2019-07-16 20:19 [PATCH 0/5][v3] rq-qos memory barrier shenanigans Josef Bacik
  2019-07-16 20:19 ` [PATCH 1/5] wait: add wq_has_single_sleeper helper Josef Bacik
@ 2019-07-16 20:19 ` Josef Bacik
  2019-07-16 20:19 ` [PATCH 3/5] rq-qos: don't reset has_sleepers on spurious wakeups Josef Bacik
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Josef Bacik @ 2019-07-16 20:19 UTC (permalink / raw)
  To: axboe, kernel-team, linux-block, peterz, oleg

We saw a hang in production with WBT where there was only one waiter in
the throttle path and no outstanding IO.  This is because of the
has_sleepers optimization that is used to make sure we don't steal an
inflight counter for new submitters when there are people already on the
list.

We can race with our check to see if the waitqueue has any waiters (this
is done locklessly) and the time we actually add ourselves to the
waitqueue.  If this happens we'll go to sleep and never be woken up
because nobody is doing IO to wake us up.

Fix this by checking if the waitqueue has a single sleeper on the list
after we add ourselves, that way we have an uptodate view of the list.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 block/blk-rq-qos.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 659ccb8b693f..67a0a4c07060 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -244,6 +244,7 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data,
 		return;
 
 	prepare_to_wait_exclusive(&rqw->wait, &data.wq, TASK_UNINTERRUPTIBLE);
+	has_sleeper = !wq_has_single_sleeper(&rqw->wait);
 	do {
 		if (data.got_token)
 			break;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/5] rq-qos: don't reset has_sleepers on spurious wakeups
  2019-07-16 20:19 [PATCH 0/5][v3] rq-qos memory barrier shenanigans Josef Bacik
  2019-07-16 20:19 ` [PATCH 1/5] wait: add wq_has_single_sleeper helper Josef Bacik
  2019-07-16 20:19 ` [PATCH 2/5] rq-qos: fix missed wake-ups in rq_qos_throttle Josef Bacik
@ 2019-07-16 20:19 ` Josef Bacik
  2019-07-16 20:19 ` [PATCH 4/5] rq-qos: set ourself TASK_UNINTERRUPTIBLE after we schedule Josef Bacik
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Josef Bacik @ 2019-07-16 20:19 UTC (permalink / raw)
  To: axboe, kernel-team, linux-block, peterz, oleg

If we raced with somebody else getting an inflight counter we could fail
to get an inflight counter with no sleepers on the list, and thus need
to go to sleep.  In this case has_sleepers should be true because we are
now relying on the waker to get our inflight counter for us.  And in the
case of spurious wakeups we'd still want this to be the case.  So set
has_sleepers to true if we went to sleep to make sure we're woken up the
proper way.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 block/blk-rq-qos.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 67a0a4c07060..69a0f0b77795 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -261,7 +261,7 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data,
 			break;
 		}
 		io_schedule();
-		has_sleeper = false;
+		has_sleeper = true;
 	} while (1);
 	finish_wait(&rqw->wait, &data.wq);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/5] rq-qos: set ourself TASK_UNINTERRUPTIBLE after we schedule
  2019-07-16 20:19 [PATCH 0/5][v3] rq-qos memory barrier shenanigans Josef Bacik
                   ` (2 preceding siblings ...)
  2019-07-16 20:19 ` [PATCH 3/5] rq-qos: don't reset has_sleepers on spurious wakeups Josef Bacik
@ 2019-07-16 20:19 ` Josef Bacik
  2019-07-16 20:19 ` [PATCH 5/5] rq-qos: use a mb for got_token Josef Bacik
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Josef Bacik @ 2019-07-16 20:19 UTC (permalink / raw)
  To: axboe, kernel-team, linux-block, peterz, oleg

In case we get a spurious wakeup we need to make sure to re-set
ourselves to TASK_UNINTERRUPTIBLE so we don't busy wait.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 block/blk-rq-qos.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 69a0f0b77795..c450b8952eae 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -262,6 +262,7 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data,
 		}
 		io_schedule();
 		has_sleeper = true;
+		set_current_state(TASK_UNINTERRUPTIBLE);
 	} while (1);
 	finish_wait(&rqw->wait, &data.wq);
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/5] rq-qos: use a mb for got_token
  2019-07-16 20:19 [PATCH 0/5][v3] rq-qos memory barrier shenanigans Josef Bacik
                   ` (3 preceding siblings ...)
  2019-07-16 20:19 ` [PATCH 4/5] rq-qos: set ourself TASK_UNINTERRUPTIBLE after we schedule Josef Bacik
@ 2019-07-16 20:19 ` Josef Bacik
  2019-07-18 15:56 ` [PATCH 0/5][v3] rq-qos memory barrier shenanigans Oleg Nesterov
  2019-07-18 16:20 ` Jens Axboe
  6 siblings, 0 replies; 8+ messages in thread
From: Josef Bacik @ 2019-07-16 20:19 UTC (permalink / raw)
  To: axboe, kernel-team, linux-block, peterz, oleg

Oleg noticed that our checking of data.got_token is unsafe in the
cleanup case, and should really use a memory barrier.  Use a wmb on the
write side, and a rmb() on the read side.  We don't need one in the main
loop since we're saved by set_current_state().

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 block/blk-rq-qos.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index c450b8952eae..3954c0dc1443 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -202,6 +202,7 @@ static int rq_qos_wake_function(struct wait_queue_entry *curr,
 		return -1;
 
 	data->got_token = true;
+	smp_wmb();
 	list_del_init(&curr->entry);
 	wake_up_process(data->task);
 	return 1;
@@ -246,6 +247,7 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data,
 	prepare_to_wait_exclusive(&rqw->wait, &data.wq, TASK_UNINTERRUPTIBLE);
 	has_sleeper = !wq_has_single_sleeper(&rqw->wait);
 	do {
+		/* The memory barrier in set_task_state saves us here. */
 		if (data.got_token)
 			break;
 		if (!has_sleeper && acquire_inflight_cb(rqw, private_data)) {
@@ -256,6 +258,7 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data,
 			 * which means we now have two. Put our local token
 			 * and wake anyone else potentially waiting for one.
 			 */
+			smp_rmb();
 			if (data.got_token)
 				cleanup_cb(rqw, private_data);
 			break;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/5][v3] rq-qos memory barrier shenanigans
  2019-07-16 20:19 [PATCH 0/5][v3] rq-qos memory barrier shenanigans Josef Bacik
                   ` (4 preceding siblings ...)
  2019-07-16 20:19 ` [PATCH 5/5] rq-qos: use a mb for got_token Josef Bacik
@ 2019-07-18 15:56 ` Oleg Nesterov
  2019-07-18 16:20 ` Jens Axboe
  6 siblings, 0 replies; 8+ messages in thread
From: Oleg Nesterov @ 2019-07-18 15:56 UTC (permalink / raw)
  To: Josef Bacik; +Cc: axboe, kernel-team, linux-block, peterz

On 07/16, Josef Bacik wrote:
>
> - add a comment about why we don't need a mb for the first data.token check
>   which I'm sure Oleg will tell me is wrong and I'll have to send a v4

we don't need it because prepare_to_wait_exclusive() does set_current_state()
and this implies mb().

(in fact I think in this particular case it is not needed at all because
 rq_qos_wake_function() sets condition == T under wq_head->lock)

I see nothing wrong in this series. Feel free to add

Reviewed-by: Oleg Nesterov <oleg@redhat.com>

But why does it use wq_has_sleeper() ? I do not understand why do we need
mb() before waitqueue_active() in this particular case...


and just for record. iiuc acquire_inflight_cb() can't block, so it is not
clear to me why performance-wise this all is actually better than just

	void rq_qos_wait(struct rq_wait *rqw, void *private_data,
			 acquire_inflight_cb_t *acquire_inflight_cb)
	{
		struct rq_qos_wait_data data = {
			.wq = {
				.flags	= WQ_FLAG_EXCLUSIVE;
				.func	= rq_qos_wake_function,
				.entry	= LIST_HEAD_INIT(data.wq.entry),
			},
			.task = current,
			.rqw = rqw,
			.cb = acquire_inflight_cb,
			.private_data = private_data,
		};

		if (!wq_has_sleeper(&rqw->wait) && acquire_inflight_cb(rqw, private_data))
			return;

		spin_lock_irq(&rqw->wait.lock);
		if (list_empty(&wq_entry->entry) && acquire_inflight_cb(rqw, private_data))
			data.got_token = true;
		else
			__add_wait_queue_entry_tail(&rqw->wait, &data.wq);
		spin_unlock_irq(&rqw->wait.lock);

		for (;;) {
			set_current_state(TASK_UNINTERRUPTIBLE);
			if (data.got_token)
				break;
			io_schedule();
		}
		finish_wait(&rqw->wait, &data.wq);
	}

note also that the acquire_inflight_cb argument goes away.

Nevermind, feel free to ignore.

Oleg.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/5][v3] rq-qos memory barrier shenanigans
  2019-07-16 20:19 [PATCH 0/5][v3] rq-qos memory barrier shenanigans Josef Bacik
                   ` (5 preceding siblings ...)
  2019-07-18 15:56 ` [PATCH 0/5][v3] rq-qos memory barrier shenanigans Oleg Nesterov
@ 2019-07-18 16:20 ` Jens Axboe
  6 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2019-07-18 16:20 UTC (permalink / raw)
  To: Josef Bacik, kernel-team, linux-block, peterz, oleg

On 7/16/19 2:19 PM, Josef Bacik wrote:
> This is the patch series to address the hang we saw in production because of
> missed wakeups, and the other issues that Oleg noticed while reviewing the code.
> 
> v2->v3:
> - apparently I don't understand what READ/WRITE_ONCE does
> - set ourselves to TASK_UNINTERRUPTIBLE on wakeup just in case
> - add a comment about why we don't need a mb for the first data.token check
>    which I'm sure Oleg will tell me is wrong and I'll have to send a v4
> 
> v1->v2:
> - rename wq_has_multiple_sleepers to wq_has_single_sleeper
> - fix the check for has_sleepers in the missed wake-ups patch
> - fix the barrier issues around got_token that Oleg noticed
> - dropped the has_sleepers reset that Oleg noticed we didn't need

Thanks Josef, applied for 5.3.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-07-18 16:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-16 20:19 [PATCH 0/5][v3] rq-qos memory barrier shenanigans Josef Bacik
2019-07-16 20:19 ` [PATCH 1/5] wait: add wq_has_single_sleeper helper Josef Bacik
2019-07-16 20:19 ` [PATCH 2/5] rq-qos: fix missed wake-ups in rq_qos_throttle Josef Bacik
2019-07-16 20:19 ` [PATCH 3/5] rq-qos: don't reset has_sleepers on spurious wakeups Josef Bacik
2019-07-16 20:19 ` [PATCH 4/5] rq-qos: set ourself TASK_UNINTERRUPTIBLE after we schedule Josef Bacik
2019-07-16 20:19 ` [PATCH 5/5] rq-qos: use a mb for got_token Josef Bacik
2019-07-18 15:56 ` [PATCH 0/5][v3] rq-qos memory barrier shenanigans Oleg Nesterov
2019-07-18 16:20 ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.