[PATCH BUGFIX 0/3] null_blk: fix throughout losses and hangs

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH BUGFIX 0/3] null_blk: fix throughout losses and hangs
@ 2015-11-02 14:31 Paolo Valente
  2015-11-02 14:31 ` [PATCH BUGFIX 1/3] null_blk: set a separate timer for each command Paolo Valente
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Paolo Valente @ 2015-11-02 14:31 UTC (permalink / raw)
  To: Jens Axboe, Matias Bjørling, Arianna Avanzini
  Cc: Paolo Valente, Akinobu Mita, Luis R. Rodriguez, Ming Lei,
	Mike Krinkin, linux-kernel

Hi,
while doing some tests with the null_blk device driver, we bumped into
two problems: first, unjustified and in some cases high throughput
losses; second, actual hangs. These problems seem to be the
consequence of the combination of three causes, and this patchset
introduces a fix for each of these causes. In particular, changes
address:
. an apparent flaw in the logic with which delayed completions are
  implemented: this flaw causes, with unlucky but non-pathological
  workloads, actual request-completion delays to become arbitrarily
  larger than the configured delay;
. the missing restart of the device queue on the completion of a request in
  single-queue non-delayed mode;
. the overflow of the request-delay parameter, when extremely high values
  are used (e.g., to spot bugs).

To avoid possible confusion, we stress that these fixes *do not* have
anything to do with the problems highlighted in [1] (tests of the
multiqueue xen-blkfront and xen-blkback modules with null_blk).

You can find more details in the patch descriptions.

Thanks,
Paolo and Arianna

[1] https://lkml.org/lkml/2015/8/19/181

Arianna Avanzini (2):
  null_blk: guarantee device restart in all irq modes
  null_blk: change type of completion_nsec to unsigned long

Paolo Valente (1):
  null_blk: set a separate timer for each command

 drivers/block/null_blk.c | 94 +++++++++++++++++-------------------------------
 1 file changed, 33 insertions(+), 61 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH BUGFIX 1/3] null_blk: set a separate timer for each command
  2015-11-02 14:31 [PATCH BUGFIX 0/3] null_blk: fix throughout losses and hangs Paolo Valente
@ 2015-11-02 14:31 ` Paolo Valente
  2015-11-02 16:14   ` Jens Axboe
  2015-11-02 14:31 ` [PATCH BUGFIX 2/3] null_blk: guarantee device restart in all irq modes Paolo Valente
  2015-11-02 14:31 ` [PATCH BUGFIX 3/3] null_blk: change type of completion_nsec to unsigned long Paolo Valente
  2 siblings, 1 reply; 15+ messages in thread
From: Paolo Valente @ 2015-11-02 14:31 UTC (permalink / raw)
  To: Jens Axboe, Matias Bjørling, Arianna Avanzini
  Cc: Paolo Valente, Akinobu Mita, Luis R. Rodriguez, Ming Lei,
	Mike Krinkin, linux-kernel

For the Timer IRQ mode (i.e., when command completions are delayed),
there is one timer for each CPU. Each of these timers
. has a completion queue associated with it, containing all the
  command completions to be executed when the timer fires;
. is set, and a new completion-to-execute is inserted into its
  completion queue, every time the dispatch code for a new command
  happens to be executed on the CPU related to the timer.

This implies that, if the dispatch of a new command happens to be
executed on a CPU whose timer has already been set, but has not yet
fired, then the timer is set again, to the completion time of the
newly arrived command. When the timer eventually fires, all its queued
completions are executed.

This way of handling delayed command completions entails the following
problem: if more than one command completion is inserted into the
queue of a timer before the timer fires, then the expiration time for
the timer is moved forward every time each of these completions is
enqueued. As a consequence, only the last completion enqueued enjoys a
correct execution time, while all previous completions are unjustly
delayed until the last completion is executed (and at that time they
are executed all together).

Specifically, if all the above completions are enqueued almost at the
same time, then the problem is negligible. On the opposite end, if
every completion is enqueued a while after the previous completion was
enqueued (in the extreme case, it is enqueued only right before the
timer would have expired), then every enqueued completion, except for
the last one, experiences an inflated delay, proportional to the number
of completions enqueued after it. In the end, commands, and thus I/O
requests, may be completed at an arbitrarily lower rate than the
desired one.

This commit addresses this issue by replacing per-CPU timers with
per-command timers, i.e., by associating an individual timer with each
command.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Arianna Avanzini <avanzini@google.com>
---
 drivers/block/null_blk.c | 79 +++++++++++++++---------------------------------
 1 file changed, 24 insertions(+), 55 deletions(-)

diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index 1c9e4fe..7db5a77 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -17,6 +17,7 @@ struct nullb_cmd {
 	struct bio *bio;
 	unsigned int tag;
 	struct nullb_queue *nq;
+	struct hrtimer timer;
 };
 
 struct nullb_queue {
@@ -46,17 +47,6 @@ static struct mutex lock;
 static int null_major;
 static int nullb_indexes;
 
-struct completion_queue {
-	struct llist_head list;
-	struct hrtimer timer;
-};
-
-/*
- * These are per-cpu for now, they will need to be configured by the
- * complete_queues parameter and appropriately mapped.
- */
-static DEFINE_PER_CPU(struct completion_queue, completion_queues);
-
 enum {
 	NULL_IRQ_NONE		= 0,
 	NULL_IRQ_SOFTIRQ	= 1,
@@ -173,6 +163,8 @@ static void free_cmd(struct nullb_cmd *cmd)
 	put_tag(cmd->nq, cmd->tag);
 }
 
+static enum hrtimer_restart null_cmd_timer_expired(struct hrtimer *timer);
+
 static struct nullb_cmd *__alloc_cmd(struct nullb_queue *nq)
 {
 	struct nullb_cmd *cmd;
@@ -183,6 +175,11 @@ static struct nullb_cmd *__alloc_cmd(struct nullb_queue *nq)
 		cmd = &nq->cmds[tag];
 		cmd->tag = tag;
 		cmd->nq = nq;
+		if (irqmode == NULL_IRQ_TIMER) {
+			hrtimer_init(&cmd->timer, CLOCK_MONOTONIC,
+				     HRTIMER_MODE_REL);
+			cmd->timer.function = null_cmd_timer_expired;
+		}
 		return cmd;
 	}
 
@@ -231,47 +228,28 @@ static void end_cmd(struct nullb_cmd *cmd)
 
 static enum hrtimer_restart null_cmd_timer_expired(struct hrtimer *timer)
 {
-	struct completion_queue *cq;
-	struct llist_node *entry;
-	struct nullb_cmd *cmd;
-
-	cq = &per_cpu(completion_queues, smp_processor_id());
-
-	while ((entry = llist_del_all(&cq->list)) != NULL) {
-		entry = llist_reverse_order(entry);
-		do {
-			struct request_queue *q = NULL;
+	struct nullb_cmd *cmd = container_of(timer, struct nullb_cmd, timer);
+	struct request_queue *q = NULL;
 
-			cmd = container_of(entry, struct nullb_cmd, ll_list);
-			entry = entry->next;
-			if (cmd->rq)
-				q = cmd->rq->q;
-			end_cmd(cmd);
+	if (cmd->rq)
+		q = cmd->rq->q;
 
-			if (q && !q->mq_ops && blk_queue_stopped(q)) {
-				spin_lock(q->queue_lock);
-				if (blk_queue_stopped(q))
-					blk_start_queue(q);
-				spin_unlock(q->queue_lock);
-			}
-		} while (entry);
+	if (q && !q->mq_ops && blk_queue_stopped(q)) {
+		spin_lock(q->queue_lock);
+		if (blk_queue_stopped(q))
+			blk_start_queue(q);
+		spin_unlock(q->queue_lock);
 	}
+	end_cmd(cmd);
 
 	return HRTIMER_NORESTART;
 }
 
 static void null_cmd_end_timer(struct nullb_cmd *cmd)
 {
-	struct completion_queue *cq = &per_cpu(completion_queues, get_cpu());
-
-	cmd->ll_list.next = NULL;
-	if (llist_add(&cmd->ll_list, &cq->list)) {
-		ktime_t kt = ktime_set(0, completion_nsec);
+	ktime_t kt = ktime_set(0, completion_nsec);
 
-		hrtimer_start(&cq->timer, kt, HRTIMER_MODE_REL_PINNED);
-	}
-
-	put_cpu();
+	hrtimer_start(&cmd->timer, kt, HRTIMER_MODE_REL);
 }
 
 static void null_softirq_done_fn(struct request *rq)
@@ -368,6 +346,10 @@ static int null_queue_rq(struct blk_mq_hw_ctx *hctx,
 {
 	struct nullb_cmd *cmd = blk_mq_rq_to_pdu(bd->rq);
 
+	if (irqmode == NULL_IRQ_TIMER) {
+		hrtimer_init(&cmd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+		cmd->timer.function = null_cmd_timer_expired;
+	}
 	cmd->rq = bd->rq;
 	cmd->nq = hctx->driver_data;
 
@@ -637,19 +619,6 @@ static int __init null_init(void)
 
 	mutex_init(&lock);
 
-	/* Initialize a separate list for each CPU for issuing softirqs */
-	for_each_possible_cpu(i) {
-		struct completion_queue *cq = &per_cpu(completion_queues, i);
-
-		init_llist_head(&cq->list);
-
-		if (irqmode != NULL_IRQ_TIMER)
-			continue;
-
-		hrtimer_init(&cq->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-		cq->timer.function = null_cmd_timer_expired;
-	}
-
 	null_major = register_blkdev(0, "nullb");
 	if (null_major < 0)
 		return null_major;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH BUGFIX 2/3] null_blk: guarantee device restart in all irq modes
  2015-11-02 14:31 [PATCH BUGFIX 0/3] null_blk: fix throughout losses and hangs Paolo Valente
  2015-11-02 14:31 ` [PATCH BUGFIX 1/3] null_blk: set a separate timer for each command Paolo Valente
@ 2015-11-02 14:31 ` Paolo Valente
  2015-11-02 16:25   ` Jens Axboe
  2015-11-02 14:31 ` [PATCH BUGFIX 3/3] null_blk: change type of completion_nsec to unsigned long Paolo Valente
  2 siblings, 1 reply; 15+ messages in thread
From: Paolo Valente @ 2015-11-02 14:31 UTC (permalink / raw)
  To: Jens Axboe, Matias Bjørling, Arianna Avanzini
  Cc: Paolo Valente, Akinobu Mita, Luis R. Rodriguez, Ming Lei,
	Mike Krinkin, linux-kernel

From: Arianna Avanzini <avanzini@google.com>

In single-queue (block layer) mode,the function null_rq_prep_fn stops
the device if alloc_cmd fails. Then, once stopped, the device must be
restarted on the next command completion, so that the request(s) for
which alloc_cmd failed can be requeued. Otherwise the device hangs.

Unfortunately, device restart is currently performed only for delayed
completions, i.e., in irqmode==2. This fact causes hangs, for the
above reasons, with the other irqmodes in combination with single-queue
block layer.

This commits addresses this issue by making sure that, if stopped, the
device is properly restarted for all irqmodes on completions.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Arianna AVanzini <avanzini@google.com>
---
 drivers/block/null_blk.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index 7db5a77..caaff67 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -210,6 +210,8 @@ static struct nullb_cmd *alloc_cmd(struct nullb_queue *nq, int can_wait)
 
 static void end_cmd(struct nullb_cmd *cmd)
 {
+	struct request_queue *q = NULL;
+
 	switch (queue_mode)  {
 	case NULL_Q_MQ:
 		blk_mq_end_request(cmd->rq, 0);
@@ -220,27 +222,28 @@ static void end_cmd(struct nullb_cmd *cmd)
 		break;
 	case NULL_Q_BIO:
 		bio_endio(cmd->bio);
-		break;
+		goto free_cmd;
 	}
 
-	free_cmd(cmd);
-}
-
-static enum hrtimer_restart null_cmd_timer_expired(struct hrtimer *timer)
-{
-	struct nullb_cmd *cmd = container_of(timer, struct nullb_cmd, timer);
-	struct request_queue *q = NULL;
-
 	if (cmd->rq)
 		q = cmd->rq->q;
 
-	if (q && !q->mq_ops && blk_queue_stopped(q)) {
-		spin_lock(q->queue_lock);
+	/* Restart queue if needed, as we are freeing a tag */
+	if (q && blk_queue_stopped(q)) {
+		unsigned long flags;
+
+		spin_lock_irqsave(q->queue_lock, flags);
 		if (blk_queue_stopped(q))
 			blk_start_queue(q);
-		spin_unlock(q->queue_lock);
+		spin_unlock_irqrestore(q->queue_lock, flags);
 	}
-	end_cmd(cmd);
+free_cmd:
+	free_cmd(cmd);
+}
+
+static enum hrtimer_restart null_cmd_timer_expired(struct hrtimer *timer)
+{
+	end_cmd(container_of(timer, struct nullb_cmd, timer));
 
 	return HRTIMER_NORESTART;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH BUGFIX 3/3] null_blk: change type of completion_nsec to unsigned long
  2015-11-02 14:31 [PATCH BUGFIX 0/3] null_blk: fix throughout losses and hangs Paolo Valente
  2015-11-02 14:31 ` [PATCH BUGFIX 1/3] null_blk: set a separate timer for each command Paolo Valente
  2015-11-02 14:31 ` [PATCH BUGFIX 2/3] null_blk: guarantee device restart in all irq modes Paolo Valente
@ 2015-11-02 14:31 ` Paolo Valente
  2 siblings, 0 replies; 15+ messages in thread
From: Paolo Valente @ 2015-11-02 14:31 UTC (permalink / raw)
  To: Jens Axboe, Matias Bjørling, Arianna Avanzini
  Cc: Paolo Valente, Akinobu Mita, Luis R. Rodriguez, Ming Lei,
	Mike Krinkin, linux-kernel

From: Arianna Avanzini <avanzini@google.com>

This commit at least doubles the maximum value for
completion_nsec. This helps in special cases where one wants/needs to
emulate an extremely slow I/O (for example to spot bugs).

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Arianna Avanzini <avanzini@google.com>
---
 drivers/block/null_blk.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index caaff67..6f0bb2f 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -125,8 +125,8 @@ static const struct kernel_param_ops null_irqmode_param_ops = {
 device_param_cb(irqmode, &null_irqmode_param_ops, &irqmode, S_IRUGO);
 MODULE_PARM_DESC(irqmode, "IRQ completion handler. 0-none, 1-softirq, 2-timer");
 
-static int completion_nsec = 10000;
-module_param(completion_nsec, int, S_IRUGO);
+static unsigned long completion_nsec = 10000;
+module_param(completion_nsec, ulong, S_IRUGO);
 MODULE_PARM_DESC(completion_nsec, "Time in ns to complete a request in hardware. Default: 10,000ns");
 
 static int hw_queue_depth = 64;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH BUGFIX 1/3] null_blk: set a separate timer for each command
  2015-11-02 14:31 ` [PATCH BUGFIX 1/3] null_blk: set a separate timer for each command Paolo Valente
@ 2015-11-02 16:14   ` Jens Axboe
  2015-11-03  9:01     ` Paolo Valente
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2015-11-02 16:14 UTC (permalink / raw)
  To: Paolo Valente, Matias Bjørling, Arianna Avanzini
  Cc: Akinobu Mita, Luis R. Rodriguez, Ming Lei, Mike Krinkin, linux-kernel

On 11/02/2015 07:31 AM, Paolo Valente wrote:
> For the Timer IRQ mode (i.e., when command completions are delayed),
> there is one timer for each CPU. Each of these timers
> . has a completion queue associated with it, containing all the
>    command completions to be executed when the timer fires;
> . is set, and a new completion-to-execute is inserted into its
>    completion queue, every time the dispatch code for a new command
>    happens to be executed on the CPU related to the timer.
>
> This implies that, if the dispatch of a new command happens to be
> executed on a CPU whose timer has already been set, but has not yet
> fired, then the timer is set again, to the completion time of the
> newly arrived command. When the timer eventually fires, all its queued
> completions are executed.
>
> This way of handling delayed command completions entails the following
> problem: if more than one command completion is inserted into the
> queue of a timer before the timer fires, then the expiration time for
> the timer is moved forward every time each of these completions is
> enqueued. As a consequence, only the last completion enqueued enjoys a
> correct execution time, while all previous completions are unjustly
> delayed until the last completion is executed (and at that time they
> are executed all together).
>
> Specifically, if all the above completions are enqueued almost at the
> same time, then the problem is negligible. On the opposite end, if
> every completion is enqueued a while after the previous completion was
> enqueued (in the extreme case, it is enqueued only right before the
> timer would have expired), then every enqueued completion, except for
> the last one, experiences an inflated delay, proportional to the number
> of completions enqueued after it. In the end, commands, and thus I/O
> requests, may be completed at an arbitrarily lower rate than the
> desired one.
>
> This commit addresses this issue by replacing per-CPU timers with
> per-command timers, i.e., by associating an individual timer with each
> command.

Functionally the patch looks fine. My only worry is that a timer per 
command would be an unnecessary slowdown compared to pushing one timer 
forward. The problem should be fixable by still doing that, just 
maintaining next-expire instead. Maybe something that would still 
roughly be precise enough, while still getting some completion batching 
going? Or maybe that would be slower, and the individual timers are 
still better.

Comments?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH BUGFIX 2/3] null_blk: guarantee device restart in all irq modes
  2015-11-02 14:31 ` [PATCH BUGFIX 2/3] null_blk: guarantee device restart in all irq modes Paolo Valente
@ 2015-11-02 16:25   ` Jens Axboe
  2015-11-03  9:02     ` Paolo Valente
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2015-11-02 16:25 UTC (permalink / raw)
  To: Paolo Valente, Matias Bjørling, Arianna Avanzini
  Cc: Akinobu Mita, Luis R. Rodriguez, Ming Lei, Mike Krinkin, linux-kernel

On 11/02/2015 07:31 AM, Paolo Valente wrote:
> From: Arianna Avanzini <avanzini@google.com>
>
> In single-queue (block layer) mode,the function null_rq_prep_fn stops
> the device if alloc_cmd fails. Then, once stopped, the device must be
> restarted on the next command completion, so that the request(s) for
> which alloc_cmd failed can be requeued. Otherwise the device hangs.
>
> Unfortunately, device restart is currently performed only for delayed
> completions, i.e., in irqmode==2. This fact causes hangs, for the
> above reasons, with the other irqmodes in combination with single-queue
> block layer.
>
> This commits addresses this issue by making sure that, if stopped, the
> device is properly restarted for all irqmodes on completions.

This looks good. I did a double take at the removal of the q->mq_ops 
check before blk_queue_stopped(), but we can't get there in MQ mode. 
Perhaps a comment would be warranted for that, the ->mq_ops check served 
as a bit of documentation for that before.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH BUGFIX 1/3] null_blk: set a separate timer for each command
  2015-11-02 16:14   ` Jens Axboe
@ 2015-11-03  9:01     ` Paolo Valente
  2015-11-29 17:27       ` Paolo Valente
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Valente @ 2015-11-03  9:01 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Matias Bjørling, Arianna Avanzini, Akinobu Mita,
	Luis R. Rodriguez, Ming Lei, Mike Krinkin, linux-kernel

Il giorno 02/nov/2015, alle ore 17:14, Jens Axboe <axboe@fb.com> ha scritto:

> On 11/02/2015 07:31 AM, Paolo Valente wrote:
>> For the Timer IRQ mode (i.e., when command completions are delayed),
>> there is one timer for each CPU. Each of these timers
>> . has a completion queue associated with it, containing all the
>>   command completions to be executed when the timer fires;
>> . is set, and a new completion-to-execute is inserted into its
>>   completion queue, every time the dispatch code for a new command
>>   happens to be executed on the CPU related to the timer.
>> 
>> This implies that, if the dispatch of a new command happens to be
>> executed on a CPU whose timer has already been set, but has not yet
>> fired, then the timer is set again, to the completion time of the
>> newly arrived command. When the timer eventually fires, all its queued
>> completions are executed.
>> 
>> This way of handling delayed command completions entails the following
>> problem: if more than one command completion is inserted into the
>> queue of a timer before the timer fires, then the expiration time for
>> the timer is moved forward every time each of these completions is
>> enqueued. As a consequence, only the last completion enqueued enjoys a
>> correct execution time, while all previous completions are unjustly
>> delayed until the last completion is executed (and at that time they
>> are executed all together).
>> 
>> Specifically, if all the above completions are enqueued almost at the
>> same time, then the problem is negligible. On the opposite end, if
>> every completion is enqueued a while after the previous completion was
>> enqueued (in the extreme case, it is enqueued only right before the
>> timer would have expired), then every enqueued completion, except for
>> the last one, experiences an inflated delay, proportional to the number
>> of completions enqueued after it. In the end, commands, and thus I/O
>> requests, may be completed at an arbitrarily lower rate than the
>> desired one.
>> 
>> This commit addresses this issue by replacing per-CPU timers with
>> per-command timers, i.e., by associating an individual timer with each
>> command.
> 
> Functionally the patch looks fine. My only worry is that a timer per command would be an unnecessary slowdown compared to pushing one timer forward. The problem should be fixable by still doing that, just maintaining next-expire instead. Maybe something that would still roughly be precise enough, while still getting some completion batching going? Or maybe that would be slower, and the individual timers are still better.
> 
> Comments?
> 

I have tried to think about these questions. Unfortunately I was not able to go beyond the following general considerations.

Given that:
1) hrtimer_start is very efficient;
2) to implement a batch-by-batch execution of command completions, a more complex code would of course be needed in null_blk;
I think that, to get a perceptible improvement, command completions should probably be executed in batches of at least 3 or 4 commands each.

In this respect, commands can accumulate in a batch only if the time granularity with which delayed commands are guaranteed to be executed, i.e, the granularity with which timers eventually fire, happens to be coarser than completion_nsec. But this would lead to the exact uncontrolled throughput-loss problem addressed by this patch, although at a lesser extent. The reason is as follows.

Consider a batch of 3 or 4 commands, executed at a certain timer expiration. If the arrival time of these commands is close to the timer expiration, then the commands happen to be executed with a delay close to the expected one. If, on the other hand, their arrival time is far from the timer expiration, then they are completed with a larger delay. In the worst case, their delay is inflated by a factor proportional to the size of the batch, namely 3 or 4.

In the end, depending on the command arrival pattern, the throughput of the emulated device may vary by 3 or 4 times. This would be a reduced version of the issue addressed by this patch. In fact, with the current implementation of delayed completions (per-cpu timers), the throughput may vary, in the worst case, by a factor equal to the queue depth (64 by default).

Finally, a side note: as for whether the increased efficiency of batched delayed completions is worth additional code complexity as well as throughput losses/fluctuations, the low delays for which this efficiency becomes important match the latencies of new very fast devices (or are at least in the same order). But, exactly for the high speed of these devices, what typically matters (AFAIK) in tests with these devices is understanding whether the rest of the system can cope with their speed. And for this type of measures, the best/typical choice is (again, AFAIK) not to delay request completions at all in null_blk.

I hope these considerations may be somehow useful.

Thanks,
Paolo

> -- 
> Jens Axboe

--
Paolo Valente                                                 
Algogroup
Dipartimento di Fisica, Informatica e Matematica		
Via Campi, 213/B
41125 Modena - Italy        				  
homepage:  http://algogroup.unimore.it/people/paolo/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH BUGFIX 2/3] null_blk: guarantee device restart in all irq modes
  2015-11-02 16:25   ` Jens Axboe
@ 2015-11-03  9:02     ` Paolo Valente
  0 siblings, 0 replies; 15+ messages in thread
From: Paolo Valente @ 2015-11-03  9:02 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Matias Bjørling, Arianna Avanzini, Akinobu Mita,
	Luis R. Rodriguez, Ming Lei, Mike Krinkin, linux-kernel


Il giorno 02/nov/2015, alle ore 17:25, Jens Axboe <axboe@fb.com> ha scritto:

> On 11/02/2015 07:31 AM, Paolo Valente wrote:
>> From: Arianna Avanzini <avanzini@google.com>
>> 
>> In single-queue (block layer) mode,the function null_rq_prep_fn stops
>> the device if alloc_cmd fails. Then, once stopped, the device must be
>> restarted on the next command completion, so that the request(s) for
>> which alloc_cmd failed can be requeued. Otherwise the device hangs.
>> 
>> Unfortunately, device restart is currently performed only for delayed
>> completions, i.e., in irqmode==2. This fact causes hangs, for the
>> above reasons, with the other irqmodes in combination with single-queue
>> block layer.
>> 
>> This commits addresses this issue by making sure that, if stopped, the
>> device is properly restarted for all irqmodes on completions.
> 
> This looks good. I did a double take at the removal of the q->mq_ops check before blk_queue_stopped(), but we can't get there in MQ mode. Perhaps a comment would be warranted for that, the ->mq_ops check served as a bit of documentation for that before.
> 

Honestly, I removed that additional check by mistake. I will put it back in the next version of this patchset, once (and if) I have the green light on patch 1/3.

Thanks,
Paolo

> -- 
> Jens Axboe
> 


--
Paolo Valente                                                 
Algogroup
Dipartimento di Fisica, Informatica e Matematica		
Via Campi, 213/B
41125 Modena - Italy        				  
homepage:  http://algogroup.unimore.it/people/paolo/


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH BUGFIX 1/3] null_blk: set a separate timer for each command
  2015-11-03  9:01     ` Paolo Valente
@ 2015-11-29 17:27       ` Paolo Valente
  2015-11-30 15:55         ` Jens Axboe
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Valente @ 2015-11-29 17:27 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Matias Bjørling, Arianna Avanzini, Akinobu Mita,
	Luis R. Rodriguez, Ming Lei, Mike Krinkin, linux-kernel

Hi Jens,
this is just to ask you whether you made any decision about these patches, including just not to apply them.

Thanks,
Paolo

Il giorno 03/nov/2015, alle ore 10:01, Paolo Valente <paolo.valente@unimore.it> ha scritto:

> 
> Il giorno 02/nov/2015, alle ore 17:14, Jens Axboe <axboe@fb.com> ha scritto:
> 
>> On 11/02/2015 07:31 AM, Paolo Valente wrote:
>>> For the Timer IRQ mode (i.e., when command completions are delayed),
>>> there is one timer for each CPU. Each of these timers
>>> . has a completion queue associated with it, containing all the
>>>  command completions to be executed when the timer fires;
>>> . is set, and a new completion-to-execute is inserted into its
>>>  completion queue, every time the dispatch code for a new command
>>>  happens to be executed on the CPU related to the timer.
>>> 
>>> This implies that, if the dispatch of a new command happens to be
>>> executed on a CPU whose timer has already been set, but has not yet
>>> fired, then the timer is set again, to the completion time of the
>>> newly arrived command. When the timer eventually fires, all its queued
>>> completions are executed.
>>> 
>>> This way of handling delayed command completions entails the following
>>> problem: if more than one command completion is inserted into the
>>> queue of a timer before the timer fires, then the expiration time for
>>> the timer is moved forward every time each of these completions is
>>> enqueued. As a consequence, only the last completion enqueued enjoys a
>>> correct execution time, while all previous completions are unjustly
>>> delayed until the last completion is executed (and at that time they
>>> are executed all together).
>>> 
>>> Specifically, if all the above completions are enqueued almost at the
>>> same time, then the problem is negligible. On the opposite end, if
>>> every completion is enqueued a while after the previous completion was
>>> enqueued (in the extreme case, it is enqueued only right before the
>>> timer would have expired), then every enqueued completion, except for
>>> the last one, experiences an inflated delay, proportional to the number
>>> of completions enqueued after it. In the end, commands, and thus I/O
>>> requests, may be completed at an arbitrarily lower rate than the
>>> desired one.
>>> 
>>> This commit addresses this issue by replacing per-CPU timers with
>>> per-command timers, i.e., by associating an individual timer with each
>>> command.
>> 
>> Functionally the patch looks fine. My only worry is that a timer per command would be an unnecessary slowdown compared to pushing one timer forward. The problem should be fixable by still doing that, just maintaining next-expire instead. Maybe something that would still roughly be precise enough, while still getting some completion batching going? Or maybe that would be slower, and the individual timers are still better.
>> 
>> Comments?
>> 
> 
> I have tried to think about these questions. Unfortunately I was not able to go beyond the following general considerations.
> 
> Given that:
> 1) hrtimer_start is very efficient;
> 2) to implement a batch-by-batch execution of command completions, a more complex code would of course be needed in null_blk;
> I think that, to get a perceptible improvement, command completions should probably be executed in batches of at least 3 or 4 commands each.
> 
> In this respect, commands can accumulate in a batch only if the time granularity with which delayed commands are guaranteed to be executed, i.e, the granularity with which timers eventually fire, happens to be coarser than completion_nsec. But this would lead to the exact uncontrolled throughput-loss problem addressed by this patch, although at a lesser extent. The reason is as follows.
> 
> Consider a batch of 3 or 4 commands, executed at a certain timer expiration. If the arrival time of these commands is close to the timer expiration, then the commands happen to be executed with a delay close to the expected one. If, on the other hand, their arrival time is far from the timer expiration, then they are completed with a larger delay. In the worst case, their delay is inflated by a factor proportional to the size of the batch, namely 3 or 4.
> 
> In the end, depending on the command arrival pattern, the throughput of the emulated device may vary by 3 or 4 times. This would be a reduced version of the issue addressed by this patch. In fact, with the current implementation of delayed completions (per-cpu timers), the throughput may vary, in the worst case, by a factor equal to the queue depth (64 by default).
> 
> Finally, a side note: as for whether the increased efficiency of batched delayed completions is worth additional code complexity as well as throughput losses/fluctuations, the low delays for which this efficiency becomes important match the latencies of new very fast devices (or are at least in the same order). But, exactly for the high speed of these devices, what typically matters (AFAIK) in tests with these devices is understanding whether the rest of the system can cope with their speed. And for this type of measures, the best/typical choice is (again, AFAIK) not to delay request completions at all in null_blk.
> 
> I hope these considerations may be somehow useful.
> 
> Thanks,
> Paolo
> 
>> -- 
>> Jens Axboe
> 
> 
> --
> Paolo Valente                                                 
> Algogroup
> Dipartimento di Fisica, Informatica e Matematica		
> Via Campi, 213/B
> 41125 Modena - Italy        				  
> homepage:  http://algogroup.unimore.it/people/paolo/


--
Paolo Valente                                                 
Algogroup
Dipartimento di Fisica, Informatica e Matematica		
Via Campi, 213/B
41125 Modena - Italy        				  
homepage:  http://algogroup.unimore.it/people/paolo/


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH BUGFIX 1/3] null_blk: set a separate timer for each command
  2015-11-29 17:27       ` Paolo Valente
@ 2015-11-30 15:55         ` Jens Axboe
  2015-12-01 10:48           ` [PATCH BUGFIX V2 0/3] null_blk: fix throughput losses and hangs Paolo Valente
  0 siblings, 1 reply; 15+ messages in thread
From: Jens Axboe @ 2015-11-30 15:55 UTC (permalink / raw)
  To: Paolo Valente
  Cc: Matias Bjørling, Arianna Avanzini, Akinobu Mita,
	Luis R. Rodriguez, Ming Lei, Mike Krinkin, linux-kernel

On 11/29/2015 10:27 AM, Paolo Valente wrote:
> Hi Jens,
> this is just to ask you whether you made any decision about these patches, including just not to apply them.

Let's move forward with them, I got hung up on potential drops in 
performance. We can fix that up as we go. Are you going to respin #2 to 
reinstate the mq_ops check?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH BUGFIX V2 0/3] null_blk: fix throughput losses and hangs
  2015-11-30 15:55         ` Jens Axboe
@ 2015-12-01 10:48           ` Paolo Valente
  2015-12-01 10:48             ` [PATCH BUGFIX V2 1/3] null_blk: set a separate timer for each command Paolo Valente
                               ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Paolo Valente @ 2015-12-01 10:48 UTC (permalink / raw)
  To: Jens Axboe, Matias Bjørling, Arianna Avanzini
  Cc: Paolo Valente, Akinobu Mita, Luis R. Rodriguez, Ming Lei,
	Mike Krinkin, linux-kernel

Hi,
here is an updated version of the patchset, differing from the
previous version only in that it reinstates the missing extra check
pointed out in [2]. For your convenience, the content of the cover
letter for the previous version follows.

While doing some tests with the null_blk device driver, we bumped into
two problems: first, unjustified and in some cases high throughput
losses; second, actual hangs. These problems seem to be the
consequence of the combination of three causes, and this patchset
introduces a fix for each of these causes. In particular, changes
address:
. an apparent flaw in the logic with which delayed completions are
  implemented: this flaw causes, with unlucky but non-pathological
  workloads, actual request-completion delays to become arbitrarily
  larger than the configured delay;
. the missing restart of the device queue on the completion of a request in
  single-queue non-delayed mode;
. the overflow of the request-delay parameter, when extremely high values
  are used (e.g., to spot bugs).

To avoid possible confusion, we stress that these fixes *do not* have
anything to do with the problems highlighted in [1] (tests of the
multiqueue xen-blkfront and xen-blkback modules with null_blk).

You can find more details in the patch descriptions.

Thanks,
Paolo and Arianna

[1] https://lkml.org/lkml/2015/8/19/181
[2] https://lkml.org/lkml/2015/11/2/433

Arianna Avanzini (2):
  null_blk: guarantee device restart in all irq modes
  null_blk: change type of completion_nsec to unsigned long

Paolo Valente (1):
  null_blk: set a separate timer for each command

 drivers/block/null_blk.c | 94 +++++++++++++++++-------------------------------
 1 file changed, 33 insertions(+), 61 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH BUGFIX V2 1/3] null_blk: set a separate timer for each command
  2015-12-01 10:48           ` [PATCH BUGFIX V2 0/3] null_blk: fix throughput losses and hangs Paolo Valente
@ 2015-12-01 10:48             ` Paolo Valente
  2015-12-01 10:48             ` [PATCH BUGFIX V2 2/3] null_blk: guarantee device restart in all irq modes Paolo Valente
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Paolo Valente @ 2015-12-01 10:48 UTC (permalink / raw)
  To: Jens Axboe, Matias Bjørling, Arianna Avanzini
  Cc: Paolo Valente, Akinobu Mita, Luis R. Rodriguez, Ming Lei,
	Mike Krinkin, linux-kernel

For the Timer IRQ mode (i.e., when command completions are delayed),
there is one timer for each CPU. Each of these timers
. has a completion queue associated with it, containing all the
  command completions to be executed when the timer fires;
. is set, and a new completion-to-execute is inserted into its
  completion queue, every time the dispatch code for a new command
  happens to be executed on the CPU related to the timer.

This implies that, if the dispatch of a new command happens to be
executed on a CPU whose timer has already been set, but has not yet
fired, then the timer is set again, to the completion time of the
newly arrived command. When the timer eventually fires, all its queued
completions are executed.

This way of handling delayed command completions entails the following
problem: if more than one command completion is inserted into the
queue of a timer before the timer fires, then the expiration time for
the timer is moved forward every time each of these completions is
enqueued. As a consequence, only the last completion enqueued enjoys a
correct execution time, while all previous completions are unjustly
delayed until the last completion is executed (and at that time they
are executed all together).

Specifically, if all the above completions are enqueued almost at the
same time, then the problem is negligible. On the opposite end, if
every completion is enqueued a while after the previous completion was
enqueued (in the extreme case, it is enqueued only right before the
timer would have expired), then every enqueued completion, except for
the last one, experiences an inflated delay, proportional to the number
of completions enqueued after it. In the end, commands, and thus I/O
requests, may be completed at an arbitrarily lower rate than the
desired one.

This commit addresses this issue by replacing per-CPU timers with
per-command timers, i.e., by associating an individual timer with each
command.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Arianna Avanzini <avanzini@google.com>
---
 drivers/block/null_blk.c | 79 +++++++++++++++---------------------------------
 1 file changed, 24 insertions(+), 55 deletions(-)

diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index 5c8ba54..08932f5 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -18,6 +18,7 @@ struct nullb_cmd {
 	struct bio *bio;
 	unsigned int tag;
 	struct nullb_queue *nq;
+	struct hrtimer timer;
 };
 
 struct nullb_queue {
@@ -49,17 +50,6 @@ static int null_major;
 static int nullb_indexes;
 static struct kmem_cache *ppa_cache;
 
-struct completion_queue {
-	struct llist_head list;
-	struct hrtimer timer;
-};
-
-/*
- * These are per-cpu for now, they will need to be configured by the
- * complete_queues parameter and appropriately mapped.
- */
-static DEFINE_PER_CPU(struct completion_queue, completion_queues);
-
 enum {
 	NULL_IRQ_NONE		= 0,
 	NULL_IRQ_SOFTIRQ	= 1,
@@ -180,6 +170,8 @@ static void free_cmd(struct nullb_cmd *cmd)
 	put_tag(cmd->nq, cmd->tag);
 }
 
+static enum hrtimer_restart null_cmd_timer_expired(struct hrtimer *timer);
+
 static struct nullb_cmd *__alloc_cmd(struct nullb_queue *nq)
 {
 	struct nullb_cmd *cmd;
@@ -190,6 +182,11 @@ static struct nullb_cmd *__alloc_cmd(struct nullb_queue *nq)
 		cmd = &nq->cmds[tag];
 		cmd->tag = tag;
 		cmd->nq = nq;
+		if (irqmode == NULL_IRQ_TIMER) {
+			hrtimer_init(&cmd->timer, CLOCK_MONOTONIC,
+				     HRTIMER_MODE_REL);
+			cmd->timer.function = null_cmd_timer_expired;
+		}
 		return cmd;
 	}
 
@@ -238,47 +235,28 @@ static void end_cmd(struct nullb_cmd *cmd)
 
 static enum hrtimer_restart null_cmd_timer_expired(struct hrtimer *timer)
 {
-	struct completion_queue *cq;
-	struct llist_node *entry;
-	struct nullb_cmd *cmd;
-
-	cq = &per_cpu(completion_queues, smp_processor_id());
-
-	while ((entry = llist_del_all(&cq->list)) != NULL) {
-		entry = llist_reverse_order(entry);
-		do {
-			struct request_queue *q = NULL;
+	struct nullb_cmd *cmd = container_of(timer, struct nullb_cmd, timer);
+	struct request_queue *q = NULL;
 
-			cmd = container_of(entry, struct nullb_cmd, ll_list);
-			entry = entry->next;
-			if (cmd->rq)
-				q = cmd->rq->q;
-			end_cmd(cmd);
+	if (cmd->rq)
+		q = cmd->rq->q;
 
-			if (q && !q->mq_ops && blk_queue_stopped(q)) {
-				spin_lock(q->queue_lock);
-				if (blk_queue_stopped(q))
-					blk_start_queue(q);
-				spin_unlock(q->queue_lock);
-			}
-		} while (entry);
+	if (q && !q->mq_ops && blk_queue_stopped(q)) {
+		spin_lock(q->queue_lock);
+		if (blk_queue_stopped(q))
+			blk_start_queue(q);
+		spin_unlock(q->queue_lock);
 	}
+	end_cmd(cmd);
 
 	return HRTIMER_NORESTART;
 }
 
 static void null_cmd_end_timer(struct nullb_cmd *cmd)
 {
-	struct completion_queue *cq = &per_cpu(completion_queues, get_cpu());
-
-	cmd->ll_list.next = NULL;
-	if (llist_add(&cmd->ll_list, &cq->list)) {
-		ktime_t kt = ktime_set(0, completion_nsec);
+	ktime_t kt = ktime_set(0, completion_nsec);
 
-		hrtimer_start(&cq->timer, kt, HRTIMER_MODE_REL_PINNED);
-	}
-
-	put_cpu();
+	hrtimer_start(&cmd->timer, kt, HRTIMER_MODE_REL);
 }
 
 static void null_softirq_done_fn(struct request *rq)
@@ -376,6 +354,10 @@ static int null_queue_rq(struct blk_mq_hw_ctx *hctx,
 {
 	struct nullb_cmd *cmd = blk_mq_rq_to_pdu(bd->rq);
 
+	if (irqmode == NULL_IRQ_TIMER) {
+		hrtimer_init(&cmd->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+		cmd->timer.function = null_cmd_timer_expired;
+	}
 	cmd->rq = bd->rq;
 	cmd->nq = hctx->driver_data;
 
@@ -813,19 +795,6 @@ static int __init null_init(void)
 
 	mutex_init(&lock);
 
-	/* Initialize a separate list for each CPU for issuing softirqs */
-	for_each_possible_cpu(i) {
-		struct completion_queue *cq = &per_cpu(completion_queues, i);
-
-		init_llist_head(&cq->list);
-
-		if (irqmode != NULL_IRQ_TIMER)
-			continue;
-
-		hrtimer_init(&cq->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-		cq->timer.function = null_cmd_timer_expired;
-	}
-
 	null_major = register_blkdev(0, "nullb");
 	if (null_major < 0)
 		return null_major;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH BUGFIX V2 2/3] null_blk: guarantee device restart in all irq modes
  2015-12-01 10:48           ` [PATCH BUGFIX V2 0/3] null_blk: fix throughput losses and hangs Paolo Valente
  2015-12-01 10:48             ` [PATCH BUGFIX V2 1/3] null_blk: set a separate timer for each command Paolo Valente
@ 2015-12-01 10:48             ` Paolo Valente
  2015-12-01 10:48             ` [PATCH BUGFIX V2 3/3] null_blk: change type of completion_nsec to unsigned long Paolo Valente
  2015-12-01 17:52             ` [PATCH BUGFIX V2 0/3] null_blk: fix throughput losses and hangs Jens Axboe
  3 siblings, 0 replies; 15+ messages in thread
From: Paolo Valente @ 2015-12-01 10:48 UTC (permalink / raw)
  To: Jens Axboe, Matias Bjørling, Arianna Avanzini
  Cc: Paolo Valente, Akinobu Mita, Luis R. Rodriguez, Ming Lei,
	Mike Krinkin, linux-kernel

From: Arianna Avanzini <avanzini@google.com>

In single-queue (block layer) mode,the function null_rq_prep_fn stops
the device if alloc_cmd fails. Then, once stopped, the device must be
restarted on the next command completion, so that the request(s) for
which alloc_cmd failed can be requeued. Otherwise the device hangs.

Unfortunately, device restart is currently performed only for delayed
completions, i.e., in irqmode==2. This fact causes hangs, for the
above reasons, with the other irqmodes in combination with single-queue
block layer.

This commits addresses this issue by making sure that, if stopped, the
device is properly restarted for all irqmodes on completions.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Arianna AVanzini <avanzini@google.com>
---
Changes V1->V2
- reinstated mq_ops check

 drivers/block/null_blk.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index 08932f5..cf65619 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -217,6 +217,8 @@ static struct nullb_cmd *alloc_cmd(struct nullb_queue *nq, int can_wait)
 
 static void end_cmd(struct nullb_cmd *cmd)
 {
+	struct request_queue *q = NULL;
+
 	switch (queue_mode)  {
 	case NULL_Q_MQ:
 		blk_mq_end_request(cmd->rq, 0);
@@ -227,27 +229,28 @@ static void end_cmd(struct nullb_cmd *cmd)
 		break;
 	case NULL_Q_BIO:
 		bio_endio(cmd->bio);
-		break;
+		goto free_cmd;
 	}
 
-	free_cmd(cmd);
-}
-
-static enum hrtimer_restart null_cmd_timer_expired(struct hrtimer *timer)
-{
-	struct nullb_cmd *cmd = container_of(timer, struct nullb_cmd, timer);
-	struct request_queue *q = NULL;
-
 	if (cmd->rq)
 		q = cmd->rq->q;
 
+	/* Restart queue if needed, as we are freeing a tag */
 	if (q && !q->mq_ops && blk_queue_stopped(q)) {
-		spin_lock(q->queue_lock);
+		unsigned long flags;
+
+		spin_lock_irqsave(q->queue_lock, flags);
 		if (blk_queue_stopped(q))
 			blk_start_queue(q);
-		spin_unlock(q->queue_lock);
+		spin_unlock_irqrestore(q->queue_lock, flags);
 	}
-	end_cmd(cmd);
+free_cmd:
+	free_cmd(cmd);
+}
+
+static enum hrtimer_restart null_cmd_timer_expired(struct hrtimer *timer)
+{
+	end_cmd(container_of(timer, struct nullb_cmd, timer));
 
 	return HRTIMER_NORESTART;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH BUGFIX V2 3/3] null_blk: change type of completion_nsec to unsigned long
  2015-12-01 10:48           ` [PATCH BUGFIX V2 0/3] null_blk: fix throughput losses and hangs Paolo Valente
  2015-12-01 10:48             ` [PATCH BUGFIX V2 1/3] null_blk: set a separate timer for each command Paolo Valente
  2015-12-01 10:48             ` [PATCH BUGFIX V2 2/3] null_blk: guarantee device restart in all irq modes Paolo Valente
@ 2015-12-01 10:48             ` Paolo Valente
  2015-12-01 17:52             ` [PATCH BUGFIX V2 0/3] null_blk: fix throughput losses and hangs Jens Axboe
  3 siblings, 0 replies; 15+ messages in thread
From: Paolo Valente @ 2015-12-01 10:48 UTC (permalink / raw)
  To: Jens Axboe, Matias Bjørling, Arianna Avanzini
  Cc: Paolo Valente, Akinobu Mita, Luis R. Rodriguez, Ming Lei,
	Mike Krinkin, linux-kernel

From: Arianna Avanzini <avanzini@google.com>

This commit at least doubles the maximum value for
completion_nsec. This helps in special cases where one wants/needs to
emulate an extremely slow I/O (for example to spot bugs).

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Arianna Avanzini <avanzini@google.com>
---
 drivers/block/null_blk.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c
index cf65619..0c3940e 100644
--- a/drivers/block/null_blk.c
+++ b/drivers/block/null_blk.c
@@ -132,8 +132,8 @@ static const struct kernel_param_ops null_irqmode_param_ops = {
 device_param_cb(irqmode, &null_irqmode_param_ops, &irqmode, S_IRUGO);
 MODULE_PARM_DESC(irqmode, "IRQ completion handler. 0-none, 1-softirq, 2-timer");
 
-static int completion_nsec = 10000;
-module_param(completion_nsec, int, S_IRUGO);
+static unsigned long completion_nsec = 10000;
+module_param(completion_nsec, ulong, S_IRUGO);
 MODULE_PARM_DESC(completion_nsec, "Time in ns to complete a request in hardware. Default: 10,000ns");
 
 static int hw_queue_depth = 64;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH BUGFIX V2 0/3] null_blk: fix throughput losses and hangs
  2015-12-01 10:48           ` [PATCH BUGFIX V2 0/3] null_blk: fix throughput losses and hangs Paolo Valente
                               ` (2 preceding siblings ...)
  2015-12-01 10:48             ` [PATCH BUGFIX V2 3/3] null_blk: change type of completion_nsec to unsigned long Paolo Valente
@ 2015-12-01 17:52             ` Jens Axboe
  3 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2015-12-01 17:52 UTC (permalink / raw)
  To: Paolo Valente, Matias Bjørling, Arianna Avanzini
  Cc: Akinobu Mita, Luis R. Rodriguez, Ming Lei, Mike Krinkin, linux-kernel

On 12/01/2015 03:48 AM, Paolo Valente wrote:
> Hi,
> here is an updated version of the patchset, differing from the
> previous version only in that it reinstates the missing extra check
> pointed out in [2]. For your convenience, the content of the cover
> letter for the previous version follows.
>
> While doing some tests with the null_blk device driver, we bumped into
> two problems: first, unjustified and in some cases high throughput
> losses; second, actual hangs. These problems seem to be the
> consequence of the combination of three causes, and this patchset
> introduces a fix for each of these causes. In particular, changes
> address:
> . an apparent flaw in the logic with which delayed completions are
>    implemented: this flaw causes, with unlucky but non-pathological
>    workloads, actual request-completion delays to become arbitrarily
>    larger than the configured delay;
> . the missing restart of the device queue on the completion of a request in
>    single-queue non-delayed mode;
> . the overflow of the request-delay parameter, when extremely high values
>    are used (e.g., to spot bugs).
>
> To avoid possible confusion, we stress that these fixes *do not* have
> anything to do with the problems highlighted in [1] (tests of the
> multiqueue xen-blkfront and xen-blkback modules with null_blk).
>
> You can find more details in the patch descriptions.

Thanks Paolo, added.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-12-01 17:52 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-02 14:31 [PATCH BUGFIX 0/3] null_blk: fix throughout losses and hangs Paolo Valente
2015-11-02 14:31 ` [PATCH BUGFIX 1/3] null_blk: set a separate timer for each command Paolo Valente
2015-11-02 16:14   ` Jens Axboe
2015-11-03  9:01     ` Paolo Valente
2015-11-29 17:27       ` Paolo Valente
2015-11-30 15:55         ` Jens Axboe
2015-12-01 10:48           ` [PATCH BUGFIX V2 0/3] null_blk: fix throughput losses and hangs Paolo Valente
2015-12-01 10:48             ` [PATCH BUGFIX V2 1/3] null_blk: set a separate timer for each command Paolo Valente
2015-12-01 10:48             ` [PATCH BUGFIX V2 2/3] null_blk: guarantee device restart in all irq modes Paolo Valente
2015-12-01 10:48             ` [PATCH BUGFIX V2 3/3] null_blk: change type of completion_nsec to unsigned long Paolo Valente
2015-12-01 17:52             ` [PATCH BUGFIX V2 0/3] null_blk: fix throughput losses and hangs Jens Axboe
2015-11-02 14:31 ` [PATCH BUGFIX 2/3] null_blk: guarantee device restart in all irq modes Paolo Valente
2015-11-02 16:25   ` Jens Axboe
2015-11-03  9:02     ` Paolo Valente
2015-11-02 14:31 ` [PATCH BUGFIX 3/3] null_blk: change type of completion_nsec to unsigned long Paolo Valente

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.