[PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
@ 2021-08-26 14:40 Zhen Lei
  2021-08-26 18:09 ` Bart Van Assche
  2021-08-27  2:30 ` Damien Le Moal
  0 siblings, 2 replies; 30+ messages in thread
From: Zhen Lei @ 2021-08-26 14:40 UTC (permalink / raw)
  To: Jens Axboe, linux-block, linux-kernel
  Cc: Zhen Lei, Damien Le Moal, Bart Van Assche

dd_queued() traverses the percpu variable for summation. The more cores,
the higher the performance overhead. I currently have a 128-core board and
this function takes 2.5 us. If the number of high-priority requests is
small and the number of low- and medium-priority requests is large, the
performance impact is significant.

Let's maintain a non-percpu member variable 'nr_queued', which is
incremented by 1 immediately following "inserted++" and decremented by 1
immediately following "completed++". Because both the judgment dd_queued()
in dd_dispatch_request() and operation "inserted++" in dd_insert_request()
are protected by dd->lock, lock protection needs to be added only in
dd_finish_request(), which is unlikely to cause significant performance
side effects.

Tested on my 128-core board with two ssd disks.
fio bs=4k rw=read iodepth=128 cpus_allowed=0-95 <others>
Before:
[183K/0/0 iops]
[172K/0/0 iops]

After:
[258K/0/0 iops]
[258K/0/0 iops]

Fixes: fb926032b320 ("block/mq-deadline: Prioritize high-priority requests")
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 block/mq-deadline.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index a09761cbdf12e58..d8f6aa12de80049 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -79,6 +79,7 @@ struct dd_per_prio {
 	struct list_head fifo_list[DD_DIR_COUNT];
 	/* Next request in FIFO order. Read, write or both are NULL. */
 	struct request *next_rq[DD_DIR_COUNT];
+	unsigned int nr_queued;
 };
 
 struct deadline_data {
@@ -277,9 +278,9 @@ deadline_move_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 }
 
 /* Number of requests queued for a given priority level. */
-static u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
+static __always_inline u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
 {
-	return dd_sum(dd, inserted, prio) - dd_sum(dd, completed, prio);
+	return dd->per_prio[prio].nr_queued;
 }
 
 /*
@@ -711,6 +712,8 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 
 	prio = ioprio_class_to_prio[ioprio_class];
 	dd_count(dd, inserted, prio);
+	per_prio = &dd->per_prio[prio];
+	per_prio->nr_queued++;
 
 	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
 		blk_mq_free_requests(&free);
@@ -719,7 +722,6 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 
 	trace_block_rq_insert(rq);
 
-	per_prio = &dd->per_prio[prio];
 	if (at_head) {
 		list_add(&rq->queuelist, &per_prio->dispatch);
 	} else {
@@ -790,12 +792,14 @@ static void dd_finish_request(struct request *rq)
 	const u8 ioprio_class = dd_rq_ioclass(rq);
 	const enum dd_prio prio = ioprio_class_to_prio[ioprio_class];
 	struct dd_per_prio *per_prio = &dd->per_prio[prio];
+	unsigned long flags;
 
 	dd_count(dd, completed, prio);
+	spin_lock_irqsave(&dd->lock, flags);
+	per_prio->nr_queued--;
+	spin_unlock_irqrestore(&dd->lock, flags);
 
 	if (blk_queue_is_zoned(q)) {
-		unsigned long flags;
-
 		spin_lock_irqsave(&dd->zone_lock, flags);
 		blk_req_zone_write_unlock(rq);
 		if (!list_empty(&per_prio->fifo_list[DD_WRITE]))
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-26 14:40 [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests Zhen Lei
@ 2021-08-26 18:09 ` Bart Van Assche
  2021-08-26 18:13   ` Jens Axboe
  2021-08-28  1:59   ` Leizhen (ThunderTown)
  2021-08-27  2:30 ` Damien Le Moal
  1 sibling, 2 replies; 30+ messages in thread
From: Bart Van Assche @ 2021-08-26 18:09 UTC (permalink / raw)
  To: Zhen Lei, Jens Axboe, linux-block, linux-kernel; +Cc: Damien Le Moal

On 8/26/21 7:40 AM, Zhen Lei wrote:
> lock protection needs to be added only in
> dd_finish_request(), which is unlikely to cause significant performance
> side effects.

Not sure the above is correct. Every new atomic instruction has a measurable
performance overhead. But I guess in this case that overhead is smaller than
the time needed to sum 128 per-CPU variables.

> Tested on my 128-core board with two ssd disks.
> fio bs=4k rw=read iodepth=128 cpus_allowed=0-95 <others>
> Before:
> [183K/0/0 iops]
> [172K/0/0 iops]
> 
> After:
> [258K/0/0 iops]
> [258K/0/0 iops]

Nice work!

> Fixes: fb926032b320 ("block/mq-deadline: Prioritize high-priority requests")

Shouldn't the Fixes: tag be used only for patches that modify functionality?
I'm not sure it is appropriate to use this tag for performance improvements.

>  struct deadline_data {
> @@ -277,9 +278,9 @@ deadline_move_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
>  }
>  
>  /* Number of requests queued for a given priority level. */
> -static u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
> +static __always_inline u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
>  {
> -	return dd_sum(dd, inserted, prio) - dd_sum(dd, completed, prio);
> +	return dd->per_prio[prio].nr_queued;
>  }

Please leave out "__always_inline". Modern compilers are smart enough to
inline this function without using the "inline" keyword.

> @@ -711,6 +712,8 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  
>  	prio = ioprio_class_to_prio[ioprio_class];
>  	dd_count(dd, inserted, prio);
> +	per_prio = &dd->per_prio[prio];
> +	per_prio->nr_queued++;
>  
>  	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
>  		blk_mq_free_requests(&free);

I think the above is wrong - nr_queued should not be incremented if the
request is merged into another request. Please move the code that increments
nr_queued past the above if-statement.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-26 18:09 ` Bart Van Assche
@ 2021-08-26 18:13   ` Jens Axboe
  2021-08-26 18:45     ` Jens Axboe
  2021-08-28  1:59   ` Leizhen (ThunderTown)
  1 sibling, 1 reply; 30+ messages in thread
From: Jens Axboe @ 2021-08-26 18:13 UTC (permalink / raw)
  To: Bart Van Assche, Zhen Lei, linux-block, linux-kernel; +Cc: Damien Le Moal

On 8/26/21 12:09 PM, Bart Van Assche wrote:
> On 8/26/21 7:40 AM, Zhen Lei wrote:
>> lock protection needs to be added only in dd_finish_request(), which
>> is unlikely to cause significant performance side effects.
> 
> Not sure the above is correct. Every new atomic instruction has a
> measurable performance overhead. But I guess in this case that
> overhead is smaller than the time needed to sum 128 per-CPU variables.

perpcu counters only really work, if the summing is not in a hot path,
or if the summing is just some "not zero" thing instead of a full sum.
They just don't scale at all for even moderately sized systems.

>> Tested on my 128-core board with two ssd disks.
>> fio bs=4k rw=read iodepth=128 cpus_allowed=0-95 <others>
>> Before:
>> [183K/0/0 iops]
>> [172K/0/0 iops]
>>
>> After:
>> [258K/0/0 iops]
>> [258K/0/0 iops]
> 
> Nice work!
> 
>> Fixes: fb926032b320 ("block/mq-deadline: Prioritize high-priority requests")
> 
> Shouldn't the Fixes: tag be used only for patches that modify
> functionality? I'm not sure it is appropriate to use this tag for
> performance improvements.

For a regression this big, I think it's the right thing. Anyone that may
backport the original commit definitely should also get the followup
fix. This isn't just a performance improvement, it's fixing a big
performance regression.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-26 18:13   ` Jens Axboe
@ 2021-08-26 18:45     ` Jens Axboe
  2021-08-26 19:17       ` Bart Van Assche
  2021-08-26 23:49       ` Bart Van Assche
  0 siblings, 2 replies; 30+ messages in thread
From: Jens Axboe @ 2021-08-26 18:45 UTC (permalink / raw)
  To: Bart Van Assche, Zhen Lei, linux-block, linux-kernel; +Cc: Damien Le Moal

On 8/26/21 12:13 PM, Jens Axboe wrote:
> On 8/26/21 12:09 PM, Bart Van Assche wrote:
>> On 8/26/21 7:40 AM, Zhen Lei wrote:
>>> lock protection needs to be added only in dd_finish_request(), which
>>> is unlikely to cause significant performance side effects.
>>
>> Not sure the above is correct. Every new atomic instruction has a
>> measurable performance overhead. But I guess in this case that
>> overhead is smaller than the time needed to sum 128 per-CPU variables.
> 
> perpcu counters only really work, if the summing is not in a hot path,
> or if the summing is just some "not zero" thing instead of a full sum.
> They just don't scale at all for even moderately sized systems.

Ugh it's actually even worse in this case, since you do:

static u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)               
{                                                                               
	return dd_sum(dd, inserted, prio) - dd_sum(dd, completed, prio);        
}

which ends up iterating possible CPUs _twice_!

Just ran a quick test here, and I go from 3.55M IOPS to 1.23M switching
to deadline, of which 37% of the overhead is from dd_dispatch().

With the posted patch applied, it runs at 2.3M IOPS with mq-deadline,
which is a lot better. This is on my 3970X test box, so 32 cores, 64
threads.

Bart, either we fix this up ASAP and get rid of the percpu counters in
the hot path, or we revert this patch.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-26 18:45     ` Jens Axboe
@ 2021-08-26 19:17       ` Bart Van Assche
  2021-08-26 19:32         ` Jens Axboe
  2021-08-26 23:49       ` Bart Van Assche
  1 sibling, 1 reply; 30+ messages in thread
From: Bart Van Assche @ 2021-08-26 19:17 UTC (permalink / raw)
  To: Jens Axboe, Zhen Lei, linux-block; +Cc: Damien Le Moal

On 8/26/21 11:45 AM, Jens Axboe wrote:
> Bart, either we fix this up ASAP and get rid of the percpu counters in
> the hot path, or we revert this patch.

I'm at home taking care of my son because he is sick.

Do you want to wait for Zhen to post a fixed version of his patch or do you
want me to fix his patch?

Bart.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-26 19:17       ` Bart Van Assche
@ 2021-08-26 19:32         ` Jens Axboe
  0 siblings, 0 replies; 30+ messages in thread
From: Jens Axboe @ 2021-08-26 19:32 UTC (permalink / raw)
  To: Bart Van Assche, Zhen Lei, linux-block; +Cc: Damien Le Moal

On 8/26/21 1:17 PM, Bart Van Assche wrote:
> On 8/26/21 11:45 AM, Jens Axboe wrote:
>> Bart, either we fix this up ASAP and get rid of the percpu counters in
>> the hot path, or we revert this patch.
> 
> I'm at home taking care of my son because he is sick.

Take care of that, always more important than the kernel :-)

> Do you want to wait for Zhen to post a fixed version of his patch or do you
> want me to fix his patch?

To be a bit proactive, I queued up a revert. Messing with it at this
point is making me a bit nervous. If the final patch ends up looking
clean enough, then we can still go that route. But it really has to be
unequivocally done this late in the game...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-26 18:45     ` Jens Axboe
  2021-08-26 19:17       ` Bart Van Assche
@ 2021-08-26 23:49       ` Bart Van Assche
  2021-08-26 23:51         ` Jens Axboe
  1 sibling, 1 reply; 30+ messages in thread
From: Bart Van Assche @ 2021-08-26 23:49 UTC (permalink / raw)
  To: Jens Axboe, Zhen Lei, linux-block; +Cc: Damien Le Moal

On 8/26/21 11:45 AM, Jens Axboe wrote:
> Just ran a quick test here, and I go from 3.55M IOPS to 1.23M switching
> to deadline, of which 37% of the overhead is from dd_dispatch().
> 
> With the posted patch applied, it runs at 2.3M IOPS with mq-deadline,
> which is a lot better. This is on my 3970X test box, so 32 cores, 64
> threads.

Hi Jens,

With the script below, queue depth >= 2 and an improved version of
Zhen's patch I see 970 K IOPS with the mq-deadline scheduler in an
8 core VM (i7-4790 CPU). In other words, more IOPS than what Zhen
reported with fewer CPU cores. Is that good enough?

Thanks,

Bart.

#!/bin/bash

if [ -e /sys/kernel/config/nullb ]; then
    for d in /sys/kernel/config/nullb/*; do
        [ -d "$d" ] && rmdir "$d"
    done
fi
numcpus=$(grep -c ^processor /proc/cpuinfo)
modprobe -r null_blk
[ -e /sys/module/null_blk ] && exit $?
modprobe null_blk nr_devices=0 &&
    udevadm settle &&
    cd /sys/kernel/config/nullb &&
    mkdir nullb0 &&
    cd nullb0 &&
    echo 0 > completion_nsec &&
    echo 512 > blocksize &&
    echo 0 > home_node &&
    echo 0 > irqmode &&
    echo 1024 > size &&
    echo 0 > memory_backed &&
    echo 2 > queue_mode &&
    echo 1 > power ||
    exit $?

(
    cd /sys/block/nullb0/queue &&
	echo 2 > rq_affinity
) || exit $?

iodepth=${1:-1}
runtime=30
args=()
if [ "$iodepth" = 1 ]; then
	args+=(--ioengine=psync)
else
	args+=(--ioengine=io_uring --iodepth_batch=$((iodepth/2)))
fi
args+=(--iodepth=$iodepth --name=nullb0 --filename=/dev/nullb0\
    --rw=read --bs=512 --loops=$((1<<20)) --direct=1 --numjobs=$numcpus \
    --thread --runtime=$runtime --invalidate=1 --gtod_reduce=1 \
    --group_reporting=1 --ioscheduler=mq-deadline)
if numactl -m 0 -N 0 echo >&/dev/null; then
	numactl -m 0 -N 0 -- fio "${args[@]}"
else
	fio "${args[@]}"
fi


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-26 23:49       ` Bart Van Assche
@ 2021-08-26 23:51         ` Jens Axboe
  2021-08-27  0:03           ` Bart Van Assche
  0 siblings, 1 reply; 30+ messages in thread
From: Jens Axboe @ 2021-08-26 23:51 UTC (permalink / raw)
  To: Bart Van Assche, Zhen Lei, linux-block; +Cc: Damien Le Moal

On 8/26/21 5:49 PM, Bart Van Assche wrote:
> On 8/26/21 11:45 AM, Jens Axboe wrote:
>> Just ran a quick test here, and I go from 3.55M IOPS to 1.23M switching
>> to deadline, of which 37% of the overhead is from dd_dispatch().
>>
>> With the posted patch applied, it runs at 2.3M IOPS with mq-deadline,
>> which is a lot better. This is on my 3970X test box, so 32 cores, 64
>> threads.
> 
> Hi Jens,
> 
> With the script below, queue depth >= 2 and an improved version of
> Zhen's patch I see 970 K IOPS with the mq-deadline scheduler in an
> 8 core VM (i7-4790 CPU). In other words, more IOPS than what Zhen
> reported with fewer CPU cores. Is that good enough?

That depends, what kind of IOPS are you getting if you revert the
original change?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-26 23:51         ` Jens Axboe
@ 2021-08-27  0:03           ` Bart Van Assche
  2021-08-27  0:05             ` Jens Axboe
  0 siblings, 1 reply; 30+ messages in thread
From: Bart Van Assche @ 2021-08-27  0:03 UTC (permalink / raw)
  To: Jens Axboe, Zhen Lei, linux-block; +Cc: Damien Le Moal

On 8/26/21 4:51 PM, Jens Axboe wrote:
> On 8/26/21 5:49 PM, Bart Van Assche wrote:
>> On 8/26/21 11:45 AM, Jens Axboe wrote:
>>> Just ran a quick test here, and I go from 3.55M IOPS to 1.23M switching
>>> to deadline, of which 37% of the overhead is from dd_dispatch().
>>>
>>> With the posted patch applied, it runs at 2.3M IOPS with mq-deadline,
>>> which is a lot better. This is on my 3970X test box, so 32 cores, 64
>>> threads.
>>
>> Hi Jens,
>>
>> With the script below, queue depth >= 2 and an improved version of
>> Zhen's patch I see 970 K IOPS with the mq-deadline scheduler in an
>> 8 core VM (i7-4790 CPU). In other words, more IOPS than what Zhen
>> reported with fewer CPU cores. Is that good enough?
> 
> That depends, what kind of IOPS are you getting if you revert the
> original change?

Hi Jens,

Here is an overview of the tests I ran so far, all on the same test
setup:
* No I/O scheduler:               about 5630 K IOPS.
* Kernel v5.11 + mq-deadline:     about 1100 K IOPS.
* block-for-next + mq-deadline:   about  760 K IOPS.
* block-for-next with improved mq-deadline performance: about 970 K IOPS.

Bart.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-27  0:03           ` Bart Van Assche
@ 2021-08-27  0:05             ` Jens Axboe
  2021-08-27  0:58               ` Bart Van Assche
  2021-08-27  2:48               ` Bart Van Assche
  0 siblings, 2 replies; 30+ messages in thread
From: Jens Axboe @ 2021-08-27  0:05 UTC (permalink / raw)
  To: Bart Van Assche, Zhen Lei, linux-block; +Cc: Damien Le Moal

On 8/26/21 6:03 PM, Bart Van Assche wrote:
> On 8/26/21 4:51 PM, Jens Axboe wrote:
>> On 8/26/21 5:49 PM, Bart Van Assche wrote:
>>> On 8/26/21 11:45 AM, Jens Axboe wrote:
>>>> Just ran a quick test here, and I go from 3.55M IOPS to 1.23M switching
>>>> to deadline, of which 37% of the overhead is from dd_dispatch().
>>>>
>>>> With the posted patch applied, it runs at 2.3M IOPS with mq-deadline,
>>>> which is a lot better. This is on my 3970X test box, so 32 cores, 64
>>>> threads.
>>>
>>> Hi Jens,
>>>
>>> With the script below, queue depth >= 2 and an improved version of
>>> Zhen's patch I see 970 K IOPS with the mq-deadline scheduler in an
>>> 8 core VM (i7-4790 CPU). In other words, more IOPS than what Zhen
>>> reported with fewer CPU cores. Is that good enough?
>>
>> That depends, what kind of IOPS are you getting if you revert the
>> original change?
> 
> Hi Jens,
> 
> Here is an overview of the tests I ran so far, all on the same test
> setup:
> * No I/O scheduler:               about 5630 K IOPS.
> * Kernel v5.11 + mq-deadline:     about 1100 K IOPS.
> * block-for-next + mq-deadline:   about  760 K IOPS.
> * block-for-next with improved mq-deadline performance: about 970 K IOPS.

So we're still off by about 12%, I don't think that is good enough.
That's assuming that v5.11 + mq-deadline is the same as for-next with
the mq-deadline change reverted? Because that would be the key number to
compare it with.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-27  0:05             ` Jens Axboe
@ 2021-08-27  0:58               ` Bart Van Assche
  2021-08-27  2:48               ` Bart Van Assche
  1 sibling, 0 replies; 30+ messages in thread
From: Bart Van Assche @ 2021-08-27  0:58 UTC (permalink / raw)
  To: Jens Axboe, Zhen Lei, linux-block; +Cc: Damien Le Moal

On 8/26/21 5:05 PM, Jens Axboe wrote:
>> Here is an overview of the tests I ran so far, all on the same test
>> setup:
>> * No I/O scheduler:               about 5630 K IOPS.
>> * Kernel v5.11 + mq-deadline:     about 1100 K IOPS.
>> * block-for-next + mq-deadline:   about  760 K IOPS.
>> * block-for-next with improved mq-deadline performance: about 970 K IOPS.
> 
> So we're still off by about 12%, I don't think that is good enough.
> That's assuming that v5.11 + mq-deadline is the same as for-next with
> the mq-deadline change reverted? Because that would be the key number to
> compare it with.

A quick attempt to eliminate the ktime_get_ns() call from
__dd_dispatch_request() improved performance a few percent but not as much
as I was hoping. I need a few days of time to run these measurements, to
optimize performance further and to rerun all functional and performance
tests.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-26 14:40 [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests Zhen Lei
  2021-08-26 18:09 ` Bart Van Assche
@ 2021-08-27  2:30 ` Damien Le Moal
  2021-08-28  2:14   ` Leizhen (ThunderTown)
  1 sibling, 1 reply; 30+ messages in thread
From: Damien Le Moal @ 2021-08-27  2:30 UTC (permalink / raw)
  To: axboe, linux-kernel, thunder.leizhen, linux-block; +Cc: bvanassche

On Thu, 2021-08-26 at 22:40 +0800, Zhen Lei wrote:
> dd_queued() traverses the percpu variable for summation. The more cores,
> the higher the performance overhead. I currently have a 128-core board and
> this function takes 2.5 us. If the number of high-priority requests is
> small and the number of low- and medium-priority requests is large, the
> performance impact is significant.
> 
> Let's maintain a non-percpu member variable 'nr_queued', which is
> incremented by 1 immediately following "inserted++" and decremented by 1
> immediately following "completed++". Because both the judgment dd_queued()
> in dd_dispatch_request() and operation "inserted++" in dd_insert_request()
> are protected by dd->lock, lock protection needs to be added only in
> dd_finish_request(), which is unlikely to cause significant performance
> side effects.
> 
> Tested on my 128-core board with two ssd disks.
> fio bs=4k rw=read iodepth=128 cpus_allowed=0-95 <others>
> Before:
> [183K/0/0 iops]
> [172K/0/0 iops]
> 
> After:
> [258K/0/0 iops]
> [258K/0/0 iops]
> 
> Fixes: fb926032b320 ("block/mq-deadline: Prioritize high-priority requests")
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  block/mq-deadline.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
> index a09761cbdf12e58..d8f6aa12de80049 100644
> --- a/block/mq-deadline.c
> +++ b/block/mq-deadline.c
> @@ -79,6 +79,7 @@ struct dd_per_prio {
>  	struct list_head fifo_list[DD_DIR_COUNT];
>  	/* Next request in FIFO order. Read, write or both are NULL. */
>  	struct request *next_rq[DD_DIR_COUNT];
> +	unsigned int nr_queued;
>  };
>  
>  struct deadline_data {
> @@ -277,9 +278,9 @@ deadline_move_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
>  }
>  
>  /* Number of requests queued for a given priority level. */
> -static u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
> +static __always_inline u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
>  {
> -	return dd_sum(dd, inserted, prio) - dd_sum(dd, completed, prio);
> +	return dd->per_prio[prio].nr_queued;
>  }
>  
>  /*
> @@ -711,6 +712,8 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  
>  	prio = ioprio_class_to_prio[ioprio_class];
>  	dd_count(dd, inserted, prio);
> +	per_prio = &dd->per_prio[prio];
> +	per_prio->nr_queued++;
>  
>  	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
>  		blk_mq_free_requests(&free);
> @@ -719,7 +722,6 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>  
>  	trace_block_rq_insert(rq);
>  
> -	per_prio = &dd->per_prio[prio];
>  	if (at_head) {
>  		list_add(&rq->queuelist, &per_prio->dispatch);
>  	} else {
> @@ -790,12 +792,14 @@ static void dd_finish_request(struct request *rq)
>  	const u8 ioprio_class = dd_rq_ioclass(rq);
>  	const enum dd_prio prio = ioprio_class_to_prio[ioprio_class];
>  	struct dd_per_prio *per_prio = &dd->per_prio[prio];
> +	unsigned long flags;
>  
>  	dd_count(dd, completed, prio);
> +	spin_lock_irqsave(&dd->lock, flags);
> +	per_prio->nr_queued--;
> +	spin_unlock_irqrestore(&dd->lock, flags);

dd->lock is not taken with irqsave everywhere else. This leads to hard lockups
which I hit right away on boot. To avoid this, we need a spin_lock_irqsave()
everywhere.

Of note is that without this patch, testing on nullblk with Bart's script on
5.14.0-rc7, I get this splat:

 [  198.726920] watchdog: BUG: soft lockup - CPU#20 stuck for 26s!
[kworker/20:1H:260]
[  198.734550] Modules linked in: null_blk rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver nfs lockd grace fscache netfs nft_fib_inet nft_fib_ipv4
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject
nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set
nf_tables libcrc32c nfnetlink sunrpc vfat fat iTCO_wdt iTCO_vendor_support
ipmi_ssif x86_pkg_temp_thermal acpi_ipmi coretemp ipmi_si ioatdma i2c_i801 bfq
i2c_smbus lpc_ich intel_pch_thermal dca ipmi_devintf ipmi_msghandler
acpi_power_meter fuse ip_tables sd_mod ast i2c_algo_bit drm_vram_helper
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helper ttm
drm i40e crct10dif_pclmul mpt3sas crc32_pclmul ahci ghash_clmulni_intel libahci
libata raid_class scsi_transport_sas pkcs8_key_parser
[  198.805375] irq event stamp: 25378690
[  198.809063] hardirqs last  enabled at (25378689): [<ffffffff81149959>]
ktime_get+0x109/0x120
[  198.817545] hardirqs last disabled at (25378690): [<ffffffff8190519b>]
sysvec_apic_timer_interrupt+0xb/0x90
[  198.827327] softirqs last  enabled at (25337302): [<ffffffff810b331f>]
__irq_exit_rcu+0xbf/0xe0
[  198.836066] softirqs last disabled at (25337297): [<ffffffff810b331f>]
__irq_exit_rcu+0xbf/0xe0
[  198.844802] CPU: 20 PID: 260 Comm: kworker/20:1H Not tainted 5.14.0-rc7+
#1324
[  198.852059] Hardware name: Supermicro Super Server/X11DPL-i, BIOS 3.3
02/21/2020
[  198.859487] Workqueue: kblockd blk_mq_run_work_fn
[  198.864222] RIP: 0010:__list_add_valid+0x33/0x40
[  198.868868] Code: f2 0f 85 ec 44 44 00 4c 8b 0a 4d 39 c1 0f 85 08 45 44 00
48 39 d7 0f 84 e8 44 44 00 4c 39 cf 0f 84 df 44 44 00 b8 01 00 00 00 <c3> 66 66
2e 0f 1f 84 00 00 00 00 00 90 48 8b 17 4c 8b 47 08 48 b8
[  198.887712] RSP: 0018:ffff8883f1337d68 EFLAGS: 00000206
[  198.892963] RAX: 0000000000000001 RBX: ffff88857dae0840 RCX:
0000000000000000
[  198.900132] RDX: ffff8885387e2bc8 RSI: ffff8885387e2bc8 RDI:
ffff8885387e2d48
[  198.907300] RBP: ffff8883f1337d90 R08: ffff8883f1337d90 R09:
ffff8883f1337d90
[  198.914467] R10: 0000000000000020 R11: 0000000000000001 R12:
000000000000000a
[  198.921632] R13: ffff88857dae0800 R14: ffff8885bd3f3400 R15:
ffff8885bd276200
[  198.928801] FS:  0000000000000000(0000) GS:ffff888860100000(0000)
knlGS:0000000000000000
[  198.936929] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  198.942703] CR2: 0000000002204440 CR3: 0000000107322004 CR4:
00000000007706e0
[  198.949871] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  198.957036] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  198.964203] PKRU: 55555554
[  198.966933] Call Trace:
[  198.969401]  __blk_mq_do_dispatch_sched+0x234/0x2f0
[  198.974314]  __blk_mq_sched_dispatch_requests+0xf4/0x140
[  198.979662]  blk_mq_sched_dispatch_requests+0x30/0x60
[  198.984744]  __blk_mq_run_hw_queue+0x49/0x90
[  198.989041]  process_one_work+0x26c/0x570
[  198.993083]  worker_thread+0x55/0x3c0
[  198.996776]  ? process_one_work+0x570/0x570
[  199.000993]  kthread+0x140/0x160
[  199.004243]  ? set_kthread_struct+0x40/0x40
[  199.008452]  ret_from_fork+0x1f/0x30




>  
>  	if (blk_queue_is_zoned(q)) {
> -		unsigned long flags;
> -
>  		spin_lock_irqsave(&dd->zone_lock, flags);
>  		blk_req_zone_write_unlock(rq);
>  		if (!list_empty(&per_prio->fifo_list[DD_WRITE]))

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-27  0:05             ` Jens Axboe
  2021-08-27  0:58               ` Bart Van Assche
@ 2021-08-27  2:48               ` Bart Van Assche
  2021-08-27  3:13                 ` Jens Axboe
  1 sibling, 1 reply; 30+ messages in thread
From: Bart Van Assche @ 2021-08-27  2:48 UTC (permalink / raw)
  To: Jens Axboe, Zhen Lei, linux-block; +Cc: Damien Le Moal

On 8/26/21 5:05 PM, Jens Axboe wrote:
> On 8/26/21 6:03 PM, Bart Van Assche wrote:
>> Here is an overview of the tests I ran so far, all on the same test
>> setup:
>> * No I/O scheduler:               about 5630 K IOPS.
>> * Kernel v5.11 + mq-deadline:     about 1100 K IOPS.
>> * block-for-next + mq-deadline:   about  760 K IOPS.
>> * block-for-next with improved mq-deadline performance: about 970 K IOPS.
> 
> So we're still off by about 12%, I don't think that is good enough.
> That's assuming that v5.11 + mq-deadline is the same as for-next with
> the mq-deadline change reverted? Because that would be the key number to
> compare it with.

With the patch series that is available at
https://github.com/bvanassche/linux/tree/block-for-next the same test reports
1090 K IOPS or only 1% below the v5.11 result. I will post that series on the
linux-block mailing list after I have finished testing that series.

Bart.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-27  2:48               ` Bart Van Assche
@ 2021-08-27  3:13                 ` Jens Axboe
  2021-08-27  4:49                   ` Damien Le Moal
  2021-08-28  1:45                   ` Leizhen (ThunderTown)
  0 siblings, 2 replies; 30+ messages in thread
From: Jens Axboe @ 2021-08-27  3:13 UTC (permalink / raw)
  To: Bart Van Assche, Zhen Lei, linux-block; +Cc: Damien Le Moal

On 8/26/21 8:48 PM, Bart Van Assche wrote:
> On 8/26/21 5:05 PM, Jens Axboe wrote:
>> On 8/26/21 6:03 PM, Bart Van Assche wrote:
>>> Here is an overview of the tests I ran so far, all on the same test
>>> setup:
>>> * No I/O scheduler:               about 5630 K IOPS.
>>> * Kernel v5.11 + mq-deadline:     about 1100 K IOPS.
>>> * block-for-next + mq-deadline:   about  760 K IOPS.
>>> * block-for-next with improved mq-deadline performance: about 970 K IOPS.
>>
>> So we're still off by about 12%, I don't think that is good enough.
>> That's assuming that v5.11 + mq-deadline is the same as for-next with
>> the mq-deadline change reverted? Because that would be the key number to
>> compare it with.
> 
> With the patch series that is available at
> https://github.com/bvanassche/linux/tree/block-for-next the same test reports
> 1090 K IOPS or only 1% below the v5.11 result. I will post that series on the
> linux-block mailing list after I have finished testing that series.

OK sounds good. I do think we should just do the revert at this point,
any real fix is going to end up being bigger than I'd like at this
point. Then we can re-introduce the feature once we're happy with the
results.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-27  3:13                 ` Jens Axboe
@ 2021-08-27  4:49                   ` Damien Le Moal
  2021-08-27 14:34                     ` Bart Van Assche
  2021-08-28  1:45                   ` Leizhen (ThunderTown)
  1 sibling, 1 reply; 30+ messages in thread
From: Damien Le Moal @ 2021-08-27  4:49 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, Zhen Lei, linux-block

[-- Attachment #1: Type: text/plain, Size: 6039 bytes --]

On 2021/08/27 12:13, Jens Axboe wrote:
> On 8/26/21 8:48 PM, Bart Van Assche wrote:
>> On 8/26/21 5:05 PM, Jens Axboe wrote:
>>> On 8/26/21 6:03 PM, Bart Van Assche wrote:
>>>> Here is an overview of the tests I ran so far, all on the same test
>>>> setup:
>>>> * No I/O scheduler:               about 5630 K IOPS.
>>>> * Kernel v5.11 + mq-deadline:     about 1100 K IOPS.
>>>> * block-for-next + mq-deadline:   about  760 K IOPS.
>>>> * block-for-next with improved mq-deadline performance: about 970 K IOPS.
>>>
>>> So we're still off by about 12%, I don't think that is good enough.
>>> That's assuming that v5.11 + mq-deadline is the same as for-next with
>>> the mq-deadline change reverted? Because that would be the key number to
>>> compare it with.
>>
>> With the patch series that is available at
>> https://github.com/bvanassche/linux/tree/block-for-next the same test reports
>> 1090 K IOPS or only 1% below the v5.11 result. I will post that series on the
>> linux-block mailing list after I have finished testing that series.
> 
> OK sounds good. I do think we should just do the revert at this point,
> any real fix is going to end up being bigger than I'd like at this
> point. Then we can re-introduce the feature once we're happy with the
> results.

FYI, here is what I get with Bart's test script running on a dual socket
8-cores/16-threads Xeon machine (32 CPUs total):

* 5.14.0-rc7, with fb926032b320 reverted:
-----------------------------------------

QD 1: IOPS=305k (*)
QD 2: IOPS=411k
QD 4: IOPS=408k
QD 8: IOPS=414k

* 5.14.0-rc7, current (no modification):
----------------------------------------

QD 1: IOPS=296k (*)
QD 2: IOPS=207k
QD 4: IOPS=208k
QD 8: IOPS=210k

* 5.14.0-rc7, with modified patch (attached to this email):
-----------------------------------------------------------

QD 1: IOPS=287k (*)
QD 2: IOPS=334k
QD 4: IOPS=330k
QD 8: IOPS=334k

For reference, with the same test script using the none scheduler:

QD 1: IOPS=2172K
QD 2: IOPS=1075K
QD 4: IOPS=1075k
QD 8: IOPS=1077k

So the mq-deadline priority patch reduces performance by nearly half at high QD.
With the modified patch, we are back to better numbers, but still a significant
20% drop at high QD.

(*) Note: in all cases using the mq-deadline scheduler, for the first run at
QD=1, I get this splat 100% of the time.

[   95.173889] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/0:1H:757]
[   95.181351] Modules linked in: null_blk rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver nfs lockd grace fscache netfs nft_fib_inet nft_fib_ipv4
nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject
nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set
nf_tables libcrc32c nfnetlink sunrpc vfat fat iTCO_wdt iTCO_vendor_support
ipmi_ssif x86_pkg_temp_thermal coretemp i2c_i801 acpi_ipmi bfq i2c_smbus ioatdma
lpc_ich ipmi_si intel_pch_thermal dca ipmi_devintf ipmi_msghandler
acpi_power_meter fuse ip_tables sd_mod ast i2c_algo_bit drm_vram_helper
drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helper ttm
drm i40e crct10dif_pclmul mpt3sas crc32_pclmul ahci ghash_clmulni_intel libahci
raid_class scsi_transport_sas libata pkcs8_key_parser
[   95.252173] irq event stamp: 30500990
[   95.255860] hardirqs last  enabled at (30500989): [<ffffffff81910e2d>]
_raw_spin_unlock_irqrestore+0x2d/0x40
[   95.265735] hardirqs last disabled at (30500990): [<ffffffff819050cb>]
sysvec_apic_timer_interrupt+0xb/0x90
[   95.275520] softirqs last  enabled at (30496338): [<ffffffff810b331f>]
__irq_exit_rcu+0xbf/0xe0
[   95.284259] softirqs last disabled at (30496333): [<ffffffff810b331f>]
__irq_exit_rcu+0xbf/0xe0
[   95.292994] CPU: 0 PID: 757 Comm: kworker/0:1H Not tainted 5.14.0-rc7+ #1334
[   95.300076] Hardware name: Supermicro Super Server/X11DPL-i, BIOS 3.3 02/21/2020
[   95.307504] Workqueue: kblockd blk_mq_run_work_fn
[   95.312243] RIP: 0010:_raw_spin_unlock_irqrestore+0x35/0x40
[   95.317844] Code: c7 18 53 48 89 f3 48 8b 74 24 10 e8 35 82 80 ff 48 89 ef e8
9d ac 80 ff 80 e7 02 74 06 e8 23 33 8b ff fb 65 ff 0d 8b 5f 70 7e <5b> 5d c3 0f
1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 fd 65 ff
[   95.336680] RSP: 0018:ffff888448cefbb0 EFLAGS: 00000202
[   95.341934] RAX: 0000000001d1687d RBX: 0000000000000287 RCX: 0000000000000006
[   95.349103] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81910e2d
[   95.356270] RBP: ffff888192649218 R08: 0000000000000001 R09: 0000000000000001
[   95.363437] R10: 0000000000000000 R11: 000000000000005c R12: 0000000000000000
[   95.370604] R13: 0000000000000287 R14: ffff888192649218 R15: ffff88885fe68e80
[   95.377771] FS:  0000000000000000(0000) GS:ffff88885fe00000(0000)
knlGS:0000000000000000
[   95.385901] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   95.391675] CR2: 00007f59bfe71f80 CR3: 000000074a91e005 CR4: 00000000007706f0
[   95.398842] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   95.406009] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   95.413176] PKRU: 55555554
[   95.415904] Call Trace:
[   95.418373]  try_to_wake_up+0x268/0x7c0
[   95.422238]  blk_update_request+0x25b/0x420
[   95.426452]  blk_mq_end_request+0x1c/0x120
[   95.430576]  null_handle_cmd+0x12d/0x270 [null_blk]
[   95.435485]  blk_mq_dispatch_rq_list+0x13c/0x7f0
[   95.440130]  ? sbitmap_get+0x86/0x190
[   95.443826]  __blk_mq_do_dispatch_sched+0xb5/0x2f0
[   95.448653]  __blk_mq_sched_dispatch_requests+0xf4/0x140
[   95.453998]  blk_mq_sched_dispatch_requests+0x30/0x60
[   95.459083]  __blk_mq_run_hw_queue+0x49/0x90
[   95.463377]  process_one_work+0x26c/0x570
[   95.467421]  worker_thread+0x55/0x3c0
[   95.471103]  ? process_one_work+0x570/0x570
[   95.475313]  kthread+0x140/0x160
[   95.478567]  ? set_kthread_struct+0x40/0x40
[   95.482774]  ret_from_fork+0x1f/0x30




-- 
Damien Le Moal
Western Digital Research

[-- Attachment #2: 0001-block-mq-deadline-Speed-up-the-dispatch-of-low-prior.patch --]
[-- Type: text/plain, Size: 9575 bytes --]

From 2ac2af2b1316adc934d0e699985567ded595fe26 Mon Sep 17 00:00:00 2001
From: Zhen Lei <thunder.leizhen@huawei.com>
Date: Thu, 26 Aug 2021 22:40:39 +0800
Subject: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority
 requests

dd_queued() traverses the percpu variable for summation. The more cores,
the higher the performance overhead. I currently have a 128-core board and
this function takes 2.5 us. If the number of high-priority requests is
small and the number of low- and medium-priority requests is large, the
performance impact is significant.

Let's maintain a non-percpu member variable 'nr_queued', which is
incremented by 1 immediately following "inserted++" and decremented by 1
immediately following "completed++". Because both the judgment dd_queued()
in dd_dispatch_request() and operation "inserted++" in dd_insert_request()
are protected by dd->lock, lock protection needs to be added only in
dd_finish_request(), which is unlikely to cause significant performance
side effects.

Tested on my 128-core board with two ssd disks.
fio bs=4k rw=read iodepth=128 cpus_allowed=0-95 <others>
Before:
[183K/0/0 iops]
[172K/0/0 iops]

After:
[258K/0/0 iops]
[258K/0/0 iops]

Fixes: fb926032b320 ("block/mq-deadline: Prioritize high-priority requests")
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 block/mq-deadline.c | 81 ++++++++++++++++++++++++---------------------
 1 file changed, 43 insertions(+), 38 deletions(-)

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index a09761cbdf12..0d879f3ff340 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -79,6 +79,7 @@ struct dd_per_prio {
 	struct list_head fifo_list[DD_DIR_COUNT];
 	/* Next request in FIFO order. Read, write or both are NULL. */
 	struct request *next_rq[DD_DIR_COUNT];
+	unsigned int nr_queued;
 };
 
 struct deadline_data {
@@ -106,7 +107,6 @@ struct deadline_data {
 	int aging_expire;
 
 	spinlock_t lock;
-	spinlock_t zone_lock;
 };
 
 /* Count one event of type 'event_type' and with I/O priority 'prio' */
@@ -276,10 +276,12 @@ deadline_move_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 	deadline_remove_request(rq->q, per_prio, rq);
 }
 
-/* Number of requests queued for a given priority level. */
-static u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
+/*
+ * Number of requests queued for a given priority level.
+ */
+static inline u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
 {
-	return dd_sum(dd, inserted, prio) - dd_sum(dd, completed, prio);
+	return dd->per_prio[prio].nr_queued;
 }
 
 /*
@@ -309,7 +311,6 @@ deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 		      enum dd_data_dir data_dir)
 {
 	struct request *rq;
-	unsigned long flags;
 
 	if (list_empty(&per_prio->fifo_list[data_dir]))
 		return NULL;
@@ -322,16 +323,12 @@ deadline_fifo_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 	 * Look for a write request that can be dispatched, that is one with
 	 * an unlocked target zone.
 	 */
-	spin_lock_irqsave(&dd->zone_lock, flags);
 	list_for_each_entry(rq, &per_prio->fifo_list[DD_WRITE], queuelist) {
 		if (blk_req_can_dispatch_to_zone(rq))
-			goto out;
+			return rq;
 	}
-	rq = NULL;
-out:
-	spin_unlock_irqrestore(&dd->zone_lock, flags);
 
-	return rq;
+	return NULL;
 }
 
 /*
@@ -343,7 +340,6 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 		      enum dd_data_dir data_dir)
 {
 	struct request *rq;
-	unsigned long flags;
 
 	rq = per_prio->next_rq[data_dir];
 	if (!rq)
@@ -356,15 +352,13 @@ deadline_next_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
 	 * Look for a write request that can be dispatched, that is one with
 	 * an unlocked target zone.
 	 */
-	spin_lock_irqsave(&dd->zone_lock, flags);
 	while (rq) {
 		if (blk_req_can_dispatch_to_zone(rq))
-			break;
+			return rq;
 		rq = deadline_latter_request(rq);
 	}
-	spin_unlock_irqrestore(&dd->zone_lock, flags);
 
-	return rq;
+	return NULL;
 }
 
 /*
@@ -497,8 +491,10 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	const u64 now_ns = ktime_get_ns();
 	struct request *rq = NULL;
 	enum dd_prio prio;
+	unsigned long flags;
+
+	spin_lock_irqsave(&dd->lock, flags);
 
-	spin_lock(&dd->lock);
 	/*
 	 * Start with dispatching requests whose deadline expired more than
 	 * aging_expire jiffies ago.
@@ -520,7 +516,7 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	}
 
 unlock:
-	spin_unlock(&dd->lock);
+	spin_unlock_irqrestore(&dd->lock, flags);
 
 	return rq;
 }
@@ -622,7 +618,6 @@ static int dd_init_sched(struct request_queue *q, struct elevator_type *e)
 	dd->fifo_batch = fifo_batch;
 	dd->aging_expire = aging_expire;
 	spin_lock_init(&dd->lock);
-	spin_lock_init(&dd->zone_lock);
 
 	q->elevator = eq;
 	return 0;
@@ -675,10 +670,11 @@ static bool dd_bio_merge(struct request_queue *q, struct bio *bio,
 	struct deadline_data *dd = q->elevator->elevator_data;
 	struct request *free = NULL;
 	bool ret;
+	unsigned long flags;
 
-	spin_lock(&dd->lock);
+	spin_lock_irqsave(&dd->lock, flags);
 	ret = blk_mq_sched_try_merge(q, bio, nr_segs, &free);
-	spin_unlock(&dd->lock);
+	spin_unlock_irqrestore(&dd->lock, flags);
 
 	if (free)
 		blk_mq_free_request(free);
@@ -690,7 +686,7 @@ static bool dd_bio_merge(struct request_queue *q, struct bio *bio,
  * add rq to rbtree and fifo
  */
 static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
-			      bool at_head)
+			      bool at_head, struct list_head *free)
 {
 	struct request_queue *q = hctx->queue;
 	struct deadline_data *dd = q->elevator->elevator_data;
@@ -699,7 +695,6 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 	u8 ioprio_class = IOPRIO_PRIO_CLASS(ioprio);
 	struct dd_per_prio *per_prio;
 	enum dd_prio prio;
-	LIST_HEAD(free);
 
 	lockdep_assert_held(&dd->lock);
 
@@ -712,14 +707,14 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
 	prio = ioprio_class_to_prio[ioprio_class];
 	dd_count(dd, inserted, prio);
 
-	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
-		blk_mq_free_requests(&free);
+	if (blk_mq_sched_try_insert_merge(q, rq, free))
 		return;
-	}
+
+	per_prio = &dd->per_prio[prio];
+	per_prio->nr_queued++;
 
 	trace_block_rq_insert(rq);
 
-	per_prio = &dd->per_prio[prio];
 	if (at_head) {
 		list_add(&rq->queuelist, &per_prio->dispatch);
 	} else {
@@ -747,16 +742,23 @@ static void dd_insert_requests(struct blk_mq_hw_ctx *hctx,
 {
 	struct request_queue *q = hctx->queue;
 	struct deadline_data *dd = q->elevator->elevator_data;
+	unsigned long flags;
+	LIST_HEAD(free);
 
-	spin_lock(&dd->lock);
+	spin_lock_irqsave(&dd->lock, flags);
 	while (!list_empty(list)) {
 		struct request *rq;
 
 		rq = list_first_entry(list, struct request, queuelist);
 		list_del_init(&rq->queuelist);
-		dd_insert_request(hctx, rq, at_head);
+		dd_insert_request(hctx, rq, at_head, &free);
+		if (!list_empty(&free)) {
+			spin_unlock_irqrestore(&dd->lock, flags);
+			blk_mq_free_requests(&free);
+			spin_lock_irqsave(&dd->lock, flags);
+		}
 	}
-	spin_unlock(&dd->lock);
+	spin_unlock_irqrestore(&dd->lock, flags);
 }
 
 /*
@@ -790,18 +792,21 @@ static void dd_finish_request(struct request *rq)
 	const u8 ioprio_class = dd_rq_ioclass(rq);
 	const enum dd_prio prio = ioprio_class_to_prio[ioprio_class];
 	struct dd_per_prio *per_prio = &dd->per_prio[prio];
+	unsigned long flags;
 
 	dd_count(dd, completed, prio);
 
-	if (blk_queue_is_zoned(q)) {
-		unsigned long flags;
+	spin_lock_irqsave(&dd->lock, flags);
 
-		spin_lock_irqsave(&dd->zone_lock, flags);
+	per_prio->nr_queued--;
+
+	if (blk_queue_is_zoned(q)) {
 		blk_req_zone_write_unlock(rq);
 		if (!list_empty(&per_prio->fifo_list[DD_WRITE]))
 			blk_mq_sched_mark_restart_hctx(rq->mq_hctx);
-		spin_unlock_irqrestore(&dd->zone_lock, flags);
 	}
+
+	spin_unlock_irqrestore(&dd->lock, flags);
 }
 
 static bool dd_has_work_for_prio(struct dd_per_prio *per_prio)
@@ -899,7 +904,7 @@ static void *deadline_##name##_fifo_start(struct seq_file *m,		\
 	struct deadline_data *dd = q->elevator->elevator_data;		\
 	struct dd_per_prio *per_prio = &dd->per_prio[prio];		\
 									\
-	spin_lock(&dd->lock);						\
+	spin_lock_irq(&dd->lock);					\
 	return seq_list_start(&per_prio->fifo_list[data_dir], *pos);	\
 }									\
 									\
@@ -919,7 +924,7 @@ static void deadline_##name##_fifo_stop(struct seq_file *m, void *v)	\
 	struct request_queue *q = m->private;				\
 	struct deadline_data *dd = q->elevator->elevator_data;		\
 									\
-	spin_unlock(&dd->lock);						\
+	spin_unlock_irq(&dd->lock);					\
 }									\
 									\
 static const struct seq_operations deadline_##name##_fifo_seq_ops = {	\
@@ -1015,7 +1020,7 @@ static void *deadline_dispatch##prio##_start(struct seq_file *m,	\
 	struct deadline_data *dd = q->elevator->elevator_data;		\
 	struct dd_per_prio *per_prio = &dd->per_prio[prio];		\
 									\
-	spin_lock(&dd->lock);						\
+	spin_lock_irq(&dd->lock);					\
 	return seq_list_start(&per_prio->dispatch, *pos);		\
 }									\
 									\
@@ -1035,7 +1040,7 @@ static void deadline_dispatch##prio##_stop(struct seq_file *m, void *v)	\
 	struct request_queue *q = m->private;				\
 	struct deadline_data *dd = q->elevator->elevator_data;		\
 									\
-	spin_unlock(&dd->lock);						\
+	spin_unlock_irq(&dd->lock);					\
 }									\
 									\
 static const struct seq_operations deadline_dispatch##prio##_seq_ops = { \
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-27  4:49                   ` Damien Le Moal
@ 2021-08-27 14:34                     ` Bart Van Assche
  2021-08-29 23:02                       ` Damien Le Moal
  0 siblings, 1 reply; 30+ messages in thread
From: Bart Van Assche @ 2021-08-27 14:34 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe, Zhen Lei, linux-block

On 8/26/21 9:49 PM, Damien Le Moal wrote:
> So the mq-deadline priority patch reduces performance by nearly half at high QD.
> With the modified patch, we are back to better numbers, but still a significant
> 20% drop at high QD.

Hi Damien,

An implementation of I/O priority for the deadline scheduler that reduces the
IOPS drop to 1% on my test setup is available here:
https://github.com/bvanassche/linux/tree/block-for-next

> (*) Note: in all cases using the mq-deadline scheduler, for the first run at
> QD=1, I get this splat 100% of the time.
> 
> [   95.173889] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/0:1H:757]
> [   95.292994] CPU: 0 PID: 757 Comm: kworker/0:1H Not tainted 5.14.0-rc7+ #1334
> [   95.307504] Workqueue: kblockd blk_mq_run_work_fn
> [   95.312243] RIP: 0010:_raw_spin_unlock_irqrestore+0x35/0x40
> [   95.415904] Call Trace:
> [   95.418373]  try_to_wake_up+0x268/0x7c0
> [   95.422238]  blk_update_request+0x25b/0x420
> [   95.426452]  blk_mq_end_request+0x1c/0x120
> [   95.430576]  null_handle_cmd+0x12d/0x270 [null_blk]
> [   95.435485]  blk_mq_dispatch_rq_list+0x13c/0x7f0
> [   95.443826]  __blk_mq_do_dispatch_sched+0xb5/0x2f0
> [   95.448653]  __blk_mq_sched_dispatch_requests+0xf4/0x140
> [   95.453998]  blk_mq_sched_dispatch_requests+0x30/0x60
> [   95.459083]  __blk_mq_run_hw_queue+0x49/0x90
> [   95.463377]  process_one_work+0x26c/0x570
> [   95.467421]  worker_thread+0x55/0x3c0
> [   95.475313]  kthread+0x140/0x160
> [   95.482774]  ret_from_fork+0x1f/0x30

I don't see any function names in the above call stack that refer to the
mq-deadline scheduler? Did I perhaps overlook something? Anyway, if you can
tell me how to reproduce this (kernel commit + kernel config) I will take a
look.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-27  3:13                 ` Jens Axboe
  2021-08-27  4:49                   ` Damien Le Moal
@ 2021-08-28  1:45                   ` Leizhen (ThunderTown)
  2021-08-28  2:19                     ` Bart Van Assche
  1 sibling, 1 reply; 30+ messages in thread
From: Leizhen (ThunderTown) @ 2021-08-28  1:45 UTC (permalink / raw)
  To: Jens Axboe, Bart Van Assche, linux-block; +Cc: Damien Le Moal



On 2021/8/27 11:13, Jens Axboe wrote:
> On 8/26/21 8:48 PM, Bart Van Assche wrote:
>> On 8/26/21 5:05 PM, Jens Axboe wrote:
>>> On 8/26/21 6:03 PM, Bart Van Assche wrote:
>>>> Here is an overview of the tests I ran so far, all on the same test
>>>> setup:
>>>> * No I/O scheduler:               about 5630 K IOPS.
>>>> * Kernel v5.11 + mq-deadline:     about 1100 K IOPS.
>>>> * block-for-next + mq-deadline:   about  760 K IOPS.
>>>> * block-for-next with improved mq-deadline performance: about 970 K IOPS.
>>>
>>> So we're still off by about 12%, I don't think that is good enough.
>>> That's assuming that v5.11 + mq-deadline is the same as for-next with
>>> the mq-deadline change reverted? Because that would be the key number to
>>> compare it with.
>>
>> With the patch series that is available at
>> https://github.com/bvanassche/linux/tree/block-for-next the same test reports
>> 1090 K IOPS or only 1% below the v5.11 result. I will post that series on the
>> linux-block mailing list after I have finished testing that series.
> 
> OK sounds good. I do think we should just do the revert at this point,
> any real fix is going to end up being bigger than I'd like at this
> point. Then we can re-introduce the feature once we're happy with the
> results.

Yes, It's already rc7 and it's no longer good for big changes. Revert is the
best solution, and apply my patch is a compromise solution.

> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-26 18:09 ` Bart Van Assche
  2021-08-26 18:13   ` Jens Axboe
@ 2021-08-28  1:59   ` Leizhen (ThunderTown)
  2021-08-28  2:41     ` Bart Van Assche
  1 sibling, 1 reply; 30+ messages in thread
From: Leizhen (ThunderTown) @ 2021-08-28  1:59 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe, linux-block, linux-kernel; +Cc: Damien Le Moal



On 2021/8/27 2:09, Bart Van Assche wrote:
> On 8/26/21 7:40 AM, Zhen Lei wrote:
>> lock protection needs to be added only in
>> dd_finish_request(), which is unlikely to cause significant performance
>> side effects.
> 
> Not sure the above is correct. Every new atomic instruction has a measurable
> performance overhead. But I guess in this case that overhead is smaller than
> the time needed to sum 128 per-CPU variables.
> 
>> Tested on my 128-core board with two ssd disks.
>> fio bs=4k rw=read iodepth=128 cpus_allowed=0-95 <others>
>> Before:
>> [183K/0/0 iops]
>> [172K/0/0 iops]
>>
>> After:
>> [258K/0/0 iops]
>> [258K/0/0 iops]
> 
> Nice work!
> 
>> Fixes: fb926032b320 ("block/mq-deadline: Prioritize high-priority requests")
> 
> Shouldn't the Fixes: tag be used only for patches that modify functionality?
> I'm not sure it is appropriate to use this tag for performance improvements.
> 
>>  struct deadline_data {
>> @@ -277,9 +278,9 @@ deadline_move_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
>>  }
>>  
>>  /* Number of requests queued for a given priority level. */
>> -static u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
>> +static __always_inline u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
>>  {
>> -	return dd_sum(dd, inserted, prio) - dd_sum(dd, completed, prio);
>> +	return dd->per_prio[prio].nr_queued;
>>  }
> 
> Please leave out "__always_inline". Modern compilers are smart enough to
> inline this function without using the "inline" keyword.

Yes.

> 
>> @@ -711,6 +712,8 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>  
>>  	prio = ioprio_class_to_prio[ioprio_class];
>>  	dd_count(dd, inserted, prio);
>> +	per_prio = &dd->per_prio[prio];
>> +	per_prio->nr_queued++;
>>  
>>  	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
>>  		blk_mq_free_requests(&free);
> 
> I think the above is wrong - nr_queued should not be incremented if the
> request is merged into another request. Please move the code that increments
> nr_queued past the above if-statement.

So dd_count(dd, inserted, prio) needs to be moved behind "if-statement" as well?

> 
> Thanks,
> 
> Bart.
> .
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-27  2:30 ` Damien Le Moal
@ 2021-08-28  2:14   ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 30+ messages in thread
From: Leizhen (ThunderTown) @ 2021-08-28  2:14 UTC (permalink / raw)
  To: Damien Le Moal, axboe, linux-kernel, linux-block; +Cc: bvanassche



On 2021/8/27 10:30, Damien Le Moal wrote:
> On Thu, 2021-08-26 at 22:40 +0800, Zhen Lei wrote:
>> dd_queued() traverses the percpu variable for summation. The more cores,
>> the higher the performance overhead. I currently have a 128-core board and
>> this function takes 2.5 us. If the number of high-priority requests is
>> small and the number of low- and medium-priority requests is large, the
>> performance impact is significant.
>>
>> Let's maintain a non-percpu member variable 'nr_queued', which is
>> incremented by 1 immediately following "inserted++" and decremented by 1
>> immediately following "completed++". Because both the judgment dd_queued()
>> in dd_dispatch_request() and operation "inserted++" in dd_insert_request()
>> are protected by dd->lock, lock protection needs to be added only in
>> dd_finish_request(), which is unlikely to cause significant performance
>> side effects.
>>
>> Tested on my 128-core board with two ssd disks.
>> fio bs=4k rw=read iodepth=128 cpus_allowed=0-95 <others>
>> Before:
>> [183K/0/0 iops]
>> [172K/0/0 iops]
>>
>> After:
>> [258K/0/0 iops]
>> [258K/0/0 iops]
>>
>> Fixes: fb926032b320 ("block/mq-deadline: Prioritize high-priority requests")
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  block/mq-deadline.c | 14 +++++++++-----
>>  1 file changed, 9 insertions(+), 5 deletions(-)
>>
>> diff --git a/block/mq-deadline.c b/block/mq-deadline.c
>> index a09761cbdf12e58..d8f6aa12de80049 100644
>> --- a/block/mq-deadline.c
>> +++ b/block/mq-deadline.c
>> @@ -79,6 +79,7 @@ struct dd_per_prio {
>>  	struct list_head fifo_list[DD_DIR_COUNT];
>>  	/* Next request in FIFO order. Read, write or both are NULL. */
>>  	struct request *next_rq[DD_DIR_COUNT];
>> +	unsigned int nr_queued;
>>  };
>>  
>>  struct deadline_data {
>> @@ -277,9 +278,9 @@ deadline_move_request(struct deadline_data *dd, struct dd_per_prio *per_prio,
>>  }
>>  
>>  /* Number of requests queued for a given priority level. */
>> -static u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
>> +static __always_inline u32 dd_queued(struct deadline_data *dd, enum dd_prio prio)
>>  {
>> -	return dd_sum(dd, inserted, prio) - dd_sum(dd, completed, prio);
>> +	return dd->per_prio[prio].nr_queued;
>>  }
>>  
>>  /*
>> @@ -711,6 +712,8 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>  
>>  	prio = ioprio_class_to_prio[ioprio_class];
>>  	dd_count(dd, inserted, prio);
>> +	per_prio = &dd->per_prio[prio];
>> +	per_prio->nr_queued++;
>>  
>>  	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
>>  		blk_mq_free_requests(&free);
>> @@ -719,7 +722,6 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>  
>>  	trace_block_rq_insert(rq);
>>  
>> -	per_prio = &dd->per_prio[prio];
>>  	if (at_head) {
>>  		list_add(&rq->queuelist, &per_prio->dispatch);
>>  	} else {
>> @@ -790,12 +792,14 @@ static void dd_finish_request(struct request *rq)
>>  	const u8 ioprio_class = dd_rq_ioclass(rq);
>>  	const enum dd_prio prio = ioprio_class_to_prio[ioprio_class];
>>  	struct dd_per_prio *per_prio = &dd->per_prio[prio];
>> +	unsigned long flags;
>>  
>>  	dd_count(dd, completed, prio);
>> +	spin_lock_irqsave(&dd->lock, flags);
>> +	per_prio->nr_queued--;
>> +	spin_unlock_irqrestore(&dd->lock, flags);
> 
> dd->lock is not taken with irqsave everywhere else. This leads to hard lockups
> which I hit right away on boot. To avoid this, we need a spin_lock_irqsave()
> everywhere.

Yes, it's safer to add interrupt protection. I noticed that too. But I thought
there was a convention for upper-layer functions to turn off interrupts. So I
didn't touch it.

> 
> Of note is that without this patch, testing on nullblk with Bart's script on
> 5.14.0-rc7, I get this splat:
> 
>  [  198.726920] watchdog: BUG: soft lockup - CPU#20 stuck for 26s!
> [kworker/20:1H:260]
> [  198.734550] Modules linked in: null_blk rpcsec_gss_krb5 auth_rpcgss nfsv4
> dns_resolver nfs lockd grace fscache netfs nft_fib_inet nft_fib_ipv4
> nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject
> nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set
> nf_tables libcrc32c nfnetlink sunrpc vfat fat iTCO_wdt iTCO_vendor_support
> ipmi_ssif x86_pkg_temp_thermal acpi_ipmi coretemp ipmi_si ioatdma i2c_i801 bfq
> i2c_smbus lpc_ich intel_pch_thermal dca ipmi_devintf ipmi_msghandler
> acpi_power_meter fuse ip_tables sd_mod ast i2c_algo_bit drm_vram_helper
> drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helper ttm
> drm i40e crct10dif_pclmul mpt3sas crc32_pclmul ahci ghash_clmulni_intel libahci
> libata raid_class scsi_transport_sas pkcs8_key_parser
> [  198.805375] irq event stamp: 25378690
> [  198.809063] hardirqs last  enabled at (25378689): [<ffffffff81149959>]
> ktime_get+0x109/0x120
> [  198.817545] hardirqs last disabled at (25378690): [<ffffffff8190519b>]
> sysvec_apic_timer_interrupt+0xb/0x90
> [  198.827327] softirqs last  enabled at (25337302): [<ffffffff810b331f>]
> __irq_exit_rcu+0xbf/0xe0
> [  198.836066] softirqs last disabled at (25337297): [<ffffffff810b331f>]
> __irq_exit_rcu+0xbf/0xe0
> [  198.844802] CPU: 20 PID: 260 Comm: kworker/20:1H Not tainted 5.14.0-rc7+
> #1324
> [  198.852059] Hardware name: Supermicro Super Server/X11DPL-i, BIOS 3.3
> 02/21/2020
> [  198.859487] Workqueue: kblockd blk_mq_run_work_fn
> [  198.864222] RIP: 0010:__list_add_valid+0x33/0x40
> [  198.868868] Code: f2 0f 85 ec 44 44 00 4c 8b 0a 4d 39 c1 0f 85 08 45 44 00
> 48 39 d7 0f 84 e8 44 44 00 4c 39 cf 0f 84 df 44 44 00 b8 01 00 00 00 <c3> 66 66
> 2e 0f 1f 84 00 00 00 00 00 90 48 8b 17 4c 8b 47 08 48 b8
> [  198.887712] RSP: 0018:ffff8883f1337d68 EFLAGS: 00000206
> [  198.892963] RAX: 0000000000000001 RBX: ffff88857dae0840 RCX:
> 0000000000000000
> [  198.900132] RDX: ffff8885387e2bc8 RSI: ffff8885387e2bc8 RDI:
> ffff8885387e2d48
> [  198.907300] RBP: ffff8883f1337d90 R08: ffff8883f1337d90 R09:
> ffff8883f1337d90
> [  198.914467] R10: 0000000000000020 R11: 0000000000000001 R12:
> 000000000000000a
> [  198.921632] R13: ffff88857dae0800 R14: ffff8885bd3f3400 R15:
> ffff8885bd276200
> [  198.928801] FS:  0000000000000000(0000) GS:ffff888860100000(0000)
> knlGS:0000000000000000
> [  198.936929] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  198.942703] CR2: 0000000002204440 CR3: 0000000107322004 CR4:
> 00000000007706e0
> [  198.949871] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [  198.957036] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [  198.964203] PKRU: 55555554
> [  198.966933] Call Trace:
> [  198.969401]  __blk_mq_do_dispatch_sched+0x234/0x2f0
> [  198.974314]  __blk_mq_sched_dispatch_requests+0xf4/0x140
> [  198.979662]  blk_mq_sched_dispatch_requests+0x30/0x60
> [  198.984744]  __blk_mq_run_hw_queue+0x49/0x90
> [  198.989041]  process_one_work+0x26c/0x570
> [  198.993083]  worker_thread+0x55/0x3c0
> [  198.996776]  ? process_one_work+0x570/0x570
> [  199.000993]  kthread+0x140/0x160
> [  199.004243]  ? set_kthread_struct+0x40/0x40
> [  199.008452]  ret_from_fork+0x1f/0x30
> 
> 
> 
> 
>>  
>>  	if (blk_queue_is_zoned(q)) {
>> -		unsigned long flags;
>> -
>>  		spin_lock_irqsave(&dd->zone_lock, flags);
>>  		blk_req_zone_write_unlock(rq);
>>  		if (!list_empty(&per_prio->fifo_list[DD_WRITE]))
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-28  1:45                   ` Leizhen (ThunderTown)
@ 2021-08-28  2:19                     ` Bart Van Assche
  2021-08-28  2:42                       ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 30+ messages in thread
From: Bart Van Assche @ 2021-08-28  2:19 UTC (permalink / raw)
  To: Leizhen (ThunderTown), Jens Axboe, linux-block; +Cc: Damien Le Moal

On 8/27/21 6:45 PM, Leizhen (ThunderTown) wrote:
> On 2021/8/27 11:13, Jens Axboe wrote:
>> On 8/26/21 8:48 PM, Bart Van Assche wrote:
>>> With the patch series that is available at
>>> https://github.com/bvanassche/linux/tree/block-for-next the same test reports
>>> 1090 K IOPS or only 1% below the v5.11 result. I will post that series on the
>>> linux-block mailing list after I have finished testing that series.
>>
>> OK sounds good. I do think we should just do the revert at this point,
>> any real fix is going to end up being bigger than I'd like at this
>> point. Then we can re-introduce the feature once we're happy with the
>> results.
> 
> Yes, It's already rc7 and it's no longer good for big changes. Revert is the
> best solution, and apply my patch is a compromise solution.

Please take a look at the patch series that is available at
https://github.com/bvanassche/linux/tree/block-for-next. Performance for
that patch series is significantly better than with your patch.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-28  1:59   ` Leizhen (ThunderTown)
@ 2021-08-28  2:41     ` Bart Van Assche
  0 siblings, 0 replies; 30+ messages in thread
From: Bart Van Assche @ 2021-08-28  2:41 UTC (permalink / raw)
  To: Leizhen (ThunderTown), Jens Axboe, linux-block, linux-kernel
  Cc: Damien Le Moal

On 8/27/21 6:59 PM, Leizhen (ThunderTown) wrote:
>>> @@ -711,6 +712,8 @@ static void dd_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq,
>>>  
>>>  	prio = ioprio_class_to_prio[ioprio_class];
>>>  	dd_count(dd, inserted, prio);
>>> +	per_prio = &dd->per_prio[prio];
>>> +	per_prio->nr_queued++;
>>>  
>>>  	if (blk_mq_sched_try_insert_merge(q, rq, &free)) {
>>>  		blk_mq_free_requests(&free);
>>
>> I think the above is wrong - nr_queued should not be incremented if the
>> request is merged into another request. Please move the code that increments
>> nr_queued past the above if-statement.
> 
> So dd_count(dd, inserted, prio) needs to be moved behind "if-statement" as well?

dd_insert_request() is called if a request is inserted and also if it is
requeued. dd_finish_request() is called once per request. Keeping
dd_count() before blk_mq_sched_try_insert_merge() is fine since
blk_mq_free_requests() will call dd_finish_request() indirectly if a
request is merged. However, dd_count() must only happen once per request
and must not be used if a request is requeued.

Additionally, since dd_insert_request() is called with dd->lock held and
since dd_finish_request() is called directly from inside
dd_insert_request() if a request is merged, acquiring dd->lock from
inside dd_finish_request() may trigger a deadlock. A convenient way to
trigger this code path is by running test block/015 from
https://github.com/osandov/blktests/.

Bart.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-28  2:19                     ` Bart Van Assche
@ 2021-08-28  2:42                       ` Leizhen (ThunderTown)
  2021-08-28 13:14                         ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 30+ messages in thread
From: Leizhen (ThunderTown) @ 2021-08-28  2:42 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe, linux-block; +Cc: Damien Le Moal



On 2021/8/28 10:19, Bart Van Assche wrote:
> On 8/27/21 6:45 PM, Leizhen (ThunderTown) wrote:
>> On 2021/8/27 11:13, Jens Axboe wrote:
>>> On 8/26/21 8:48 PM, Bart Van Assche wrote:
>>>> With the patch series that is available at
>>>> https://github.com/bvanassche/linux/tree/block-for-next the same test reports
>>>> 1090 K IOPS or only 1% below the v5.11 result. I will post that series on the
>>>> linux-block mailing list after I have finished testing that series.
>>>
>>> OK sounds good. I do think we should just do the revert at this point,
>>> any real fix is going to end up being bigger than I'd like at this
>>> point. Then we can re-introduce the feature once we're happy with the
>>> results.
>>
>> Yes, It's already rc7 and it's no longer good for big changes. Revert is the
>> best solution, and apply my patch is a compromise solution.
> 
> Please take a look at the patch series that is available at
> https://github.com/bvanassche/linux/tree/block-for-next. Performance for
> that patch series is significantly better than with your patch.

Yes, this patch is better than mine. However, Jens prefers to avoid the risk of
functional stability in v5.14. v5.15 doesn't need my patch or revert.

I'll test your patch this afternoon. I don't have the environment yet.

> 
> Thanks,
> 
> Bart.
> .
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-28  2:42                       ` Leizhen (ThunderTown)
@ 2021-08-28 13:14                         ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 30+ messages in thread
From: Leizhen (ThunderTown) @ 2021-08-28 13:14 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe, linux-block; +Cc: Damien Le Moal



On 2021/8/28 10:42, Leizhen (ThunderTown) wrote:
> 
> 
> On 2021/8/28 10:19, Bart Van Assche wrote:
>> On 8/27/21 6:45 PM, Leizhen (ThunderTown) wrote:
>>> On 2021/8/27 11:13, Jens Axboe wrote:
>>>> On 8/26/21 8:48 PM, Bart Van Assche wrote:
>>>>> With the patch series that is available at
>>>>> https://github.com/bvanassche/linux/tree/block-for-next the same test reports
>>>>> 1090 K IOPS or only 1% below the v5.11 result. I will post that series on the
>>>>> linux-block mailing list after I have finished testing that series.
>>>>
>>>> OK sounds good. I do think we should just do the revert at this point,
>>>> any real fix is going to end up being bigger than I'd like at this
>>>> point. Then we can re-introduce the feature once we're happy with the
>>>> results.
>>>
>>> Yes, It's already rc7 and it's no longer good for big changes. Revert is the
>>> best solution, and apply my patch is a compromise solution.
>>
>> Please take a look at the patch series that is available at
>> https://github.com/bvanassche/linux/tree/block-for-next. Performance for
>> that patch series is significantly better than with your patch.
> 
> Yes, this patch is better than mine. However, Jens prefers to avoid the risk of
> functional stability in v5.14. v5.15 doesn't need my patch or revert.
> 
> I'll test your patch this afternoon. I don't have the environment yet.


Revert:
253K/0/0
258K/0/0

With your patch:
258K/0/0
252K/0/0

With my patch:
245K/0/0
258K/0/0
244K/0/0

I see that Jens has already pushed "revert" into v5.14-rc8.


> 
>>
>> Thanks,
>>
>> Bart.
>> .
>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-27 14:34                     ` Bart Van Assche
@ 2021-08-29 23:02                       ` Damien Le Moal
  2021-08-30  2:31                         ` Keith Busch
  2021-08-30  2:40                         ` Bart Van Assche
  0 siblings, 2 replies; 30+ messages in thread
From: Damien Le Moal @ 2021-08-29 23:02 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe, Zhen Lei, linux-block

[-- Attachment #1: Type: text/plain, Size: 2327 bytes --]

On 2021/08/27 23:34, Bart Van Assche wrote:
> On 8/26/21 9:49 PM, Damien Le Moal wrote:
>> So the mq-deadline priority patch reduces performance by nearly half at high QD.
>> With the modified patch, we are back to better numbers, but still a significant
>> 20% drop at high QD.
> 
> Hi Damien,
> 
> An implementation of I/O priority for the deadline scheduler that reduces the
> IOPS drop to 1% on my test setup is available here:
> https://github.com/bvanassche/linux/tree/block-for-next
> 
>> (*) Note: in all cases using the mq-deadline scheduler, for the first run at
>> QD=1, I get this splat 100% of the time.
>>
>> [   95.173889] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/0:1H:757]
>> [   95.292994] CPU: 0 PID: 757 Comm: kworker/0:1H Not tainted 5.14.0-rc7+ #1334
>> [   95.307504] Workqueue: kblockd blk_mq_run_work_fn
>> [   95.312243] RIP: 0010:_raw_spin_unlock_irqrestore+0x35/0x40
>> [   95.415904] Call Trace:
>> [   95.418373]  try_to_wake_up+0x268/0x7c0
>> [   95.422238]  blk_update_request+0x25b/0x420
>> [   95.426452]  blk_mq_end_request+0x1c/0x120
>> [   95.430576]  null_handle_cmd+0x12d/0x270 [null_blk]
>> [   95.435485]  blk_mq_dispatch_rq_list+0x13c/0x7f0
>> [   95.443826]  __blk_mq_do_dispatch_sched+0xb5/0x2f0
>> [   95.448653]  __blk_mq_sched_dispatch_requests+0xf4/0x140
>> [   95.453998]  blk_mq_sched_dispatch_requests+0x30/0x60
>> [   95.459083]  __blk_mq_run_hw_queue+0x49/0x90
>> [   95.463377]  process_one_work+0x26c/0x570
>> [   95.467421]  worker_thread+0x55/0x3c0
>> [   95.475313]  kthread+0x140/0x160
>> [   95.482774]  ret_from_fork+0x1f/0x30
> 
> I don't see any function names in the above call stack that refer to the
> mq-deadline scheduler? Did I perhaps overlook something? Anyway, if you can
> tell me how to reproduce this (kernel commit + kernel config) I will take a
> look.

Indeed, the stack trace does not show any mq-deadline function. But the
workqueue is stuck on _raw_spin_unlock_irqrestore() in the blk_mq_run_work_fn()
function. I suspect that the spinlock is dd->lock, so the CPU may be stuck on
entry to mq-deadline dispatch or finish request methods. Not entirely sure.

I got this splat with 5.4.0-rc7 (Linus tag patch) with the attached config.


-- 
Damien Le Moal
Western Digital Research

[-- Attachment #2: config --]
[-- Type: text/plain, Size: 24715 bytes --]

# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_WATCH_QUEUE=y
CONFIG_USELIB=y
CONFIG_AUDIT=y
CONFIG_NO_HZ_FULL=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_BPF_JIT=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=12
CONFIG_CGROUPS=y
CONFIG_MEMCG=y
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_CGROUP_PIDS=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CPUSETS=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_PERF=y
CONFIG_NAMESPACES=y
CONFIG_USER_NS=y
CONFIG_BLK_DEV_INITRD=y
# CONFIG_RD_LZMA is not set
# CONFIG_RD_LZO is not set
# CONFIG_RD_LZ4 is not set
# CONFIG_RD_ZSTD is not set
CONFIG_EXPERT=y
# CONFIG_PCSPKR_PLATFORM is not set
CONFIG_USERFAULTFD=y
# CONFIG_VM_EVENT_COUNTERS is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_SLAB=y
CONFIG_SLAB_FREELIST_RANDOM=y
CONFIG_PROFILING=y
CONFIG_SMP=y
# CONFIG_X86_MPPARSE is not set
# CONFIG_RETPOLINE is not set
# CONFIG_X86_EXTENDED_PLATFORM is not set
CONFIG_X86_AMD_PLATFORM_DEVICE=y
# CONFIG_SCHED_OMIT_FRAME_POINTER is not set
CONFIG_HYPERVISOR_GUEST=y
CONFIG_PROCESSOR_SELECT=y
# CONFIG_CPU_SUP_HYGON is not set
# CONFIG_CPU_SUP_CENTAUR is not set
# CONFIG_CPU_SUP_ZHAOXIN is not set
CONFIG_MAXSMP=y
# CONFIG_PERF_EVENTS_INTEL_UNCORE is not set
# CONFIG_PERF_EVENTS_INTEL_RAPL is not set
# CONFIG_PERF_EVENTS_INTEL_CSTATE is not set
# CONFIG_X86_VSYSCALL_EMULATION is not set
# CONFIG_X86_IOPL_IOPERM is not set
# CONFIG_MICROCODE is not set
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
# CONFIG_X86_5LEVEL is not set
CONFIG_NUMA=y
# CONFIG_AMD_NUMA is not set
CONFIG_EFI=y
CONFIG_EFI_STUB=y
CONFIG_EFI_MIXED=y
CONFIG_HZ_1000=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_KEXEC_SIG=y
CONFIG_KEXEC_BZIMAGE_VERIFY_SIG=y
CONFIG_CRASH_DUMP=y
# CONFIG_RANDOMIZE_BASE is not set
CONFIG_PHYSICAL_ALIGN=0x1000000
CONFIG_LEGACY_VSYSCALL_EMULATE=y
# CONFIG_MODIFY_LDT_SYSCALL is not set
# CONFIG_SUSPEND is not set
CONFIG_ACPI_EC_DEBUGFS=m
# CONFIG_ACPI_AC is not set
# CONFIG_ACPI_BATTERY is not set
CONFIG_ACPI_IPMI=m
CONFIG_ACPI_PCI_SLOT=y
CONFIG_ACPI_CUSTOM_METHOD=m
CONFIG_ACPI_BGRT=y
CONFIG_ACPI_NFIT=m
CONFIG_ACPI_HMAT=y
CONFIG_ACPI_APEI=y
CONFIG_ACPI_APEI_GHES=y
CONFIG_ACPI_APEI_PCIEAER=y
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
CONFIG_X86_PCC_CPUFREQ=m
CONFIG_X86_ACPI_CPUFREQ=m
CONFIG_CPU_IDLE_GOV_MENU=y
CONFIG_INTEL_IDLE=y
CONFIG_X86_SYSFB=y
# CONFIG_FIRMWARE_MEMMAP is not set
# CONFIG_DMIID is not set
CONFIG_EFI_RCI2_TABLE=y
# CONFIG_EFI_CUSTOM_SSDT_OVERLAYS is not set
# CONFIG_VIRTUALIZATION is not set
CONFIG_JUMP_LABEL=y
CONFIG_COMPAT_32BIT_TIME=y
# CONFIG_VMAP_STACK is not set
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_SIG=y
CONFIG_MODULE_SIG_SHA256=y
CONFIG_BLK_DEV_ZONED=y
CONFIG_PARTITION_ADVANCED=y
CONFIG_MQ_IOSCHED_KYBER=m
CONFIG_IOSCHED_BFQ=m
CONFIG_BFQ_GROUP_IOSCHED=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=65536
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
CONFIG_CLEANCACHE=y
CONFIG_FRONTSWAP=y
CONFIG_ZSMALLOC=m
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_PACKET_DIAG=m
CONFIG_UNIX=y
CONFIG_UNIX_DIAG=m
CONFIG_XFRM_USER=y
CONFIG_XFRM_STATISTICS=y
CONFIG_NET_KEY=m
CONFIG_NET_KEY_MIGRATE=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_IP_FIB_TRIE_STATS=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE_DEMUX=m
CONFIG_NET_IPGRE=m
CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE=y
CONFIG_IP_MROUTE_MULTIPLE_TABLES=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
CONFIG_NET_IPVTI=m
CONFIG_NET_FOU_IP_TUNNELS=y
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_ESP_OFFLOAD=m
CONFIG_INET_ESPINTCP=y
CONFIG_INET_IPCOMP=m
CONFIG_INET_DIAG=m
CONFIG_INET_UDP_DIAG=m
CONFIG_INET_RAW_DIAG=m
CONFIG_INET_DIAG_DESTROY=y
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
CONFIG_TCP_CONG_NV=m
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
CONFIG_TCP_CONG_VENO=m
CONFIG_TCP_CONG_YEAH=m
CONFIG_TCP_CONG_ILLINOIS=m
CONFIG_TCP_CONG_DCTCP=m
CONFIG_TCP_CONG_CDG=m
CONFIG_TCP_CONG_BBR=m
CONFIG_TCP_MD5SIG=y
CONFIG_IPV6_ROUTER_PREF=y
CONFIG_IPV6_ROUTE_INFO=y
CONFIG_IPV6_OPTIMISTIC_DAD=y
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
CONFIG_INET6_ESP_OFFLOAD=m
CONFIG_INET6_IPCOMP=m
CONFIG_IPV6_MIP6=y
CONFIG_IPV6_ILA=m
CONFIG_IPV6_VTI=m
CONFIG_IPV6_SIT=m
CONFIG_IPV6_SIT_6RD=y
CONFIG_IPV6_GRE=m
CONFIG_IPV6_SUBTREES=y
CONFIG_IPV6_MROUTE=y
CONFIG_IPV6_MROUTE_MULTIPLE_TABLES=y
CONFIG_IPV6_PIMSM_V2=y
CONFIG_IPV6_SEG6_LWTUNNEL=y
CONFIG_IPV6_SEG6_HMAC=y
CONFIG_NETFILTER=y
CONFIG_NF_CONNTRACK=m
CONFIG_NF_CONNTRACK_ZONES=y
CONFIG_NF_CONNTRACK_EVENTS=y
CONFIG_NF_CONNTRACK_TIMESTAMP=y
CONFIG_NF_CONNTRACK_AMANDA=m
CONFIG_NF_CONNTRACK_FTP=m
CONFIG_NF_CONNTRACK_H323=m
CONFIG_NF_CONNTRACK_IRC=m
CONFIG_NF_CONNTRACK_NETBIOS_NS=m
CONFIG_NF_CONNTRACK_SNMP=m
CONFIG_NF_CONNTRACK_PPTP=m
CONFIG_NF_CONNTRACK_SANE=m
CONFIG_NF_CONNTRACK_SIP=m
CONFIG_NF_CONNTRACK_TFTP=m
CONFIG_NF_CT_NETLINK=m
CONFIG_NF_TABLES=m
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
CONFIG_NFT_NUMGEN=m
CONFIG_NFT_CT=m
CONFIG_NFT_FLOW_OFFLOAD=m
CONFIG_NFT_COUNTER=m
CONFIG_NFT_LOG=m
CONFIG_NFT_LIMIT=m
CONFIG_NFT_MASQ=m
CONFIG_NFT_REDIR=m
CONFIG_NFT_NAT=m
CONFIG_NFT_TUNNEL=m
CONFIG_NFT_OBJREF=m
CONFIG_NFT_QUEUE=m
CONFIG_NFT_QUOTA=m
CONFIG_NFT_REJECT=m
CONFIG_NFT_COMPAT=m
CONFIG_NFT_HASH=m
CONFIG_NFT_FIB_INET=m
CONFIG_NFT_XFRM=m
CONFIG_NFT_SOCKET=m
CONFIG_NFT_TPROXY=m
CONFIG_NFT_SYNPROXY=m
CONFIG_NFT_DUP_NETDEV=m
CONFIG_NFT_FWD_NETDEV=m
CONFIG_NFT_FIB_NETDEV=m
CONFIG_NF_FLOW_TABLE_INET=m
CONFIG_NF_FLOW_TABLE=m
CONFIG_NETFILTER_XTABLES=y
CONFIG_NETFILTER_XT_SET=m
CONFIG_NETFILTER_XT_TARGET_AUDIT=m
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=m
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
CONFIG_NETFILTER_XT_TARGET_CONNMARK=m
CONFIG_NETFILTER_XT_TARGET_DSCP=m
CONFIG_NETFILTER_XT_TARGET_HMARK=m
CONFIG_NETFILTER_XT_TARGET_IDLETIMER=m
CONFIG_NETFILTER_XT_TARGET_LOG=m
CONFIG_NETFILTER_XT_TARGET_MARK=m
CONFIG_NETFILTER_XT_TARGET_NFLOG=m
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
CONFIG_NETFILTER_XT_TARGET_NOTRACK=m
CONFIG_NETFILTER_XT_TARGET_TEE=m
CONFIG_NETFILTER_XT_TARGET_TPROXY=m
CONFIG_NETFILTER_XT_TARGET_TRACE=m
CONFIG_NETFILTER_XT_TARGET_TCPMSS=m
CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP=m
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=m
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_NETFILTER_XT_MATCH_CGROUP=m
CONFIG_NETFILTER_XT_MATCH_CLUSTER=m
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_CONNBYTES=m
CONFIG_NETFILTER_XT_MATCH_CONNLABEL=m
CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=m
CONFIG_NETFILTER_XT_MATCH_CONNMARK=m
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
CONFIG_NETFILTER_XT_MATCH_CPU=m
CONFIG_NETFILTER_XT_MATCH_DCCP=m
CONFIG_NETFILTER_XT_MATCH_DEVGROUP=m
CONFIG_NETFILTER_XT_MATCH_DSCP=m
CONFIG_NETFILTER_XT_MATCH_ESP=m
CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=m
CONFIG_NETFILTER_XT_MATCH_HELPER=m
CONFIG_NETFILTER_XT_MATCH_IPCOMP=m
CONFIG_NETFILTER_XT_MATCH_IPRANGE=m
CONFIG_NETFILTER_XT_MATCH_IPVS=m
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
CONFIG_NETFILTER_XT_MATCH_LIMIT=m
CONFIG_NETFILTER_XT_MATCH_MAC=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
CONFIG_NETFILTER_XT_MATCH_NFACCT=m
CONFIG_NETFILTER_XT_MATCH_OSF=m
CONFIG_NETFILTER_XT_MATCH_OWNER=m
CONFIG_NETFILTER_XT_MATCH_POLICY=m
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
CONFIG_NETFILTER_XT_MATCH_QUOTA=m
CONFIG_NETFILTER_XT_MATCH_RATEEST=m
CONFIG_NETFILTER_XT_MATCH_REALM=m
CONFIG_NETFILTER_XT_MATCH_RECENT=m
CONFIG_NETFILTER_XT_MATCH_SCTP=m
CONFIG_NETFILTER_XT_MATCH_SOCKET=m
CONFIG_NETFILTER_XT_MATCH_STATE=m
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
CONFIG_NETFILTER_XT_MATCH_TIME=m
CONFIG_NETFILTER_XT_MATCH_U32=m
CONFIG_IP_SET=m
CONFIG_IP_SET_BITMAP_IP=m
CONFIG_IP_SET_BITMAP_IPMAC=m
CONFIG_IP_SET_BITMAP_PORT=m
CONFIG_IP_SET_HASH_IP=m
CONFIG_IP_SET_HASH_IPMARK=m
CONFIG_IP_SET_HASH_IPPORT=m
CONFIG_IP_SET_HASH_IPPORTIP=m
CONFIG_IP_SET_HASH_IPPORTNET=m
CONFIG_IP_SET_HASH_IPMAC=m
CONFIG_IP_SET_HASH_MAC=m
CONFIG_IP_SET_HASH_NETPORTNET=m
CONFIG_IP_SET_HASH_NET=m
CONFIG_IP_SET_HASH_NETNET=m
CONFIG_IP_SET_HASH_NETPORT=m
CONFIG_IP_SET_HASH_NETIFACE=m
CONFIG_IP_SET_LIST_SET=m
CONFIG_IP_VS=m
CONFIG_IP_VS_IPV6=y
CONFIG_IP_VS_PROTO_TCP=y
CONFIG_IP_VS_PROTO_UDP=y
CONFIG_IP_VS_PROTO_ESP=y
CONFIG_IP_VS_PROTO_AH=y
CONFIG_IP_VS_PROTO_SCTP=y
CONFIG_IP_VS_RR=m
CONFIG_IP_VS_WRR=m
CONFIG_IP_VS_LC=m
CONFIG_IP_VS_WLC=m
CONFIG_IP_VS_FO=m
CONFIG_IP_VS_OVF=m
CONFIG_IP_VS_LBLC=m
CONFIG_IP_VS_LBLCR=m
CONFIG_IP_VS_DH=m
CONFIG_IP_VS_SH=m
CONFIG_IP_VS_MH=m
CONFIG_IP_VS_SED=m
CONFIG_IP_VS_NQ=m
CONFIG_IP_VS_FTP=m
CONFIG_IP_VS_PE_SIP=m
CONFIG_NFT_DUP_IPV4=m
CONFIG_NFT_FIB_IPV4=m
CONFIG_NF_TABLES_ARP=y
CONFIG_NF_FLOW_TABLE_IPV4=m
CONFIG_NF_LOG_ARP=m
CONFIG_NF_LOG_IPV4=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_AH=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_RPFILTER=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_TARGET_SYNPROXY=m
CONFIG_IP_NF_NAT=m
CONFIG_IP_NF_TARGET_MASQUERADE=m
CONFIG_IP_NF_TARGET_NETMAP=m
CONFIG_IP_NF_TARGET_REDIRECT=m
CONFIG_IP_NF_MANGLE=m
CONFIG_IP_NF_TARGET_CLUSTERIP=m
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
CONFIG_IP_NF_SECURITY=m
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m
CONFIG_NFT_DUP_IPV6=m
CONFIG_NFT_FIB_IPV6=m
CONFIG_NF_FLOW_TABLE_IPV6=m
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_AH=m
CONFIG_IP6_NF_MATCH_EUI64=m
CONFIG_IP6_NF_MATCH_FRAG=m
CONFIG_IP6_NF_MATCH_OPTS=m
CONFIG_IP6_NF_MATCH_HL=m
CONFIG_IP6_NF_MATCH_IPV6HEADER=m
CONFIG_IP6_NF_MATCH_MH=m
CONFIG_IP6_NF_MATCH_RPFILTER=m
CONFIG_IP6_NF_MATCH_RT=m
CONFIG_IP6_NF_MATCH_SRH=m
CONFIG_IP6_NF_TARGET_HL=m
CONFIG_IP6_NF_FILTER=m
CONFIG_IP6_NF_TARGET_REJECT=m
CONFIG_IP6_NF_TARGET_SYNPROXY=m
CONFIG_IP6_NF_MANGLE=m
CONFIG_IP6_NF_RAW=m
CONFIG_IP6_NF_SECURITY=m
CONFIG_IP6_NF_NAT=m
CONFIG_IP6_NF_TARGET_MASQUERADE=m
CONFIG_IP6_NF_TARGET_NPT=m
CONFIG_NF_CONNTRACK_BRIDGE=m
CONFIG_L2TP=m
CONFIG_L2TP_DEBUGFS=m
CONFIG_L2TP_V3=y
CONFIG_L2TP_IP=m
CONFIG_L2TP_ETH=m
CONFIG_NETLINK_DIAG=m
CONFIG_NET_L3_MASTER_DEV=y
CONFIG_CGROUP_NET_PRIO=y
# CONFIG_WIRELESS is not set
CONFIG_PCI=y
CONFIG_PCIEPORTBUS=y
CONFIG_PCIEAER=y
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_FW_LOADER_COMPRESS=y
# CONFIG_PNP_DEBUG_MESSAGES is not set
CONFIG_BLK_DEV_NULL_BLK=m
CONFIG_ZRAM=m
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_LOOP_MIN_COUNT=0
CONFIG_BLK_DEV_NBD=m
CONFIG_BLK_DEV_RAM=m
CONFIG_BLK_DEV_RAM_SIZE=16384
CONFIG_VIRTIO_BLK=y
CONFIG_BLK_DEV_NVME=y
CONFIG_NVME_TCP=m
CONFIG_NVME_TARGET=m
CONFIG_NVME_TARGET_PASSTHRU=y
CONFIG_NVME_TARGET_LOOP=m
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=m
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
CONFIG_SCSI_SCAN_ASYNC=y
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
CONFIG_SCSI_SAS_ATA=y
CONFIG_MEGARAID_NEWGEN=y
CONFIG_MEGARAID_MM=m
CONFIG_MEGARAID_MAILBOX=m
CONFIG_MEGARAID_LEGACY=m
CONFIG_MEGARAID_SAS=m
CONFIG_SCSI_MPT3SAS=m
CONFIG_SCSI_SMARTPQI=m
CONFIG_SCSI_DEBUG=m
CONFIG_SCSI_VIRTIO=m
CONFIG_ATA=m
# CONFIG_SATA_PMP is not set
CONFIG_SATA_AHCI=m
# CONFIG_ATA_SFF is not set
CONFIG_MD=y
CONFIG_MD_LINEAR=m
CONFIG_MD_MULTIPATH=m
CONFIG_BLK_DEV_DM=m
CONFIG_DM_DEBUG=y
CONFIG_DM_UNSTRIPED=m
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_THIN_PROVISIONING=m
CONFIG_DM_CACHE=m
CONFIG_DM_WRITECACHE=m
CONFIG_DM_EBS=m
CONFIG_DM_ERA=m
CONFIG_DM_CLONE=m
CONFIG_DM_MIRROR=m
CONFIG_DM_LOG_USERSPACE=m
CONFIG_DM_RAID=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
CONFIG_DM_MULTIPATH_QL=m
CONFIG_DM_MULTIPATH_ST=m
CONFIG_DM_MULTIPATH_HST=m
CONFIG_DM_MULTIPATH_IOA=m
CONFIG_DM_DELAY=m
CONFIG_DM_DUST=m
CONFIG_DM_UEVENT=y
CONFIG_DM_FLAKEY=m
CONFIG_DM_VERITY=m
CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG=y
CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG_SECONDARY_KEYRING=y
CONFIG_DM_VERITY_FEC=y
CONFIG_DM_SWITCH=m
CONFIG_DM_LOG_WRITES=m
CONFIG_DM_INTEGRITY=m
CONFIG_DM_ZONED=m
CONFIG_TARGET_CORE=m
CONFIG_TCM_IBLOCK=m
CONFIG_TCM_FILEIO=m
CONFIG_TCM_PSCSI=m
CONFIG_TCM_USER2=m
CONFIG_LOOPBACK_TARGET=m
CONFIG_ISCSI_TARGET=m
CONFIG_NETDEVICES=y
CONFIG_TUN=m
CONFIG_VIRTIO_NET=m
# CONFIG_NET_VENDOR_3COM is not set
# CONFIG_NET_VENDOR_ADAPTEC is not set
# CONFIG_NET_VENDOR_AGERE is not set
# CONFIG_NET_VENDOR_ALACRITECH is not set
# CONFIG_NET_VENDOR_ALTEON is not set
# CONFIG_NET_VENDOR_AMAZON is not set
# CONFIG_NET_VENDOR_AMD is not set
# CONFIG_NET_VENDOR_AQUANTIA is not set
# CONFIG_NET_VENDOR_ARC is not set
# CONFIG_NET_VENDOR_ATHEROS is not set
CONFIG_BNXT=m
# CONFIG_NET_VENDOR_BROCADE is not set
# CONFIG_NET_VENDOR_CADENCE is not set
# CONFIG_NET_VENDOR_CAVIUM is not set
# CONFIG_NET_VENDOR_CHELSIO is not set
# CONFIG_NET_VENDOR_CISCO is not set
# CONFIG_NET_VENDOR_CORTINA is not set
# CONFIG_NET_VENDOR_DEC is not set
# CONFIG_NET_VENDOR_DLINK is not set
# CONFIG_NET_VENDOR_EMULEX is not set
# CONFIG_NET_VENDOR_EZCHIP is not set
# CONFIG_NET_VENDOR_GOOGLE is not set
# CONFIG_NET_VENDOR_HUAWEI is not set
# CONFIG_NET_VENDOR_I825XX is not set
CONFIG_E1000=m
CONFIG_E1000E=m
CONFIG_IGB=m
CONFIG_IXGB=m
CONFIG_IXGBE=m
CONFIG_I40E=m
CONFIG_ICE=m
CONFIG_IGC=m
# CONFIG_NET_VENDOR_MICROSOFT is not set
# CONFIG_NET_VENDOR_MARVELL is not set
# CONFIG_NET_VENDOR_MELLANOX is not set
# CONFIG_NET_VENDOR_MICREL is not set
# CONFIG_NET_VENDOR_MICROCHIP is not set
# CONFIG_NET_VENDOR_MICROSEMI is not set
# CONFIG_NET_VENDOR_MYRI is not set
# CONFIG_NET_VENDOR_NATSEMI is not set
# CONFIG_NET_VENDOR_NETERION is not set
# CONFIG_NET_VENDOR_NETRONOME is not set
# CONFIG_NET_VENDOR_NI is not set
# CONFIG_NET_VENDOR_NVIDIA is not set
# CONFIG_NET_VENDOR_OKI is not set
# CONFIG_NET_VENDOR_PACKET_ENGINES is not set
# CONFIG_NET_VENDOR_PENSANDO is not set
# CONFIG_NET_VENDOR_QLOGIC is not set
# CONFIG_NET_VENDOR_QUALCOMM is not set
# CONFIG_NET_VENDOR_RDC is not set
# CONFIG_NET_VENDOR_REALTEK is not set
# CONFIG_NET_VENDOR_RENESAS is not set
# CONFIG_NET_VENDOR_ROCKER is not set
# CONFIG_NET_VENDOR_SAMSUNG is not set
# CONFIG_NET_VENDOR_SEEQ is not set
# CONFIG_NET_VENDOR_SOLARFLARE is not set
# CONFIG_NET_VENDOR_SILAN is not set
# CONFIG_NET_VENDOR_SIS is not set
# CONFIG_NET_VENDOR_SMSC is not set
# CONFIG_NET_VENDOR_SOCIONEXT is not set
# CONFIG_NET_VENDOR_STMICRO is not set
# CONFIG_NET_VENDOR_SUN is not set
# CONFIG_NET_VENDOR_SYNOPSYS is not set
# CONFIG_NET_VENDOR_TEHUTI is not set
# CONFIG_NET_VENDOR_TI is not set
# CONFIG_NET_VENDOR_VIA is not set
# CONFIG_NET_VENDOR_WIZNET is not set
# CONFIG_NET_VENDOR_XILINX is not set
# CONFIG_USB_NET_DRIVERS is not set
# CONFIG_WLAN is not set
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_EVDEV=y
# CONFIG_MOUSE_PS2 is not set
# CONFIG_LEGACY_PTYS is not set
# CONFIG_LDISC_AUTOLOAD is not set
CONFIG_SERIAL_8250=y
# CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_EXAR=m
CONFIG_SERIAL_8250_NR_UARTS=32
CONFIG_SERIAL_8250_RUNTIME_UARTS=32
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
# CONFIG_SERIAL_8250_LPSS is not set
# CONFIG_SERIAL_8250_MID is not set
CONFIG_VIRTIO_CONSOLE=y
CONFIG_IPMI_HANDLER=m
CONFIG_IPMI_DEVICE_INTERFACE=m
CONFIG_IPMI_SI=m
CONFIG_IPMI_SSIF=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=m
CONFIG_HW_RANDOM=y
CONFIG_HW_RANDOM_TIMERIOMEM=m
CONFIG_HW_RANDOM_INTEL=m
# CONFIG_HW_RANDOM_AMD is not set
# CONFIG_HW_RANDOM_VIA is not set
CONFIG_HW_RANDOM_VIRTIO=y
CONFIG_NVRAM=y
CONFIG_HPET=y
# CONFIG_HPET_MMAP is not set
CONFIG_HANGCHECK_TIMER=m
CONFIG_I2C=y
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_AMD756=m
CONFIG_I2C_AMD756_S4882=m
CONFIG_I2C_AMD8111=m
CONFIG_I2C_AMD_MP2=m
CONFIG_I2C_I801=m
CONFIG_I2C_ISCH=m
CONFIG_I2C_ISMT=m
CONFIG_I2C_PIIX4=m
CONFIG_I2C_NFORCE2=m
CONFIG_I2C_NFORCE2_S4985=m
CONFIG_I2C_NVIDIA_GPU=m
CONFIG_I2C_SIS96X=m
CONFIG_I2C_VIA=m
CONFIG_I2C_VIAPRO=m
CONFIG_I2C_SCMI=m
CONFIG_I2C_DESIGNWARE_SLAVE=y
CONFIG_I2C_DESIGNWARE_PLATFORM=y
CONFIG_I2C_DESIGNWARE_PCI=y
CONFIG_I2C_PCA_PLATFORM=m
CONFIG_I2C_SIMTEC=m
CONFIG_I2C_DIOLAN_U2C=m
CONFIG_I2C_TINY_USB=m
CONFIG_I2C_MLXCPLD=m
# CONFIG_PTP_1588_CLOCK is not set
CONFIG_SENSORS_DRIVETEMP=m
CONFIG_SENSORS_I5500=m
CONFIG_SENSORS_CORETEMP=m
CONFIG_SENSORS_ACPI_POWER=m
CONFIG_THERMAL_STATISTICS=y
CONFIG_THERMAL_GOV_FAIR_SHARE=y
CONFIG_THERMAL_GOV_BANG_BANG=y
CONFIG_INTEL_PCH_THERMAL=m
CONFIG_WATCHDOG=y
CONFIG_WATCHDOG_CORE=y
CONFIG_WATCHDOG_SYSFS=y
CONFIG_SOFT_WATCHDOG=m
CONFIG_WDAT_WDT=m
CONFIG_I6300ESB_WDT=m
CONFIG_ITCO_WDT=m
CONFIG_ITCO_VENDOR_SUPPORT=y
CONFIG_LPC_ICH=m
CONFIG_MFD_INTEL_LPSS_ACPI=y
CONFIG_MFD_INTEL_LPSS_PCI=y
CONFIG_AGP=y
CONFIG_AGP_AMD64=m
CONFIG_AGP_INTEL=m
CONFIG_VGA_SWITCHEROO=y
CONFIG_DRM=m
CONFIG_DRM_DP_AUX_CHARDEV=y
CONFIG_DRM_LOAD_EDID_FIRMWARE=y
CONFIG_DRM_I2C_CH7006=m
CONFIG_DRM_I2C_SIL164=m
CONFIG_DRM_I915=m
CONFIG_DRM_GMA500=m
CONFIG_DRM_AST=m
CONFIG_DRM_QXL=m
CONFIG_DRM_VIRTIO_GPU=m
CONFIG_FB=y
CONFIG_FB_TILEBLITTING=y
CONFIG_FB_VGA16=m
CONFIG_FB_VESA=y
CONFIG_FB_EFI=y
CONFIG_FB_VIRTUAL=m
CONFIG_BACKLIGHT_CLASS_DEVICE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER=y
CONFIG_HIDRAW=y
CONFIG_UHID=m
CONFIG_HID_PID=y
CONFIG_USB_HIDDEV=y
CONFIG_USB_ULPI_BUS=m
CONFIG_USB=y
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
CONFIG_USB_XHCI_HCD=y
CONFIG_USB_XHCI_DBGCAP=y
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_OHCI_HCD=y
CONFIG_USB_UHCI_HCD=y
CONFIG_USB_STORAGE=m
CONFIG_USB_UAS=m
CONFIG_RTC_CLASS=y
# CONFIG_RTC_SYSTOHC is not set
# CONFIG_RTC_NVMEM is not set
CONFIG_DMADEVICES=y
CONFIG_INTEL_IDMA64=m
CONFIG_INTEL_IDXD=m
CONFIG_INTEL_IOATDMA=m
CONFIG_UDMABUF=y
CONFIG_UIO=m
CONFIG_VIRTIO_PCI=y
CONFIG_VIRTIO_PMEM=m
CONFIG_VIRTIO_BALLOON=y
CONFIG_VIRTIO_INPUT=y
CONFIG_VIRTIO_MMIO=y
CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES=y
CONFIG_VHOST_NET=m
CONFIG_VHOST_SCSI=m
# CONFIG_X86_PLATFORM_DEVICES is not set
# CONFIG_SURFACE_PLATFORMS is not set
CONFIG_AMD_IOMMU=y
CONFIG_AMD_IOMMU_V2=m
CONFIG_INTEL_IOMMU=y
CONFIG_INTEL_IOMMU_SVM=y
# CONFIG_INTEL_IOMMU_DEFAULT_ON is not set
CONFIG_INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON=y
CONFIG_VIRTIO_IOMMU=y
# CONFIG_BLK_DEV_PMEM is not set
# CONFIG_ND_BLK is not set
# CONFIG_BTT is not set
CONFIG_DAX=y
CONFIG_VALIDATE_FS_PARSER=y
CONFIG_EXT2_FS=m
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT2_FS_SECURITY=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_FS_POSIX_ACL=y
CONFIG_EXT4_FS_SECURITY=y
CONFIG_XFS_FS=m
CONFIG_XFS_QUOTA=y
CONFIG_XFS_POSIX_ACL=y
CONFIG_XFS_ONLINE_SCRUB=y
CONFIG_BTRFS_FS=m
CONFIG_BTRFS_FS_POSIX_ACL=y
CONFIG_BTRFS_DEBUG=y
CONFIG_BTRFS_ASSERT=y
CONFIG_F2FS_FS=m
CONFIG_F2FS_FS_SECURITY=y
CONFIG_ZONEFS_FS=m
# CONFIG_MANDATORY_FILE_LOCKING is not set
CONFIG_FANOTIFY=y
CONFIG_AUTOFS4_FS=y
CONFIG_FUSE_FS=m
CONFIG_CUSE=m
CONFIG_FSCACHE=m
CONFIG_FSCACHE_STATS=y
CONFIG_ISO9660_FS=m
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_IOCHARSET="ascii"
CONFIG_EXFAT_FS=m
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE_DEVICE_DUMP=y
CONFIG_PROC_CHILDREN=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_HUGETLBFS=y
CONFIG_CONFIGFS_FS=y
CONFIG_EFIVAR_FS=y
# CONFIG_PSTORE_DEFLATE_COMPRESS is not set
CONFIG_NFS_FS=m
# CONFIG_NFS_V2 is not set
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=m
CONFIG_NFS_SWAP=y
CONFIG_NFS_V4_1=y
CONFIG_NFS_V4_2=y
CONFIG_NFS_FSCACHE=y
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_CODEPAGE_737=y
CONFIG_NLS_CODEPAGE_775=y
CONFIG_NLS_CODEPAGE_850=y
CONFIG_NLS_CODEPAGE_852=y
CONFIG_NLS_CODEPAGE_855=y
CONFIG_NLS_CODEPAGE_857=y
CONFIG_NLS_CODEPAGE_860=y
CONFIG_NLS_CODEPAGE_861=y
CONFIG_NLS_CODEPAGE_862=y
CONFIG_NLS_CODEPAGE_863=y
CONFIG_NLS_CODEPAGE_864=y
CONFIG_NLS_CODEPAGE_865=y
CONFIG_NLS_CODEPAGE_866=y
CONFIG_NLS_CODEPAGE_869=y
CONFIG_NLS_CODEPAGE_936=y
CONFIG_NLS_CODEPAGE_950=y
CONFIG_NLS_CODEPAGE_932=y
CONFIG_NLS_CODEPAGE_949=y
CONFIG_NLS_CODEPAGE_874=y
CONFIG_NLS_ISO8859_8=y
CONFIG_NLS_CODEPAGE_1250=y
CONFIG_NLS_CODEPAGE_1251=y
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_ISO8859_2=y
CONFIG_NLS_ISO8859_3=y
CONFIG_NLS_ISO8859_4=y
CONFIG_NLS_ISO8859_5=y
CONFIG_NLS_ISO8859_6=y
CONFIG_NLS_ISO8859_7=y
CONFIG_NLS_ISO8859_9=y
CONFIG_NLS_ISO8859_13=y
CONFIG_NLS_ISO8859_14=y
CONFIG_NLS_ISO8859_15=y
CONFIG_NLS_KOI8_R=y
CONFIG_NLS_KOI8_U=y
CONFIG_NLS_MAC_ROMAN=y
CONFIG_NLS_MAC_CELTIC=y
CONFIG_NLS_MAC_CENTEURO=y
CONFIG_NLS_MAC_CROATIAN=y
CONFIG_NLS_MAC_CYRILLIC=y
CONFIG_NLS_MAC_GAELIC=y
CONFIG_NLS_MAC_GREEK=y
CONFIG_NLS_MAC_ICELAND=y
CONFIG_NLS_MAC_INUIT=y
CONFIG_NLS_MAC_ROMANIAN=y
CONFIG_NLS_MAC_TURKISH=y
CONFIG_NLS_UTF8=y
CONFIG_KEYS_REQUEST_CACHE=y
CONFIG_PERSISTENT_KEYRINGS=y
CONFIG_ENCRYPTED_KEYS=y
CONFIG_KEY_DH_OPERATIONS=y
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
CONFIG_SECURITY_NETWORK_XFRM=y
CONFIG_HARDENED_USERCOPY=y
CONFIG_FORTIFY_SOURCE=y
# CONFIG_INTEGRITY is not set
CONFIG_LSM="yama,loadpin,safesetid,integrity,selinux,smack,tomoyo,apparmor"
CONFIG_CRYPTO_USER=y
# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
CONFIG_CRYPTO_PCRYPT=m
CONFIG_CRYPTO_ECRDSA=m
CONFIG_CRYPTO_CURVE25519=m
CONFIG_CRYPTO_CURVE25519_X86=m
CONFIG_CRYPTO_CCM=m
CONFIG_CRYPTO_GCM=y
CONFIG_CRYPTO_CHACHA20POLY1305=m
CONFIG_CRYPTO_AEGIS128=m
CONFIG_CRYPTO_AEGIS128_AESNI_SSE2=m
CONFIG_CRYPTO_SEQIV=y
CONFIG_CRYPTO_CFB=m
CONFIG_CRYPTO_CTS=y
CONFIG_CRYPTO_LRW=y
CONFIG_CRYPTO_OFB=m
CONFIG_CRYPTO_PCBC=m
CONFIG_CRYPTO_XTS=y
CONFIG_CRYPTO_KEYWRAP=m
CONFIG_CRYPTO_NHPOLY1305_SSE2=m
CONFIG_CRYPTO_NHPOLY1305_AVX2=m
CONFIG_CRYPTO_ADIANTUM=m
CONFIG_CRYPTO_CMAC=m
CONFIG_CRYPTO_XCBC=m
CONFIG_CRYPTO_VMAC=m
CONFIG_CRYPTO_CRC32C_INTEL=y
CONFIG_CRYPTO_CRC32_PCLMUL=m
CONFIG_CRYPTO_BLAKE2S=m
CONFIG_CRYPTO_BLAKE2S_X86=m
CONFIG_CRYPTO_CRCT10DIF_PCLMUL=m
CONFIG_CRYPTO_POLY1305_X86_64=m
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MICHAEL_MIC=m
CONFIG_CRYPTO_RMD160=m
CONFIG_CRYPTO_SHA1_SSSE3=m
CONFIG_CRYPTO_SHA256_SSSE3=m
CONFIG_CRYPTO_SHA512_SSSE3=m
CONFIG_CRYPTO_SHA3=m
CONFIG_CRYPTO_SM3=m
CONFIG_CRYPTO_WP512=m
CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL=m
CONFIG_CRYPTO_AES_TI=m
CONFIG_CRYPTO_AES_NI_INTEL=y
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
CONFIG_CRYPTO_BLOWFISH_X86_64=m
CONFIG_CRYPTO_CAMELLIA=m
CONFIG_CRYPTO_CAMELLIA_AESNI_AVX2_X86_64=m
CONFIG_CRYPTO_CAST5_AVX_X86_64=m
CONFIG_CRYPTO_CAST6_AVX_X86_64=m
CONFIG_CRYPTO_DES=m
CONFIG_CRYPTO_DES3_EDE_X86_64=m
CONFIG_CRYPTO_FCRYPT=m
CONFIG_CRYPTO_KHAZAD=m
CONFIG_CRYPTO_CHACHA20_X86_64=m
CONFIG_CRYPTO_SEED=m
CONFIG_CRYPTO_SERPENT_SSE2_X86_64=m
CONFIG_CRYPTO_SERPENT_AVX2_X86_64=m
CONFIG_CRYPTO_SM4=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_AVX_X86_64=m
CONFIG_CRYPTO_DEFLATE=y
CONFIG_CRYPTO_LZO=y
CONFIG_CRYPTO_842=y
CONFIG_CRYPTO_LZ4=m
CONFIG_CRYPTO_LZ4HC=m
CONFIG_CRYPTO_ZSTD=m
CONFIG_CRYPTO_ANSI_CPRNG=m
CONFIG_CRYPTO_DRBG_HASH=y
CONFIG_CRYPTO_DRBG_CTR=y
CONFIG_CRYPTO_USER_API_HASH=y
CONFIG_CRYPTO_USER_API_SKCIPHER=y
CONFIG_CRYPTO_USER_API_RNG=y
CONFIG_CRYPTO_USER_API_AEAD=y
CONFIG_CRYPTO_LIB_BLAKE2S=m
CONFIG_CRYPTO_LIB_CURVE25519=m
CONFIG_CRYPTO_LIB_CHACHA20POLY1305=m
CONFIG_CRYPTO_DEV_PADLOCK=m
CONFIG_CRYPTO_DEV_PADLOCK_AES=m
CONFIG_CRYPTO_DEV_PADLOCK_SHA=m
CONFIG_CRYPTO_DEV_ATMEL_ECC=m
CONFIG_CRYPTO_DEV_ATMEL_SHA204A=m
CONFIG_CRYPTO_DEV_CCP=y
CONFIG_CRYPTO_DEV_QAT_DH895xCC=y
CONFIG_CRYPTO_DEV_QAT_C3XXX=y
CONFIG_CRYPTO_DEV_QAT_C62X=y
CONFIG_CRYPTO_DEV_QAT_DH895xCCVF=y
CONFIG_CRYPTO_DEV_QAT_C3XXXVF=y
CONFIG_CRYPTO_DEV_QAT_C62XVF=y
CONFIG_CRYPTO_DEV_VIRTIO=m
CONFIG_PKCS8_PRIVATE_KEY_PARSER=m
CONFIG_SIGNED_PE_FILE_VERIFICATION=y
CONFIG_SYSTEM_EXTRA_CERTIFICATE=y
CONFIG_SECONDARY_TRUSTED_KEYRING=y
CONFIG_SYSTEM_BLACKLIST_KEYRING=y
CONFIG_PACKING=y
CONFIG_CORDIC=m
CONFIG_CRC_CCITT=y
CONFIG_CRC64=m
CONFIG_CRC4=m
CONFIG_CRC7=m
CONFIG_CRC8=m
CONFIG_PRINTK_TIME=y
CONFIG_CONSOLE_LOGLEVEL_QUIET=3
CONFIG_DYNAMIC_DEBUG=y
CONFIG_DEBUG_INFO=y
CONFIG_STRIP_ASM_SYMS=y
CONFIG_HEADERS_INSTALL=y
CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x0
CONFIG_KGDB=y
CONFIG_KGDB_TESTS=y
CONFIG_KGDB_LOW_LEVEL_TRAP=y
CONFIG_HARDLOCKUP_DETECTOR=y
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC=y
# CONFIG_DETECT_HUNG_TASK is not set
CONFIG_SCHEDSTATS=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP_CHAINS_BITS=20
CONFIG_DEBUG_ATOMIC_SLEEP=y
CONFIG_BUG_ON_DATA_CORRUPTION=y
CONFIG_RCU_CPU_STALL_TIMEOUT=60
# CONFIG_RCU_TRACE is not set
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_PROFILER=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_TRACER_SNAPSHOT=y
CONFIG_BLK_DEV_IO_TRACE=y
# CONFIG_STRICT_DEVMEM is not set
# CONFIG_X86_VERBOSE_BOOTUP is not set
CONFIG_DEBUG_BOOT_PARAMS=y
# CONFIG_X86_DEBUG_FPU is not set
CONFIG_FAULT_INJECTION=y
CONFIG_FAILSLAB=y
CONFIG_FAIL_PAGE_ALLOC=y
CONFIG_FAULT_INJECTION_USERCOPY=y
CONFIG_FAIL_MAKE_REQUEST=y
CONFIG_FAIL_IO_TIMEOUT=y
CONFIG_FAULT_INJECTION_DEBUG_FS=y
CONFIG_FAIL_FUNCTION=y
# CONFIG_RUNTIME_TESTING_MENU is not set

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-29 23:02                       ` Damien Le Moal
@ 2021-08-30  2:31                         ` Keith Busch
  2021-08-30  3:03                           ` Damien Le Moal
  2021-08-30  2:40                         ` Bart Van Assche
  1 sibling, 1 reply; 30+ messages in thread
From: Keith Busch @ 2021-08-30  2:31 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: Bart Van Assche, Jens Axboe, Zhen Lei, linux-block

On Sun, Aug 29, 2021 at 11:02:22PM +0000, Damien Le Moal wrote:
> On 2021/08/27 23:34, Bart Van Assche wrote:
> > On 8/26/21 9:49 PM, Damien Le Moal wrote:
> >> So the mq-deadline priority patch reduces performance by nearly half at high QD.
> >> With the modified patch, we are back to better numbers, but still a significant
> >> 20% drop at high QD.
> > 
> > Hi Damien,
> > 
> > An implementation of I/O priority for the deadline scheduler that reduces the
> > IOPS drop to 1% on my test setup is available here:
> > https://github.com/bvanassche/linux/tree/block-for-next
> > 
> >> (*) Note: in all cases using the mq-deadline scheduler, for the first run at
> >> QD=1, I get this splat 100% of the time.
> >>
> >> [   95.173889] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/0:1H:757]
> >> [   95.292994] CPU: 0 PID: 757 Comm: kworker/0:1H Not tainted 5.14.0-rc7+ #1334
> >> [   95.307504] Workqueue: kblockd blk_mq_run_work_fn
> >> [   95.312243] RIP: 0010:_raw_spin_unlock_irqrestore+0x35/0x40
> >> [   95.415904] Call Trace:
> >> [   95.418373]  try_to_wake_up+0x268/0x7c0
> >> [   95.422238]  blk_update_request+0x25b/0x420
> >> [   95.426452]  blk_mq_end_request+0x1c/0x120
> >> [   95.430576]  null_handle_cmd+0x12d/0x270 [null_blk]
> >> [   95.435485]  blk_mq_dispatch_rq_list+0x13c/0x7f0
> >> [   95.443826]  __blk_mq_do_dispatch_sched+0xb5/0x2f0
> >> [   95.448653]  __blk_mq_sched_dispatch_requests+0xf4/0x140
> >> [   95.453998]  blk_mq_sched_dispatch_requests+0x30/0x60
> >> [   95.459083]  __blk_mq_run_hw_queue+0x49/0x90
> >> [   95.463377]  process_one_work+0x26c/0x570
> >> [   95.467421]  worker_thread+0x55/0x3c0
> >> [   95.475313]  kthread+0x140/0x160
> >> [   95.482774]  ret_from_fork+0x1f/0x30
> > 
> > I don't see any function names in the above call stack that refer to the
> > mq-deadline scheduler? Did I perhaps overlook something? Anyway, if you can
> > tell me how to reproduce this (kernel commit + kernel config) I will take a
> > look.
> 
> Indeed, the stack trace does not show any mq-deadline function. But the
> workqueue is stuck on _raw_spin_unlock_irqrestore() in the blk_mq_run_work_fn()
> function. I suspect that the spinlock is dd->lock, so the CPU may be stuck on
> entry to mq-deadline dispatch or finish request methods. Not entirely sure.

I don't think you can be stuck on the *unlock* part, though. In my
experience, that function showing up in a soft lockup indicates you're
in a broken loop that's repeatedly locking and unlocking. I haven't
found anything immediately obvious in this call chain, though.

> I got this splat with 5.4.0-rc7 (Linus tag patch) with the attached config.

Surely 5.14.0-rc7, right?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-29 23:02                       ` Damien Le Moal
  2021-08-30  2:31                         ` Keith Busch
@ 2021-08-30  2:40                         ` Bart Van Assche
  2021-08-30  3:07                           ` Damien Le Moal
  1 sibling, 1 reply; 30+ messages in thread
From: Bart Van Assche @ 2021-08-30  2:40 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe, Zhen Lei, linux-block

On 8/29/21 16:02, Damien Le Moal wrote:
> On 2021/08/27 23:34, Bart Van Assche wrote:
>> On 8/26/21 9:49 PM, Damien Le Moal wrote:
>>> So the mq-deadline priority patch reduces performance by nearly half at high QD.
>>> (*) Note: in all cases using the mq-deadline scheduler, for the first run at
>>> QD=1, I get this splat 100% of the time.
>>>
>>> [   95.173889] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/0:1H:757]
>>> [   95.292994] CPU: 0 PID: 757 Comm: kworker/0:1H Not tainted 5.14.0-rc7+ #1334
>>> [   95.307504] Workqueue: kblockd blk_mq_run_work_fn
>>> [   95.312243] RIP: 0010:_raw_spin_unlock_irqrestore+0x35/0x40
>>> [   95.415904] Call Trace:
>>> [   95.418373]  try_to_wake_up+0x268/0x7c0
>>> [   95.422238]  blk_update_request+0x25b/0x420
>>> [   95.426452]  blk_mq_end_request+0x1c/0x120
>>> [   95.430576]  null_handle_cmd+0x12d/0x270 [null_blk]
>>> [   95.435485]  blk_mq_dispatch_rq_list+0x13c/0x7f0
>>> [   95.443826]  __blk_mq_do_dispatch_sched+0xb5/0x2f0
>>> [   95.448653]  __blk_mq_sched_dispatch_requests+0xf4/0x140
>>> [   95.453998]  blk_mq_sched_dispatch_requests+0x30/0x60
>>> [   95.459083]  __blk_mq_run_hw_queue+0x49/0x90
>>> [   95.463377]  process_one_work+0x26c/0x570
>>> [   95.467421]  worker_thread+0x55/0x3c0
>>> [   95.475313]  kthread+0x140/0x160
>>> [   95.482774]  ret_from_fork+0x1f/0x30
>>
>> I don't see any function names in the above call stack that refer to the
>> mq-deadline scheduler? Did I perhaps overlook something? Anyway, if you can
>> tell me how to reproduce this (kernel commit + kernel config) I will take a
>> look.
> 
> Indeed, the stack trace does not show any mq-deadline function. But the
> workqueue is stuck on _raw_spin_unlock_irqrestore() in the blk_mq_run_work_fn()
> function. I suspect that the spinlock is dd->lock, so the CPU may be stuck on
> entry to mq-deadline dispatch or finish request methods. Not entirely sure.
> 
> I got this splat with 5.4.0-rc7 (Linus tag patch) with the attached config.

Hi Damien,

Thank you for having shared the kernel configuration used in your test. 
So far I have not yet been able to reproduce the above call trace in a 
VM. Could the above call trace have been triggered by the mpt3sas driver 
instead of the mq-deadline I/O scheduler?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-30  2:31                         ` Keith Busch
@ 2021-08-30  3:03                           ` Damien Le Moal
  0 siblings, 0 replies; 30+ messages in thread
From: Damien Le Moal @ 2021-08-30  3:03 UTC (permalink / raw)
  To: Keith Busch; +Cc: Bart Van Assche, Jens Axboe, Zhen Lei, linux-block

On 2021/08/30 11:32, Keith Busch wrote:
> On Sun, Aug 29, 2021 at 11:02:22PM +0000, Damien Le Moal wrote:
>> On 2021/08/27 23:34, Bart Van Assche wrote:
>>> On 8/26/21 9:49 PM, Damien Le Moal wrote:
>>>> So the mq-deadline priority patch reduces performance by nearly half at high QD.
>>>> With the modified patch, we are back to better numbers, but still a significant
>>>> 20% drop at high QD.
>>>
>>> Hi Damien,
>>>
>>> An implementation of I/O priority for the deadline scheduler that reduces the
>>> IOPS drop to 1% on my test setup is available here:
>>> https://github.com/bvanassche/linux/tree/block-for-next
>>>
>>>> (*) Note: in all cases using the mq-deadline scheduler, for the first run at
>>>> QD=1, I get this splat 100% of the time.
>>>>
>>>> [   95.173889] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/0:1H:757]
>>>> [   95.292994] CPU: 0 PID: 757 Comm: kworker/0:1H Not tainted 5.14.0-rc7+ #1334
>>>> [   95.307504] Workqueue: kblockd blk_mq_run_work_fn
>>>> [   95.312243] RIP: 0010:_raw_spin_unlock_irqrestore+0x35/0x40
>>>> [   95.415904] Call Trace:
>>>> [   95.418373]  try_to_wake_up+0x268/0x7c0
>>>> [   95.422238]  blk_update_request+0x25b/0x420
>>>> [   95.426452]  blk_mq_end_request+0x1c/0x120
>>>> [   95.430576]  null_handle_cmd+0x12d/0x270 [null_blk]
>>>> [   95.435485]  blk_mq_dispatch_rq_list+0x13c/0x7f0
>>>> [   95.443826]  __blk_mq_do_dispatch_sched+0xb5/0x2f0
>>>> [   95.448653]  __blk_mq_sched_dispatch_requests+0xf4/0x140
>>>> [   95.453998]  blk_mq_sched_dispatch_requests+0x30/0x60
>>>> [   95.459083]  __blk_mq_run_hw_queue+0x49/0x90
>>>> [   95.463377]  process_one_work+0x26c/0x570
>>>> [   95.467421]  worker_thread+0x55/0x3c0
>>>> [   95.475313]  kthread+0x140/0x160
>>>> [   95.482774]  ret_from_fork+0x1f/0x30
>>>
>>> I don't see any function names in the above call stack that refer to the
>>> mq-deadline scheduler? Did I perhaps overlook something? Anyway, if you can
>>> tell me how to reproduce this (kernel commit + kernel config) I will take a
>>> look.
>>
>> Indeed, the stack trace does not show any mq-deadline function. But the
>> workqueue is stuck on _raw_spin_unlock_irqrestore() in the blk_mq_run_work_fn()
>> function. I suspect that the spinlock is dd->lock, so the CPU may be stuck on
>> entry to mq-deadline dispatch or finish request methods. Not entirely sure.
> 
> I don't think you can be stuck on the *unlock* part, though. In my
> experience, that function showing up in a soft lockup indicates you're
> in a broken loop that's repeatedly locking and unlocking. I haven't
> found anything immediately obvious in this call chain, though.

Arg. I misread the stack trace. It is an unlock, not a lock...

> 
>> I got this splat with 5.4.0-rc7 (Linus tag patch) with the attached config.
> 
> Surely 5.14.0-rc7, right?

Oops. Yes.

> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-30  2:40                         ` Bart Van Assche
@ 2021-08-30  3:07                           ` Damien Le Moal
  2021-08-30 17:14                             ` Bart Van Assche
  0 siblings, 1 reply; 30+ messages in thread
From: Damien Le Moal @ 2021-08-30  3:07 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe, Zhen Lei, linux-block

On 2021/08/30 11:40, Bart Van Assche wrote:
> On 8/29/21 16:02, Damien Le Moal wrote:
>> On 2021/08/27 23:34, Bart Van Assche wrote:
>>> On 8/26/21 9:49 PM, Damien Le Moal wrote:
>>>> So the mq-deadline priority patch reduces performance by nearly half at high QD.
>>>> (*) Note: in all cases using the mq-deadline scheduler, for the first run at
>>>> QD=1, I get this splat 100% of the time.
>>>>
>>>> [   95.173889] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/0:1H:757]
>>>> [   95.292994] CPU: 0 PID: 757 Comm: kworker/0:1H Not tainted 5.14.0-rc7+ #1334
>>>> [   95.307504] Workqueue: kblockd blk_mq_run_work_fn
>>>> [   95.312243] RIP: 0010:_raw_spin_unlock_irqrestore+0x35/0x40
>>>> [   95.415904] Call Trace:
>>>> [   95.418373]  try_to_wake_up+0x268/0x7c0
>>>> [   95.422238]  blk_update_request+0x25b/0x420
>>>> [   95.426452]  blk_mq_end_request+0x1c/0x120
>>>> [   95.430576]  null_handle_cmd+0x12d/0x270 [null_blk]
>>>> [   95.435485]  blk_mq_dispatch_rq_list+0x13c/0x7f0
>>>> [   95.443826]  __blk_mq_do_dispatch_sched+0xb5/0x2f0
>>>> [   95.448653]  __blk_mq_sched_dispatch_requests+0xf4/0x140
>>>> [   95.453998]  blk_mq_sched_dispatch_requests+0x30/0x60
>>>> [   95.459083]  __blk_mq_run_hw_queue+0x49/0x90
>>>> [   95.463377]  process_one_work+0x26c/0x570
>>>> [   95.467421]  worker_thread+0x55/0x3c0
>>>> [   95.475313]  kthread+0x140/0x160
>>>> [   95.482774]  ret_from_fork+0x1f/0x30
>>>
>>> I don't see any function names in the above call stack that refer to the
>>> mq-deadline scheduler? Did I perhaps overlook something? Anyway, if you can
>>> tell me how to reproduce this (kernel commit + kernel config) I will take a
>>> look.
>>
>> Indeed, the stack trace does not show any mq-deadline function. But the
>> workqueue is stuck on _raw_spin_unlock_irqrestore() in the blk_mq_run_work_fn()
>> function. I suspect that the spinlock is dd->lock, so the CPU may be stuck on
>> entry to mq-deadline dispatch or finish request methods. Not entirely sure.
>>
>> I got this splat with 5.4.0-rc7 (Linus tag patch) with the attached config.
> 
> Hi Damien,
> 
> Thank you for having shared the kernel configuration used in your test. 
> So far I have not yet been able to reproduce the above call trace in a 
> VM. Could the above call trace have been triggered by the mpt3sas driver 
> instead of the mq-deadline I/O scheduler?

The above was triggered using nullblk with the test script you sent. I was not
using drives on the HBA or AHCI when it happens. And I can reproduce this 100%
of the time by running your script with QD=1.


> 
> Thanks,
> 
> Bart.
> 
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-30  3:07                           ` Damien Le Moal
@ 2021-08-30 17:14                             ` Bart Van Assche
  2021-08-30 21:42                               ` Damien Le Moal
  0 siblings, 1 reply; 30+ messages in thread
From: Bart Van Assche @ 2021-08-30 17:14 UTC (permalink / raw)
  To: Damien Le Moal, Jens Axboe, Zhen Lei, linux-block

On 8/29/21 8:07 PM, Damien Le Moal wrote:
> On 2021/08/30 11:40, Bart Van Assche wrote:
>> On 8/29/21 16:02, Damien Le Moal wrote:
>>> On 2021/08/27 23:34, Bart Van Assche wrote:
>>>> On 8/26/21 9:49 PM, Damien Le Moal wrote:
>>>>> So the mq-deadline priority patch reduces performance by nearly half at high QD.
>>>>> (*) Note: in all cases using the mq-deadline scheduler, for the first run at
>>>>> QD=1, I get this splat 100% of the time.
>>>>>
>>>>> [   95.173889] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/0:1H:757]
>>>>> [   95.292994] CPU: 0 PID: 757 Comm: kworker/0:1H Not tainted 5.14.0-rc7+ #1334
>>>>> [   95.307504] Workqueue: kblockd blk_mq_run_work_fn
>>>>> [   95.312243] RIP: 0010:_raw_spin_unlock_irqrestore+0x35/0x40
>>>>> [   95.415904] Call Trace:
>>>>> [   95.418373]  try_to_wake_up+0x268/0x7c0
>>>>> [   95.422238]  blk_update_request+0x25b/0x420
>>>>> [   95.426452]  blk_mq_end_request+0x1c/0x120
>>>>> [   95.430576]  null_handle_cmd+0x12d/0x270 [null_blk]
>>>>> [   95.435485]  blk_mq_dispatch_rq_list+0x13c/0x7f0
>>>>> [   95.443826]  __blk_mq_do_dispatch_sched+0xb5/0x2f0
>>>>> [   95.448653]  __blk_mq_sched_dispatch_requests+0xf4/0x140
>>>>> [   95.453998]  blk_mq_sched_dispatch_requests+0x30/0x60
>>>>> [   95.459083]  __blk_mq_run_hw_queue+0x49/0x90
>>>>> [   95.463377]  process_one_work+0x26c/0x570
>>>>> [   95.467421]  worker_thread+0x55/0x3c0
>>>>> [   95.475313]  kthread+0x140/0x160
>>>>> [   95.482774]  ret_from_fork+0x1f/0x30
>>>>
>>>> I don't see any function names in the above call stack that refer to the
>>>> mq-deadline scheduler? Did I perhaps overlook something? Anyway, if you can
>>>> tell me how to reproduce this (kernel commit + kernel config) I will take a
>>>> look.
>>>
>>> Indeed, the stack trace does not show any mq-deadline function. But the
>>> workqueue is stuck on _raw_spin_unlock_irqrestore() in the blk_mq_run_work_fn()
>>> function. I suspect that the spinlock is dd->lock, so the CPU may be stuck on
>>> entry to mq-deadline dispatch or finish request methods. Not entirely sure.
>>>
>>> I got this splat with 5.4.0-rc7 (Linus tag patch) with the attached config.
>>
>> Hi Damien,
>>
>> Thank you for having shared the kernel configuration used in your test.
>> So far I have not yet been able to reproduce the above call trace in a
>> VM. Could the above call trace have been triggered by the mpt3sas driver
>> instead of the mq-deadline I/O scheduler?
> 
> The above was triggered using nullblk with the test script you sent. I was not
> using drives on the HBA or AHCI when it happens. And I can reproduce this 100%
> of the time by running your script with QD=1.

Hi Damien,

I rebuilt kernel v5.14-rc7 (tag v5.14-rc7) after having run git clean -f -d -x
and reran my nullb iops test with the mq-deadline scheduler. No kernel complaints
appeared in the kernel log. Next I enabled lockdep (CONFIG_PROVE_LOCKING=y) and
reran the nullb iops test with mq-deadline as scheduler. Again zero complaints
appeared in the kernel log. Next I ran a subset of the blktests test
(./check -q block). All tests passed and no complaints appeared in the kernel log.

Please help with root-causing this issue by rerunning the test on your setup after
having enabled lockdep.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests
  2021-08-30 17:14                             ` Bart Van Assche
@ 2021-08-30 21:42                               ` Damien Le Moal
  0 siblings, 0 replies; 30+ messages in thread
From: Damien Le Moal @ 2021-08-30 21:42 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe, Zhen Lei, linux-block

On 2021/08/31 2:14, Bart Van Assche wrote:
> On 8/29/21 8:07 PM, Damien Le Moal wrote:
>> On 2021/08/30 11:40, Bart Van Assche wrote:
>>> On 8/29/21 16:02, Damien Le Moal wrote:
>>>> On 2021/08/27 23:34, Bart Van Assche wrote:
>>>>> On 8/26/21 9:49 PM, Damien Le Moal wrote:
>>>>>> So the mq-deadline priority patch reduces performance by nearly half at high QD.
>>>>>> (*) Note: in all cases using the mq-deadline scheduler, for the first run at
>>>>>> QD=1, I get this splat 100% of the time.
>>>>>>
>>>>>> [   95.173889] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [kworker/0:1H:757]
>>>>>> [   95.292994] CPU: 0 PID: 757 Comm: kworker/0:1H Not tainted 5.14.0-rc7+ #1334
>>>>>> [   95.307504] Workqueue: kblockd blk_mq_run_work_fn
>>>>>> [   95.312243] RIP: 0010:_raw_spin_unlock_irqrestore+0x35/0x40
>>>>>> [   95.415904] Call Trace:
>>>>>> [   95.418373]  try_to_wake_up+0x268/0x7c0
>>>>>> [   95.422238]  blk_update_request+0x25b/0x420
>>>>>> [   95.426452]  blk_mq_end_request+0x1c/0x120
>>>>>> [   95.430576]  null_handle_cmd+0x12d/0x270 [null_blk]
>>>>>> [   95.435485]  blk_mq_dispatch_rq_list+0x13c/0x7f0
>>>>>> [   95.443826]  __blk_mq_do_dispatch_sched+0xb5/0x2f0
>>>>>> [   95.448653]  __blk_mq_sched_dispatch_requests+0xf4/0x140
>>>>>> [   95.453998]  blk_mq_sched_dispatch_requests+0x30/0x60
>>>>>> [   95.459083]  __blk_mq_run_hw_queue+0x49/0x90
>>>>>> [   95.463377]  process_one_work+0x26c/0x570
>>>>>> [   95.467421]  worker_thread+0x55/0x3c0
>>>>>> [   95.475313]  kthread+0x140/0x160
>>>>>> [   95.482774]  ret_from_fork+0x1f/0x30
>>>>>
>>>>> I don't see any function names in the above call stack that refer to the
>>>>> mq-deadline scheduler? Did I perhaps overlook something? Anyway, if you can
>>>>> tell me how to reproduce this (kernel commit + kernel config) I will take a
>>>>> look.
>>>>
>>>> Indeed, the stack trace does not show any mq-deadline function. But the
>>>> workqueue is stuck on _raw_spin_unlock_irqrestore() in the blk_mq_run_work_fn()
>>>> function. I suspect that the spinlock is dd->lock, so the CPU may be stuck on
>>>> entry to mq-deadline dispatch or finish request methods. Not entirely sure.
>>>>
>>>> I got this splat with 5.4.0-rc7 (Linus tag patch) with the attached config.
>>>
>>> Hi Damien,
>>>
>>> Thank you for having shared the kernel configuration used in your test.
>>> So far I have not yet been able to reproduce the above call trace in a
>>> VM. Could the above call trace have been triggered by the mpt3sas driver
>>> instead of the mq-deadline I/O scheduler?
>>
>> The above was triggered using nullblk with the test script you sent. I was not
>> using drives on the HBA or AHCI when it happens. And I can reproduce this 100%
>> of the time by running your script with QD=1.
> 
> Hi Damien,
> 
> I rebuilt kernel v5.14-rc7 (tag v5.14-rc7) after having run git clean -f -d -x
> and reran my nullb iops test with the mq-deadline scheduler. No kernel complaints
> appeared in the kernel log. Next I enabled lockdep (CONFIG_PROVE_LOCKING=y) and
> reran the nullb iops test with mq-deadline as scheduler. Again zero complaints
> appeared in the kernel log. Next I ran a subset of the blktests test
> (./check -q block). All tests passed and no complaints appeared in the kernel log.
> 
> Please help with root-causing this issue by rerunning the test on your setup after
> having enabled lockdep.

OK. Will have a look again.

> 
> Thanks,
> 
> Bart.
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-08-30 21:42 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-26 14:40 [PATCH] block/mq-deadline: Speed up the dispatch of low-priority requests Zhen Lei
2021-08-26 18:09 ` Bart Van Assche
2021-08-26 18:13   ` Jens Axboe
2021-08-26 18:45     ` Jens Axboe
2021-08-26 19:17       ` Bart Van Assche
2021-08-26 19:32         ` Jens Axboe
2021-08-26 23:49       ` Bart Van Assche
2021-08-26 23:51         ` Jens Axboe
2021-08-27  0:03           ` Bart Van Assche
2021-08-27  0:05             ` Jens Axboe
2021-08-27  0:58               ` Bart Van Assche
2021-08-27  2:48               ` Bart Van Assche
2021-08-27  3:13                 ` Jens Axboe
2021-08-27  4:49                   ` Damien Le Moal
2021-08-27 14:34                     ` Bart Van Assche
2021-08-29 23:02                       ` Damien Le Moal
2021-08-30  2:31                         ` Keith Busch
2021-08-30  3:03                           ` Damien Le Moal
2021-08-30  2:40                         ` Bart Van Assche
2021-08-30  3:07                           ` Damien Le Moal
2021-08-30 17:14                             ` Bart Van Assche
2021-08-30 21:42                               ` Damien Le Moal
2021-08-28  1:45                   ` Leizhen (ThunderTown)
2021-08-28  2:19                     ` Bart Van Assche
2021-08-28  2:42                       ` Leizhen (ThunderTown)
2021-08-28 13:14                         ` Leizhen (ThunderTown)
2021-08-28  1:59   ` Leizhen (ThunderTown)
2021-08-28  2:41     ` Bart Van Assche
2021-08-27  2:30 ` Damien Le Moal
2021-08-28  2:14   ` Leizhen (ThunderTown)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.