* [PATCH blktests 0/2] Add scsi-stress-remove to blktests
@ 2018-12-12 23:09 Dennis Zhou
2018-12-12 23:09 ` [PATCH blktests 1/2] blktests: split out cgroup2 controller and file check Dennis Zhou
2018-12-12 23:09 ` [PATCH blktests 2/2] blktests: add Ming Lei's scsi-stress-remove Dennis Zhou
0 siblings, 2 replies; 10+ messages in thread
From: Dennis Zhou @ 2018-12-12 23:09 UTC (permalink / raw)
To: Omar Sandoval, Ming Lei; +Cc: kernel-team, linux-block, Dennis Zhou
Hi,
Ming Lei's scsi-stress-remove test found a bug in blkg destruction [1]
where bios being created when the request_queue was being cleaned up
threw a NPE in blkg association. The fix is currently being discussed in
[2]. To make this test more accessible, I've ported it to blktests with
Ming Lei's copyright. I've tested this in my qemu instance and verified
we do not see the NPE on a fixed kernel.
Ming, please let me know if you have any objections.
[1] https://lore.kernel.org/linux-block/CACVXFVO_QXipD3cmPvpLyBYSiEcWPN_ThQ=0pO9AwLqN-Lv93w@mail.gmail.com
[2] https://lore.kernel.org/lkml/20181211230308.66276-1-dennis@kernel.org/
This patchset is ontop of osandov#josef ad08c1fe0d9f.
diffstats below:
Dennis Zhou (2):
blktests: split out cgroup2 controller and file check
blktests: add Ming Lei's scsi-stress-remove
common/cgroup | 18 ++++++---
tests/block/022 | 96 +++++++++++++++++++++++++++++++++++++++++++++
tests/block/022.out | 2 +
3 files changed, 111 insertions(+), 5 deletions(-)
create mode 100755 tests/block/022
create mode 100644 tests/block/022.out
Thanks,
Dennis
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH blktests 1/2] blktests: split out cgroup2 controller and file check
2018-12-12 23:09 [PATCH blktests 0/2] Add scsi-stress-remove to blktests Dennis Zhou
@ 2018-12-12 23:09 ` Dennis Zhou
2018-12-19 18:34 ` Omar Sandoval
2018-12-12 23:09 ` [PATCH blktests 2/2] blktests: add Ming Lei's scsi-stress-remove Dennis Zhou
1 sibling, 1 reply; 10+ messages in thread
From: Dennis Zhou @ 2018-12-12 23:09 UTC (permalink / raw)
To: Omar Sandoval, Ming Lei; +Cc: kernel-team, linux-block, Dennis Zhou
This is a prep patch for a new test that will race blkg association and
request_queue cleanup. As blkg association is a underlying cgroup io
controller feature, we need the ability to check if the controller is
available.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
common/cgroup | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/common/cgroup b/common/cgroup
index d445093..3481458 100644
--- a/common/cgroup
+++ b/common/cgroup
@@ -37,19 +37,27 @@ _have_cgroup2()
return 0
}
-_have_cgroup2_controller_file()
+_have_cgroup2_controller()
{
- _have_cgroup2 || return 1
-
local controller="$1"
- local file="$2"
- local dir
+
+ _have_cgroup2 || return 1
dir="$(_cgroup2_base_dir)"
+
if ! grep -q "$controller" "$dir/cgroup.controllers"; then
SKIP_REASON="no support for $controller cgroup controller; if it is enabled, you may need to boot with cgroup_no_v1=$controller"
return 1
fi
+}
+
+_have_cgroup2_controller_file()
+{
+ local controller="$1"
+ local file="$2"
+ local dir
+
+ _have_cgroup_2_controller "$controller" || return 1
mkdir "$dir/blktests"
echo "+$controller" > "$dir/cgroup.subtree_control"
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH blktests 2/2] blktests: add Ming Lei's scsi-stress-remove
2018-12-12 23:09 [PATCH blktests 0/2] Add scsi-stress-remove to blktests Dennis Zhou
2018-12-12 23:09 ` [PATCH blktests 1/2] blktests: split out cgroup2 controller and file check Dennis Zhou
@ 2018-12-12 23:09 ` Dennis Zhou
2018-12-13 1:24 ` Ming Lei
2018-12-13 18:28 ` [PATCH blktests v2 " Dennis Zhou
1 sibling, 2 replies; 10+ messages in thread
From: Dennis Zhou @ 2018-12-12 23:09 UTC (permalink / raw)
To: Omar Sandoval, Ming Lei; +Cc: kernel-team, linux-block, Dennis Zhou
This test exposed a race condition with shutting down a request_queue
and the new blkg association. The issue ended up being that while the
request_queue will just start failing requests, blkg destruction sets
the q->root_blkg to %NULL. This caused a NPE when trying to reference
it. So to help prevent this from happening again, integrate Ming's test
into blktests so that it can more easily be ran.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
---
tests/block/022 | 96 +++++++++++++++++++++++++++++++++++++++++++++
tests/block/022.out | 2 +
2 files changed, 98 insertions(+)
create mode 100755 tests/block/022
create mode 100644 tests/block/022.out
diff --git a/tests/block/022 b/tests/block/022
new file mode 100755
index 0000000..45bfff7
--- /dev/null
+++ b/tests/block/022
@@ -0,0 +1,96 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-3.0+
+# Copyright (C) 2018 Ming Lei
+#
+# Regression test for patch "blkcg: handle dying request_queue when associating
+# a blkg"
+#
+# This tries to expose the race condition between blkg association and
+# request_queue shutdown. When a request_queue is shutdown, the corresponding
+# blkgs are destroyed. Any further associations should fail gracefully and not
+# cause a kernel panic.
+
+. tests/block/rc
+. common/scsi_debug
+. common/cgroup
+
+DESCRIPTION="test graceful shutdown of scsi_debug devices with running fio jobs"
+QUICK=1
+
+requires() {
+ _have_cgroup2_controller io && _have_scsi_debug && _have_fio
+}
+
+scsi_debug_stress_remove() {
+ scsi_debug_path="/sys/bus/pseudo/drivers/scsi_debug"
+ count=21
+
+ runtime=12
+ nr_fio_jobs=8
+ scsi_dbg_ndelay=10000
+
+ # set higher aio limit
+ echo 524288 > /proc/sys/fs/aio-max-nr
+
+ #figure out the CAN_QUEUE
+ can_queue=$(((count + 1) * (count / 2) / 2))
+
+ rmmod scsi_debug > /dev/null 2>&1
+ modprobe scsi_debug virtual_gb=128 max_luns=$count \
+ ndelay=$scsi_dbg_ndelay max_queue=$can_queue
+
+ # figure out scsi_debug disks
+ hosts=$(ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter0/host*)
+ hostname=$(basename "$hosts")
+ host=$(echo "$hostname" | grep -o -E '[0-9]+')
+
+ sdisks=$(ls -d $scsi_debug_path/adapter*/"$hostname"/target*/*/block/*)
+ disks=""
+ for sd in $sdisks; do
+ disks+="/dev/"$(basename "$sd")
+ disks+=" "
+ done
+
+ use_mq=$(cat /sys/module/scsi_mod/parameters/use_blk_mq)
+ if [[ $use_mq = "Y" ]]; then
+ scheds=("none" "mq-deadline" "kyber")
+ else
+ scheds=("noop" "deadline" "cfq")
+ fi
+
+ fio_jobs=""
+ cnt=0
+ for sd in $disks; do
+ cnt=$((cnt+1))
+ fio_jobs=$fio_jobs" --name=job1 --filename=$sd: "
+ dev_name=$(basename "$sd")
+ q_path=/sys/block/$dev_name/queue
+
+ sched_idx=$((cnt % ${#scheds[@]}))
+ echo "${scheds[$sched_idx]}" > "$q_path/scheduler"
+ echo $cnt > "$q_path/../device/queue_depth"
+ done
+
+ fio --rw=randread --size=128G --direct=1 --ioengine=libaio \
+ --iodepth=2048 --numjobs=$nr_fio_jobs --bs=4k \
+ --group_reporting=1 --group_reporting=1 --runtime=$runtime \
+ --loops=10000 "$fio_jobs" > "$FULL" 2>&1 &
+
+ sleep 7
+ for sd in $disks; do
+ dev_name=$(basename "$sd")
+ dpath=/sys/block/$dev_name/device
+ [ -f "$dpath/delete" ] && echo 1 > "$dpath/delete"
+ done
+
+ wait
+}
+
+
+test() {
+ echo "Running ${TEST_NAME}"
+
+ scsi_debug_stress_remove
+
+ echo "Test complete"
+}
diff --git a/tests/block/022.out b/tests/block/022.out
new file mode 100644
index 0000000..14d43cb
--- /dev/null
+++ b/tests/block/022.out
@@ -0,0 +1,2 @@
+Running block/022
+Test complete
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH blktests 2/2] blktests: add Ming Lei's scsi-stress-remove
2018-12-12 23:09 ` [PATCH blktests 2/2] blktests: add Ming Lei's scsi-stress-remove Dennis Zhou
@ 2018-12-13 1:24 ` Ming Lei
2018-12-13 18:21 ` Dennis Zhou
2018-12-13 18:28 ` [PATCH blktests v2 " Dennis Zhou
1 sibling, 1 reply; 10+ messages in thread
From: Ming Lei @ 2018-12-13 1:24 UTC (permalink / raw)
To: Dennis Zhou; +Cc: Omar Sandoval, kernel-team, linux-block
On Wed, Dec 12, 2018 at 06:09:59PM -0500, Dennis Zhou wrote:
> This test exposed a race condition with shutting down a request_queue
> and the new blkg association. The issue ended up being that while the
> request_queue will just start failing requests, blkg destruction sets
> the q->root_blkg to %NULL. This caused a NPE when trying to reference
> it. So to help prevent this from happening again, integrate Ming's test
> into blktests so that it can more easily be ran.
>
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> ---
> tests/block/022 | 96 +++++++++++++++++++++++++++++++++++++++++++++
> tests/block/022.out | 2 +
> 2 files changed, 98 insertions(+)
> create mode 100755 tests/block/022
> create mode 100644 tests/block/022.out
>
> diff --git a/tests/block/022 b/tests/block/022
> new file mode 100755
> index 0000000..45bfff7
> --- /dev/null
> +++ b/tests/block/022
> @@ -0,0 +1,96 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-3.0+
> +# Copyright (C) 2018 Ming Lei
> +#
> +# Regression test for patch "blkcg: handle dying request_queue when associating
> +# a blkg"
> +#
> +# This tries to expose the race condition between blkg association and
> +# request_queue shutdown. When a request_queue is shutdown, the corresponding
> +# blkgs are destroyed. Any further associations should fail gracefully and not
> +# cause a kernel panic.
> +
> +. tests/block/rc
> +. common/scsi_debug
> +. common/cgroup
> +
> +DESCRIPTION="test graceful shutdown of scsi_debug devices with running fio jobs"
> +QUICK=1
> +
> +requires() {
> + _have_cgroup2_controller io && _have_scsi_debug && _have_fio
> +}
> +
> +scsi_debug_stress_remove() {
> + scsi_debug_path="/sys/bus/pseudo/drivers/scsi_debug"
> + count=21
> +
> + runtime=12
> + nr_fio_jobs=8
> + scsi_dbg_ndelay=10000
> +
> + # set higher aio limit
> + echo 524288 > /proc/sys/fs/aio-max-nr
> +
> + #figure out the CAN_QUEUE
> + can_queue=$(((count + 1) * (count / 2) / 2))
> +
> + rmmod scsi_debug > /dev/null 2>&1
> + modprobe scsi_debug virtual_gb=128 max_luns=$count \
> + ndelay=$scsi_dbg_ndelay max_queue=$can_queue
> +
> + # figure out scsi_debug disks
> + hosts=$(ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter0/host*)
> + hostname=$(basename "$hosts")
> + host=$(echo "$hostname" | grep -o -E '[0-9]+')
> +
> + sdisks=$(ls -d $scsi_debug_path/adapter*/"$hostname"/target*/*/block/*)
> + disks=""
> + for sd in $sdisks; do
> + disks+="/dev/"$(basename "$sd")
> + disks+=" "
> + done
> +
> + use_mq=$(cat /sys/module/scsi_mod/parameters/use_blk_mq)
> + if [[ $use_mq = "Y" ]]; then
> + scheds=("none" "mq-deadline" "kyber")
> + else
> + scheds=("noop" "deadline" "cfq")
> + fi
You may use the following to figure out all supported io schedulers,
especially we have removed all legacy io schedulers.
IOSCHEDS=`sed 's/[][]//g' $Q_PATH/scheduler`
Thanks,
Ming
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH blktests 2/2] blktests: add Ming Lei's scsi-stress-remove
2018-12-13 1:24 ` Ming Lei
@ 2018-12-13 18:21 ` Dennis Zhou
0 siblings, 0 replies; 10+ messages in thread
From: Dennis Zhou @ 2018-12-13 18:21 UTC (permalink / raw)
To: Ming Lei; +Cc: Dennis Zhou, Omar Sandoval, kernel-team, linux-block
On Thu, Dec 13, 2018 at 09:24:09AM +0800, Ming Lei wrote:
> On Wed, Dec 12, 2018 at 06:09:59PM -0500, Dennis Zhou wrote:
> > This test exposed a race condition with shutting down a request_queue
> > and the new blkg association. The issue ended up being that while the
> > request_queue will just start failing requests, blkg destruction sets
> > the q->root_blkg to %NULL. This caused a NPE when trying to reference
> > it. So to help prevent this from happening again, integrate Ming's test
> > into blktests so that it can more easily be ran.
> >
> > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> > Cc: Ming Lei <ming.lei@redhat.com>
> > ---
> > tests/block/022 | 96 +++++++++++++++++++++++++++++++++++++++++++++
> > tests/block/022.out | 2 +
> > 2 files changed, 98 insertions(+)
> > create mode 100755 tests/block/022
> > create mode 100644 tests/block/022.out
> >
> > diff --git a/tests/block/022 b/tests/block/022
> > new file mode 100755
> > index 0000000..45bfff7
> > --- /dev/null
> > +++ b/tests/block/022
> > @@ -0,0 +1,96 @@
> > +#!/bin/bash
> > +# SPDX-License-Identifier: GPL-3.0+
> > +# Copyright (C) 2018 Ming Lei
> > +#
> > +# Regression test for patch "blkcg: handle dying request_queue when associating
> > +# a blkg"
> > +#
> > +# This tries to expose the race condition between blkg association and
> > +# request_queue shutdown. When a request_queue is shutdown, the corresponding
> > +# blkgs are destroyed. Any further associations should fail gracefully and not
> > +# cause a kernel panic.
> > +
> > +. tests/block/rc
> > +. common/scsi_debug
> > +. common/cgroup
> > +
> > +DESCRIPTION="test graceful shutdown of scsi_debug devices with running fio jobs"
> > +QUICK=1
> > +
> > +requires() {
> > + _have_cgroup2_controller io && _have_scsi_debug && _have_fio
> > +}
> > +
> > +scsi_debug_stress_remove() {
> > + scsi_debug_path="/sys/bus/pseudo/drivers/scsi_debug"
> > + count=21
> > +
> > + runtime=12
> > + nr_fio_jobs=8
> > + scsi_dbg_ndelay=10000
> > +
> > + # set higher aio limit
> > + echo 524288 > /proc/sys/fs/aio-max-nr
> > +
> > + #figure out the CAN_QUEUE
> > + can_queue=$(((count + 1) * (count / 2) / 2))
> > +
> > + rmmod scsi_debug > /dev/null 2>&1
> > + modprobe scsi_debug virtual_gb=128 max_luns=$count \
> > + ndelay=$scsi_dbg_ndelay max_queue=$can_queue
> > +
> > + # figure out scsi_debug disks
> > + hosts=$(ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter0/host*)
> > + hostname=$(basename "$hosts")
> > + host=$(echo "$hostname" | grep -o -E '[0-9]+')
> > +
> > + sdisks=$(ls -d $scsi_debug_path/adapter*/"$hostname"/target*/*/block/*)
> > + disks=""
> > + for sd in $sdisks; do
> > + disks+="/dev/"$(basename "$sd")
> > + disks+=" "
> > + done
> > +
> > + use_mq=$(cat /sys/module/scsi_mod/parameters/use_blk_mq)
> > + if [[ $use_mq = "Y" ]]; then
> > + scheds=("none" "mq-deadline" "kyber")
> > + else
> > + scheds=("noop" "deadline" "cfq")
> > + fi
>
> You may use the following to figure out all supported io schedulers,
> especially we have removed all legacy io schedulers.
>
> IOSCHEDS=`sed 's/[][]//g' $Q_PATH/scheduler`
>
>
> Thanks,
> Ming
Thanks Ming! I'll post a v2 update for this patch.
Thanks,
Dennis
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH blktests v2 2/2] blktests: add Ming Lei's scsi-stress-remove
2018-12-12 23:09 ` [PATCH blktests 2/2] blktests: add Ming Lei's scsi-stress-remove Dennis Zhou
2018-12-13 1:24 ` Ming Lei
@ 2018-12-13 18:28 ` Dennis Zhou
2018-12-14 0:31 ` Ming Lei
2018-12-19 22:49 ` Omar Sandoval
1 sibling, 2 replies; 10+ messages in thread
From: Dennis Zhou @ 2018-12-13 18:28 UTC (permalink / raw)
To: Omar Sandoval, Ming Lei; +Cc: kernel-team, linux-block, Dennis Zhou
This test exposed a race condition with shutting down a request_queue
and the new blkg association. The issue ended up being that while the
request_queue will just start failing requests, blkg destruction sets
the q->root_blkg to %NULL. This caused a NPE when trying to reference
it. So to help prevent this from happening again, integrate Ming's test
into blktests so that it can more easily be ran.
Signed-off-by: Dennis Zhou <dennis@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
---
v2:
- Change scheduler retrieving logic based on Ming's comment
tests/block/022 | 90 +++++++++++++++++++++++++++++++++++++++++++++
tests/block/022.out | 2 +
2 files changed, 92 insertions(+)
create mode 100755 tests/block/022
create mode 100644 tests/block/022.out
diff --git a/tests/block/022 b/tests/block/022
new file mode 100755
index 0000000..84336e0
--- /dev/null
+++ b/tests/block/022
@@ -0,0 +1,90 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-3.0+
+# Copyright (C) 2018 Ming Lei
+#
+# Regression test for patch "blkcg: handle dying request_queue when associating
+# a blkg"
+#
+# This tries to expose the race condition between blkg association and
+# request_queue shutdown. When a request_queue is shutdown, the corresponding
+# blkgs are destroyed. Any further associations should fail gracefully and not
+# cause a kernel panic.
+
+. tests/block/rc
+. common/scsi_debug
+. common/cgroup
+
+DESCRIPTION="test graceful shutdown of scsi_debug devices with running fio jobs"
+QUICK=1
+
+requires() {
+ _have_cgroup2_controller io && _have_scsi_debug && _have_fio
+}
+
+scsi_debug_stress_remove() {
+ scsi_debug_path="/sys/bus/pseudo/drivers/scsi_debug"
+ count=21
+
+ runtime=12
+ nr_fio_jobs=8
+ scsi_dbg_ndelay=10000
+
+ # set higher aio limit
+ echo 524288 > /proc/sys/fs/aio-max-nr
+
+ #figure out the CAN_QUEUE
+ can_queue=$(((count + 1) * (count / 2) / 2))
+
+ rmmod scsi_debug > /dev/null 2>&1
+ modprobe scsi_debug virtual_gb=128 max_luns=$count \
+ ndelay=$scsi_dbg_ndelay max_queue=$can_queue
+
+ # figure out scsi_debug disks
+ hosts=$(ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter0/host*)
+ hostname=$(basename "$hosts")
+ host=$(echo "$hostname" | grep -o -E '[0-9]+')
+
+ sdisks=$(ls -d $scsi_debug_path/adapter*/"$hostname"/target*/*/block/*)
+ disks=""
+ for sd in $sdisks; do
+ disks+="/dev/"$(basename "$sd")
+ disks+=" "
+ done
+
+ fio_jobs=""
+ cnt=0
+ for sd in $disks; do
+ cnt=$((cnt+1))
+ fio_jobs=$fio_jobs" --name=job1 --filename=$sd: "
+ dev_name=$(basename "$sd")
+ q_path=/sys/block/$dev_name/queue
+
+ scheds=($(sed 's/[][]//g' "$q_path/scheduler"))
+ sched_idx=$((cnt % ${#scheds[@]}))
+ echo "${scheds[$sched_idx]}" > "$q_path/scheduler"
+ echo $cnt > "$q_path/../device/queue_depth"
+ done
+
+ fio --rw=randread --size=128G --direct=1 --ioengine=libaio \
+ --iodepth=2048 --numjobs=$nr_fio_jobs --bs=4k \
+ --group_reporting=1 --group_reporting=1 --runtime=$runtime \
+ --loops=10000 "$fio_jobs" > "$FULL" 2>&1 &
+
+ sleep 7
+ for sd in $disks; do
+ dev_name=$(basename "$sd")
+ dpath=/sys/block/$dev_name/device
+ [ -f "$dpath/delete" ] && echo 1 > "$dpath/delete"
+ done
+
+ wait
+}
+
+
+test() {
+ echo "Running ${TEST_NAME}"
+
+ scsi_debug_stress_remove
+
+ echo "Test complete"
+}
diff --git a/tests/block/022.out b/tests/block/022.out
new file mode 100644
index 0000000..14d43cb
--- /dev/null
+++ b/tests/block/022.out
@@ -0,0 +1,2 @@
+Running block/022
+Test complete
--
2.17.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH blktests v2 2/2] blktests: add Ming Lei's scsi-stress-remove
2018-12-13 18:28 ` [PATCH blktests v2 " Dennis Zhou
@ 2018-12-14 0:31 ` Ming Lei
2018-12-19 22:49 ` Omar Sandoval
1 sibling, 0 replies; 10+ messages in thread
From: Ming Lei @ 2018-12-14 0:31 UTC (permalink / raw)
To: Dennis Zhou; +Cc: Omar Sandoval, kernel-team, linux-block
On Thu, Dec 13, 2018 at 01:28:44PM -0500, Dennis Zhou wrote:
> This test exposed a race condition with shutting down a request_queue
> and the new blkg association. The issue ended up being that while the
> request_queue will just start failing requests, blkg destruction sets
> the q->root_blkg to %NULL. This caused a NPE when trying to reference
> it. So to help prevent this from happening again, integrate Ming's test
> into blktests so that it can more easily be ran.
>
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> ---
> v2:
> - Change scheduler retrieving logic based on Ming's comment
>
> tests/block/022 | 90 +++++++++++++++++++++++++++++++++++++++++++++
> tests/block/022.out | 2 +
> 2 files changed, 92 insertions(+)
> create mode 100755 tests/block/022
> create mode 100644 tests/block/022.out
>
> diff --git a/tests/block/022 b/tests/block/022
> new file mode 100755
> index 0000000..84336e0
> --- /dev/null
> +++ b/tests/block/022
> @@ -0,0 +1,90 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-3.0+
> +# Copyright (C) 2018 Ming Lei
> +#
> +# Regression test for patch "blkcg: handle dying request_queue when associating
> +# a blkg"
> +#
> +# This tries to expose the race condition between blkg association and
> +# request_queue shutdown. When a request_queue is shutdown, the corresponding
> +# blkgs are destroyed. Any further associations should fail gracefully and not
> +# cause a kernel panic.
> +
> +. tests/block/rc
> +. common/scsi_debug
> +. common/cgroup
> +
> +DESCRIPTION="test graceful shutdown of scsi_debug devices with running fio jobs"
> +QUICK=1
> +
> +requires() {
> + _have_cgroup2_controller io && _have_scsi_debug && _have_fio
> +}
> +
> +scsi_debug_stress_remove() {
> + scsi_debug_path="/sys/bus/pseudo/drivers/scsi_debug"
> + count=21
> +
> + runtime=12
> + nr_fio_jobs=8
> + scsi_dbg_ndelay=10000
> +
> + # set higher aio limit
> + echo 524288 > /proc/sys/fs/aio-max-nr
> +
> + #figure out the CAN_QUEUE
> + can_queue=$(((count + 1) * (count / 2) / 2))
> +
> + rmmod scsi_debug > /dev/null 2>&1
> + modprobe scsi_debug virtual_gb=128 max_luns=$count \
> + ndelay=$scsi_dbg_ndelay max_queue=$can_queue
> +
> + # figure out scsi_debug disks
> + hosts=$(ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter0/host*)
> + hostname=$(basename "$hosts")
> + host=$(echo "$hostname" | grep -o -E '[0-9]+')
> +
> + sdisks=$(ls -d $scsi_debug_path/adapter*/"$hostname"/target*/*/block/*)
> + disks=""
> + for sd in $sdisks; do
> + disks+="/dev/"$(basename "$sd")
> + disks+=" "
> + done
> +
> + fio_jobs=""
> + cnt=0
> + for sd in $disks; do
> + cnt=$((cnt+1))
> + fio_jobs=$fio_jobs" --name=job1 --filename=$sd: "
> + dev_name=$(basename "$sd")
> + q_path=/sys/block/$dev_name/queue
> +
> + scheds=($(sed 's/[][]//g' "$q_path/scheduler"))
> + sched_idx=$((cnt % ${#scheds[@]}))
> + echo "${scheds[$sched_idx]}" > "$q_path/scheduler"
> + echo $cnt > "$q_path/../device/queue_depth"
> + done
> +
> + fio --rw=randread --size=128G --direct=1 --ioengine=libaio \
> + --iodepth=2048 --numjobs=$nr_fio_jobs --bs=4k \
> + --group_reporting=1 --group_reporting=1 --runtime=$runtime \
> + --loops=10000 "$fio_jobs" > "$FULL" 2>&1 &
> +
> + sleep 7
> + for sd in $disks; do
> + dev_name=$(basename "$sd")
> + dpath=/sys/block/$dev_name/device
> + [ -f "$dpath/delete" ] && echo 1 > "$dpath/delete"
> + done
> +
> + wait
> +}
> +
> +
> +test() {
> + echo "Running ${TEST_NAME}"
> +
> + scsi_debug_stress_remove
> +
> + echo "Test complete"
> +}
> diff --git a/tests/block/022.out b/tests/block/022.out
> new file mode 100644
> index 0000000..14d43cb
> --- /dev/null
> +++ b/tests/block/022.out
> @@ -0,0 +1,2 @@
> +Running block/022
> +Test complete
> --
> 2.17.1
>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
thanks,
Ming
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH blktests 1/2] blktests: split out cgroup2 controller and file check
2018-12-12 23:09 ` [PATCH blktests 1/2] blktests: split out cgroup2 controller and file check Dennis Zhou
@ 2018-12-19 18:34 ` Omar Sandoval
0 siblings, 0 replies; 10+ messages in thread
From: Omar Sandoval @ 2018-12-19 18:34 UTC (permalink / raw)
To: Dennis Zhou; +Cc: Omar Sandoval, Ming Lei, kernel-team, linux-block
On Wed, Dec 12, 2018 at 06:09:58PM -0500, Dennis Zhou wrote:
> This is a prep patch for a new test that will race blkg association and
> request_queue cleanup. As blkg association is a underlying cgroup io
> controller feature, we need the ability to check if the controller is
> available.
>
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
> ---
> common/cgroup | 18 +++++++++++++-----
> 1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/common/cgroup b/common/cgroup
> index d445093..3481458 100644
> --- a/common/cgroup
> +++ b/common/cgroup
> @@ -37,19 +37,27 @@ _have_cgroup2()
> return 0
> }
>
> -_have_cgroup2_controller_file()
> +_have_cgroup2_controller()
> {
> - _have_cgroup2 || return 1
> -
> local controller="$1"
> - local file="$2"
> - local dir
> +
> + _have_cgroup2 || return 1
>
> dir="$(_cgroup2_base_dir)"
> +
> if ! grep -q "$controller" "$dir/cgroup.controllers"; then
> SKIP_REASON="no support for $controller cgroup controller; if it is enabled, you may need to boot with cgroup_no_v1=$controller"
> return 1
> fi
> +}
> +
> +_have_cgroup2_controller_file()
> +{
> + local controller="$1"
> + local file="$2"
> + local dir
> +
> + _have_cgroup_2_controller "$controller" || return 1
This should be _have_cgroup2_controller. I'll fix it when I apply it.
>
> mkdir "$dir/blktests"
> echo "+$controller" > "$dir/cgroup.subtree_control"
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH blktests v2 2/2] blktests: add Ming Lei's scsi-stress-remove
2018-12-13 18:28 ` [PATCH blktests v2 " Dennis Zhou
2018-12-14 0:31 ` Ming Lei
@ 2018-12-19 22:49 ` Omar Sandoval
2018-12-19 22:57 ` Dennis Zhou
1 sibling, 1 reply; 10+ messages in thread
From: Omar Sandoval @ 2018-12-19 22:49 UTC (permalink / raw)
To: Dennis Zhou; +Cc: Omar Sandoval, Ming Lei, kernel-team, linux-block
On Thu, Dec 13, 2018 at 01:28:44PM -0500, Dennis Zhou wrote:
> This test exposed a race condition with shutting down a request_queue
> and the new blkg association. The issue ended up being that while the
> request_queue will just start failing requests, blkg destruction sets
> the q->root_blkg to %NULL. This caused a NPE when trying to reference
> it. So to help prevent this from happening again, integrate Ming's test
> into blktests so that it can more easily be ran.
>
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> ---
> v2:
> - Change scheduler retrieving logic based on Ming's comment
>
> tests/block/022 | 90 +++++++++++++++++++++++++++++++++++++++++++++
> tests/block/022.out | 2 +
> 2 files changed, 92 insertions(+)
> create mode 100755 tests/block/022
> create mode 100644 tests/block/022.out
>
> diff --git a/tests/block/022 b/tests/block/022
> new file mode 100755
> index 0000000..84336e0
> --- /dev/null
> +++ b/tests/block/022
> @@ -0,0 +1,90 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-3.0+
> +# Copyright (C) 2018 Ming Lei
> +#
> +# Regression test for patch "blkcg: handle dying request_queue when associating
> +# a blkg"
> +#
> +# This tries to expose the race condition between blkg association and
> +# request_queue shutdown. When a request_queue is shutdown, the corresponding
> +# blkgs are destroyed. Any further associations should fail gracefully and not
> +# cause a kernel panic.
> +
> +. tests/block/rc
> +. common/scsi_debug
> +. common/cgroup
> +
> +DESCRIPTION="test graceful shutdown of scsi_debug devices with running fio jobs"
> +QUICK=1
> +
> +requires() {
> + _have_cgroup2_controller io && _have_scsi_debug && _have_fio
> +}
> +
> +scsi_debug_stress_remove() {
> + scsi_debug_path="/sys/bus/pseudo/drivers/scsi_debug"
> + count=21
> +
> + runtime=12
> + nr_fio_jobs=8
> + scsi_dbg_ndelay=10000
> +
> + # set higher aio limit
> + echo 524288 > /proc/sys/fs/aio-max-nr
> +
> + #figure out the CAN_QUEUE
> + can_queue=$(((count + 1) * (count / 2) / 2))
> +
> + rmmod scsi_debug > /dev/null 2>&1
> + modprobe scsi_debug virtual_gb=128 max_luns=$count \
> + ndelay=$scsi_dbg_ndelay max_queue=$can_queue
> +
> + # figure out scsi_debug disks
> + hosts=$(ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter0/host*)
> + hostname=$(basename "$hosts")
> + host=$(echo "$hostname" | grep -o -E '[0-9]+')
> +
> + sdisks=$(ls -d $scsi_debug_path/adapter*/"$hostname"/target*/*/block/*)
> + disks=""
> + for sd in $sdisks; do
> + disks+="/dev/"$(basename "$sd")
> + disks+=" "
> + done
blktests has _init_scsi_debug which does all of this for you. And,
block/001 is very similar to this test, just without the fio workload or
changing schedulers. Could you please rework this to be based on
block/001?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH blktests v2 2/2] blktests: add Ming Lei's scsi-stress-remove
2018-12-19 22:49 ` Omar Sandoval
@ 2018-12-19 22:57 ` Dennis Zhou
0 siblings, 0 replies; 10+ messages in thread
From: Dennis Zhou @ 2018-12-19 22:57 UTC (permalink / raw)
To: Omar Sandoval
Cc: Dennis Zhou, Omar Sandoval, Ming Lei, kernel-team, linux-block
On Wed, Dec 19, 2018 at 02:49:42PM -0800, Omar Sandoval wrote:
> On Thu, Dec 13, 2018 at 01:28:44PM -0500, Dennis Zhou wrote:
> > This test exposed a race condition with shutting down a request_queue
> > and the new blkg association. The issue ended up being that while the
> > request_queue will just start failing requests, blkg destruction sets
> > the q->root_blkg to %NULL. This caused a NPE when trying to reference
> > it. So to help prevent this from happening again, integrate Ming's test
> > into blktests so that it can more easily be ran.
> >
> > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> > Cc: Ming Lei <ming.lei@redhat.com>
> > ---
> > v2:
> > - Change scheduler retrieving logic based on Ming's comment
> >
> > tests/block/022 | 90 +++++++++++++++++++++++++++++++++++++++++++++
> > tests/block/022.out | 2 +
> > 2 files changed, 92 insertions(+)
> > create mode 100755 tests/block/022
> > create mode 100644 tests/block/022.out
> >
> > diff --git a/tests/block/022 b/tests/block/022
> > new file mode 100755
> > index 0000000..84336e0
> > --- /dev/null
> > +++ b/tests/block/022
> > @@ -0,0 +1,90 @@
> > +#!/bin/bash
> > +# SPDX-License-Identifier: GPL-3.0+
> > +# Copyright (C) 2018 Ming Lei
> > +#
> > +# Regression test for patch "blkcg: handle dying request_queue when associating
> > +# a blkg"
> > +#
> > +# This tries to expose the race condition between blkg association and
> > +# request_queue shutdown. When a request_queue is shutdown, the corresponding
> > +# blkgs are destroyed. Any further associations should fail gracefully and not
> > +# cause a kernel panic.
> > +
> > +. tests/block/rc
> > +. common/scsi_debug
> > +. common/cgroup
> > +
> > +DESCRIPTION="test graceful shutdown of scsi_debug devices with running fio jobs"
> > +QUICK=1
> > +
> > +requires() {
> > + _have_cgroup2_controller io && _have_scsi_debug && _have_fio
> > +}
> > +
> > +scsi_debug_stress_remove() {
> > + scsi_debug_path="/sys/bus/pseudo/drivers/scsi_debug"
> > + count=21
> > +
> > + runtime=12
> > + nr_fio_jobs=8
> > + scsi_dbg_ndelay=10000
> > +
> > + # set higher aio limit
> > + echo 524288 > /proc/sys/fs/aio-max-nr
> > +
> > + #figure out the CAN_QUEUE
> > + can_queue=$(((count + 1) * (count / 2) / 2))
> > +
> > + rmmod scsi_debug > /dev/null 2>&1
> > + modprobe scsi_debug virtual_gb=128 max_luns=$count \
> > + ndelay=$scsi_dbg_ndelay max_queue=$can_queue
> > +
> > + # figure out scsi_debug disks
> > + hosts=$(ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter0/host*)
> > + hostname=$(basename "$hosts")
> > + host=$(echo "$hostname" | grep -o -E '[0-9]+')
> > +
> > + sdisks=$(ls -d $scsi_debug_path/adapter*/"$hostname"/target*/*/block/*)
> > + disks=""
> > + for sd in $sdisks; do
> > + disks+="/dev/"$(basename "$sd")
> > + disks+=" "
> > + done
>
> blktests has _init_scsi_debug which does all of this for you. And,
> block/001 is very similar to this test, just without the fio workload or
> changing schedulers. Could you please rework this to be based on
> block/001?
Yeah I can do that. Sorry for not looking at block/001 more closely.
Thanks,
Dennis
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-12-19 22:57 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-12 23:09 [PATCH blktests 0/2] Add scsi-stress-remove to blktests Dennis Zhou
2018-12-12 23:09 ` [PATCH blktests 1/2] blktests: split out cgroup2 controller and file check Dennis Zhou
2018-12-19 18:34 ` Omar Sandoval
2018-12-12 23:09 ` [PATCH blktests 2/2] blktests: add Ming Lei's scsi-stress-remove Dennis Zhou
2018-12-13 1:24 ` Ming Lei
2018-12-13 18:21 ` Dennis Zhou
2018-12-13 18:28 ` [PATCH blktests v2 " Dennis Zhou
2018-12-14 0:31 ` Ming Lei
2018-12-19 22:49 ` Omar Sandoval
2018-12-19 22:57 ` Dennis Zhou
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).