linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 0/7] nvmet: add target ns revalidate support
@ 2020-04-19 23:48 Chaitanya Kulkarni
  2020-04-19 23:48 ` [PATCH V3 1/7] nvmet: add ns revalidation support Chaitanya Kulkarni
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Chaitanya Kulkarni @ 2020-04-19 23:48 UTC (permalink / raw)
  To: linux-nvme; +Cc: sagi, Chaitanya Kulkarni, hch

Hi,

This patch series adds support for the NVMeOF target bdev-ns and 
file-ns size re-validation.

The first patch adds bdev/file backend, revalidation helpers. It
was posted by Anthony Iliopoulos. I've fixed the comments posted on V1
by keeping the authorship of the patch.

The 2nd patch is needed since the change of size detection on the target
should generate the AEN to the host. Right now there is no generic
mechanism that allows us to add callbacks for the block and file backend 
so that we will get the notification (if anyone knows please let me
know, I'll be happy rework this series). So this just adds light-weight
global maintenance thread that checks for the size change and generates
AEN when needed.

Now when loading the nvmet module, it will start the global maintenance
thread with the default parameters. Different users may require 
different tunables based on their needs for the ns size revalidation and
how their system performs. Instead of forcing a particular policy I've
added different tunables for maintenance thread in the configfs for 
adjusting maintenance thread scheduling policy along with scheduling
priority, sleep timeout for the thread (i.e. refresh interval between
scan) and allowing namespace to participate in the size revalidation or
not. It allows the user to have flexibility on namespace granularity so
that the user can decide whether it wants a namespace to participate in
the revalidation process. 

Regarding functional testing :-

I've tested this with a dedicated blktest which creates 10 subsys and 10
ns per subsys with file backend and then generated async event from
the target by changing the file size with truncate with nvme-loop.
I've verified that new size for all the namespaces from target to host
block device is propagated with this patch series when maintenance
thread is enabled.       

Regarding performance testing :-

I couldn't find huge difference, still trying to see if I miss somehing.
Following are the collected numbers, for details of the methodology
please have a look at the end of the patch.

1. Fio IOPS/BW With Resize Monitor Turned off/on :-
---------------------------------------------------
1.1 Off :-
----------
# echo 0 > /sys/kernel/config/nvmet/subsystems/control_resize_refresh 
[ 1094.203726] nvmet: nvmet_ns_resize_monitor 573 Monitor Goodbye 
[ 1094.203883] nvmet: nvmet_disable_ns_resize_monitor 1577 DISABLED 
read: IOPS=168k, BW=658MiB/s (690MB/s)(38.5GiB/60007msec)
read: IOPS=167k, BW=654MiB/s (686MB/s)(38.3GiB/60004msec)
read: IOPS=168k, BW=658MiB/s (690MB/s)(38.5GiB/60004msec)
read: IOPS=168k, BW=658MiB/s (690MB/s)(38.6GiB/60005msec)
read: IOPS=166k, BW=648MiB/s (680MB/s)(37.0GiB/60009msec)
read: IOPS=168k, BW=658MiB/s (690MB/s)(38.5GiB/60003msec)
read: IOPS=168k, BW=658MiB/s (690MB/s)(38.5GiB/60003msec)
read: IOPS=169k, BW=658MiB/s (690MB/s)(38.6GiB/60006msec)
read: IOPS=168k, BW=656MiB/s (688MB/s)(38.4GiB/60004msec)
1.2 On :-
---------
# echo 1 > /sys/kernel/config/nvmet/subsystems/control_resize_refresh 
# echo 1 > /sys/kernel/config/nvmet/subsystems/control_resize_timeout 
[ 5336.754319] nvmet: nvmet_enable_ns_resize_monitor 1552 Monitor Hello 
[ 5336.754663] nvmet: nvmet_enable_ns_resize_monitor 1556 ENABLED 
[ 5336.754685] nvmet: nvmet_ns_resize_monitor 552 Monitor Hello 
read: IOPS=168k, BW=655MiB/s (687MB/s)(38.4GiB/60006msec)
read: IOPS=142k, BW=554MiB/s (580MB/s)(32.4GiB/60003msec)
read: IOPS=168k, BW=658MiB/s (690MB/s)(38.5GiB/60003msec)
read: IOPS=168k, BW=655MiB/s (687MB/s)(38.4GiB/60002msec)
read: IOPS=168k, BW=657MiB/s (689MB/s)(38.5GiB/60003msec)
read: IOPS=168k, BW=658MiB/s (690MB/s)(38.5GiB/60004msec)
read: IOPS=169k, BW=659MiB/s (691MB/s)(38.6GiB/60003msec)
read: IOPS=168k, BW=656MiB/s (688MB/s)(38.4GiB/60002msec)
read: IOPS=169k, BW=659MiB/s (691MB/s)(38.6GiB/60003msec)

2. Fio latency With Resize Monitor Turned off/on :-
---------------------------------------------------
2.1 Off :-
----------
# echo 0 > /sys/kernel/config/nvmet/subsystems/control_resize_refresh 
[ 1094.203726] nvmet: nvmet_ns_resize_monitor 573 Monitor Goodbye 
[ 1094.203883] nvmet: nvmet_disable_ns_resize_monitor 1577 DISABLED 
lat (usec): min=132, max=52678, avg=6079.19, stdev=1363.88
lat (usec): min=149, max=67465, avg=6112.28, stdev=1413.36
lat (usec): min=158, max=60909, avg=6081.65, stdev=1268.04
lat (usec): min=227, max=53860, avg=6077.59, stdev=1267.18
lat (usec): min=170, max=719682, avg=6167.66, stdev=1905.63
lat (usec): min=156, max=59967, avg=6078.86, stdev=1334.39
lat (usec): min=146, max=52313, avg=6079.96, stdev=1337.53
lat (usec): min=167, max=55577, avg=6074.54, stdev=1361.47
lat (usec): min=146, max=49626, avg=6095.29, stdev=1304.03
2.2. On :-
----------
# echo 1 > /sys/kernel/config/nvmet/subsystems/control_resize_refresh 
# echo 1 > /sys/kernel/config/nvmet/subsystems/control_resize_timeout 
[ 5336.754319] nvmet: nvmet_enable_ns_resize_monitor 1552 Monitor Hello 
[ 5336.754663] nvmet: nvmet_enable_ns_resize_monitor 1556 ENABLED 
[ 5336.754685] nvmet: nvmet_ns_resize_monitor 552 Monitor Hello 
lat (usec): min=141, max=63001, avg=6103.25, stdev=1383.30
lat (usec): min=266, max=64597, avg=7224.60, stdev=2373.59
lat (usec): min=151, max=64977, avg=6080.20, stdev=1286.61
lat (usec): min=188, max=47260, avg=6104.01, stdev=1365.08
lat (usec): min=188, max=57818, avg=6088.90, stdev=1335.74
lat (usec): min=100, max=47576, avg=6081.35, stdev=1312.35
lat (usec): min=171, max=55718, avg=6069.51, stdev=1353.18
lat (usec): min=168, max=49303, avg=6096.46, stdev=1321.67
lat (usec): min=138, max=48412, avg=6070.14, stdev=1314.59

Regards,
Chaitanya

Changes from V2 :-

1. Add a global maintenance thread with SCHED_IDLE default policy
   so that it will have minimum impact on the CPU utilization when
   target is busy.
2. Add maintenance thread tuneables so that user can adjust this
   feature from configfs and have more fine grained policy. 
3. Add an async event generation tracepoint for target. This is
   needed in order to test the events across the transport.

Changes from V1 :-

1. Just use ns->size = i_size_read(ns->bdev->bd_inode) in the
   nvmet_bdev_ns_revalidate().
2. Remove !file check and use fill line for vfs_getattr() call in
   nvmet_file_ns_revalidate().
3. Add wrapper nvmet_ns_revalidate().
4. Add 2nd patch to introduce per namespace thread to monitor the size
   by calling nvmet_ns_revalidate() and generate AEN when size change
   is detected.  
5. Change return type of the nvmet_[bdev|file]ns_revalidate() from void
   to bool.

Anthony Iliopoulos (1):
  nvmet: add ns revalidation support

Chaitanya Kulkarni (6):
  nvmet: add global thread for ns-resize AEN
  nvmet: export resize thread enable-disable attr
  nvmet: export resize thread scan interval
  nvmet: export resize thread sched attributes
  nvmet: export ns resize monitor attribute
  nvmet: add async event tracing support

 drivers/nvme/target/admin-cmd.c   |   4 +
 drivers/nvme/target/configfs.c    | 156 +++++++++++++++++++++++++++++-
 drivers/nvme/target/core.c        | 132 +++++++++++++++++++++++++
 drivers/nvme/target/io-cmd-bdev.c |  12 +++
 drivers/nvme/target/io-cmd-file.c |  16 +++
 drivers/nvme/target/nvmet.h       |  37 ++++++-
 drivers/nvme/target/trace.h       |  28 ++++++
 7 files changed, 383 insertions(+), 2 deletions(-)

1. Functional Testing :-

1.1 Create 10 subsys/10 ns per subsys and generate AEN for each namespace
    and verify size change reflected on the host for all 100 ns.
test() {
        echo "Running ${TEST_NAME}"

        local port
        local file_path
        local nr_ss=10
        local nr_ns=10
        local orig_size=10G
        local new_size=1G
        local subsys_name="blktests-subsystem"

        file_path="${TMPDIR}/img1"
        truncate -s ${orig_size} "${file_path}"
        _setup_nvmet
        port="$(_create_nvmet_port "loop")"
        for ((i = 1; i <= nr_ss; i++)); do
                _create_nvmet_subsystem "${subsys_name}${i}" "${file_path}" \
                        "91fdba0d-f87b-4c25-b80f-db7be1418b9e"
                for ((j = 2; j <= nr_ns; j++)); do
                        file_path="${TMPDIR}/img${i}${j}"
                        truncate -s ${orig_size} "${file_path}"
                        _create_nvmet_ns "${subsys_name}${i}" "${j}" "${file_path}"
                done
                _add_nvmet_subsys_to_port "${port}" "${subsys_name}${i}"
                nvme connect -t loop -n "${subsys_name}${i}"
        done

        sleep 1

        echo "Original Size of NVMeOF host device:-"
        for i in `nvme list | grep "Linux" | tr -s ' ' ' ' | cut -f 1 -d ' ' | sort`; do
                lsblk ${i} --output NAME,SIZE | grep -v NAME | sort
        done
        for ((i = nr_ss; i >= 1; i--)); do
                for ((j = nr_ns; j > 1; j--)); do
                        file_path="${TMPDIR}/img${i}${j}"
                        truncate -s ${new_size} ${file_path}
                        # Allow maintenance thread to generate AEN
                        sleep 1
                done
        done
        echo "New Size of NVMeOF host device:-"
        for i in `nvme list | grep "Linux" | tr -s ' ' ' ' | cut -f 1 -d ' ' | sort`; do
                lsblk ${i} --output NAME,SIZE | grep -v NAME
        done

        for ((i = nr_ss; i >= 1; i--)); do
                nvme disconnect -n "${subsys_name}${i}"
                _remove_nvmet_subsystem_from_port "${port}" "${subsys_name}${i}"
                for ((j = nr_ns; j > 1; j--)); do
                        _remove_nvmet_ns "${subsys_name}${i}" $j
                        file_path="${TMPDIR}/img${i}${j}"
                        rm ${file_path}
                done
                _remove_nvmet_subsystem "${subsys_name}${i}"
        done

        _remove_nvmet_port "${port}"

        file_path="${TMPDIR}/img1"
        rm "${file_path}"

        echo "Test complete"
}

1.2 Decrease the size of the namespace gradually by 100M and verify AEN on
    host and target with gracefull shutdown for the application which running
    I/O traffic on the host.

1.2.1 Create nvmeof loop based file backed target and connect to host.
[83222.015323] nvme1n1: detected capacity change from 0 to 986316800

1.2.2 Truncate backend file gradually :-
for i in 800 700 600 500 400 # 400 # <---- fio job size
do
	truncate -s ${i}m /mnt/backend0/backend0
	sleep 4
done

1.2.3 fio job to generate I/O traffc :-
# cat fio//randread.fio 
[RANDREAD]
ioengine=libaio
direct=1
rw=randread
norandommap
randrepeat=0
runtime=30s
iodepth=8
numjobs=12
bs=4k
time_based
overwrite=0
size=500m  <--- fio job file size

fio fio/randread.fio --filename=/dev/nvme1n1
RANDREAD: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=8
...
fio-3.8-5-g464b
Starting 12 processes
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=502120448, buflen=4096
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=506044416, buflen=4096
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=505094144, buflen=4096
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=434446336, buflen=4096
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=436948992, buflen=4096
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=431280128, buflen=4096
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=491143168, buflen=4096
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=507174912, buflen=4096
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=514990080, buflen=4096
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=524247040, buflen=4096
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=478760960, buflen=4096
fio: pid=15123, err=28/file:io_u.c:1744, func=io_u error, error=No space left on device
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=466919424, buflen=4096
fio: pid=15131, err=28/file:io_u.c:1744, func=io_u error, error=No space left on device
fio: pid=15129, err=28/file:io_u.c:1744, func=io_u error, error=No space left on device
fio: pid=15122, err=28/file:io_u.c:1744, func=io_u error, error=No space left on device
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=524111872, buflen=4096
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=479281152, buflen=4096
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=450961408, buflen=4096
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=469983232, buflen=4096
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=513048576, buflen=4096
fio: pid=15127, err=28/file:io_u.c:1744, func=io_u error, error=No space left on device
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=420630528, buflen=4096
fio: pid=15121, err=28/file:io_u.c:1744, func=io_u error, error=No space left on device
fio: pid=15130, err=28/file:io_u.c:1744, func=io_u error, error=No space left on device
fio: pid=15124, err=28/file:io_u.c:1744, func=io_u error, error=No space left on device
fio: pid=15132, err=28/file:io_u.c:1744, func=io_u error, error=No space left on device
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=449703936, buflen=4096
fio: pid=15126, err=28/file:io_u.c:1744, func=io_u error, error=No space left on device
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=523735040, buflen=4096
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=454336512, buflen=4096
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=480763904, buflen=4096
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=452882432, buflen=4096
fio: pid=15128, err=28/file:io_u.c:1744, func=io_u error, error=No space left on device
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=472510464, buflen=4096
fio: io_u error on file /dev/nvme1n1: No space left on device: read offset=488837120, buflen=4096
fio: pid=15125, err=28/file:io_u.c:1744, func=io_u error, error=No space left on device

RANDREAD: (groupid=0, jobs=12): err=28 (file:io_u.c:1744, func=io_u error, error=No space left on device): pid=15121: Tue Apr 14 21:11:22 2020
   read: IOPS=530k, BW=2072MiB/s (2173MB/s)(50.3GiB/24836msec)
    slat (usec): min=7, max=19057, avg=11.69, stdev=21.57
    clat (usec): min=50, max=37133, avg=168.44, stdev=169.93
     lat (usec): min=72, max=37184, avg=180.21, stdev=172.74
    clat percentiles (usec):
     |  1.00th=[  143],  5.00th=[  145], 10.00th=[  147], 20.00th=[  151],
     | 30.00th=[  153], 40.00th=[  153], 50.00th=[  155], 60.00th=[  159],
     | 70.00th=[  163], 80.00th=[  172], 90.00th=[  221], 95.00th=[  239],
     | 99.00th=[  326], 99.50th=[  355], 99.90th=[  445], 99.95th=[  519],
     | 99.99th=[  734]
   bw (  KiB/s): min=151000, max=192448, per=8.34%, avg=176897.17, stdev=6830.17, samples=588
   iops        : min=37750, max=48112, avg=44224.25, stdev=1707.56, samples=588
  lat (usec)   : 100=0.01%, 250=96.38%, 500=3.56%, 750=0.05%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  cpu          : usr=4.74%, sys=61.53%, ctx=1647598, majf=0, minf=360
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=13173135,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
   READ: bw=2072MiB/s (2173MB/s), 2072MiB/s-2072MiB/s (2173MB/s-2173MB/s), io=50.3GiB (53.0GB), run=24836-24836msec

Disk stats (read/write):
  nvme1n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

1.2.4 Examine AEN in the trace for both host and target :-
# cat /sys/kernel/debug/tracing/events/nvme/nvme_async_event/enable
1
# cat /sys/kernel/debug/tracing/events/nvmet/nvmet_async_event/enable
1
# cat /sys/kernel/debug/tracing/trace_pipe 
nvmet_async_event: nvmet1: NVME_AEN=0x000000 [NVME_AER_NOTICE_NS_CHANGED]
nvme_async_event: nvme1: NVME_AEN=0x000000 [NVME_AER_NOTICE_NS_CHANGED]
nvmet_async_event: nvmet1: NVME_AEN=0x000000 [NVME_AER_NOTICE_NS_CHANGED]
nvme_async_event: nvme1: NVME_AEN=0x000000 [NVME_AER_NOTICE_NS_CHANGED]
nvmet_async_event: nvmet1: NVME_AEN=0x000000 [NVME_AER_NOTICE_NS_CHANGED]
nvme_async_event: nvme1: NVME_AEN=0x000000 [NVME_AER_NOTICE_NS_CHANGED]
nvmet_async_event: nvmet1: NVME_AEN=0x000000 [NVME_AER_NOTICE_NS_CHANGED]
nvme_async_event: nvme1: NVME_AEN=0x000000 [NVME_AER_NOTICE_NS_CHANGED]

2. Performance Testing :- 

2.1 Performance Set up Script :-

SS=fs
SSPATH=/sys/kernel/config/nvmet/subsystems/${SS}/
PORTS=/sys/kernel/config/nvmet/ports

load_modules()
{
	modprobe nvme
	modprobe nvme-fabrics
	modprobe nvmet
	modprobe nvme-loop
	sleep 3 
}

make_nullb()
{
	local src=drivers/block/
	local dest=/lib/modules/`uname -r`/kernel/drivers/block

	modprobe -r null_blk
	makej M=drivers/block/
	\cp ${src}/null_blk.ko ${dest}/
	modprobe null_blk nr_devices=32 gb=1
	sleep 1
}

make_target()
{
	tree /sys/kernel/config
	mkdir ${SSPATH}

	for i in `seq 1 32`; do
		mkdir ${SSPATH}/namespaces/${i}
		file=/dev/nullb$((i-1))
		echo -n ${file} > ${SSPATH}/namespaces/${i}/device_path
		cat ${SSPATH}/namespaces/${i}/device_path
		echo 0 > ${SSPATH}/namespaces/${i}/buffered_io
		cat ${SSPATH}/namespaces/${i}/buffered_io
		echo 1 > ${SSPATH}/namespaces/${i}/enable 
	done

	mkdir ${PORTS}/1/
	echo -n "loop" > ${PORTS}/1/addr_trtype 
	echo -n 1 > ${SSPATH}/attr_allow_any_host
	ln -s ${SSPATH} ${PORTS}/1/subsystems/
	sleep 1
}

connect()
{
	echo  "transport=loop,nqn=fs" > /dev/nvme-fabrics
	sleep 1
}

main()
{
	load_modules
	make_nullb
	make_target
	connect
	dmesg -c
}

2.2 Fio Job :-

[global]
ioengine=libaio
direct=1
rw=randread
norandommap
allrandrepeat=1
runtime=1m
iodepth=32
bs=4k
time_based
overwrite=0
size=900m
group_reporting

[job1]
name=test1
filename=/dev/nvme1n1
rw=randread

[job2]
name=test2
filename=/dev/nvme1n2
rw=randread

[job3]
name=test3
filename=/dev/nvme1n3
rw=randread
.
.
.

[job32]
name=test32
filename=/dev/nvme1n32
rw=randread
-- 
2.22.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH V3 1/7] nvmet: add ns revalidation support
  2020-04-19 23:48 [PATCH V3 0/7] nvmet: add target ns revalidate support Chaitanya Kulkarni
@ 2020-04-19 23:48 ` Chaitanya Kulkarni
  2020-04-19 23:48 ` [PATCH V3 2/7] nvmet: add global thread for ns-resize AEN Chaitanya Kulkarni
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Chaitanya Kulkarni @ 2020-04-19 23:48 UTC (permalink / raw)
  To: linux-nvme; +Cc: sagi, Chaitanya Kulkarni, hch

From: Anthony Iliopoulos <ailiop@suse.com>

Add support for detecting capacity changes on nvmet blockdev and file
backed namespaces. This allows for emulating and testing online resizing
of nvme devices and filesystems on top.

Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
[Fix comments posted on V1]
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c   |  5 +++++
 drivers/nvme/target/io-cmd-bdev.c |  5 +++++
 drivers/nvme/target/io-cmd-file.c | 11 +++++++++++
 drivers/nvme/target/nvmet.h       |  2 ++
 4 files changed, 23 insertions(+)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 9d6f75cfa77c..4c79aa804887 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -486,6 +486,11 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 	if (!ns)
 		goto done;
 
+	if (ns->bdev)
+		nvmet_bdev_ns_revalidate(ns);
+	else
+		nvmet_file_ns_revalidate(ns);
+
 	/*
 	 * nuse = ncap = nsze isn't always true, but we have no way to find
 	 * that out from the underlying device.
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index ea0e596be15d..0427e040e3dd 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -75,6 +75,11 @@ void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
 	}
 }
 
+void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns)
+{
+	ns->size = i_size_read(ns->bdev->bd_inode);
+}
+
 static u16 blk_to_nvme_status(struct nvmet_req *req, blk_status_t blk_sts)
 {
 	u16 status = NVME_SC_SUCCESS;
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index cd5670b83118..14364383d2fe 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -80,6 +80,17 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
 	return ret;
 }
 
+void nvmet_file_ns_revalidate(struct nvmet_ns *ns)
+{
+	struct kstat stat;
+
+	if (vfs_getattr(&ns->file->f_path, &stat, STATX_SIZE,
+			AT_STATX_FORCE_SYNC))
+		return;
+
+	ns->size = stat.size;
+}
+
 static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
 {
 	bv->bv_page = sg_page(sg);
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 421dff3ea143..8b479d932a7b 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -498,6 +498,8 @@ void nvmet_file_ns_disable(struct nvmet_ns *ns);
 u16 nvmet_bdev_flush(struct nvmet_req *req);
 u16 nvmet_file_flush(struct nvmet_req *req);
 void nvmet_ns_changed(struct nvmet_subsys *subsys, u32 nsid);
+void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns);
+void nvmet_file_ns_revalidate(struct nvmet_ns *ns);
 
 static inline u32 nvmet_rw_len(struct nvmet_req *req)
 {
-- 
2.22.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V3 2/7] nvmet: add global thread for ns-resize AEN
  2020-04-19 23:48 [PATCH V3 0/7] nvmet: add target ns revalidate support Chaitanya Kulkarni
  2020-04-19 23:48 ` [PATCH V3 1/7] nvmet: add ns revalidation support Chaitanya Kulkarni
@ 2020-04-19 23:48 ` Chaitanya Kulkarni
  2020-04-27 14:19   ` Hannes Reinecke
  2020-04-19 23:48 ` [PATCH V3 3/7] nvmet: export resize thread enable-disable attr Chaitanya Kulkarni
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Chaitanya Kulkarni @ 2020-04-19 23:48 UTC (permalink / raw)
  To: linux-nvme; +Cc: sagi, Chaitanya Kulkarni, hch

The change of size detection on the target should generate an AEN to
the host. Right now there is no generic mechanism that allows us to add
callbacks for the block and file backend so that we will get the
notification for change of the size for block device and file backend.
This patch adds global maintenance thread that checks for the size
change and generates AEN when needed. This also adds a required lock
needed to protect the ns->size variable which is updated under
nvmet_ns_enable().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/admin-cmd.c   |   7 +-
 drivers/nvme/target/configfs.c    |   2 +-
 drivers/nvme/target/core.c        | 129 ++++++++++++++++++++++++++++++
 drivers/nvme/target/io-cmd-bdev.c |   9 ++-
 drivers/nvme/target/io-cmd-file.c |  15 ++--
 drivers/nvme/target/nvmet.h       |  22 ++++-
 6 files changed, 170 insertions(+), 14 deletions(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 4c79aa804887..a3bc2987c72a 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -486,10 +486,9 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
 	if (!ns)
 		goto done;
 
-	if (ns->bdev)
-		nvmet_bdev_ns_revalidate(ns);
-	else
-		nvmet_file_ns_revalidate(ns);
+	mutex_lock(&ns->subsys->lock);
+	nvmet_ns_revalidate(ns);
+	mutex_unlock(&ns->subsys->lock);
 
 	/*
 	 * nuse = ncap = nsze isn't always true, but we have no way to find
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 58cabd7b6fc5..b0e84027b3bc 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -1469,7 +1469,7 @@ int __init nvmet_init_configfs(void)
 	return 0;
 }
 
-void __exit nvmet_exit_configfs(void)
+void nvmet_exit_configfs(void)
 {
 	configfs_unregister_subsystem(&nvmet_configfs_subsystem);
 }
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index b685f99d56a1..c42d24c1728e 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -9,12 +9,15 @@
 #include <linux/rculist.h>
 #include <linux/pci-p2pdma.h>
 #include <linux/scatterlist.h>
+#include <linux/delay.h>
+#include <uapi/linux/sched/types.h>
 
 #define CREATE_TRACE_POINTS
 #include "trace.h"
 
 #include "nvmet.h"
 
+struct nvmet_resize_monitor *monitor;
 struct workqueue_struct *buffered_io_wq;
 static const struct nvmet_fabrics_ops *nvmet_transports[NVMF_TRTYPE_MAX];
 static DEFINE_IDA(cntlid_ida);
@@ -514,6 +517,59 @@ static void nvmet_p2pmem_ns_add_p2p(struct nvmet_ctrl *ctrl,
 		ns->nsid);
 }
 
+bool nvmet_ns_revalidate(struct nvmet_ns *ns)
+{
+	if (ns->bdev)
+		return nvmet_bdev_ns_revalidate(ns);
+
+	return nvmet_file_ns_revalidate(ns);
+}
+
+static void __nvmet_handle_resize_ns(struct nvmet_subsys *s)
+{
+	struct nvmet_ns *ns;
+
+	mutex_lock(&s->lock);
+	list_for_each_entry_rcu(ns, &s->namespaces, dev_link, 1) {
+		if (nvmet_ns_revalidate(ns))
+			nvmet_ns_changed(ns->subsys, ns->nsid);
+	}
+	mutex_unlock(&s->lock);
+}
+
+static int nvmet_ns_resize_monitor(void *data)
+{
+	struct sched_param p = { .sched_priority = monitor->sched_priority };
+	struct nvmet_subsys_link *link;
+	struct nvmet_port *port;
+	u32 msec;
+
+	complete(&monitor->wait);
+	sched_setscheduler(current, monitor->sched_policy, &p);
+
+	while (!kthread_should_park()) {
+		down_read(&nvmet_config_sem);
+		list_for_each_entry(port, nvmet_ports, global_entry)
+			list_for_each_entry(link, &port->subsystems, entry)
+				__nvmet_handle_resize_ns(link->subsys);
+		up_read(&nvmet_config_sem);
+		schedule();
+		/* XXX: use better sleep wakeup mechanism */
+
+		/*
+		 * This allows user to change the sleep timtout without
+		 * stopping monitor thread.
+		 */
+		mutex_lock(&monitor->control_lock);
+		msec = monitor->msec;
+		mutex_unlock(&monitor->control_lock);
+		msleep(msec);
+	}
+
+	kthread_parkme();
+	return 0;
+}
+
 int nvmet_ns_enable(struct nvmet_ns *ns)
 {
 	struct nvmet_subsys *subsys = ns->subsys;
@@ -1480,6 +1536,71 @@ void nvmet_subsys_put(struct nvmet_subsys *subsys)
 	kref_put(&subsys->ref, nvmet_subsys_free);
 }
 
+bool nvmet_enable_ns_resize_monitor(void)
+{
+	lockdep_assert_held(&monitor->control_lock);
+
+	if (monitor->thread)
+		goto out;
+
+	monitor->thread = kthread_create(nvmet_ns_resize_monitor, NULL,
+				       "nvmet_ns_resize");
+	if (monitor->thread) {
+		wake_up_process(monitor->thread);
+		wait_for_completion(&monitor->wait);
+		pr_debug("ns monitor thread started successfully\n");
+	}
+
+out:
+	return monitor->thread ? true : false;
+}
+
+bool nvmet_disable_ns_resize_monitor(void)
+{
+	bool ret = false;
+
+	lockdep_assert_held(&monitor->control_lock);
+
+	if (monitor->thread) {
+		kthread_park(monitor->thread);
+		kthread_stop(monitor->thread);
+		monitor->thread = NULL;
+		ret = true;
+	}
+
+	kfree(monitor->thread);
+	return ret;
+}
+
+int nvmet_init_ns_resize_monitor(void)
+{
+	int ret;
+
+	monitor = kzalloc(sizeof(*monitor), GFP_KERNEL);
+	if (!monitor)
+		return -ENOMEM;
+
+	monitor->msec = NVMET_RESIZE_MON_MSEC;
+	monitor->sched_policy = SCHED_IDLE;
+	monitor->sched_priority = 0;
+	mutex_init(&monitor->control_lock);
+	init_completion(&monitor->wait);
+
+	mutex_lock(&monitor->control_lock);
+	ret = nvmet_enable_ns_resize_monitor() ? 0 : -ENOMEM;
+	mutex_unlock(&monitor->control_lock);
+
+	return ret;
+}
+
+void nvmet_exit_ns_resize_monitor(void)
+{
+	mutex_lock(&monitor->control_lock);
+	nvmet_disable_ns_resize_monitor();
+	mutex_unlock(&monitor->control_lock);
+	kfree(monitor);
+}
+
 static int __init nvmet_init(void)
 {
 	int error;
@@ -1500,8 +1621,15 @@ static int __init nvmet_init(void)
 	error = nvmet_init_configfs();
 	if (error)
 		goto out_exit_discovery;
+
+	error = nvmet_init_ns_resize_monitor();
+	if (error)
+		goto out_exit_configfs;
+
 	return 0;
 
+out_exit_configfs:
+	nvmet_exit_configfs();
 out_exit_discovery:
 	nvmet_exit_discovery();
 out_free_work_queue:
@@ -1512,6 +1640,7 @@ static int __init nvmet_init(void)
 
 static void __exit nvmet_exit(void)
 {
+	nvmet_exit_ns_resize_monitor();
 	nvmet_exit_configfs();
 	nvmet_exit_discovery();
 	ida_destroy(&cntlid_ida);
diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
index 0427e040e3dd..3cca08e9ad90 100644
--- a/drivers/nvme/target/io-cmd-bdev.c
+++ b/drivers/nvme/target/io-cmd-bdev.c
@@ -75,9 +75,16 @@ void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
 	}
 }
 
-void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns)
+bool nvmet_bdev_ns_revalidate(struct nvmet_ns *ns)
 {
+	bool change;
+
+	lockdep_assert_held(&ns->subsys->lock);
+
+	change = ns->size != i_size_read(ns->bdev->bd_inode) ? true : false;
 	ns->size = i_size_read(ns->bdev->bd_inode);
+
+	return change;
 }
 
 static u16 blk_to_nvme_status(struct nvmet_req *req, blk_status_t blk_sts)
diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
index 14364383d2fe..a2d82e55858b 100644
--- a/drivers/nvme/target/io-cmd-file.c
+++ b/drivers/nvme/target/io-cmd-file.c
@@ -80,15 +80,20 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
 	return ret;
 }
 
-void nvmet_file_ns_revalidate(struct nvmet_ns *ns)
+bool nvmet_file_ns_revalidate(struct nvmet_ns *ns)
 {
+	struct path *f_path = &ns->file->f_path;
+	bool change = false;
 	struct kstat stat;
 
-	if (vfs_getattr(&ns->file->f_path, &stat, STATX_SIZE,
-			AT_STATX_FORCE_SYNC))
-		return;
+	lockdep_assert_held(&ns->subsys->lock);
 
-	ns->size = stat.size;
+	if (vfs_getattr(f_path, &stat, STATX_SIZE, AT_STATX_FORCE_SYNC) == 0) {
+		change = ns->size != stat.size ? true : false;
+		ns->size = stat.size;
+	}
+
+	return change;
 }
 
 static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 8b479d932a7b..7fe6d705cbf1 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -23,6 +23,7 @@
 #define NVMET_ASYNC_EVENTS		4
 #define NVMET_ERROR_LOG_SLOTS		128
 #define NVMET_NO_ERROR_LOC		((u16)-1)
+#define NVMET_RESIZE_MON_MSEC		500
 #define NVMET_DEFAULT_CTRL_MODEL	"Linux"
 
 /*
@@ -50,6 +51,17 @@
 #define IPO_IATTR_CONNECT_SQE(x)	\
 	(cpu_to_le32(offsetof(struct nvmf_connect_command, x)))
 
+struct nvmet_resize_monitor {
+	u32			sched_priority;
+	u32			sched_policy;
+	struct mutex		control_lock;
+	struct task_struct	*thread;
+	struct completion	wait;
+	u32			msec;
+};
+
+extern struct nvmet_resize_monitor *monitor;
+
 struct nvmet_ns {
 	struct list_head	dev_link;
 	struct percpu_ref	ref;
@@ -477,7 +489,7 @@ void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
 #define NVMET_DISC_KATO_MS		120000
 
 int __init nvmet_init_configfs(void);
-void __exit nvmet_exit_configfs(void);
+void nvmet_exit_configfs(void);
 
 int __init nvmet_init_discovery(void);
 void nvmet_exit_discovery(void);
@@ -498,8 +510,12 @@ void nvmet_file_ns_disable(struct nvmet_ns *ns);
 u16 nvmet_bdev_flush(struct nvmet_req *req);
 u16 nvmet_file_flush(struct nvmet_req *req);
 void nvmet_ns_changed(struct nvmet_subsys *subsys, u32 nsid);
-void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns);
-void nvmet_file_ns_revalidate(struct nvmet_ns *ns);
+bool nvmet_bdev_ns_revalidate(struct nvmet_ns *ns);
+bool nvmet_file_ns_revalidate(struct nvmet_ns *ns);
+bool nvmet_ns_revalidate(struct nvmet_ns *ns);
+
+bool nvmet_enable_ns_resize_monitor(void);
+bool nvmet_disable_ns_resize_monitor(void);
 
 static inline u32 nvmet_rw_len(struct nvmet_req *req)
 {
-- 
2.22.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V3 3/7] nvmet: export resize thread enable-disable attr
  2020-04-19 23:48 [PATCH V3 0/7] nvmet: add target ns revalidate support Chaitanya Kulkarni
  2020-04-19 23:48 ` [PATCH V3 1/7] nvmet: add ns revalidation support Chaitanya Kulkarni
  2020-04-19 23:48 ` [PATCH V3 2/7] nvmet: add global thread for ns-resize AEN Chaitanya Kulkarni
@ 2020-04-19 23:48 ` Chaitanya Kulkarni
  2020-04-19 23:48 ` [PATCH V3 4/7] nvmet: export resize thread scan interval Chaitanya Kulkarni
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Chaitanya Kulkarni @ 2020-04-19 23:48 UTC (permalink / raw)
  To: linux-nvme; +Cc: sagi, Chaitanya Kulkarni, hch

We implement ns-revalidation feature as a mechanism and not as a policy
and allow user to set the policies. This also allows more flexibility to
user.

This patch adds a new configfs attributes so that user can enable/
disable resize monitor thread as and when needed by allowing user to
decide on the policy.  

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/configfs.c | 34 ++++++++++++++++++++++++++++++++++
 drivers/nvme/target/nvmet.h    | 13 +++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index b0e84027b3bc..f6fbe59fc60c 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -1050,8 +1050,42 @@ static struct configfs_group_operations nvmet_subsystems_group_ops = {
 	.make_group		= nvmet_subsys_make,
 };
 
+static ssize_t nvmet_control_group_control_resize_refresh_show(
+		struct config_item *item, char *page)
+{
+	return snprintf(page, PAGE_SIZE, "%u\n",
+			to_nvmet_control(item)->enable);
+}
+
+static ssize_t nvmet_control_group_control_resize_refresh_store(
+		struct config_item *item, const char *page, size_t count)
+{
+	struct nvmet_control_configfs *control = to_nvmet_control(item);
+	bool enable;
+
+	if (kstrtobool(page, &enable))
+		return -EINVAL;
+
+	mutex_lock(&monitor->control_lock);
+	control->enable = enable;
+	if (control->enable)
+		nvmet_enable_ns_resize_monitor();
+	else
+		nvmet_disable_ns_resize_monitor();
+	mutex_unlock(&monitor->control_lock);
+	return count;
+}
+
+CONFIGFS_ATTR(nvmet_control_group_, control_resize_refresh);
+
+static struct configfs_attribute *nvmet_control_group_attrs[] = {
+	&nvmet_control_group_attr_control_resize_refresh,
+	NULL,
+};
+
 static const struct config_item_type nvmet_subsystems_type = {
 	.ct_group_ops		= &nvmet_subsystems_group_ops,
+	.ct_attrs		= nvmet_control_group_attrs,
 	.ct_owner		= THIS_MODULE,
 };
 
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 7fe6d705cbf1..796acd4226a1 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -62,6 +62,19 @@ struct nvmet_resize_monitor {
 
 extern struct nvmet_resize_monitor *monitor;
 
+struct nvmet_control_configfs {
+	bool                    enable;
+	struct config_group     group;
+};
+
+static inline struct nvmet_control_configfs *to_nvmet_control(
+		struct config_item *item)
+{
+	struct config_group *group = to_config_group(item);
+
+	return container_of(group, struct nvmet_control_configfs, group);
+}
+
 struct nvmet_ns {
 	struct list_head	dev_link;
 	struct percpu_ref	ref;
-- 
2.22.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V3 4/7] nvmet: export resize thread scan interval
  2020-04-19 23:48 [PATCH V3 0/7] nvmet: add target ns revalidate support Chaitanya Kulkarni
                   ` (2 preceding siblings ...)
  2020-04-19 23:48 ` [PATCH V3 3/7] nvmet: export resize thread enable-disable attr Chaitanya Kulkarni
@ 2020-04-19 23:48 ` Chaitanya Kulkarni
  2020-04-19 23:48 ` [PATCH V3 5/7] nvmet: export resize thread sched attributes Chaitanya Kulkarni
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Chaitanya Kulkarni @ 2020-04-19 23:48 UTC (permalink / raw)
  To: linux-nvme; +Cc: sagi, Chaitanya Kulkarni, hch

We implement ns-revalidation feature as a mechanism but as a policy
that allows user to set the policies. This also allows more flexibility
to user.

Currently, we have systems which are running without resize monitor,
that means we should avoid any performance regressions introduced by
this feature. On some systems, the frequency of the resize monitor
execution may result in a potential drop in the performance.

In order to make this feature backward compatible this patch adds new
configfs attributes so that users can set how often resize monitor
should check for the AEN by setting the timeout value in msec as and
when needed, allowing user to decide on the policy.  

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/configfs.c | 26 ++++++++++++++++++++++++++
 drivers/nvme/target/nvmet.h    |  1 +
 2 files changed, 27 insertions(+)

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index f6fbe59fc60c..57c973773180 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -1078,8 +1078,34 @@ static ssize_t nvmet_control_group_control_resize_refresh_store(
 
 CONFIGFS_ATTR(nvmet_control_group_, control_resize_refresh);
 
+static ssize_t nvmet_control_group_control_resize_timeout_show(
+		struct config_item *item, char *page)
+{
+	return snprintf(page, PAGE_SIZE, "%u\n",
+			to_nvmet_control(item)->msec);
+}
+
+static ssize_t nvmet_control_group_control_resize_timeout_store(
+		struct config_item *item, const char *page, size_t count)
+{
+	struct nvmet_control_configfs *control = to_nvmet_control(item);
+	u32 msec;
+
+	if (kstrtou32(page, 10, &msec))
+		return -EINVAL;
+
+	mutex_lock(&monitor->control_lock);
+	control->msec = msec;
+	monitor->msec = msec;
+	mutex_unlock(&monitor->control_lock);
+	return count;
+}
+
+CONFIGFS_ATTR(nvmet_control_group_, control_resize_timeout);
+
 static struct configfs_attribute *nvmet_control_group_attrs[] = {
 	&nvmet_control_group_attr_control_resize_refresh,
+	&nvmet_control_group_attr_control_resize_timeout,
 	NULL,
 };
 
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 796acd4226a1..d14f792f327e 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -65,6 +65,7 @@ extern struct nvmet_resize_monitor *monitor;
 struct nvmet_control_configfs {
 	bool                    enable;
 	struct config_group     group;
+	u32			msec;
 };
 
 static inline struct nvmet_control_configfs *to_nvmet_control(
-- 
2.22.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V3 5/7] nvmet: export resize thread sched attributes
  2020-04-19 23:48 [PATCH V3 0/7] nvmet: add target ns revalidate support Chaitanya Kulkarni
                   ` (3 preceding siblings ...)
  2020-04-19 23:48 ` [PATCH V3 4/7] nvmet: export resize thread scan interval Chaitanya Kulkarni
@ 2020-04-19 23:48 ` Chaitanya Kulkarni
  2020-04-19 23:48 ` [PATCH V3 6/7] nvmet: export ns resize monitor attribute Chaitanya Kulkarni
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Chaitanya Kulkarni @ 2020-04-19 23:48 UTC (permalink / raw)
  To: linux-nvme; +Cc: sagi, Chaitanya Kulkarni, hch

We implement ns-revalidation feature as a mechanism and allow user to
decide policy about the same. This allows more flexibility to user. This
patch adds a new configfs attributes so that user can set the scheduling
parameters such as scheduling policy and priority for monitor thread as
and when needed by allowing user to decide on the policy.  

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/configfs.c | 70 ++++++++++++++++++++++++++++++++++
 drivers/nvme/target/nvmet.h    |  2 +
 2 files changed, 72 insertions(+)

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 57c973773180..be4fa7797a16 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -1050,6 +1050,74 @@ static struct configfs_group_operations nvmet_subsystems_group_ops = {
 	.make_group		= nvmet_subsys_make,
 };
 
+static ssize_t nvmet_control_group_control_resize_sched_priority_show(
+		struct config_item *item, char *page)
+{
+	return snprintf(page, PAGE_SIZE, "%u\n",
+			to_nvmet_control(item)->sched_priority);
+}
+
+static ssize_t nvmet_control_group_control_resize_sched_priority_store(
+		struct config_item *item, const char *page, size_t count)
+{
+	struct nvmet_control_configfs *control = to_nvmet_control(item);
+	u32 sched_priority;
+	int ret = 0;
+
+	if (kstrtou32(page, 10, &sched_priority)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	mutex_lock(&monitor->control_lock);
+	if (control->enable) {
+		mutex_unlock(&monitor->control_lock);
+		ret = -EINVAL;
+		goto out;
+	}
+	control->sched_priority = sched_priority;
+	monitor->sched_priority = sched_priority;
+	mutex_unlock(&monitor->control_lock);
+out:
+	return ret ? ret : count;
+}
+
+CONFIGFS_ATTR(nvmet_control_group_, control_resize_sched_priority);
+
+static ssize_t nvmet_control_group_control_resize_sched_policy_show(
+		struct config_item *item, char *page)
+{
+	return snprintf(page, PAGE_SIZE, "%u\n",
+			to_nvmet_control(item)->sched_policy);
+}
+
+static ssize_t nvmet_control_group_control_resize_sched_policy_store(
+		struct config_item *item, const char *page, size_t count)
+{
+	struct nvmet_control_configfs *control = to_nvmet_control(item);
+	u32 sched_policy;
+	int ret = 0;
+
+	if (kstrtou32(page, 10, &sched_policy)) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	mutex_lock(&monitor->control_lock);
+	if (control->enable) {
+		mutex_unlock(&monitor->control_lock);
+		ret = -EINVAL;
+		goto out;
+	}
+	control->sched_policy = sched_policy;
+	monitor->sched_policy = sched_policy;
+	mutex_unlock(&monitor->control_lock);
+out:
+	return ret ? ret : count;
+}
+
+CONFIGFS_ATTR(nvmet_control_group_, control_resize_sched_policy);
+
 static ssize_t nvmet_control_group_control_resize_refresh_show(
 		struct config_item *item, char *page)
 {
@@ -1104,6 +1172,8 @@ static ssize_t nvmet_control_group_control_resize_timeout_store(
 CONFIGFS_ATTR(nvmet_control_group_, control_resize_timeout);
 
 static struct configfs_attribute *nvmet_control_group_attrs[] = {
+	&nvmet_control_group_attr_control_resize_sched_priority,
+	&nvmet_control_group_attr_control_resize_sched_policy,
 	&nvmet_control_group_attr_control_resize_refresh,
 	&nvmet_control_group_attr_control_resize_timeout,
 	NULL,
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index d14f792f327e..15cb337cd9b8 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -63,6 +63,8 @@ struct nvmet_resize_monitor {
 extern struct nvmet_resize_monitor *monitor;
 
 struct nvmet_control_configfs {
+	u32			sched_priority;
+	u32			sched_policy;
 	bool                    enable;
 	struct config_group     group;
 	u32			msec;
-- 
2.22.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V3 6/7] nvmet: export ns resize monitor attribute
  2020-04-19 23:48 [PATCH V3 0/7] nvmet: add target ns revalidate support Chaitanya Kulkarni
                   ` (4 preceding siblings ...)
  2020-04-19 23:48 ` [PATCH V3 5/7] nvmet: export resize thread sched attributes Chaitanya Kulkarni
@ 2020-04-19 23:48 ` Chaitanya Kulkarni
  2020-04-19 23:48 ` [PATCH V3 7/7] nvmet: add async event tracing support Chaitanya Kulkarni
  2020-04-22  8:19 ` [PATCH V3 0/7] nvmet: add target ns revalidate support Christoph Hellwig
  7 siblings, 0 replies; 16+ messages in thread
From: Chaitanya Kulkarni @ 2020-04-19 23:48 UTC (permalink / raw)
  To: linux-nvme; +Cc: sagi, Chaitanya Kulkarni, hch

This patch adds a new configfs ns attributes so that user can decide to
include or exclude the participation of the namespace in the resize
monitor namespace scan and AEN generation.  

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/configfs.c | 24 ++++++++++++++++++++++++
 drivers/nvme/target/core.c     |  3 ++-
 drivers/nvme/target/nvmet.h    |  1 +
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index be4fa7797a16..6d3d703aa907 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -543,6 +543,29 @@ static ssize_t nvmet_ns_buffered_io_store(struct config_item *item,
 
 CONFIGFS_ATTR(nvmet_ns_, buffered_io);
 
+static ssize_t nvmet_ns_resize_monitor_show(struct config_item *item,
+		char *page)
+{
+	return sprintf(page, "%d\n", to_nvmet_ns(item)->resize_monitor);
+}
+
+static ssize_t nvmet_ns_resize_monitor_store(struct config_item *item,
+		const char *page, size_t count)
+{
+	struct nvmet_ns *ns = to_nvmet_ns(item);
+	bool val;
+
+	if (strtobool(page, &val))
+		return -EINVAL;
+
+	mutex_lock(&ns->subsys->lock);
+	ns->resize_monitor = val;
+	mutex_unlock(&ns->subsys->lock);
+	return count;
+}
+
+CONFIGFS_ATTR(nvmet_ns_, resize_monitor);
+
 static struct configfs_attribute *nvmet_ns_attrs[] = {
 	&nvmet_ns_attr_device_path,
 	&nvmet_ns_attr_device_nguid,
@@ -550,6 +573,7 @@ static struct configfs_attribute *nvmet_ns_attrs[] = {
 	&nvmet_ns_attr_ana_grpid,
 	&nvmet_ns_attr_enable,
 	&nvmet_ns_attr_buffered_io,
+	&nvmet_ns_attr_resize_monitor,
 #ifdef CONFIG_PCI_P2PDMA
 	&nvmet_ns_attr_p2pmem,
 #endif
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index c42d24c1728e..63fbd30f0ec1 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -531,7 +531,7 @@ static void __nvmet_handle_resize_ns(struct nvmet_subsys *s)
 
 	mutex_lock(&s->lock);
 	list_for_each_entry_rcu(ns, &s->namespaces, dev_link, 1) {
-		if (nvmet_ns_revalidate(ns))
+		if (ns->resize_monitor && nvmet_ns_revalidate(ns))
 			nvmet_ns_changed(ns->subsys, ns->nsid);
 	}
 	mutex_unlock(&s->lock);
@@ -714,6 +714,7 @@ struct nvmet_ns *nvmet_ns_alloc(struct nvmet_subsys *subsys, u32 nsid)
 
 	uuid_gen(&ns->uuid);
 	ns->buffered_io = false;
+	ns->resize_monitor = true;
 
 	return ns;
 }
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 15cb337cd9b8..55a51c9e9316 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -84,6 +84,7 @@ struct nvmet_ns {
 	struct block_device	*bdev;
 	struct file		*file;
 	bool			readonly;
+	bool			resize_monitor;
 	u32			nsid;
 	u32			blksize_shift;
 	loff_t			size;
-- 
2.22.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V3 7/7] nvmet: add async event tracing support
  2020-04-19 23:48 [PATCH V3 0/7] nvmet: add target ns revalidate support Chaitanya Kulkarni
                   ` (5 preceding siblings ...)
  2020-04-19 23:48 ` [PATCH V3 6/7] nvmet: export ns resize monitor attribute Chaitanya Kulkarni
@ 2020-04-19 23:48 ` Chaitanya Kulkarni
  2020-04-22  8:19 ` [PATCH V3 0/7] nvmet: add target ns revalidate support Christoph Hellwig
  7 siblings, 0 replies; 16+ messages in thread
From: Chaitanya Kulkarni @ 2020-04-19 23:48 UTC (permalink / raw)
  To: linux-nvme; +Cc: sagi, Chaitanya Kulkarni, hch

This adds a new tracepoint for the target to trace async event. This is
helpful in debugging and comparing host and target side async events. 

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 drivers/nvme/target/core.c  |  2 ++
 drivers/nvme/target/trace.h | 28 ++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 63fbd30f0ec1..749c2d1ec8ec 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -154,6 +154,8 @@ static void nvmet_async_events_process(struct nvmet_ctrl *ctrl, u16 status)
 		kfree(aen);
 
 		mutex_unlock(&ctrl->lock);
+		trace_nvmet_async_event(ctrl,
+				(req->cqe->result.u32 & 0xff00) >> 8);
 		nvmet_req_complete(req, status);
 	}
 }
diff --git a/drivers/nvme/target/trace.h b/drivers/nvme/target/trace.h
index e645caa882dd..d0267226c9a0 100644
--- a/drivers/nvme/target/trace.h
+++ b/drivers/nvme/target/trace.h
@@ -130,6 +130,34 @@ TRACE_EVENT(nvmet_req_complete,
 
 );
 
+#define aer_name(aer) { aer, #aer }
+
+TRACE_EVENT(nvmet_async_event,
+	TP_PROTO(struct nvmet_ctrl *ctrl, u32 result),
+	TP_ARGS(ctrl, result),
+	TP_STRUCT__entry(
+		__field(int, ctrl_id)
+		__field(u32, result)
+	),
+	TP_fast_assign(
+		__entry->ctrl_id = ctrl->cntlid;
+		__entry->result = result;
+	),
+	TP_printk("nvmet%d: NVME_AEN=%#08x [%s]",
+		__entry->ctrl_id, __entry->result,
+		__print_symbolic(__entry->result,
+		aer_name(NVME_AER_NOTICE_NS_CHANGED),
+		aer_name(NVME_AER_NOTICE_ANA),
+		aer_name(NVME_AER_NOTICE_FW_ACT_STARTING),
+		aer_name(NVME_AER_NOTICE_DISC_CHANGED),
+		aer_name(NVME_AER_ERROR),
+		aer_name(NVME_AER_SMART),
+		aer_name(NVME_AER_CSS),
+		aer_name(NVME_AER_VS))
+	)
+);
+#undef aer_name
+
 #endif /* _TRACE_NVMET_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
2.22.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH V3 0/7] nvmet: add target ns revalidate support
  2020-04-19 23:48 [PATCH V3 0/7] nvmet: add target ns revalidate support Chaitanya Kulkarni
                   ` (6 preceding siblings ...)
  2020-04-19 23:48 ` [PATCH V3 7/7] nvmet: add async event tracing support Chaitanya Kulkarni
@ 2020-04-22  8:19 ` Christoph Hellwig
  2020-04-23  6:03   ` Chaitanya Kulkarni
  7 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2020-04-22  8:19 UTC (permalink / raw)
  To: Chaitanya Kulkarni; +Cc: sagi, linux-nvme, hch

Patch one looks obviously sane to me, but the whole polling thread
thing just seems like a lot of overhead, and I don't really see
the point.  In the end the revalidate only really matters when
either the host queries the information.

So I'm tempted to just apply patch 1 for now and skip the rest.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3 0/7] nvmet: add target ns revalidate support
  2020-04-22  8:19 ` [PATCH V3 0/7] nvmet: add target ns revalidate support Christoph Hellwig
@ 2020-04-23  6:03   ` Chaitanya Kulkarni
  2020-04-23  8:20     ` Sagi Grimberg
  0 siblings, 1 reply; 16+ messages in thread
From: Chaitanya Kulkarni @ 2020-04-23  6:03 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: sagi, linux-nvme

On 04/22/2020 01:19 AM, Christoph Hellwig wrote:
> Patch one looks obviously sane to me, but the whole polling thread
> thing just seems like a lot of overhead, and I don't really see
> the point.  In the end the revalidate only really matters when
> either the host queries the information.
>

A. Regarding host querying information :-

That means host needs to query id-ns periodically to have correct
size for the block device, until then it will have wrong size
(unless it calls ns-scan) which will break the host software assumption
about block device mapped on the namespace. See [1].

Isn't calling id-ns for size validation also defeats the purpose of
whole AEN mechanism that we have implemented for host/target and
present in the spec ?

B. Regarding Overhead :-

Can you please explain what exactly the overhead is ?

With my understanding :-

As mentioned in the cover letter this design provides
a mechanism (since it is a driver feature) and not the policy,
so it doesn't force any particular behavior onto user, i.e.
users can adjust tunables provided for this feature and use
feature the way they want it or even disable it and rely on host
based periodic id-ns-size-change-rescan sequence.

Overhead in terms of design :-
We can change the current polling so that instead of scanning all the 
susbys/ns then sleep, we scan in batch (N namespaces) for each poll.

Overhead is in terms of code then we can :-
1. Move core.c maintenance thread related code into separate file,
    (name suggestions are welcome).
2. Add Kconfig options to enable/disable this feature for target.

In case of performance :-
1. User should apply policy based on their requirement through
    tuneables since different users can have different
    machine/platforms/environment.

> So I'm tempted to just apply patch 1 for now and skip the rest.
>

[1] Inconsistencies with lsblk and id-ns execution with only
     1st patch/no AER:-

# nvme id-ns /dev/nvme1n1 | grep -e nsze -e ncap -e nuse
nsze    : 0x280000 <---
ncap    : 0x280000
nuse    : 0x280000
# lsblk | grep nvme1n1
lsblk: nvme1c1n1: unknown device name
nvme1n1           259:3    0   10G  0 disk
# truncate -s 5G /mnt/backend/nvme1n1
# dmesg  -c
# nvme id-ns /dev/nvme1n1 | grep -e nsze -e ncap -e nuse
nsze    : 0x140000 <---
ncap    : 0x140000
nuse    : 0x140000
# lsblk | grep nvme1n1
nvme1n1           259:3    0   10G  0 disk <----
# nvme ns-rescan /dev/nvme1
# nvme id-ns /dev/nvme1n1 | grep -e nsze -e ncap -e nuse
nsze    : 0x140000 <----
ncap    : 0x140000
nuse    : 0x140000
# lsblk | grep nvme1n1
nvme1n1           259:3    0    5G  0 disk <----

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3 0/7] nvmet: add target ns revalidate support
  2020-04-23  6:03   ` Chaitanya Kulkarni
@ 2020-04-23  8:20     ` Sagi Grimberg
  2020-04-24  7:05       ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2020-04-23  8:20 UTC (permalink / raw)
  To: Chaitanya Kulkarni, Christoph Hellwig; +Cc: linux-nvme


> A. Regarding host querying information :-
> 
> That means host needs to query id-ns periodically to have correct
> size for the block device, until then it will have wrong size
> (unless it calls ns-scan) which will break the host software assumption
> about block device mapped on the namespace. See [1].
> 
> Isn't calling id-ns for size validation also defeats the purpose of
> whole AEN mechanism that we have implemented for host/target and
> present in the spec ?
> 
> B. Regarding Overhead :-
> 
> Can you please explain what exactly the overhead is ?
> 
> With my understanding :-
> 
> As mentioned in the cover letter this design provides
> a mechanism (since it is a driver feature) and not the policy,
> so it doesn't force any particular behavior onto user, i.e.
> users can adjust tunables provided for this feature and use
> feature the way they want it or even disable it and rely on host
> based periodic id-ns-size-change-rescan sequence.
> 
> Overhead in terms of design :-
> We can change the current polling so that instead of scanning all the
> susbys/ns then sleep, we scan in batch (N namespaces) for each poll.
> 
> Overhead is in terms of code then we can :-
> 1. Move core.c maintenance thread related code into separate file,
>      (name suggestions are welcome).
> 2. Add Kconfig options to enable/disable this feature for target.
> 
> In case of performance :-
> 1. User should apply policy based on their requirement through
>      tuneables since different users can have different
>      machine/platforms/environment.

This is cumbersome in my mind... and the polling part is
kinda bothering me...

I still think that having this sit in userspace is so much more
elegant really.

A simple service that watches with inotify on the device_paths (files or
bdevs - which are also files) and trigger revalidate via configfs when
it gets an attrib event.

We can even have it watch configfs and automatically add watchers
when new namespaces are enabled and remove watchers when namespaces are
disabled, so it can be completely zero touch.

This can sit as a simple systemd service that nvmetcli installs.

I'd very much prefer this over the proposed approach...

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3 0/7] nvmet: add target ns revalidate support
  2020-04-23  8:20     ` Sagi Grimberg
@ 2020-04-24  7:05       ` Christoph Hellwig
  2020-04-24  8:34         ` Chaitanya Kulkarni
  2020-04-24 19:08         ` Sagi Grimberg
  0 siblings, 2 replies; 16+ messages in thread
From: Christoph Hellwig @ 2020-04-24  7:05 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-nvme, Christoph Hellwig, Chaitanya Kulkarni

On Thu, Apr 23, 2020 at 01:20:25AM -0700, Sagi Grimberg wrote:
> This is cumbersome in my mind... and the polling part is
> kinda bothering me...
>
> I still think that having this sit in userspace is so much more
> elegant really.
>
> A simple service that watches with inotify on the device_paths (files or
> bdevs - which are also files) and trigger revalidate via configfs when
> it gets an attrib event.

Eactly - plus udev watch for the block device KOBJ_CHANGE notifications.
>
> We can even have it watch configfs and automatically add watchers
> when new namespaces are enabled and remove watchers when namespaces are
> disabled, so it can be completely zero touch.
>
> This can sit as a simple systemd service that nvmetcli installs.
>
> I'd very much prefer this over the proposed approach...

Same here.  The idea of having a kernel thread poll things for which
we have notification, and an easy userspace way to handle them just
seems like a whole lot of bloat.

Also remember the use case:  shrinking a volume is a pretty destructive
operation and not really practically relevant for a live volume.  So
the interesting case is growing, and having a little delay or even
a manual interaction isn't really the end of the world there.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3 0/7] nvmet: add target ns revalidate support
  2020-04-24  7:05       ` Christoph Hellwig
@ 2020-04-24  8:34         ` Chaitanya Kulkarni
  2020-04-24 19:08         ` Sagi Grimberg
  1 sibling, 0 replies; 16+ messages in thread
From: Chaitanya Kulkarni @ 2020-04-24  8:34 UTC (permalink / raw)
  To: Christoph Hellwig, Sagi Grimberg; +Cc: linux-nvme

On 4/24/20 12:05 AM, Christoph Hellwig wrote:
> On Thu, Apr 23, 2020 at 01:20:25AM -0700, Sagi Grimberg wrote:
>> This is cumbersome in my mind... and the polling part is
>> kinda bothering me...
>>
>> I still think that having this sit in userspace is so much more
>> elegant really.
>>
>> A simple service that watches with inotify on the device_paths (files or
>> bdevs - which are also files) and trigger revalidate via configfs when
>> it gets an attrib event.
> Eactly - plus udev watch for the block device KOBJ_CHANGE notifications.
>> We can even have it watch configfs and automatically add watchers
>> when new namespaces are enabled and remove watchers when namespaces are
>> disabled, so it can be completely zero touch.
>>
>> This can sit as a simple systemd service that nvmetcli installs.
>>
>> I'd very much prefer this over the proposed approach...
> Same here.  The idea of having a kernel thread poll things for which
> we have notification, and an easy userspace way to handle them just
> seems like a whole lot of bloat.
>

I see, I wanted to avoid any userspace dependency if we can. It is 
better to have it in userspace so we will not bloat the code base.

> Also remember the use case:  shrinking a volume is a pretty destructive
> operation and not really practically relevant for a live volume.  So
> the interesting case is growing, and having a little delay or even
> a manual interaction isn't really the end of the world there.
>
Will send out V4 with configfs per-ns attribute for size check and
AEN generation.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3 0/7] nvmet: add target ns revalidate support
  2020-04-24  7:05       ` Christoph Hellwig
  2020-04-24  8:34         ` Chaitanya Kulkarni
@ 2020-04-24 19:08         ` Sagi Grimberg
  2020-04-24 21:02           ` Sagi Grimberg
  1 sibling, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2020-04-24 19:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Chaitanya Kulkarni


>> This is cumbersome in my mind... and the polling part is
>> kinda bothering me...
>>
>> I still think that having this sit in userspace is so much more
>> elegant really.
>>
>> A simple service that watches with inotify on the device_paths (files or
>> bdevs - which are also files) and trigger revalidate via configfs when
>> it gets an attrib event.
> 
> Eactly - plus udev watch for the block device KOBJ_CHANGE notifications.

udev isn't needed actually, blockdevs work just fine with inotify.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3 0/7] nvmet: add target ns revalidate support
  2020-04-24 19:08         ` Sagi Grimberg
@ 2020-04-24 21:02           ` Sagi Grimberg
  0 siblings, 0 replies; 16+ messages in thread
From: Sagi Grimberg @ 2020-04-24 21:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, Chaitanya Kulkarni


>>> This is cumbersome in my mind... and the polling part is
>>> kinda bothering me...
>>>
>>> I still think that having this sit in userspace is so much more
>>> elegant really.
>>>
>>> A simple service that watches with inotify on the device_paths (files or
>>> bdevs - which are also files) and trigger revalidate via configfs when
>>> it gets an attrib event.
>>
>> Eactly - plus udev watch for the block device KOBJ_CHANGE notifications.
> 
> udev isn't needed actually, blockdevs work just fine with inotify.

Actually you're right, we'll probably need uev watch as fsnotify won't
catch it...

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3 2/7] nvmet: add global thread for ns-resize AEN
  2020-04-19 23:48 ` [PATCH V3 2/7] nvmet: add global thread for ns-resize AEN Chaitanya Kulkarni
@ 2020-04-27 14:19   ` Hannes Reinecke
  0 siblings, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2020-04-27 14:19 UTC (permalink / raw)
  To: Chaitanya Kulkarni, linux-nvme; +Cc: sagi, hch

On 4/20/20 1:48 AM, Chaitanya Kulkarni wrote:
> The change of size detection on the target should generate an AEN to
> the host. Right now there is no generic mechanism that allows us to add
> callbacks for the block and file backend so that we will get the
> notification for change of the size for block device and file backend.
> This patch adds global maintenance thread that checks for the size
> change and generates AEN when needed. This also adds a required lock
> needed to protect the ns->size variable which is updated under
> nvmet_ns_enable().
> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>   drivers/nvme/target/admin-cmd.c   |   7 +-
>   drivers/nvme/target/configfs.c    |   2 +-
>   drivers/nvme/target/core.c        | 129 ++++++++++++++++++++++++++++++
>   drivers/nvme/target/io-cmd-bdev.c |   9 ++-
>   drivers/nvme/target/io-cmd-file.c |  15 ++--
>   drivers/nvme/target/nvmet.h       |  22 ++++-
>   6 files changed, 170 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
> index 4c79aa804887..a3bc2987c72a 100644
> --- a/drivers/nvme/target/admin-cmd.c
> +++ b/drivers/nvme/target/admin-cmd.c
> @@ -486,10 +486,9 @@ static void nvmet_execute_identify_ns(struct nvmet_req *req)
>   	if (!ns)
>   		goto done;
>   
> -	if (ns->bdev)
> -		nvmet_bdev_ns_revalidate(ns);
> -	else
> -		nvmet_file_ns_revalidate(ns);
> +	mutex_lock(&ns->subsys->lock);
> +	nvmet_ns_revalidate(ns);
> +	mutex_unlock(&ns->subsys->lock);
>   
>   	/*
>   	 * nuse = ncap = nsze isn't always true, but we have no way to find
> diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
> index 58cabd7b6fc5..b0e84027b3bc 100644
> --- a/drivers/nvme/target/configfs.c
> +++ b/drivers/nvme/target/configfs.c
> @@ -1469,7 +1469,7 @@ int __init nvmet_init_configfs(void)
>   	return 0;
>   }
>   
> -void __exit nvmet_exit_configfs(void)
> +void nvmet_exit_configfs(void)
>   {
>   	configfs_unregister_subsystem(&nvmet_configfs_subsystem);
>   }
> diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
> index b685f99d56a1..c42d24c1728e 100644
> --- a/drivers/nvme/target/core.c
> +++ b/drivers/nvme/target/core.c
> @@ -9,12 +9,15 @@
>   #include <linux/rculist.h>
>   #include <linux/pci-p2pdma.h>
>   #include <linux/scatterlist.h>
> +#include <linux/delay.h>
> +#include <uapi/linux/sched/types.h>
>   
>   #define CREATE_TRACE_POINTS
>   #include "trace.h"
>   
>   #include "nvmet.h"
>   
> +struct nvmet_resize_monitor *monitor;
>   struct workqueue_struct *buffered_io_wq;
>   static const struct nvmet_fabrics_ops *nvmet_transports[NVMF_TRTYPE_MAX];
>   static DEFINE_IDA(cntlid_ida);
> @@ -514,6 +517,59 @@ static void nvmet_p2pmem_ns_add_p2p(struct nvmet_ctrl *ctrl,
>   		ns->nsid);
>   }
>   
> +bool nvmet_ns_revalidate(struct nvmet_ns *ns)
> +{
> +	if (ns->bdev)
> +		return nvmet_bdev_ns_revalidate(ns);
> +
> +	return nvmet_file_ns_revalidate(ns);
> +}
> +
> +static void __nvmet_handle_resize_ns(struct nvmet_subsys *s)
> +{
> +	struct nvmet_ns *ns;
> +
> +	mutex_lock(&s->lock);
> +	list_for_each_entry_rcu(ns, &s->namespaces, dev_link, 1) {
> +		if (nvmet_ns_revalidate(ns))
> +			nvmet_ns_changed(ns->subsys, ns->nsid);
> +	}
> +	mutex_unlock(&s->lock);
> +}
> +
> +static int nvmet_ns_resize_monitor(void *data)
> +{
> +	struct sched_param p = { .sched_priority = monitor->sched_priority };
> +	struct nvmet_subsys_link *link;
> +	struct nvmet_port *port;
> +	u32 msec;
> +
> +	complete(&monitor->wait);
> +	sched_setscheduler(current, monitor->sched_policy, &p);
> +
> +	while (!kthread_should_park()) {
> +		down_read(&nvmet_config_sem);
> +		list_for_each_entry(port, nvmet_ports, global_entry)
> +			list_for_each_entry(link, &port->subsystems, entry)
> +				__nvmet_handle_resize_ns(link->subsys);
> +		up_read(&nvmet_config_sem);
> +		schedule();
> +		/* XXX: use better sleep wakeup mechanism */
> +
> +		/*
> +		 * This allows user to change the sleep timtout without
> +		 * stopping monitor thread.
> +		 */
> +		mutex_lock(&monitor->control_lock);
> +		msec = monitor->msec;
> +		mutex_unlock(&monitor->control_lock);
> +		msleep(msec);
> +	}
> +
> +	kthread_parkme();
> +	return 0;
> +}
> +
>   int nvmet_ns_enable(struct nvmet_ns *ns)
>   {
>   	struct nvmet_subsys *subsys = ns->subsys;
> @@ -1480,6 +1536,71 @@ void nvmet_subsys_put(struct nvmet_subsys *subsys)
>   	kref_put(&subsys->ref, nvmet_subsys_free);
>   }
>   
> +bool nvmet_enable_ns_resize_monitor(void)
> +{
> +	lockdep_assert_held(&monitor->control_lock);
> +
> +	if (monitor->thread)
> +		goto out;
> +
> +	monitor->thread = kthread_create(nvmet_ns_resize_monitor, NULL,
> +				       "nvmet_ns_resize");
> +	if (monitor->thread) {
> +		wake_up_process(monitor->thread);
> +		wait_for_completion(&monitor->wait);
> +		pr_debug("ns monitor thread started successfully\n");
> +	}
> +
> +out:
> +	return monitor->thread ? true : false;
> +}
> +
> +bool nvmet_disable_ns_resize_monitor(void)
> +{
> +	bool ret = false;
> +
> +	lockdep_assert_held(&monitor->control_lock);
> +
> +	if (monitor->thread) {
> +		kthread_park(monitor->thread);
> +		kthread_stop(monitor->thread);
> +		monitor->thread = NULL;
> +		ret = true;
> +	}
> +
> +	kfree(monitor->thread);
> +	return ret;
> +}
> +
> +int nvmet_init_ns_resize_monitor(void)
> +{
> +	int ret;
> +
> +	monitor = kzalloc(sizeof(*monitor), GFP_KERNEL);
> +	if (!monitor)
> +		return -ENOMEM;
> +
> +	monitor->msec = NVMET_RESIZE_MON_MSEC;
> +	monitor->sched_policy = SCHED_IDLE;
> +	monitor->sched_priority = 0;
> +	mutex_init(&monitor->control_lock);
> +	init_completion(&monitor->wait);
> +
> +	mutex_lock(&monitor->control_lock);
> +	ret = nvmet_enable_ns_resize_monitor() ? 0 : -ENOMEM;
> +	mutex_unlock(&monitor->control_lock);
> +
> +	return ret;
> +}
> +
> +void nvmet_exit_ns_resize_monitor(void)
> +{
> +	mutex_lock(&monitor->control_lock);
> +	nvmet_disable_ns_resize_monitor();
> +	mutex_unlock(&monitor->control_lock);
> +	kfree(monitor);
> +}
> +
>   static int __init nvmet_init(void)
>   {
>   	int error;
> @@ -1500,8 +1621,15 @@ static int __init nvmet_init(void)
>   	error = nvmet_init_configfs();
>   	if (error)
>   		goto out_exit_discovery;
> +
> +	error = nvmet_init_ns_resize_monitor();
> +	if (error)
> +		goto out_exit_configfs;
> +
>   	return 0;
>   
> +out_exit_configfs:
> +	nvmet_exit_configfs();
>   out_exit_discovery:
>   	nvmet_exit_discovery();
>   out_free_work_queue:
> @@ -1512,6 +1640,7 @@ static int __init nvmet_init(void)
>   
>   static void __exit nvmet_exit(void)
>   {
> +	nvmet_exit_ns_resize_monitor();
>   	nvmet_exit_configfs();
>   	nvmet_exit_discovery();
>   	ida_destroy(&cntlid_ida);
> diff --git a/drivers/nvme/target/io-cmd-bdev.c b/drivers/nvme/target/io-cmd-bdev.c
> index 0427e040e3dd..3cca08e9ad90 100644
> --- a/drivers/nvme/target/io-cmd-bdev.c
> +++ b/drivers/nvme/target/io-cmd-bdev.c
> @@ -75,9 +75,16 @@ void nvmet_bdev_ns_disable(struct nvmet_ns *ns)
>   	}
>   }
>   
> -void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns)
> +bool nvmet_bdev_ns_revalidate(struct nvmet_ns *ns)
>   {
> +	bool change;
> +
> +	lockdep_assert_held(&ns->subsys->lock);
> +
> +	change = ns->size != i_size_read(ns->bdev->bd_inode) ? true : false;
>   	ns->size = i_size_read(ns->bdev->bd_inode);
> +
> +	return change;
>   }
>   
>   static u16 blk_to_nvme_status(struct nvmet_req *req, blk_status_t blk_sts)
> diff --git a/drivers/nvme/target/io-cmd-file.c b/drivers/nvme/target/io-cmd-file.c
> index 14364383d2fe..a2d82e55858b 100644
> --- a/drivers/nvme/target/io-cmd-file.c
> +++ b/drivers/nvme/target/io-cmd-file.c
> @@ -80,15 +80,20 @@ int nvmet_file_ns_enable(struct nvmet_ns *ns)
>   	return ret;
>   }
>   
> -void nvmet_file_ns_revalidate(struct nvmet_ns *ns)
> +bool nvmet_file_ns_revalidate(struct nvmet_ns *ns)
>   {
> +	struct path *f_path = &ns->file->f_path;
> +	bool change = false;
>   	struct kstat stat;
>   
> -	if (vfs_getattr(&ns->file->f_path, &stat, STATX_SIZE,
> -			AT_STATX_FORCE_SYNC))
> -		return;
> +	lockdep_assert_held(&ns->subsys->lock);
>   
> -	ns->size = stat.size;
> +	if (vfs_getattr(f_path, &stat, STATX_SIZE, AT_STATX_FORCE_SYNC) == 0) {
> +		change = ns->size != stat.size ? true : false;
> +		ns->size = stat.size;
> +	}
> +
> +	return change;
>   }
>   
>   static void nvmet_file_init_bvec(struct bio_vec *bv, struct scatterlist *sg)
> diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
> index 8b479d932a7b..7fe6d705cbf1 100644
> --- a/drivers/nvme/target/nvmet.h
> +++ b/drivers/nvme/target/nvmet.h
> @@ -23,6 +23,7 @@
>   #define NVMET_ASYNC_EVENTS		4
>   #define NVMET_ERROR_LOG_SLOTS		128
>   #define NVMET_NO_ERROR_LOC		((u16)-1)
> +#define NVMET_RESIZE_MON_MSEC		500
>   #define NVMET_DEFAULT_CTRL_MODEL	"Linux"
>   
>   /*
> @@ -50,6 +51,17 @@
>   #define IPO_IATTR_CONNECT_SQE(x)	\
>   	(cpu_to_le32(offsetof(struct nvmf_connect_command, x)))
>   
> +struct nvmet_resize_monitor {
> +	u32			sched_priority;
> +	u32			sched_policy;
> +	struct mutex		control_lock;
> +	struct task_struct	*thread;
> +	struct completion	wait;
> +	u32			msec;
> +};
> +
> +extern struct nvmet_resize_monitor *monitor;
> +
>   struct nvmet_ns {
>   	struct list_head	dev_link;
>   	struct percpu_ref	ref;
> @@ -477,7 +489,7 @@ void nvmet_add_async_event(struct nvmet_ctrl *ctrl, u8 event_type,
>   #define NVMET_DISC_KATO_MS		120000
>   
>   int __init nvmet_init_configfs(void);
> -void __exit nvmet_exit_configfs(void);
> +void nvmet_exit_configfs(void);
>   
>   int __init nvmet_init_discovery(void);
>   void nvmet_exit_discovery(void);
> @@ -498,8 +510,12 @@ void nvmet_file_ns_disable(struct nvmet_ns *ns);
>   u16 nvmet_bdev_flush(struct nvmet_req *req);
>   u16 nvmet_file_flush(struct nvmet_req *req);
>   void nvmet_ns_changed(struct nvmet_subsys *subsys, u32 nsid);
> -void nvmet_bdev_ns_revalidate(struct nvmet_ns *ns);
> -void nvmet_file_ns_revalidate(struct nvmet_ns *ns);
> +bool nvmet_bdev_ns_revalidate(struct nvmet_ns *ns);
> +bool nvmet_file_ns_revalidate(struct nvmet_ns *ns);
> +bool nvmet_ns_revalidate(struct nvmet_ns *ns);
> +
> +bool nvmet_enable_ns_resize_monitor(void);
> +bool nvmet_disable_ns_resize_monitor(void);
>   
>   static inline u32 nvmet_rw_len(struct nvmet_req *req)
>   {
> 
I'm not a big fan of having single-purpose kthreads; this always smells 
like a waste of resources.
Any particular reason why you didn't use workqueues here?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke            Teamlead Storage & Networking
hare@suse.de                               +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-04-27 14:19 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-19 23:48 [PATCH V3 0/7] nvmet: add target ns revalidate support Chaitanya Kulkarni
2020-04-19 23:48 ` [PATCH V3 1/7] nvmet: add ns revalidation support Chaitanya Kulkarni
2020-04-19 23:48 ` [PATCH V3 2/7] nvmet: add global thread for ns-resize AEN Chaitanya Kulkarni
2020-04-27 14:19   ` Hannes Reinecke
2020-04-19 23:48 ` [PATCH V3 3/7] nvmet: export resize thread enable-disable attr Chaitanya Kulkarni
2020-04-19 23:48 ` [PATCH V3 4/7] nvmet: export resize thread scan interval Chaitanya Kulkarni
2020-04-19 23:48 ` [PATCH V3 5/7] nvmet: export resize thread sched attributes Chaitanya Kulkarni
2020-04-19 23:48 ` [PATCH V3 6/7] nvmet: export ns resize monitor attribute Chaitanya Kulkarni
2020-04-19 23:48 ` [PATCH V3 7/7] nvmet: add async event tracing support Chaitanya Kulkarni
2020-04-22  8:19 ` [PATCH V3 0/7] nvmet: add target ns revalidate support Christoph Hellwig
2020-04-23  6:03   ` Chaitanya Kulkarni
2020-04-23  8:20     ` Sagi Grimberg
2020-04-24  7:05       ` Christoph Hellwig
2020-04-24  8:34         ` Chaitanya Kulkarni
2020-04-24 19:08         ` Sagi Grimberg
2020-04-24 21:02           ` Sagi Grimberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).