All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] block: Add support for REQ_OP_ASSIGN_RANGE
@ 2020-03-29 17:47 Chaitanya Kulkarni
       [not found] ` <(Chaitanya>
                   ` (6 more replies)
  0 siblings, 7 replies; 122+ messages in thread
From: Chaitanya Kulkarni @ 2020-03-29 17:47 UTC (permalink / raw)
  To: hch, martin.petersen
  Cc: darrick.wong, axboe, tytso, adilger.kernel, ming.lei, jthumshirn,
	minwoo.im.dev, chaitanya.kulkarni, damien.lemoal, andrea.parri,
	hare, tj, hannes, khlebnikov, ajay.joshi, bvanassche, arnd,
	houtao1, asml.silence, linux-block, linux-ext4

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 12778 bytes --]

Hi,

This patch-series is based on the original RFC patch series:-
https://www.spinics.net/lists/linux-block/msg47933.html.

I've designed a rough testcase based on the information present
in the mailing list archive for original RFC, it may need
some corrections from the author.

If anyone is interested, test results are at the end of this patch.

Following is the original cover-letter :-

Information about continuous extent placement may be useful
for some block devices. Say, distributed network filesystems,
which provide block device interface, may use this information
for better blocks placement over the nodes in their cluster,
and for better performance. Block devices, which map a file
on another filesystem (loop), may request the same length extent
on underlining filesystem for less fragmentation and for batching
allocation requests. Also, hypervisors like QEMU may use this
information for optimization of cluster allocations.

This patchset introduces REQ_OP_ASSIGN_RANGE, which is going
to be used for forwarding user's fallocate(0) requests into
block device internals. It rather similar to existing
REQ_OP_DISCARD, REQ_OP_WRITE_ZEROES, etc. The corresponding
exported primitive is called blkdev_issue_assign_range().
See [1/3] for the details.

Patch [2/3] teaches loop driver to handle REQ_OP_ASSIGN_RANGE
requests by calling fallocate(0).

Patch [3/3] makes ext4 to notify a block device about fallocate(0).

Here is a simple test I did:
https://gist.github.com/tkhai/5b788651cdb74c1dbff3500745878856

I attached a file on ext4 to loop. Then, created ext4 partition
on loop device and started the test in the partition. Direct-io
is enabled on loop.

The test fallocates 4G file and writes from some offset with
given step, then it chooses another offset and repeats. After
the test all the blocks in the file become written.

The results shows that batching extents-assigning requests improves
the performance:

Before patchset: real ~ 1min 27sec
After patchset:  real ~ 1min 16sec (18% better)

Ordinary fallocate() before writes improves the performance
by batching the requests. These results just show, the same
is in case of forwarding extents information to underlining
filesystem.

Regards,
Chaitanya

Changes from RFC:-

1. Add missing plumbing for REQ_OP_ASSIGN_RANGE similar to write-zeores.
2. Add a prep patch to create a helper to submit payloadless bios.
3. Design a testcases around the description present in the
   cover-letter.

Chaitanya Kulkarni (1):
  block: create payloadless issue bio helper

Kirill Tkhai (3):
  block: Add support for REQ_OP_ASSIGN_RANGE
  loop: Forward REQ_OP_ASSIGN_RANGE into fallocate(0)
  ext4: Notify block device about alloc-assigned blk

 block/blk-core.c          |   5 ++
 block/blk-lib.c           | 115 +++++++++++++++++++++++++++++++-------
 block/blk-merge.c         |  21 +++++++
 block/blk-settings.c      |  19 +++++++
 block/blk-zoned.c         |   1 +
 block/bounce.c            |   1 +
 drivers/block/loop.c      |   5 ++
 fs/ext4/ext4.h            |   2 +
 fs/ext4/extents.c         |  12 +++-
 include/linux/bio.h       |   9 ++-
 include/linux/blk_types.h |   2 +
 include/linux/blkdev.h    |  34 +++++++++++
 12 files changed, 201 insertions(+), 25 deletions(-)

1. Setup :-
-----------
# git log --oneline -5 
c64a4c781915 (HEAD -> req-op-assign-range) ext4: Notify block device about alloc-assigned blk
000cbc6720a4 loop: Forward REQ_OP_ASSIGN_RANGE into fallocate(0)
89ceed8cac80 block: Add support for REQ_OP_ASSIGN_RANGE
a798743e87e7 block: create payloadless issue bio helper
b53df2e7442c (tag: block-5.6-2020-03-13) block: Fix partition support for host aware zoned block devices

# cat /proc/kallsyms | grep -i blkdev_issue_assign_range
ffffffffa3264a80 T blkdev_issue_assign_range
ffffffffa4027184 r __ksymtab_blkdev_issue_assign_range
ffffffffa40524be r __kstrtabns_blkdev_issue_assign_range
ffffffffa405a8eb r __kstrtab_blkdev_issue_assign_range

2. Test program, will be moved to blktest once code is upstream :-
-----------------
#define _GNU_SOURCE
#include <sys/types.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>

#define BLOCK_SIZE 4096
#define STEP (BLOCK_SIZE * 16)
#define SIZE (1024 * 1024 * 1024ULL)

int main(int argc, char *argv[])
{
	int fd, step, ret = 0;
	unsigned long i;
	void *buf;

	if (posix_memalign(&buf, BLOCK_SIZE, SIZE)) {
		perror("alloc");
		exit(1);
	}

	fd = open("/mnt/loop0/file.img", O_RDWR | O_CREAT | O_DIRECT);
	if (fd < 0) {
		perror("open");
		exit(1);
	}

	if (ftruncate(fd, SIZE)) {
		perror("ftruncate");
		exit(1);
	}

	ret = fallocate(fd, 0, 0, SIZE);
	if (ret) {
		perror("fallocate");
		exit(1);
	}
	
	for (step = STEP - BLOCK_SIZE; step >= 0; step -= BLOCK_SIZE) {
		printf("step=%u\n", step);
		for (i = step; i < SIZE; i += STEP) {
			errno = 0;
			if (pwrite(fd, buf, BLOCK_SIZE, i) != BLOCK_SIZE) {
				perror("pwrite");
				exit(1);
			}
		}

		if (fsync(fd)) {
			perror("fsync");
			exit(1);
		}
	}
	return 0;
}

3. Test script, will be moved to blktests once code is upstream :-
------------------------------------------------------------------
# cat req_op_assign_test.sh 
#!/bin/bash -x

NULLB_FILE="/mnt/backend/data"
NULLB_MNT="/mnt/backend"
LOOP_MNT="/mnt/loop0"

delete_loop()
{
	umount ${LOOP_MNT}
	losetup -D
	sleep 3
}

delete_nullb()
{
	umount ${NULLB_MNT}
	echo 1 > config/nullb/nullb0/power
	rmdir config/nullb/nullb0
	sleep 3
}

unload_modules()
{
	rmmod drivers/block/loop.ko
	rmmod fs/ext4/ext4.ko
	rmmod drivers/block/null_blk.ko
	lsmod | grep -e ext4 -e loop -e null_blk
}

unload()
{
	delete_loop
	delete_nullb
	unload_modules
}

load_ext4()
{
	make -j $(nproc) M=fs/ext4 modules
	local src=fs/ext4/
	local dest=/lib/modules/`uname -r`/kernel/fs/ext4
	\cp ${src}/ext4.ko ${dest}/

	modprobe mbcache
	modprobe jbd2
	sleep 1
	insmod fs/ext4/ext4.ko
	sleep 1
}

load_nullb()
{
	local src=drivers/block/
	local dest=/lib/modules/`uname -r`/kernel/drivers/block
	\cp ${src}/null_blk.ko ${dest}/

	modprobe null_blk nr_devices=0
	sleep 1

	mkdir config/nullb/nullb0
	tree config/nullb/nullb0

	echo 1 > config/nullb/nullb0/memory_backed
	echo 512 > config/nullb/nullb0/blocksize 

	# 20 GB
	echo 20480 > config/nullb/nullb0/size 
	echo 1 > config/nullb/nullb0/power
	sleep 2
	IDX=`cat config/nullb/nullb0/index`
	lsblk | grep null${IDX}
	sleep 1

	mkfs.ext4 /dev/nullb0 
	mount /dev/nullb0 ${NULLB_MNT}
	sleep 1
	mount | grep nullb

	# 10 GB
	dd if=/dev/zero of=${NULLB_FILE} count=2621440 bs=4096
}

load_loop()
{
	local src=drivers/block/
	local dest=/lib/modules/`uname -r`/kernel/drivers/block
	\cp ${src}/loop.ko ${dest}/

	insmod drivers/block/loop.ko max_loop=1
	sleep 3
	/root/util-linux/losetup --direct-io=off /dev/loop0 ${NULLB_FILE}
	sleep 3
	/root/util-linux/losetup
	ls -l /dev/loop*
	dmesg -c 
	mkfs.ext4 /dev/loop0
	mount /dev/loop0 ${LOOP_MNT}
	mount | grep loop0
}

load()
{
	make -j $(nproc) M=drivers/block modules

	load_ext4
	load_nullb
	load_loop
	sleep 1
	sync
	sync
	sync
}

unload
load
time ./test

4. Test Results :-
------------------

# ./req_op_assign_test.sh 
+ NULLB_FILE=/mnt/backend/data
+ NULLB_MNT=/mnt/backend
+ LOOP_MNT=/mnt/loop0
+ unload
+ delete_loop
+ umount /mnt/loop0
+ losetup -D
+ sleep 3
+ delete_nullb
+ umount /mnt/backend
+ echo 1
+ rmdir config/nullb/nullb0
+ sleep 3
+ unload_modules
+ rmmod drivers/block/loop.ko
+ rmmod fs/ext4/ext4.ko
+ rmmod drivers/block/null_blk.ko
+ lsmod
+ grep -e ext4 -e loop -e null_blk
+ load
++ nproc
+ make -j 32 M=drivers/block modules
  CC [M]  drivers/block/loop.o
  MODPOST 11 modules
  CC [M]  drivers/block/loop.mod.o
  LD [M]  drivers/block/loop.ko
+ load_ext4
++ nproc
+ make -j 32 M=fs/ext4 modules
  CC [M]  fs/ext4/balloc.o
  CC [M]  fs/ext4/bitmap.o
  CC [M]  fs/ext4/block_validity.o
  CC [M]  fs/ext4/dir.o
  CC [M]  fs/ext4/ext4_jbd2.o
  CC [M]  fs/ext4/extents.o
  CC [M]  fs/ext4/extents_status.o
  CC [M]  fs/ext4/file.o
  CC [M]  fs/ext4/fsmap.o
  CC [M]  fs/ext4/fsync.o
  CC [M]  fs/ext4/hash.o
  CC [M]  fs/ext4/ialloc.o
  CC [M]  fs/ext4/indirect.o
  CC [M]  fs/ext4/inline.o
  CC [M]  fs/ext4/inode.o
  CC [M]  fs/ext4/ioctl.o
  CC [M]  fs/ext4/mballoc.o
  CC [M]  fs/ext4/migrate.o
  CC [M]  fs/ext4/mmp.o
  CC [M]  fs/ext4/move_extent.o
  CC [M]  fs/ext4/namei.o
  CC [M]  fs/ext4/page-io.o
  CC [M]  fs/ext4/readpage.o
  CC [M]  fs/ext4/resize.o
  CC [M]  fs/ext4/super.o
  CC [M]  fs/ext4/symlink.o
  CC [M]  fs/ext4/sysfs.o
  CC [M]  fs/ext4/xattr.o
  CC [M]  fs/ext4/xattr_trusted.o
  CC [M]  fs/ext4/xattr_user.o
  CC [M]  fs/ext4/acl.o
  CC [M]  fs/ext4/xattr_security.o
  LD [M]  fs/ext4/ext4.o
  MODPOST 1 modules
  LD [M]  fs/ext4/ext4.ko
+ local src=fs/ext4/
++ uname -r
+ local dest=/lib/modules/5.6.0-rc3lbk+/kernel/fs/ext4
+ cp fs/ext4//ext4.ko /lib/modules/5.6.0-rc3lbk+/kernel/fs/ext4/
+ modprobe mbcache
+ modprobe jbd2
+ sleep 1
+ insmod fs/ext4/ext4.ko
+ sleep 1
+ load_nullb
+ local src=drivers/block/
++ uname -r
+ local dest=/lib/modules/5.6.0-rc3lbk+/kernel/drivers/block
+ cp drivers/block//null_blk.ko /lib/modules/5.6.0-rc3lbk+/kernel/drivers/block/
+ modprobe null_blk nr_devices=0
+ sleep 1
+ mkdir config/nullb/nullb0
+ tree config/nullb/nullb0
config/nullb/nullb0
├── badblocks
├── blocking
├── blocksize
├── cache_size
├── completion_nsec
├── discard
├── home_node
├── hw_queue_depth
├── index
├── irqmode
├── mbps
├── memory_backed
├── power
├── queue_mode
├── size
├── submit_queues
├── use_per_node_hctx
├── zoned
├── zone_nr_conv
└── zone_size

0 directories, 20 files
+ echo 1
+ echo 512
+ echo 20480
+ echo 1
+ sleep 2
++ cat config/nullb/nullb0/index
+ IDX=0
+ lsblk
+ grep null0
+ sleep 1
+ mkfs.ext4 /dev/nullb0
mke2fs 1.42.9 (28-Dec-2013)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
1310720 inodes, 5242880 blocks
262144 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2153775104
160 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
	4096000

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done   

+ mount /dev/nullb0 /mnt/backend
+ sleep 1
+ mount
+ grep nullb
/dev/nullb0 on /mnt/backend type ext4 (rw,relatime,seclabel)
+ dd if=/dev/zero of=/mnt/backend/data count=2621440 bs=4096
2621440+0 records in
2621440+0 records out
10737418240 bytes (11 GB) copied, 27.4579 s, 391 MB/s
+ load_loop
+ local src=drivers/block/
++ uname -r
+ local dest=/lib/modules/5.6.0-rc3lbk+/kernel/drivers/block
+ cp drivers/block//loop.ko /lib/modules/5.6.0-rc3lbk+/kernel/drivers/block/
+ insmod drivers/block/loop.ko max_loop=1
+ sleep 3
+ /root/util-linux/losetup --direct-io=off /dev/loop0 /mnt/backend/data
+ sleep 3
+ /root/util-linux/losetup
NAME       SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE         DIO LOG-SEC
/dev/loop0         0      0         0  0 /mnt/backend/data   0     512
+ ls -l /dev/loop0 /dev/loop-control
brw-rw----. 1 root disk  7,   0 Mar 29 10:28 /dev/loop0
crw-rw----. 1 root disk 10, 237 Mar 29 10:28 /dev/loop-control
+ dmesg -c
[42963.967060] null_blk: module loaded
[42968.419481] EXT4-fs (nullb0): mounted filesystem with ordered data mode. Opts: (null)
[42996.928141] loop: module loaded
+ mkfs.ext4 /dev/loop0
mke2fs 1.42.9 (28-Dec-2013)
Discarding device blocks: done                            
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
655360 inodes, 2621440 blocks
131072 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2151677952
80 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
	32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done 

+ mount /dev/loop0 /mnt/loop0
+ mount
+ grep loop0
/dev/loop0 on /mnt/loop0 type ext4 (rw,relatime,seclabel)
+ sleep 1
+ sync
+ sync
+ sync
+ ./test
step=61440
step=57344
step=53248
step=49152
step=45056
step=40960
step=36864
step=32768
step=28672
step=24576
step=20480
step=16384
step=12288
step=8192
step=4096
step=0

real	9m34.472s
user	0m0.062s
sys	0m5.783s

-- 
2.22.0


^ permalink raw reply	[flat|nested] 122+ messages in thread
[parent not found: <CGME20201207190149epcas5p2d877f4e3f6d31548d97f9b486d243a05@epcas5p2.samsung.com>]
* [PATCH] lpfc: Correct null ndlp reference on routine exit
@ 2020-11-30 18:12 James Smart
       [not found] ` <(James>
  2020-12-08  4:52 ` Martin K. Petersen
  0 siblings, 2 replies; 122+ messages in thread
From: James Smart @ 2020-11-30 18:12 UTC (permalink / raw)
  To: linux-scsi; +Cc: James Smart, Dan Carpenter

[-- Attachment #1: Type: text/plain, Size: 1470 bytes --]

smatch correctly called out a logic error with accessing a pointer after
checking it for null.
 drivers/scsi/lpfc/lpfc_els.c:2043 lpfc_cmpl_els_plogi()
 error: we previously assumed 'ndlp' could be null (see line 1942)

Adjust the exit point to avoid the trace printf ndlp reference. A trace
entry was already generated when the ndlp was checked for null.

Fixes: 4430f7fd09ec ("scsi: lpfc: Rework locations of ndlp reference taking")
Signed-off-by: James Smart <james.smart@broadcom.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
---
 drivers/scsi/lpfc/lpfc_els.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/lpfc/lpfc_els.c b/drivers/scsi/lpfc/lpfc_els.c
index fd5c581cd67a..96c087b8b474 100644
--- a/drivers/scsi/lpfc/lpfc_els.c
+++ b/drivers/scsi/lpfc/lpfc_els.c
@@ -1946,7 +1946,7 @@ lpfc_cmpl_els_plogi(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
 				 irsp->un.elsreq64.remoteID,
 				 irsp->ulpStatus, irsp->un.ulpWord[4],
 				 irsp->ulpIoTag);
-		goto out;
+		goto out_freeiocb;
 	}
 
 	/* Since ndlp can be freed in the disc state machine, note if this node
@@ -2042,6 +2042,7 @@ lpfc_cmpl_els_plogi(struct lpfc_hba *phba, struct lpfc_iocbq *cmdiocb,
 			      "PLOGI Cmpl PUT:     did:x%x refcnt %d",
 			      ndlp->nlp_DID, kref_read(&ndlp->kref), 0);
 
+out_freeiocb:
 	/* Release the reference on the original I/O request. */
 	free_ndlp = (struct lpfc_nodelist *)cmdiocb->context1;
 
-- 
2.26.2


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4163 bytes --]

^ permalink raw reply related	[flat|nested] 122+ messages in thread
* [PATCH 0/2] two generic block layer fixes for 5.9
@ 2020-07-13 12:35 Coly Li
  2020-07-13 12:35 ` [PATCH 1/2] block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers Coly Li
  2020-07-13 12:35 ` [PATCH 2/2] block: improve discard bio alignment in __blkdev_issue_discard() Coly Li
  0 siblings, 2 replies; 122+ messages in thread
From: Coly Li @ 2020-07-13 12:35 UTC (permalink / raw)
  To: axboe, linux-block; +Cc: Coly Li

Hi Jens,

These two patches are posted for a while, and have reviewed by several
other developers. Could you please to take them for Linux v5.9 ?

Thanks in advance.

Coly Li
---

Coly Li (2):
  block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd
    numbers
  block: improve discard bio alignment in __blkdev_issue_discard()

 block/blk-lib.c           | 25 +++++++++++++++++++++++--
 block/blk.h               | 14 ++++++++++++++
 include/linux/blk_types.h |  8 ++++----
 3 files changed, 41 insertions(+), 6 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 122+ messages in thread
* [PATCH v11 00/10] Introduce Zone Append for writing to zoned block devices
@ 2020-05-12  8:55 Johannes Thumshirn
  2020-05-12  8:55 ` [PATCH v11 01/10] block: provide fallbacks for blk_queue_zone_is_seq and blk_queue_zone_no Johannes Thumshirn
                   ` (11 more replies)
  0 siblings, 12 replies; 122+ messages in thread
From: Johannes Thumshirn @ 2020-05-12  8:55 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, linux-block, Damien Le Moal, Keith Busch,
	linux-scsi @ vger . kernel . org, Martin K . Petersen,
	linux-fsdevel @ vger . kernel . org, Johannes Thumshirn

The upcoming NVMe ZNS Specification will define a new type of write
command for zoned block devices, zone append.

When when writing to a zoned block device using zone append, the start
sector of the write is pointing at the start LBA of the zone to write to.
Upon completion the block device will respond with the position the data
has been placed in the zone. This from a high level perspective can be
seen like a file system's block allocator, where the user writes to a
file and the file-system takes care of the data placement on the device.

In order to fully exploit the new zone append command in file-systems and
other interfaces above the block layer, we choose to emulate zone append
in SCSI and null_blk. This way we can have a single write path for both
file-systems and other interfaces above the block-layer, like io_uring on
zoned block devices, without having to care too much about the underlying
characteristics of the device itself.

The emulation works by providing a cache of each zone's write pointer, so
zone append issued to the disk can be translated to a write with a
starting LBA of the write pointer. This LBA is used as input zone number
for the write pointer lookup in the zone write pointer offset cache and
the cached offset is then added to the LBA to get the actual position to
write the data. In SCSI we then turn the REQ_OP_ZONE_APPEND request into a
WRITE(16) command. Upon successful completion of the WRITE(16), the cache
will be updated to the new write pointer location and the written sector
will be noted in the request. On error the cache entry will be marked as
invalid and on the next write an update of the write pointer will be
scheduled, before issuing the actual write.

In order to reduce memory consumption, the only cached item is the offset
of the write pointer from the start of the zone, everything else can be
calculated. On an example drive with 52156 zones, the additional memory
consumption of the cache is thus 52156 * 4 = 208624 Bytes or 51 4k Byte
pages. The performance impact is neglectable for a spinning drive.

For null_blk the emulation is way simpler, as null_blk's zoned block
device emulation support already caches the write pointer position, so we
only need to report the position back to the upper layers. Additional
caching is not needed here.

Furthermore we have converted zonefs to run use ZONE_APPEND for synchronous
direct I/Os. Asynchronous I/O still uses the normal path via iomap.

Performance testing with zonefs sync writes on a 14 TB SMR drive and nullblk
shows good results. On the SMR drive we're not regressing (the performance
improvement is within noise), on nullblk we could drastically improve specific
workloads:

* nullblk:

Single Thread Multiple Zones
				kIOPS	MiB/s	MB/s	% delta
mq-deadline REQ_OP_WRITE	10.1	631	662
mq-deadline REQ_OP_ZONE_APPEND	13.2	828	868	+31.12
none REQ_OP_ZONE_APPEND		15.6	978	1026	+54.98


Multiple Threads Multiple Zones
				kIOPS	MiB/s	MB/s	% delta
mq-deadline REQ_OP_WRITE	10.2	640	671
mq-deadline REQ_OP_ZONE_APPEND	10.4	650	681	+1.49
none REQ_OP_ZONE_APPEND		16.9	1058	1109	+65.28

* 14 TB SMR drive

Single Thread Multiple Zones
				IOPS	MiB/s	MB/s	% delta
mq-deadline REQ_OP_WRITE	797	49.9	52.3
mq-deadline REQ_OP_ZONE_APPEND	806	50.4	52.9	+1.15

Multiple Threads Multiple Zones
				kIOPS	MiB/s	MB/s	% delta
mq-deadline REQ_OP_WRITE	745	46.6	48.9
mq-deadline REQ_OP_ZONE_APPEND	768	48	50.3	+2.86

The %-delta is against the baseline of REQ_OP_WRITE using mq-deadline as I/O
scheduler.

The series is based on Jens' for-5.8/block branch with HEAD:
ae979182ebb3 ("bdi: fix up for "remove the name field in struct backing_dev_info"")

As Christoph asked for a branch I pushed it to a git repo at:
git://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git zone-append.v11
https://git.kernel.org/pub/scm/linux/kernel/git/jth/linux.git/log/?h=zone-append.v11

Changes to v10:
- Added Reviews from Hannes
- Added Performance Numbers to cover letter

Changes to v9:
- Renamed zone_wp_ofst to zone_wp_offset (Hannes/Martin)
- Colledted Reviews
- Dropped already merged patches

Changes to v8:
- Added kerneldoc for bio_add_hw_page (Hannes)
- Simplified calculation of zone-boundary cross checking (Bart)
- Added safety nets for max_appen_sectors setting
- Added Reviews from Hannes
- Added Damien's Ack on the zonefs change

Changes to v7:
- Rebased on Jens' for-5.8/block
- Fixed up stray whitespace change (Bart)
- Added Reviews from Bart and Christoph

Changes to v6:
- Added Daniel's Reviewed-by's
- Addressed Christoph's comment on whitespace changes in 4/11
- Renamed driver_cb in 6/11
- Fixed lines over 80 characters in 8/11
- Damien simplified sd_zbc_revalidate_zones() in 8/11

Changes to v5:
- Added patch to fix the memleak on failed scsi command setup
- Added prep patch from Christoph for bio_add_hw_page
- Added Christoph's suggestions for adding append pages to bios
- Fixed compile warning with !CONFIG_BLK_DEV_ZONED
- Damien re-worked revalidate zone
- Added Christoph's suggestions for rescanning write pointers to update cache

Changes to v4:
- Added page merging for zone-append bios (Christoph)
- Removed different locking schmes for zone management operations (Christoph)
- Changed wp_ofst assignment from blk_revalidate_zones (Christoph)
- Smaller nitpicks (Christoph)
- Documented my changes to Keith's patch so it's clear where I messed up so he
  doesn't get blamed
- Added Damien as a Co-developer to the sd emulation patch as he wrote as much
  code for it as I did (if not more)

Changes since v3:
- Remove impact of zone-append from bio_full() and bio_add_page()
  fast-path (Christoph)
- All of the zone write pointer offset caching is handled in SCSI now
  (Christoph) 
- Drop null_blk pathces that damien sent separately (Christoph)
- Use EXPORT_SYMBOL_GPL for new exports (Christoph)	

Changes since v2:
- Remove iomap implementation and directly issue zone-appends from within
  zonefs (Christoph)
- Drop already merged patch
- Rebase onto new for-next branch

Changes since v1:
- Too much to mention, treat as a completely new series.


Christoph Hellwig (1):
  block: rename __bio_add_pc_page to bio_add_hw_page

Damien Le Moal (2):
  block: Modify revalidate zones
  null_blk: Support REQ_OP_ZONE_APPEND

Johannes Thumshirn (6):
  block: provide fallbacks for blk_queue_zone_is_seq and
    blk_queue_zone_no
  block: introduce blk_req_zone_write_trylock
  scsi: sd_zbc: factor out sanity checks for zoned commands
  scsi: sd_zbc: emulate ZONE_APPEND commands
  block: export bio_release_pages and bio_iov_iter_get_pages
  zonefs: use REQ_OP_ZONE_APPEND for sync DIO

Keith Busch (1):
  block: Introduce REQ_OP_ZONE_APPEND

 block/bio.c                    | 129 ++++++++---
 block/blk-core.c               |  52 +++++
 block/blk-map.c                |   5 +-
 block/blk-mq.c                 |  27 +++
 block/blk-settings.c           |  31 +++
 block/blk-sysfs.c              |  13 ++
 block/blk-zoned.c              |  23 +-
 block/blk.h                    |   4 +-
 drivers/block/null_blk_zoned.c |  37 ++-
 drivers/scsi/scsi_lib.c        |   1 +
 drivers/scsi/sd.c              |  16 +-
 drivers/scsi/sd.h              |  43 +++-
 drivers/scsi/sd_zbc.c          | 399 ++++++++++++++++++++++++++++++---
 fs/zonefs/super.c              |  80 ++++++-
 include/linux/blk_types.h      |  14 ++
 include/linux/blkdev.h         |  25 ++-
 16 files changed, 807 insertions(+), 92 deletions(-)

-- 
2.24.1


^ permalink raw reply	[flat|nested] 122+ messages in thread
* linux-next: Signed-off-by missing for commit in the scsi-fixes tree
@ 2018-11-06 19:48 Stephen Rothwell
       [not found] ` <(Stephen>
  0 siblings, 1 reply; 122+ messages in thread
From: Stephen Rothwell @ 2018-11-06 19:48 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Linux-Next Mailing List, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 211 bytes --]

Hi Martin,

Commit

  85ee0a7b2d53 ("Revert "scsi: ufs: Disable blk-mq for now"")

is missing a Signed-off-by from its author and committer.

Reverts are commits, too.

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread
* [PATCH] scsi_dh_alua: Remove stale variables
@ 2015-12-03  6:57 Hannes Reinecke
       [not found] ` <(Hannes>
                   ` (2 more replies)
  0 siblings, 3 replies; 122+ messages in thread
From: Hannes Reinecke @ 2015-12-03  6:57 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, linux-scsi, Hannes Reinecke

With commit 83ea0e5e3501 these variables became obsolete,
but weren't removed.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index f100cbb..5a328bf 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -320,8 +320,6 @@ static int alua_check_tpgs(struct scsi_device *sdev)
  */
 static int alua_check_vpd(struct scsi_device *sdev, struct alua_dh_data *h)
 {
-	unsigned char *d;
-	unsigned char __rcu *vpd_pg83;
 	int rel_port = -1, group_id;
 
 	group_id = scsi_vpd_tpg_id(sdev, &rel_port);
-- 
1.8.5.6


^ permalink raw reply related	[flat|nested] 122+ messages in thread
* RAID-10 keeps aborting
@ 2013-06-03  3:57 H. Peter Anvin
  2013-06-03  4:05 ` H. Peter Anvin
                   ` (2 more replies)
  0 siblings, 3 replies; 122+ messages in thread
From: H. Peter Anvin @ 2013-06-03  3:57 UTC (permalink / raw)
  To: linux-raid

Hello,

I have a brand new server with a RAID-10 array.  The drives are a SAS
JBOD (mptsas) which I'm driving using Linux mdraid raid10.

Unfortunately, although the server did burn-in fine, once put in
production I have so far had multiple cases (about once every 24 hours)
of the raid10 failing, with a mirror pair dropping out in very short
succession:

Jun  2 20:23:05 terminus kernel: [83595.614689] md/raid10:md4: Disk
failure on sdb6, disabling device.
Jun  2 20:23:05 terminus kernel: [83595.614689] md/raid10:md4: Operation
continuing on 3 devices.
Jun  2 20:23:05 terminus kernel: [83595.703106] md/raid10:md4: Disk
failure on sdc6, disabling device.
Jun  2 20:23:05 terminus kernel: [83595.703106] md/raid10:md4: Operation
continuing on 2 devices.
Jun  2 20:23:05 terminus kernel: [83595.789234] md4: WRITE SAME failed.
Manually zeroing.

Unfortunately, those two devices that just dropped out are of course the
mirrors of each other, causing filesystem corruption and shutdown in
very short order.

There are no other kernel messages from the same time, and given the
timing (less than 90 ms apart) it would appear that this is a timeout of
some kind and not an actual disk failure.

Are there any tunables I can tweak, or do I have a $4000 paperweight?

	-hpa

^ permalink raw reply	[flat|nested] 122+ messages in thread

end of thread, other threads:[~2020-12-09  2:20 UTC | newest]

Thread overview: 122+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-29 17:47 [PATCH 0/4] block: Add support for REQ_OP_ASSIGN_RANGE Chaitanya Kulkarni
     [not found] ` <(Chaitanya>
2020-03-29 17:47 ` [PATCH 1/4] block: create payloadless issue bio helper Chaitanya Kulkarni
2020-03-29 17:47 ` [PATCH 2/4] block: Add support for REQ_OP_ASSIGN_RANGE Chaitanya Kulkarni
2020-03-29 17:47 ` [PATCH 3/4] loop: Forward REQ_OP_ASSIGN_RANGE into fallocate(0) Chaitanya Kulkarni
2020-03-29 17:47 ` [PATCH 4/4] ext4: Notify block device about alloc-assigned blk Chaitanya Kulkarni
2020-04-01  6:22 ` [PATCH 0/4] block: Add support for REQ_OP_ASSIGN_RANGE Konstantin Khlebnikov
2020-04-02  2:29   ` Martin K. Petersen
2020-04-02  9:49     ` Konstantin Khlebnikov
2020-04-02 22:41 ` Dave Chinner
2020-04-03  1:34   ` Martin K. Petersen
2020-04-03  2:57     ` Dave Chinner
     [not found]       ` <(Dave>
     [not found] <CGME20201207190149epcas5p2d877f4e3f6d31548d97f9b486d243a05@epcas5p2.samsung.com>
2020-12-07 19:01 ` [PATCH 0/2] two UFS changes Bean Huo
     [not found]   ` <(Bean>
2020-12-07 19:01   ` [PATCH 1/2] scsi: ufs: Remove an unused macro definition POWER_DESC_MAX_SIZE Bean Huo
2020-12-08  7:52     ` Avri Altman
2020-12-07 19:01   ` [PATCH 2/2] scsi: ufs: Fix wrong print message in dev_err() Bean Huo
2020-12-08  7:53     ` Avri Altman
2020-12-08  2:57   ` [PATCH 0/2] two UFS changes Alim Akhtar
  -- strict thread matches above, loose matches on Subject: below --
2020-11-30 18:12 [PATCH] lpfc: Correct null ndlp reference on routine exit James Smart
     [not found] ` <(James>
2020-12-08  4:52 ` Martin K. Petersen
2020-07-13 12:35 [PATCH 0/2] two generic block layer fixes for 5.9 Coly Li
2020-07-13 12:35 ` [PATCH 1/2] block: change REQ_OP_ZONE_RESET and REQ_OP_ZONE_RESET_ALL to be odd numbers Coly Li
2020-07-13 23:12   ` Damien Le Moal
2020-07-13 12:35 ` [PATCH 2/2] block: improve discard bio alignment in __blkdev_issue_discard() Coly Li
     [not found]   ` <(Coly>
2020-05-12  8:55 [PATCH v11 00/10] Introduce Zone Append for writing to zoned block devices Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 01/10] block: provide fallbacks for blk_queue_zone_is_seq and blk_queue_zone_no Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 02/10] block: rename __bio_add_pc_page to bio_add_hw_page Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 03/10] block: Introduce REQ_OP_ZONE_APPEND Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 04/10] block: introduce blk_req_zone_write_trylock Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 05/10] block: Modify revalidate zones Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 06/10] scsi: sd_zbc: factor out sanity checks for zoned commands Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 07/10] scsi: sd_zbc: emulate ZONE_APPEND commands Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 08/10] null_blk: Support REQ_OP_ZONE_APPEND Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 09/10] block: export bio_release_pages and bio_iov_iter_get_pages Johannes Thumshirn
2020-05-12  8:55 ` [PATCH v11 10/10] zonefs: use REQ_OP_ZONE_APPEND for sync DIO Johannes Thumshirn
2020-05-12 13:17 ` [PATCH v11 00/10] Introduce Zone Append for writing to zoned block devices Christoph Hellwig
     [not found]   ` <(Christoph>
2020-05-13  2:37 ` Jens Axboe
2018-11-06 19:48 linux-next: Signed-off-by missing for commit in the scsi-fixes tree Stephen Rothwell
     [not found] ` <(Stephen>
2015-12-03  6:57 [PATCH] scsi_dh_alua: Remove stale variables Hannes Reinecke
     [not found] ` <(Hannes>
2015-12-03  9:23 ` Johannes Thumshirn
2015-12-03 16:43 ` Christoph Hellwig
2013-06-03  3:57 RAID-10 keeps aborting H. Peter Anvin
2013-06-03  4:05 ` H. Peter Anvin
2013-06-03  5:47 ` Dan Williams
2013-06-03  6:06   ` H. Peter Anvin
2013-06-03  6:14     ` Dan Williams
2013-06-03  6:30       ` H. Peter Anvin
2013-06-03 14:39       ` H. Peter Anvin
2013-06-11 16:47         ` Joe Lawrence
2013-06-11 17:12           ` H. Peter Anvin
2013-06-03 15:47       ` H. Peter Anvin
2013-06-03 16:09         ` Joe Lawrence
2013-06-03 17:22         ` Dan Williams
2013-06-03 17:40           ` H. Peter Anvin
2013-06-03 18:35             ` Martin K. Petersen
2013-06-03 18:38               ` H. Peter Anvin
2013-06-03 18:40               ` H. Peter Anvin
2013-06-03 22:20                 ` H. Peter Anvin
2013-06-03 22:34                   ` H. Peter Anvin
2013-06-04 15:56                     ` Martin K. Petersen
2013-06-03 23:19               ` H. Peter Anvin
2013-06-04 15:39                 ` Joe Lawrence
2013-06-04 15:46                   ` H. Peter Anvin
2013-06-04 15:54                     ` Martin K. Petersen
2013-06-05 10:02                   ` Bernd Schubert
2013-06-05 11:38                     ` Bernd Schubert
2013-06-05 12:53                       ` [PATCH] scsi: Check if the device support WRITE_SAME_10 Bernd Schubert
2013-06-05 19:14                         ` Martin K. Petersen
2013-06-05 20:09                           ` Bernd Schubert
2013-06-07  2:15                             ` Martin K. Petersen
2013-06-12 19:34                               ` Bernd Schubert
2013-06-05 19:11                       ` RAID-10 keeps aborting Martin K. Petersen
2013-06-04 17:36               ` Dan Williams
2013-06-04 17:54                 ` Martin K. Petersen
2013-06-04 17:57                   ` H. Peter Anvin
2013-06-04 18:04                     ` Martin K. Petersen
2013-06-04 18:32                       ` Dan Williams
2013-06-04 18:38                         ` H. Peter Anvin
2013-06-04 18:56                           ` Dan Williams
2013-06-05  2:39                             ` H. Peter Anvin
     [not found]                               ` <(H.>
     [not found]                                 ` <Peter>
     [not found]                                   ` <Anvin's>
     [not found]                                     ` <message>
     [not found]                                       ` <of>
     [not found]                                         ` <"Mon>
     [not found]                                           ` <13>
     [not found]                                         ` <"Wed>
     [not found]                                           ` <12>
     [not found]                                           ` <7>
     [not found]                                             ` <Nov>
     [not found]                                               ` <2018>
     [not found]                                                 ` <06:48:34>
     [not found]                                                   ` <+1100")>
2018-11-07  1:52                                                     ` linux-next: Signed-off-by missing for commit in the scsi-fixes tree Martin K. Petersen
2020-04-03  3:45                                                     ` [PATCH 0/4] block: Add support for REQ_OP_ASSIGN_RANGE Martin K. Petersen
2020-04-07  2:27                                                       ` Dave Chinner
2020-04-08  4:10                                                         ` Martin K. Petersen
2020-04-19 22:36                                                           ` Dave Chinner
2020-04-23  0:40                                                             ` Martin K. Petersen
     [not found]                                         ` <"Tue>
     [not found]                                           ` <04>
     [not found]                                             ` <Jun>
     [not found]                                               ` <2013>
     [not found]                                                 ` <14:27:47>
     [not found]                                                   ` <-0400")>
2013-06-07  2:19                                                     ` RAID-10 keeps aborting Martin K. Petersen
2013-06-10 14:15                                                       ` Joe Lawrence
2013-06-12  3:15                                                         ` NeilBrown
2013-06-12  4:07                                                           ` H. Peter Anvin
2013-06-12  6:29                                                             ` Bernd Schubert
2013-06-12 10:22                                                               ` Joe Lawrence
2013-06-12 14:28                                                               ` Martin K. Petersen
2013-06-12 14:25                                                             ` Martin K. Petersen
2013-06-12 14:29                                                               ` H. Peter Anvin
2013-06-12 14:34                                                                 ` Martin K. Petersen
2013-06-12 14:37                                                                   ` H. Peter Anvin
2013-06-12 14:45                                                                   ` H. Peter Anvin
     [not found]                                                                     ` <5AA430FFE4486C448003201AC83BC85E0360CE3F@EXHQ.corp.stratus! .com>
     [not found]                                                                       ` <5AA430FFE4486C448003201AC83BC85E0360CE3F@EXHQ.corp.stratus.com>
2013-06-12 15:58                                                                         ` H. Peter Anvin
2013-06-13  3:10                                                                     ` NeilBrown
2013-06-13  3:13                                                                       ` H. Peter Anvin
2013-06-13  3:31                                                                         ` NeilBrown
2013-06-13 21:40                                                                       ` Martin K. Petersen
2013-06-13  2:45                                                           ` Joe Lawrence
2013-06-13  3:11                                                             ` NeilBrown
     [not found]                                                 ` <19:39:58>
     [not found]                                                   ` <-0700")>
2013-06-05 19:29                                                     ` Martin K. Petersen
2013-06-06 18:27                                                       ` Joe Lawrence
     [not found]                                                         ` <(Joe>
2013-06-06 18:36                                                         ` H. Peter Anvin
2013-06-12 14:43                                                     ` Martin K. Petersen
2020-04-01  2:29                                                     ` [PATCH 0/4] block: Add support for REQ_OP_ASSIGN_RANGE Martin K. Petersen
2020-04-01  4:53                                                       ` Chaitanya Kulkarni
2020-05-12 16:01                                                     ` [PATCH v11 00/10] Introduce Zone Append for writing to zoned block devices Martin K. Petersen
2020-05-12 16:04                                                       ` Christoph Hellwig
2020-05-12 16:12                                                         ` Martin K. Petersen
2020-05-12 16:18                                                           ` Johannes Thumshirn
2020-05-12 16:24                                                             ` Martin K. Petersen
     [not found]                                         ` <"Sun>
     [not found]                                           ` <29>
     [not found]                                             ` <Mar>
     [not found]                                               ` <2020>
     [not found]                                                 ` <20:35:11>
     [not found]                                                   ` <+0800")>
2020-07-13 16:47                                                     ` [PATCH 2/2] block: improve discard bio alignment in __blkdev_issue_discard() Martin K. Petersen
2020-07-13 17:50                                                       ` Coly Li
     [not found]                                                 ` <10:12:26>
     [not found]                                                   ` <-0800")>
2020-12-01  5:19                                                     ` [PATCH] lpfc: Correct null ndlp reference on routine exit Martin K. Petersen
     [not found]                                         ` <"Thu>
     [not found]                                           ` <3>
     [not found]                                             ` <Dec>
     [not found]                                               ` <2015>
     [not found]                                                 ` <07:57:35>
     [not found]                                                   ` <+0100")>
2015-12-08  1:12                                                     ` [PATCH] scsi_dh_alua: Remove stale variables Martin K. Petersen
2020-12-09  2:17                                                     ` [PATCH 0/2] two UFS changes Martin K. Petersen
2013-06-11 21:50 ` RAID-10 keeps aborting Joe Lawrence
2013-06-11 21:53   ` H. Peter Anvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.