* [Question] IO is split by block layer when size is larger than 4k @ 2020-03-12 11:13 Feng Li 2020-03-12 11:51 ` Hannes Reinecke 2020-03-12 12:34 ` Ming Lei 0 siblings, 2 replies; 13+ messages in thread From: Feng Li @ 2020-03-12 11:13 UTC (permalink / raw) To: linux-block, ming.lei Hi experts, May I ask a question about block layer? When running fio in guest os, I find a 256k IO is split into the page by page in bio, saved in bvecs. And virtio-blk just put the bio_vec one by one in the available descriptor table. So if my backend device does not support iovector opertion(preadv/pwritev), then IO is issued to a low layer page by page. My question is: why doesn't the bio save multi-pages in one bio_vec? the /dev/vdb is a vhost-user-blk-pci device from spdk or virtio-blk-pci device. fio config is: [global] name=fio-rand-read rw=randread ioengine=libaio direct=1 numjobs=1 iodepth=1 bs=256K [file1] filename=/dev/vdb Traceing result like this: /usr/share/bcc/tools/stackcount -K -T '__blockdev_direct_IO' return 378048 /usr/share/bcc/tools/stackcount -K -T 'bio_add_page' return 5878 I can get: 378048/5878 = 64 256k/4k=64. __blockdev_direct_IO splits 256k to 64 parts. The /dev/vdb queue properties is as follows: [root@t1 00:10:42 queue]$find . | while read f;do echo "$f = $(cat $f)";done ./nomerges = 0 ./logical_block_size = 512 ./rq_affinity = 1 ./discard_zeroes_data = 0 ./max_segments = 126 ./unpriv_sgio = 0 ./max_segment_size = 4294967295 ./rotational = 1 ./scheduler = none ./read_ahead_kb = 128 ./max_hw_sectors_kb = 2147483647 ./discard_granularity = 0 ./discard_max_bytes = 0 ./write_same_max_bytes = 0 ./max_integrity_segments = 0 ./max_sectors_kb = 512 ./physical_block_size = 512 ./add_random = 0 ./nr_requests = 128 ./minimum_io_size = 512 ./hw_sector_size = 512 ./optimal_io_size Sometimes the part io size is bigger than 4k. some logs: id: 0 size: 4096 ... id: 57 size: 4096 id: 58 size: 24576 Why does this happen? kernel version: 1. 3.10.0-1062.1.2.el7 2. 5.3 Thanks in advance. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Question] IO is split by block layer when size is larger than 4k 2020-03-12 11:13 [Question] IO is split by block layer when size is larger than 4k Feng Li @ 2020-03-12 11:51 ` Hannes Reinecke 2020-03-12 12:09 ` Feng Li 2020-03-12 12:34 ` Ming Lei 1 sibling, 1 reply; 13+ messages in thread From: Hannes Reinecke @ 2020-03-12 11:51 UTC (permalink / raw) To: Feng Li, linux-block, ming.lei On 3/12/20 12:13 PM, Feng Li wrote: > Hi experts, > > May I ask a question about block layer? > When running fio in guest os, I find a 256k IO is split into the page > by page in bio, saved in bvecs. > And virtio-blk just put the bio_vec one by one in the available > descriptor table. > It isn't 'split', it's using _one_ bio containing bvecs, where each bvec consists of one page. 'split' for the blocklayer means that a single I/O is split into several bios, which I dont' think is the case here. Or? Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), GF: Felix Imendörffer ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Question] IO is split by block layer when size is larger than 4k 2020-03-12 11:51 ` Hannes Reinecke @ 2020-03-12 12:09 ` Feng Li 0 siblings, 0 replies; 13+ messages in thread From: Feng Li @ 2020-03-12 12:09 UTC (permalink / raw) To: Hannes Reinecke; +Cc: linux-block, ming.lei Ok, thanks for the clarification. I mean each bvec consists of one page. It's not good for virtio backend. Hannes Reinecke <hare@suse.de> 于2020年3月12日周四 下午7:51写道: > > On 3/12/20 12:13 PM, Feng Li wrote: > > Hi experts, > > > > May I ask a question about block layer? > > When running fio in guest os, I find a 256k IO is split into the page > > by page in bio, saved in bvecs. > > And virtio-blk just put the bio_vec one by one in the available > > descriptor table. > > > It isn't 'split', it's using _one_ bio containing bvecs, where each bvec > consists of one page. > > 'split' for the blocklayer means that a single I/O is split into several > bios, which I dont' think is the case here. Or? > > Cheers, > > Hannes > -- > Dr. Hannes Reinecke Kernel Storage Architect > hare@suse.de +49 911 74053 688 > SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg > HRB 36809 (AG Nürnberg), GF: Felix Imendörffer ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Question] IO is split by block layer when size is larger than 4k 2020-03-12 11:13 [Question] IO is split by block layer when size is larger than 4k Feng Li 2020-03-12 11:51 ` Hannes Reinecke @ 2020-03-12 12:34 ` Ming Lei 2020-03-12 13:21 ` Feng Li 1 sibling, 1 reply; 13+ messages in thread From: Ming Lei @ 2020-03-12 12:34 UTC (permalink / raw) To: Feng Li; +Cc: linux-block On Thu, Mar 12, 2020 at 07:13:28PM +0800, Feng Li wrote: > Hi experts, > > May I ask a question about block layer? > When running fio in guest os, I find a 256k IO is split into the page > by page in bio, saved in bvecs. > And virtio-blk just put the bio_vec one by one in the available > descriptor table. > > So if my backend device does not support iovector > opertion(preadv/pwritev), then IO is issued to a low layer page by > page. > My question is: why doesn't the bio save multi-pages in one bio_vec? We start multipage bvec since v5.1, especially since 07173c3ec276 ("block: enable multipage bvecs"). Thanks, Ming ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Question] IO is split by block layer when size is larger than 4k 2020-03-12 12:34 ` Ming Lei @ 2020-03-12 13:21 ` Feng Li 2020-03-13 2:31 ` Ming Lei 0 siblings, 1 reply; 13+ messages in thread From: Feng Li @ 2020-03-12 13:21 UTC (permalink / raw) To: Ming Lei; +Cc: linux-block Hi Ming, Thanks. I have tested kernel '5.4.0-rc6+', which includes 07173c3ec276. But the virtio is still be filled with single page by page. Ming Lei <ming.lei@redhat.com> 于2020年3月12日周四 下午8:34写道: > > On Thu, Mar 12, 2020 at 07:13:28PM +0800, Feng Li wrote: > > Hi experts, > > > > May I ask a question about block layer? > > When running fio in guest os, I find a 256k IO is split into the page > > by page in bio, saved in bvecs. > > And virtio-blk just put the bio_vec one by one in the available > > descriptor table. > > > > So if my backend device does not support iovector > > opertion(preadv/pwritev), then IO is issued to a low layer page by > > page. > > My question is: why doesn't the bio save multi-pages in one bio_vec? > > We start multipage bvec since v5.1, especially since 07173c3ec276 > ("block: enable multipage bvecs"). > > Thanks, > Ming > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Question] IO is split by block layer when size is larger than 4k 2020-03-12 13:21 ` Feng Li @ 2020-03-13 2:31 ` Ming Lei 2020-03-14 17:43 ` Feng Li 0 siblings, 1 reply; 13+ messages in thread From: Ming Lei @ 2020-03-13 2:31 UTC (permalink / raw) To: Feng Li; +Cc: linux-block On Thu, Mar 12, 2020 at 09:21:11PM +0800, Feng Li wrote: > Hi Ming, > Thanks. > I have tested kernel '5.4.0-rc6+', which includes 07173c3ec276. > But the virtio is still be filled with single page by page. Hello, Could you share your test script? BTW, it depends if fs layer passes contiguous pages to block layer. You can dump each bvec of the bio, and see if they are contiguous physically. Thanks, Ming > > Ming Lei <ming.lei@redhat.com> 于2020年3月12日周四 下午8:34写道: > > > > On Thu, Mar 12, 2020 at 07:13:28PM +0800, Feng Li wrote: > > > Hi experts, > > > > > > May I ask a question about block layer? > > > When running fio in guest os, I find a 256k IO is split into the page > > > by page in bio, saved in bvecs. > > > And virtio-blk just put the bio_vec one by one in the available > > > descriptor table. > > > > > > So if my backend device does not support iovector > > > opertion(preadv/pwritev), then IO is issued to a low layer page by > > > page. > > > My question is: why doesn't the bio save multi-pages in one bio_vec? > > > > We start multipage bvec since v5.1, especially since 07173c3ec276 > > ("block: enable multipage bvecs"). > > > > Thanks, > > Ming > > > -- Ming ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Question] IO is split by block layer when size is larger than 4k 2020-03-13 2:31 ` Ming Lei @ 2020-03-14 17:43 ` Feng Li 2020-03-16 15:22 ` Ming Lei 0 siblings, 1 reply; 13+ messages in thread From: Feng Li @ 2020-03-14 17:43 UTC (permalink / raw) To: Ming Lei; +Cc: linux-block Hi Ming, This is my cmd to run qemu: qemu-2.12.0/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -device virtio-balloon -cpu host -smp 4 -m 2G -drive file=/root/html/fedora-10g.img,format=raw,cache=none,aio=native,if=none,id=drive-virtio-disk1 -device virtio-blk-pci,scsi=off,drive=drive-virtio-disk1,id=virtio-disk1,bootindex=1 -drive file=/dev/sdb,format=raw,cache=none,aio=native,if=none,id=drive-virtio-disk2 -device virtio-blk-pci,scsi=off,drive=drive-virtio-disk2,id=virtio-disk2,bootindex=2 -device virtio-net,netdev=nw1,mac=00:11:22:EE:EE:10 -netdev tap,id=nw1,script=no,downscript=no,ifname=tap0 -serial mon:stdio -nographic -object memory-backend-file,id=mem0,size=2G,mem-path=/dev/hugepages,share=on -numa node,memdev=mem0 -vnc 0.0.0.0:100 -machine usb=on,nvdimm -device usb-tablet -monitor unix:///tmp/a.socket,server,nowait -qmp tcp:0.0.0.0:2234,server,nowait OS image is Fedora 31. Kernel is 5.3.7-301.fc31.x86_64. The address from virio in qemu like this: ========= size: 262144, iovcnt: 64 0: size: 4096 addr: 0x7fffc83f1000 1: size: 4096 addr: 0x7fffc8037000 2: size: 4096 addr: 0x7fffd3710000 3: size: 4096 addr: 0x7fffd5624000 4: size: 4096 addr: 0x7fffc766c000 5: size: 4096 addr: 0x7fffc7c21000 6: size: 4096 addr: 0x7fffc8d54000 7: size: 4096 addr: 0x7fffc8fc6000 8: size: 4096 addr: 0x7fffd5659000 9: size: 4096 addr: 0x7fffc7f88000 10: size: 4096 addr: 0x7fffc767b000 11: size: 4096 addr: 0x7fffc8332000 12: size: 4096 addr: 0x7fffb4297000 13: size: 4096 addr: 0x7fffc8888000 14: size: 4096 addr: 0x7fffc93d7000 15: size: 4096 addr: 0x7fffc9f1f000 They are not contiguous pages, so the pages in bvec are not continus physical pages. I don't know how to dump the bvec address in bio without recompiling the kernel. IO Pattern in guest is : root@192.168.19.239 02:39:29 ~ $ cat 256k-randread.fio [global] ioengine=libaio invalidate=1 ramp_time=5 iodepth=1 runtime=120000 time_based direct=1 [randread-vdb-256k-para] bs=256k stonewall filename=/dev/vdb rw=randread Thanks. Ming Lei <ming.lei@redhat.com> 于2020年3月13日周五 上午10:32写道: > > On Thu, Mar 12, 2020 at 09:21:11PM +0800, Feng Li wrote: > > Hi Ming, > > Thanks. > > I have tested kernel '5.4.0-rc6+', which includes 07173c3ec276. > > But the virtio is still be filled with single page by page. > > Hello, > > Could you share your test script? > > BTW, it depends if fs layer passes contiguous pages to block layer. > > You can dump each bvec of the bio, and see if they are contiguous > physically. > > Thanks, > Ming > > > > > Ming Lei <ming.lei@redhat.com> 于2020年3月12日周四 下午8:34写道: > > > > > > On Thu, Mar 12, 2020 at 07:13:28PM +0800, Feng Li wrote: > > > > Hi experts, > > > > > > > > May I ask a question about block layer? > > > > When running fio in guest os, I find a 256k IO is split into the page > > > > by page in bio, saved in bvecs. > > > > And virtio-blk just put the bio_vec one by one in the available > > > > descriptor table. > > > > > > > > So if my backend device does not support iovector > > > > opertion(preadv/pwritev), then IO is issued to a low layer page by > > > > page. > > > > My question is: why doesn't the bio save multi-pages in one bio_vec? > > > > > > We start multipage bvec since v5.1, especially since 07173c3ec276 > > > ("block: enable multipage bvecs"). > > > > > > Thanks, > > > Ming > > > > > > > -- > Ming > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Question] IO is split by block layer when size is larger than 4k 2020-03-14 17:43 ` Feng Li @ 2020-03-16 15:22 ` Ming Lei 2020-03-17 8:19 ` Feng Li 0 siblings, 1 reply; 13+ messages in thread From: Ming Lei @ 2020-03-16 15:22 UTC (permalink / raw) To: Feng Li; +Cc: Ming Lei, linux-block [-- Attachment #1: Type: text/plain, Size: 2308 bytes --] On Sun, Mar 15, 2020 at 9:34 AM Feng Li <lifeng1519@gmail.com> wrote: > > Hi Ming, > This is my cmd to run qemu: > qemu-2.12.0/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -device > virtio-balloon -cpu host -smp 4 -m 2G -drive > file=/root/html/fedora-10g.img,format=raw,cache=none,aio=native,if=none,id=drive-virtio-disk1 > -device virtio-blk-pci,scsi=off,drive=drive-virtio-disk1,id=virtio-disk1,bootindex=1 > -drive file=/dev/sdb,format=raw,cache=none,aio=native,if=none,id=drive-virtio-disk2 > -device virtio-blk-pci,scsi=off,drive=drive-virtio-disk2,id=virtio-disk2,bootindex=2 > -device virtio-net,netdev=nw1,mac=00:11:22:EE:EE:10 -netdev > tap,id=nw1,script=no,downscript=no,ifname=tap0 -serial mon:stdio > -nographic -object > memory-backend-file,id=mem0,size=2G,mem-path=/dev/hugepages,share=on > -numa node,memdev=mem0 -vnc 0.0.0.0:100 -machine usb=on,nvdimm -device > usb-tablet -monitor unix:///tmp/a.socket,server,nowait -qmp > tcp:0.0.0.0:2234,server,nowait > > OS image is Fedora 31. Kernel is 5.3.7-301.fc31.x86_64. > > The address from virio in qemu like this: > ========= size: 262144, iovcnt: 64 > 0: size: 4096 addr: 0x7fffc83f1000 > 1: size: 4096 addr: 0x7fffc8037000 > 2: size: 4096 addr: 0x7fffd3710000 > 3: size: 4096 addr: 0x7fffd5624000 > 4: size: 4096 addr: 0x7fffc766c000 > 5: size: 4096 addr: 0x7fffc7c21000 > 6: size: 4096 addr: 0x7fffc8d54000 > 7: size: 4096 addr: 0x7fffc8fc6000 > 8: size: 4096 addr: 0x7fffd5659000 > 9: size: 4096 addr: 0x7fffc7f88000 > 10: size: 4096 addr: 0x7fffc767b000 > 11: size: 4096 addr: 0x7fffc8332000 > 12: size: 4096 addr: 0x7fffb4297000 > 13: size: 4096 addr: 0x7fffc8888000 > 14: size: 4096 addr: 0x7fffc93d7000 > 15: size: 4096 addr: 0x7fffc9f1f000 > > They are not contiguous pages, so the pages in bvec are not continus > physical pages. > > I don't know how to dump the bvec address in bio without recompiling the kernel. I just run similar test on 5.3.11-100.fc29.x86_64, and the observation is similar with yours. However, not observe similar problem in 5.6-rc kernel in VM, maybe kernel config causes the difference. BTW, I usually use the attached bcc script to observe bvec pages, and you may try that on upstream kernel. Thanks, Ming [-- Attachment #2: bvec_avg_pages.py --] [-- Type: text/x-python, Size: 2144 bytes --] #!/usr/bin/python3 # # bvec_pages.py # # Written as a basic example of a function pages per bvec distribution histogram. # # USAGE: bvec_pages # # The default interval is 5 seconds. A Ctrl-C will print the partially # gathered histogram then exit. # # Copyright (c) 2016 Ming Lei # Licensed under the Apache License, Version 2.0 (the "License") # # 15-Aug-2015 Ming Lei Created this. from bcc import BPF from ctypes import c_ushort, c_int, c_ulonglong from time import sleep from sys import argv import os # define BPF program bpf_text = """ #include <uapi/linux/ptrace.h> #include <linux/blkdev.h> struct key_t { unsigned dev_no; }; struct val_t { u64 bvec_cnt; u64 size; u64 bio_cnt; }; BPF_HASH(bvec, struct key_t, struct val_t); // time block I/O int trace_submit_bio(struct pt_regs *ctx, struct bio *bio) { unsigned short vcnt; unsigned size; size = bio->bi_iter.bi_size; vcnt = bio->bi_vcnt; if (vcnt) { struct val_t *valp; struct key_t key; struct val_t zero = {0}; #if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 14, 0) int maj, min; maj = bio->bi_disk->major; min = bio->bi_disk->first_minor; key.dev_no = (unsigned)MKDEV(maj, min); #else key.dev_no = (unsigned)bio->bi_bdev->bd_dev; #endif valp = bvec.lookup_or_init(&key, &zero); valp->bvec_cnt += vcnt; valp->size += size; valp->bio_cnt += 1; } //bpf_trace_printk("pages %d, vcnt: %d\\n", size>>12, vcnt); return 0; } """ # load BPF program b = BPF(text=bpf_text); b.attach_kprobe(event="submit_bio", fn_name="trace_submit_bio") # header print("Tracing... Hit Ctrl-C to end.") # output try: sleep(99999999) except KeyboardInterrupt: pass page_size = os.sysconf("SC_PAGE_SIZE") print("\n%-7s %-12s %12s %12s" % ("DEVICE", "PAGES_PER_BVEC", "SIZE_PER_BIO", "VCNT_PER_BIO")) counts = b.get_table("bvec") for k, v in counts.items(): pgs = v.size / page_size print("%-3d:%-3d %-12d %12dKB %12d" % (k.dev_no >> 20, k.dev_no & ((1 << 20) - 1), pgs / v.bvec_cnt, (v.size >> 10) / v.bio_cnt, v.bvec_cnt / v.bio_cnt)) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Question] IO is split by block layer when size is larger than 4k 2020-03-16 15:22 ` Ming Lei @ 2020-03-17 8:19 ` Feng Li 2020-03-17 10:26 ` Ming Lei 0 siblings, 1 reply; 13+ messages in thread From: Feng Li @ 2020-03-17 8:19 UTC (permalink / raw) To: Ming Lei; +Cc: Ming Lei, linux-block Thanks. Sometimes when I observe multipage bvec on 5.3.7-301.fc31.x86_64. This log is from Qemu virtio-blk. ========= size: 262144, iovcnt: 2 0: size: 229376 addr: 0x7fff6a7c8000 1: size: 32768 addr: 0x7fff64c00000 ========= size: 262144, iovcnt: 2 0: size: 229376 addr: 0x7fff6a7c8000 1: size: 32768 addr: 0x7fff64c00000 I also tested on 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64. And observe 64 iovcnt. ========= size: 262144, iovcnt: 64 0: size: 4096 addr: 0x7fffb5ece000 1: size: 4096 addr: 0x7fffb5ecd000 ... 63: size: 4096 addr: 0x7fff8baec000 So I think this is a common issue of the upstream kernel, from 5.3 to 5.6. BTW, I have used your script on 5.3.7-301.fc31.x86_64, it works well. However, when updating to kernel 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64. It complains: root@192.168.19.239 16:57:23 ~ $ ./bvec_avg_pages.py In file included from /virtual/main.c:2: In file included from /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/uapi/linux/ptrace.h:142: In file included from /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/ptrace.h:5: /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/segment.h:266:2: error: expected '(' after 'asm' alternative_io ("lsl %[seg],%[p]", ^ /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/alternative.h:240:2: note: expanded from macro 'alternative_io' asm_inline volatile (ALTERNATIVE(oldinstr, newinstr, feature) \ ^ /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/linux/compiler_types.h:210:24: note: expanded from macro 'asm_inline' #define asm_inline asm __inline ^ In file included from /virtual/main.c:3: In file included from /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/linux/blkdev.h:5: In file included from /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/linux/sched.h:14: In file included from /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/linux/pid.h:5: In file included from /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/linux/rculist.h:11: In file included from /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/linux/rcupdate.h:27: In file included from /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/linux/preempt.h:78: In file included from /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/preempt.h:7: In file included from /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/linux/thread_info.h:38: In file included from /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/thread_info.h:12: In file included from /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/page.h:12: /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/page_64.h:49:2: error: expected '(' after 'asm' alternative_call_2(clear_page_orig, Ming Lei <tom.leiming@gmail.com> 于2020年3月16日周一 下午11:22写道: > > On Sun, Mar 15, 2020 at 9:34 AM Feng Li <lifeng1519@gmail.com> wrote: > > > > Hi Ming, > > This is my cmd to run qemu: > > qemu-2.12.0/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -device > > virtio-balloon -cpu host -smp 4 -m 2G -drive > > file=/root/html/fedora-10g.img,format=raw,cache=none,aio=native,if=none,id=drive-virtio-disk1 > > -device virtio-blk-pci,scsi=off,drive=drive-virtio-disk1,id=virtio-disk1,bootindex=1 > > -drive file=/dev/sdb,format=raw,cache=none,aio=native,if=none,id=drive-virtio-disk2 > > -device virtio-blk-pci,scsi=off,drive=drive-virtio-disk2,id=virtio-disk2,bootindex=2 > > -device virtio-net,netdev=nw1,mac=00:11:22:EE:EE:10 -netdev > > tap,id=nw1,script=no,downscript=no,ifname=tap0 -serial mon:stdio > > -nographic -object > > memory-backend-file,id=mem0,size=2G,mem-path=/dev/hugepages,share=on > > -numa node,memdev=mem0 -vnc 0.0.0.0:100 -machine usb=on,nvdimm -device > > usb-tablet -monitor unix:///tmp/a.socket,server,nowait -qmp > > tcp:0.0.0.0:2234,server,nowait > > > > OS image is Fedora 31. Kernel is 5.3.7-301.fc31.x86_64. > > > > The address from virio in qemu like this: > > ========= size: 262144, iovcnt: 64 > > 0: size: 4096 addr: 0x7fffc83f1000 > > 1: size: 4096 addr: 0x7fffc8037000 > > 2: size: 4096 addr: 0x7fffd3710000 > > 3: size: 4096 addr: 0x7fffd5624000 > > 4: size: 4096 addr: 0x7fffc766c000 > > 5: size: 4096 addr: 0x7fffc7c21000 > > 6: size: 4096 addr: 0x7fffc8d54000 > > 7: size: 4096 addr: 0x7fffc8fc6000 > > 8: size: 4096 addr: 0x7fffd5659000 > > 9: size: 4096 addr: 0x7fffc7f88000 > > 10: size: 4096 addr: 0x7fffc767b000 > > 11: size: 4096 addr: 0x7fffc8332000 > > 12: size: 4096 addr: 0x7fffb4297000 > > 13: size: 4096 addr: 0x7fffc8888000 > > 14: size: 4096 addr: 0x7fffc93d7000 > > 15: size: 4096 addr: 0x7fffc9f1f000 > > > > They are not contiguous pages, so the pages in bvec are not continus > > physical pages. > > > > I don't know how to dump the bvec address in bio without recompiling the kernel. > > I just run similar test on 5.3.11-100.fc29.x86_64, and the observation > is similar with > yours. > > However, not observe similar problem in 5.6-rc kernel in VM, maybe kernel config > causes the difference. > > BTW, I usually use the attached bcc script to observe bvec pages, and you may > try that on upstream kernel. > > Thanks, > Ming ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Question] IO is split by block layer when size is larger than 4k 2020-03-17 8:19 ` Feng Li @ 2020-03-17 10:26 ` Ming Lei 2020-03-18 6:29 ` Feng Li 0 siblings, 1 reply; 13+ messages in thread From: Ming Lei @ 2020-03-17 10:26 UTC (permalink / raw) To: Feng Li; +Cc: Ming Lei, linux-block On Tue, Mar 17, 2020 at 04:19:44PM +0800, Feng Li wrote: > Thanks. > Sometimes when I observe multipage bvec on 5.3.7-301.fc31.x86_64. > This log is from Qemu virtio-blk. > > ========= size: 262144, iovcnt: 2 > 0: size: 229376 addr: 0x7fff6a7c8000 > 1: size: 32768 addr: 0x7fff64c00000 > ========= size: 262144, iovcnt: 2 > 0: size: 229376 addr: 0x7fff6a7c8000 > 1: size: 32768 addr: 0x7fff64c00000 Then it is working. > > I also tested on 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64. > And observe 64 iovcnt. > ========= size: 262144, iovcnt: 64 > 0: size: 4096 addr: 0x7fffb5ece000 > 1: size: 4096 addr: 0x7fffb5ecd000 > ... > 63: size: 4096 addr: 0x7fff8baec000 > > So I think this is a common issue of the upstream kernel, from 5.3 to 5.6. As I mentioned before, it is because the pages aren't contiguous physically. If you enable hugepage, you will see lot of pages in one single bvec. > > BTW, I have used your script on 5.3.7-301.fc31.x86_64, it works well. > However, when updating to kernel 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64. > It complains: > > root@192.168.19.239 16:57:23 ~ $ ./bvec_avg_pages.py > In file included from /virtual/main.c:2: > In file included from > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/uapi/linux/ptrace.h:142: > In file included from > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/ptrace.h:5: > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/segment.h:266:2: > error: expected '(' after 'asm' > alternative_io ("lsl %[seg],%[p]", It can be workaround by commenting the following line in /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/generated/autoconf.h: #define CONFIG_CC_HAS_ASM_INLINE 1 Thanks, Ming ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Question] IO is split by block layer when size is larger than 4k 2020-03-17 10:26 ` Ming Lei @ 2020-03-18 6:29 ` Feng Li 2020-03-18 7:36 ` Ming Lei 0 siblings, 1 reply; 13+ messages in thread From: Feng Li @ 2020-03-18 6:29 UTC (permalink / raw) To: Ming Lei; +Cc: Ming Lei, linux-block Hi Ming, What I need is that always get contiguous pages in one bvec. Maybe currently it's hard to satisfy this requirement. About huge pages, I know the userspace processes could use huge pages that kernel reserved. Could bio/block layer support use huge pages? Thanks again for your help. Ming Lei <ming.lei@redhat.com> 于2020年3月17日周二 下午6:27写道: > > On Tue, Mar 17, 2020 at 04:19:44PM +0800, Feng Li wrote: > > Thanks. > > Sometimes when I observe multipage bvec on 5.3.7-301.fc31.x86_64. > > This log is from Qemu virtio-blk. > > > > ========= size: 262144, iovcnt: 2 > > 0: size: 229376 addr: 0x7fff6a7c8000 > > 1: size: 32768 addr: 0x7fff64c00000 > > ========= size: 262144, iovcnt: 2 > > 0: size: 229376 addr: 0x7fff6a7c8000 > > 1: size: 32768 addr: 0x7fff64c00000 > > Then it is working. > > > > > I also tested on 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64. > > And observe 64 iovcnt. > > ========= size: 262144, iovcnt: 64 > > 0: size: 4096 addr: 0x7fffb5ece000 > > 1: size: 4096 addr: 0x7fffb5ecd000 > > ... > > 63: size: 4096 addr: 0x7fff8baec000 > > > > So I think this is a common issue of the upstream kernel, from 5.3 to 5.6. > > As I mentioned before, it is because the pages aren't contiguous > physically. > > If you enable hugepage, you will see lot of pages in one single bvec. > > > > > BTW, I have used your script on 5.3.7-301.fc31.x86_64, it works well. > > However, when updating to kernel 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64. > > It complains: > > > > root@192.168.19.239 16:57:23 ~ $ ./bvec_avg_pages.py > > In file included from /virtual/main.c:2: > > In file included from > > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/uapi/linux/ptrace.h:142: > > In file included from > > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/ptrace.h:5: > > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/segment.h:266:2: > > error: expected '(' after 'asm' > > alternative_io ("lsl %[seg],%[p]", > > It can be workaround by commenting the following line in > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/generated/autoconf.h: > > #define CONFIG_CC_HAS_ASM_INLINE 1 > > > Thanks, > Ming > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Question] IO is split by block layer when size is larger than 4k 2020-03-18 6:29 ` Feng Li @ 2020-03-18 7:36 ` Ming Lei 2020-03-19 6:38 ` Feng Li 0 siblings, 1 reply; 13+ messages in thread From: Ming Lei @ 2020-03-18 7:36 UTC (permalink / raw) To: Feng Li; +Cc: Ming Lei, linux-block On Wed, Mar 18, 2020 at 02:29:17PM +0800, Feng Li wrote: > Hi Ming, > What I need is that always get contiguous pages in one bvec. > Maybe currently it's hard to satisfy this requirement. > About huge pages, I know the userspace processes could use huge pages > that kernel reserved. > Could bio/block layer support use huge pages? Yes, you will see all pages in one huge page are stored in one single bvec. > > Thanks again for your help. > > Ming Lei <ming.lei@redhat.com> 于2020年3月17日周二 下午6:27写道: > > > > > On Tue, Mar 17, 2020 at 04:19:44PM +0800, Feng Li wrote: > > > Thanks. > > > Sometimes when I observe multipage bvec on 5.3.7-301.fc31.x86_64. > > > This log is from Qemu virtio-blk. > > > > > > ========= size: 262144, iovcnt: 2 > > > 0: size: 229376 addr: 0x7fff6a7c8000 > > > 1: size: 32768 addr: 0x7fff64c00000 > > > ========= size: 262144, iovcnt: 2 > > > 0: size: 229376 addr: 0x7fff6a7c8000 > > > 1: size: 32768 addr: 0x7fff64c00000 > > > > Then it is working. > > > > > > > > I also tested on 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64. > > > And observe 64 iovcnt. > > > ========= size: 262144, iovcnt: 64 > > > 0: size: 4096 addr: 0x7fffb5ece000 > > > 1: size: 4096 addr: 0x7fffb5ecd000 > > > ... > > > 63: size: 4096 addr: 0x7fff8baec000 > > > > > > So I think this is a common issue of the upstream kernel, from 5.3 to 5.6. > > > > As I mentioned before, it is because the pages aren't contiguous > > physically. > > > > If you enable hugepage, you will see lot of pages in one single bvec. > > > > > > > > BTW, I have used your script on 5.3.7-301.fc31.x86_64, it works well. > > > However, when updating to kernel 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64. > > > It complains: > > > > > > root@192.168.19.239 16:57:23 ~ $ ./bvec_avg_pages.py > > > In file included from /virtual/main.c:2: > > > In file included from > > > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/uapi/linux/ptrace.h:142: > > > In file included from > > > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/ptrace.h:5: > > > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/segment.h:266:2: > > > error: expected '(' after 'asm' > > > alternative_io ("lsl %[seg],%[p]", > > > > It can be workaround by commenting the following line in > > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/generated/autoconf.h: > > > > #define CONFIG_CC_HAS_ASM_INLINE 1 > > > > > > Thanks, > > Ming > > > -- Ming ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Question] IO is split by block layer when size is larger than 4k 2020-03-18 7:36 ` Ming Lei @ 2020-03-19 6:38 ` Feng Li 0 siblings, 0 replies; 13+ messages in thread From: Feng Li @ 2020-03-19 6:38 UTC (permalink / raw) To: Ming Lei; +Cc: Ming Lei, linux-block Ok, thanks. Ming Lei <ming.lei@redhat.com> 于2020年3月18日周三 下午3:36写道: > > On Wed, Mar 18, 2020 at 02:29:17PM +0800, Feng Li wrote: > > Hi Ming, > > What I need is that always get contiguous pages in one bvec. > > Maybe currently it's hard to satisfy this requirement. > > About huge pages, I know the userspace processes could use huge pages > > that kernel reserved. > > Could bio/block layer support use huge pages? > > Yes, you will see all pages in one huge page are stored in one single > bvec. > > > > > Thanks again for your help. > > > > Ming Lei <ming.lei@redhat.com> 于2020年3月17日周二 下午6:27写道: > > > > > > > > On Tue, Mar 17, 2020 at 04:19:44PM +0800, Feng Li wrote: > > > > Thanks. > > > > Sometimes when I observe multipage bvec on 5.3.7-301.fc31.x86_64. > > > > This log is from Qemu virtio-blk. > > > > > > > > ========= size: 262144, iovcnt: 2 > > > > 0: size: 229376 addr: 0x7fff6a7c8000 > > > > 1: size: 32768 addr: 0x7fff64c00000 > > > > ========= size: 262144, iovcnt: 2 > > > > 0: size: 229376 addr: 0x7fff6a7c8000 > > > > 1: size: 32768 addr: 0x7fff64c00000 > > > > > > Then it is working. > > > > > > > > > > > I also tested on 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64. > > > > And observe 64 iovcnt. > > > > ========= size: 262144, iovcnt: 64 > > > > 0: size: 4096 addr: 0x7fffb5ece000 > > > > 1: size: 4096 addr: 0x7fffb5ecd000 > > > > ... > > > > 63: size: 4096 addr: 0x7fff8baec000 > > > > > > > > So I think this is a common issue of the upstream kernel, from 5.3 to 5.6. > > > > > > As I mentioned before, it is because the pages aren't contiguous > > > physically. > > > > > > If you enable hugepage, you will see lot of pages in one single bvec. > > > > > > > > > > > BTW, I have used your script on 5.3.7-301.fc31.x86_64, it works well. > > > > However, when updating to kernel 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64. > > > > It complains: > > > > > > > > root@192.168.19.239 16:57:23 ~ $ ./bvec_avg_pages.py > > > > In file included from /virtual/main.c:2: > > > > In file included from > > > > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/uapi/linux/ptrace.h:142: > > > > In file included from > > > > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/ptrace.h:5: > > > > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/segment.h:266:2: > > > > error: expected '(' after 'asm' > > > > alternative_io ("lsl %[seg],%[p]", > > > > > > It can be workaround by commenting the following line in > > > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/generated/autoconf.h: > > > > > > #define CONFIG_CC_HAS_ASM_INLINE 1 > > > > > > > > > Thanks, > > > Ming > > > > > > > -- > Ming > ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2020-03-19 6:39 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-03-12 11:13 [Question] IO is split by block layer when size is larger than 4k Feng Li 2020-03-12 11:51 ` Hannes Reinecke 2020-03-12 12:09 ` Feng Li 2020-03-12 12:34 ` Ming Lei 2020-03-12 13:21 ` Feng Li 2020-03-13 2:31 ` Ming Lei 2020-03-14 17:43 ` Feng Li 2020-03-16 15:22 ` Ming Lei 2020-03-17 8:19 ` Feng Li 2020-03-17 10:26 ` Ming Lei 2020-03-18 6:29 ` Feng Li 2020-03-18 7:36 ` Ming Lei 2020-03-19 6:38 ` Feng Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).