linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Question] IO is split by block layer when size is larger than 4k
@ 2020-03-12 11:13 Feng Li
  2020-03-12 11:51 ` Hannes Reinecke
  2020-03-12 12:34 ` Ming Lei
  0 siblings, 2 replies; 13+ messages in thread
From: Feng Li @ 2020-03-12 11:13 UTC (permalink / raw)
  To: linux-block, ming.lei

Hi experts,

May I ask a question about block layer?
When running fio in guest os, I find a 256k IO is split into the page
by page in bio, saved in bvecs.
And virtio-blk just put the bio_vec one by one in the available
descriptor table.

So if my backend device does not support iovector
opertion(preadv/pwritev), then IO is issued to a low layer page by
page.
My question is: why doesn't the bio save multi-pages in one bio_vec?

the /dev/vdb is a vhost-user-blk-pci device from spdk or virtio-blk-pci device.
fio config is:
[global]
name=fio-rand-read
rw=randread
ioengine=libaio
direct=1
numjobs=1
iodepth=1
bs=256K
[file1]
filename=/dev/vdb

Traceing result like this:

/usr/share/bcc/tools/stackcount -K  -T  '__blockdev_direct_IO'
return 378048
 /usr/share/bcc/tools/stackcount -K  -T 'bio_add_page'
return 5878
I can get:
378048/5878 = 64
256k/4k=64.

__blockdev_direct_IO splits 256k to 64 parts.

The /dev/vdb queue properties is as follows:

[root@t1 00:10:42 queue]$find . | while read f;do echo "$f = $(cat $f)";done
./nomerges = 0
./logical_block_size = 512
./rq_affinity = 1
./discard_zeroes_data = 0
./max_segments = 126
./unpriv_sgio = 0
./max_segment_size = 4294967295
./rotational = 1
./scheduler = none
./read_ahead_kb = 128
./max_hw_sectors_kb = 2147483647
./discard_granularity = 0
./discard_max_bytes = 0
./write_same_max_bytes = 0
./max_integrity_segments = 0
./max_sectors_kb = 512
./physical_block_size = 512
./add_random = 0
./nr_requests = 128
./minimum_io_size = 512
./hw_sector_size = 512
./optimal_io_size

Sometimes the part io size is bigger than 4k.
some logs:
id: 0 size: 4096
...
id: 57 size: 4096
id: 58 size: 24576

Why does this happen?

kernel version:
1. 3.10.0-1062.1.2.el7
2. 5.3

Thanks in advance.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Question] IO is split by block layer when size is larger than 4k
  2020-03-12 11:13 [Question] IO is split by block layer when size is larger than 4k Feng Li
@ 2020-03-12 11:51 ` Hannes Reinecke
  2020-03-12 12:09   ` Feng Li
  2020-03-12 12:34 ` Ming Lei
  1 sibling, 1 reply; 13+ messages in thread
From: Hannes Reinecke @ 2020-03-12 11:51 UTC (permalink / raw)
  To: Feng Li, linux-block, ming.lei

On 3/12/20 12:13 PM, Feng Li wrote:
> Hi experts,
> 
> May I ask a question about block layer?
> When running fio in guest os, I find a 256k IO is split into the page
> by page in bio, saved in bvecs.
> And virtio-blk just put the bio_vec one by one in the available
> descriptor table.
> 
It isn't 'split', it's using _one_ bio containing bvecs, where each bvec
consists of one page.

'split' for the blocklayer means that a single I/O is split into several
bios, which I dont' think is the case here. Or?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Question] IO is split by block layer when size is larger than 4k
  2020-03-12 11:51 ` Hannes Reinecke
@ 2020-03-12 12:09   ` Feng Li
  0 siblings, 0 replies; 13+ messages in thread
From: Feng Li @ 2020-03-12 12:09 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: linux-block, ming.lei

Ok, thanks for the clarification.
I mean each bvec consists of one page.
It's not good for virtio backend.

Hannes Reinecke <hare@suse.de> 于2020年3月12日周四 下午7:51写道:
>
> On 3/12/20 12:13 PM, Feng Li wrote:
> > Hi experts,
> >
> > May I ask a question about block layer?
> > When running fio in guest os, I find a 256k IO is split into the page
> > by page in bio, saved in bvecs.
> > And virtio-blk just put the bio_vec one by one in the available
> > descriptor table.
> >
> It isn't 'split', it's using _one_ bio containing bvecs, where each bvec
> consists of one page.
>
> 'split' for the blocklayer means that a single I/O is split into several
> bios, which I dont' think is the case here. Or?
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                        Kernel Storage Architect
> hare@suse.de                                      +49 911 74053 688
> SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Question] IO is split by block layer when size is larger than 4k
  2020-03-12 11:13 [Question] IO is split by block layer when size is larger than 4k Feng Li
  2020-03-12 11:51 ` Hannes Reinecke
@ 2020-03-12 12:34 ` Ming Lei
  2020-03-12 13:21   ` Feng Li
  1 sibling, 1 reply; 13+ messages in thread
From: Ming Lei @ 2020-03-12 12:34 UTC (permalink / raw)
  To: Feng Li; +Cc: linux-block

On Thu, Mar 12, 2020 at 07:13:28PM +0800, Feng Li wrote:
> Hi experts,
> 
> May I ask a question about block layer?
> When running fio in guest os, I find a 256k IO is split into the page
> by page in bio, saved in bvecs.
> And virtio-blk just put the bio_vec one by one in the available
> descriptor table.
> 
> So if my backend device does not support iovector
> opertion(preadv/pwritev), then IO is issued to a low layer page by
> page.
> My question is: why doesn't the bio save multi-pages in one bio_vec?

We start multipage bvec since v5.1, especially since 07173c3ec276
("block: enable multipage bvecs").

Thanks, 
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Question] IO is split by block layer when size is larger than 4k
  2020-03-12 12:34 ` Ming Lei
@ 2020-03-12 13:21   ` Feng Li
  2020-03-13  2:31     ` Ming Lei
  0 siblings, 1 reply; 13+ messages in thread
From: Feng Li @ 2020-03-12 13:21 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block

Hi Ming,
Thanks.
I have tested kernel '5.4.0-rc6+', which includes 07173c3ec276.
But the virtio is still be filled with single page by page.

Ming Lei <ming.lei@redhat.com> 于2020年3月12日周四 下午8:34写道:
>
> On Thu, Mar 12, 2020 at 07:13:28PM +0800, Feng Li wrote:
> > Hi experts,
> >
> > May I ask a question about block layer?
> > When running fio in guest os, I find a 256k IO is split into the page
> > by page in bio, saved in bvecs.
> > And virtio-blk just put the bio_vec one by one in the available
> > descriptor table.
> >
> > So if my backend device does not support iovector
> > opertion(preadv/pwritev), then IO is issued to a low layer page by
> > page.
> > My question is: why doesn't the bio save multi-pages in one bio_vec?
>
> We start multipage bvec since v5.1, especially since 07173c3ec276
> ("block: enable multipage bvecs").
>
> Thanks,
> Ming
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Question] IO is split by block layer when size is larger than 4k
  2020-03-12 13:21   ` Feng Li
@ 2020-03-13  2:31     ` Ming Lei
  2020-03-14 17:43       ` Feng Li
  0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2020-03-13  2:31 UTC (permalink / raw)
  To: Feng Li; +Cc: linux-block

On Thu, Mar 12, 2020 at 09:21:11PM +0800, Feng Li wrote:
> Hi Ming,
> Thanks.
> I have tested kernel '5.4.0-rc6+', which includes 07173c3ec276.
> But the virtio is still be filled with single page by page.

Hello,

Could you share your test script?

BTW, it depends if fs layer passes contiguous pages to block layer.

You can dump each bvec of the bio, and see if they are contiguous
physically.

Thanks,
Ming

> 
> Ming Lei <ming.lei@redhat.com> 于2020年3月12日周四 下午8:34写道:
> >
> > On Thu, Mar 12, 2020 at 07:13:28PM +0800, Feng Li wrote:
> > > Hi experts,
> > >
> > > May I ask a question about block layer?
> > > When running fio in guest os, I find a 256k IO is split into the page
> > > by page in bio, saved in bvecs.
> > > And virtio-blk just put the bio_vec one by one in the available
> > > descriptor table.
> > >
> > > So if my backend device does not support iovector
> > > opertion(preadv/pwritev), then IO is issued to a low layer page by
> > > page.
> > > My question is: why doesn't the bio save multi-pages in one bio_vec?
> >
> > We start multipage bvec since v5.1, especially since 07173c3ec276
> > ("block: enable multipage bvecs").
> >
> > Thanks,
> > Ming
> >
> 

-- 
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Question] IO is split by block layer when size is larger than 4k
  2020-03-13  2:31     ` Ming Lei
@ 2020-03-14 17:43       ` Feng Li
  2020-03-16 15:22         ` Ming Lei
  0 siblings, 1 reply; 13+ messages in thread
From: Feng Li @ 2020-03-14 17:43 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block

Hi Ming,
This is my cmd to run qemu:
qemu-2.12.0/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -device
virtio-balloon -cpu host -smp 4 -m 2G -drive
file=/root/html/fedora-10g.img,format=raw,cache=none,aio=native,if=none,id=drive-virtio-disk1
-device virtio-blk-pci,scsi=off,drive=drive-virtio-disk1,id=virtio-disk1,bootindex=1
-drive file=/dev/sdb,format=raw,cache=none,aio=native,if=none,id=drive-virtio-disk2
-device virtio-blk-pci,scsi=off,drive=drive-virtio-disk2,id=virtio-disk2,bootindex=2
-device virtio-net,netdev=nw1,mac=00:11:22:EE:EE:10 -netdev
tap,id=nw1,script=no,downscript=no,ifname=tap0 -serial mon:stdio
-nographic -object
memory-backend-file,id=mem0,size=2G,mem-path=/dev/hugepages,share=on
-numa node,memdev=mem0 -vnc 0.0.0.0:100 -machine usb=on,nvdimm -device
usb-tablet -monitor unix:///tmp/a.socket,server,nowait -qmp
tcp:0.0.0.0:2234,server,nowait

OS image is Fedora 31. Kernel is 5.3.7-301.fc31.x86_64.

The address from virio in qemu like this:
========= size: 262144, iovcnt: 64
      0: size: 4096 addr: 0x7fffc83f1000
      1: size: 4096 addr: 0x7fffc8037000
      2: size: 4096 addr: 0x7fffd3710000
      3: size: 4096 addr: 0x7fffd5624000
      4: size: 4096 addr: 0x7fffc766c000
      5: size: 4096 addr: 0x7fffc7c21000
      6: size: 4096 addr: 0x7fffc8d54000
      7: size: 4096 addr: 0x7fffc8fc6000
      8: size: 4096 addr: 0x7fffd5659000
      9: size: 4096 addr: 0x7fffc7f88000
      10: size: 4096 addr: 0x7fffc767b000
      11: size: 4096 addr: 0x7fffc8332000
      12: size: 4096 addr: 0x7fffb4297000
      13: size: 4096 addr: 0x7fffc8888000
      14: size: 4096 addr: 0x7fffc93d7000
      15: size: 4096 addr: 0x7fffc9f1f000

They are not contiguous pages, so the pages in bvec are not continus
physical pages.

I don't know how to dump the bvec address in bio without recompiling the kernel.

IO Pattern in guest is :
root@192.168.19.239 02:39:29 ~ $ cat 256k-randread.fio
[global]
ioengine=libaio
invalidate=1
ramp_time=5
iodepth=1
runtime=120000
time_based
direct=1

[randread-vdb-256k-para]
bs=256k
stonewall
filename=/dev/vdb
rw=randread

Thanks.


Ming Lei <ming.lei@redhat.com> 于2020年3月13日周五 上午10:32写道:
>
> On Thu, Mar 12, 2020 at 09:21:11PM +0800, Feng Li wrote:
> > Hi Ming,
> > Thanks.
> > I have tested kernel '5.4.0-rc6+', which includes 07173c3ec276.
> > But the virtio is still be filled with single page by page.
>
> Hello,
>
> Could you share your test script?
>
> BTW, it depends if fs layer passes contiguous pages to block layer.
>
> You can dump each bvec of the bio, and see if they are contiguous
> physically.
>
> Thanks,
> Ming
>
> >
> > Ming Lei <ming.lei@redhat.com> 于2020年3月12日周四 下午8:34写道:
> > >
> > > On Thu, Mar 12, 2020 at 07:13:28PM +0800, Feng Li wrote:
> > > > Hi experts,
> > > >
> > > > May I ask a question about block layer?
> > > > When running fio in guest os, I find a 256k IO is split into the page
> > > > by page in bio, saved in bvecs.
> > > > And virtio-blk just put the bio_vec one by one in the available
> > > > descriptor table.
> > > >
> > > > So if my backend device does not support iovector
> > > > opertion(preadv/pwritev), then IO is issued to a low layer page by
> > > > page.
> > > > My question is: why doesn't the bio save multi-pages in one bio_vec?
> > >
> > > We start multipage bvec since v5.1, especially since 07173c3ec276
> > > ("block: enable multipage bvecs").
> > >
> > > Thanks,
> > > Ming
> > >
> >
>
> --
> Ming
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Question] IO is split by block layer when size is larger than 4k
  2020-03-14 17:43       ` Feng Li
@ 2020-03-16 15:22         ` Ming Lei
  2020-03-17  8:19           ` Feng Li
  0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2020-03-16 15:22 UTC (permalink / raw)
  To: Feng Li; +Cc: Ming Lei, linux-block

[-- Attachment #1: Type: text/plain, Size: 2308 bytes --]

On Sun, Mar 15, 2020 at 9:34 AM Feng Li <lifeng1519@gmail.com> wrote:
>
> Hi Ming,
> This is my cmd to run qemu:
> qemu-2.12.0/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -device
> virtio-balloon -cpu host -smp 4 -m 2G -drive
> file=/root/html/fedora-10g.img,format=raw,cache=none,aio=native,if=none,id=drive-virtio-disk1
> -device virtio-blk-pci,scsi=off,drive=drive-virtio-disk1,id=virtio-disk1,bootindex=1
> -drive file=/dev/sdb,format=raw,cache=none,aio=native,if=none,id=drive-virtio-disk2
> -device virtio-blk-pci,scsi=off,drive=drive-virtio-disk2,id=virtio-disk2,bootindex=2
> -device virtio-net,netdev=nw1,mac=00:11:22:EE:EE:10 -netdev
> tap,id=nw1,script=no,downscript=no,ifname=tap0 -serial mon:stdio
> -nographic -object
> memory-backend-file,id=mem0,size=2G,mem-path=/dev/hugepages,share=on
> -numa node,memdev=mem0 -vnc 0.0.0.0:100 -machine usb=on,nvdimm -device
> usb-tablet -monitor unix:///tmp/a.socket,server,nowait -qmp
> tcp:0.0.0.0:2234,server,nowait
>
> OS image is Fedora 31. Kernel is 5.3.7-301.fc31.x86_64.
>
> The address from virio in qemu like this:
> ========= size: 262144, iovcnt: 64
>       0: size: 4096 addr: 0x7fffc83f1000
>       1: size: 4096 addr: 0x7fffc8037000
>       2: size: 4096 addr: 0x7fffd3710000
>       3: size: 4096 addr: 0x7fffd5624000
>       4: size: 4096 addr: 0x7fffc766c000
>       5: size: 4096 addr: 0x7fffc7c21000
>       6: size: 4096 addr: 0x7fffc8d54000
>       7: size: 4096 addr: 0x7fffc8fc6000
>       8: size: 4096 addr: 0x7fffd5659000
>       9: size: 4096 addr: 0x7fffc7f88000
>       10: size: 4096 addr: 0x7fffc767b000
>       11: size: 4096 addr: 0x7fffc8332000
>       12: size: 4096 addr: 0x7fffb4297000
>       13: size: 4096 addr: 0x7fffc8888000
>       14: size: 4096 addr: 0x7fffc93d7000
>       15: size: 4096 addr: 0x7fffc9f1f000
>
> They are not contiguous pages, so the pages in bvec are not continus
> physical pages.
>
> I don't know how to dump the bvec address in bio without recompiling the kernel.

I just run similar test on 5.3.11-100.fc29.x86_64, and the observation
is similar with
yours.

However, not observe similar problem in 5.6-rc kernel in VM, maybe kernel config
causes the difference.

BTW, I usually use the attached bcc script to observe bvec pages, and you may
try that on upstream kernel.

Thanks,
Ming

[-- Attachment #2: bvec_avg_pages.py --]
[-- Type: text/x-python, Size: 2144 bytes --]

#!/usr/bin/python3
#
# bvec_pages.py
#
# Written as a basic example of a function pages per bvec distribution histogram.
#
# USAGE: bvec_pages
#
# The default interval is 5 seconds. A Ctrl-C will print the partially
# gathered histogram then exit.
#
# Copyright (c) 2016 Ming Lei
# Licensed under the Apache License, Version 2.0 (the "License")
#
# 15-Aug-2015	Ming Lei	Created this.

from bcc import BPF
from ctypes import c_ushort, c_int, c_ulonglong
from time import sleep
from sys import argv
import os

# define BPF program
bpf_text = """
#include <uapi/linux/ptrace.h>
#include <linux/blkdev.h>

struct key_t {
    unsigned dev_no;
};

struct val_t {
    u64 bvec_cnt;
    u64 size;
    u64 bio_cnt;
};

BPF_HASH(bvec, struct key_t, struct val_t);

// time block I/O
int trace_submit_bio(struct pt_regs *ctx, struct bio *bio)
{
    unsigned short vcnt;
    unsigned size;

    size = bio->bi_iter.bi_size;
    vcnt = bio->bi_vcnt;

    if (vcnt) {
        struct val_t *valp;
        struct key_t key;
        struct val_t zero = {0};

#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 14, 0)
        int maj, min;

        maj = bio->bi_disk->major;
        min = bio->bi_disk->first_minor;
        key.dev_no = (unsigned)MKDEV(maj, min);
#else
        key.dev_no = (unsigned)bio->bi_bdev->bd_dev;
#endif
        valp = bvec.lookup_or_init(&key, &zero);
        valp->bvec_cnt += vcnt;
        valp->size += size;
        valp->bio_cnt += 1;
    }

    //bpf_trace_printk("pages %d, vcnt: %d\\n", size>>12, vcnt);

    return 0;
}

"""

# load BPF program
b = BPF(text=bpf_text);
b.attach_kprobe(event="submit_bio", fn_name="trace_submit_bio")

# header
print("Tracing... Hit Ctrl-C to end.")

# output
try:
    sleep(99999999)
except KeyboardInterrupt:
    pass

page_size = os.sysconf("SC_PAGE_SIZE")
print("\n%-7s %-12s %12s %12s" % ("DEVICE", "PAGES_PER_BVEC", "SIZE_PER_BIO", "VCNT_PER_BIO"))
counts = b.get_table("bvec")
for k, v in counts.items():
    pgs = v.size / page_size
    print("%-3d:%-3d %-12d %12dKB %12d" % (k.dev_no >> 20, k.dev_no & ((1 << 20) - 1), pgs / v.bvec_cnt, (v.size >> 10) / v.bio_cnt, v.bvec_cnt / v.bio_cnt))


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Question] IO is split by block layer when size is larger than 4k
  2020-03-16 15:22         ` Ming Lei
@ 2020-03-17  8:19           ` Feng Li
  2020-03-17 10:26             ` Ming Lei
  0 siblings, 1 reply; 13+ messages in thread
From: Feng Li @ 2020-03-17  8:19 UTC (permalink / raw)
  To: Ming Lei; +Cc: Ming Lei, linux-block

Thanks.
Sometimes when I observe multipage bvec on 5.3.7-301.fc31.x86_64.
This log is from Qemu virtio-blk.

========= size: 262144, iovcnt: 2
      0: size: 229376 addr: 0x7fff6a7c8000
      1: size: 32768 addr: 0x7fff64c00000
========= size: 262144, iovcnt: 2
      0: size: 229376 addr: 0x7fff6a7c8000
      1: size: 32768 addr: 0x7fff64c00000

I also tested on 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64.
And observe 64 iovcnt.
========= size: 262144, iovcnt: 64
      0: size: 4096 addr: 0x7fffb5ece000
      1: size: 4096 addr: 0x7fffb5ecd000
...
      63: size: 4096 addr: 0x7fff8baec000

So I think this is a common issue of the upstream kernel, from 5.3 to 5.6.

BTW, I have used your script on 5.3.7-301.fc31.x86_64, it works well.
However, when updating to kernel 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64.
It complains:

root@192.168.19.239 16:57:23 ~ $ ./bvec_avg_pages.py
In file included from /virtual/main.c:2:
In file included from
/lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/uapi/linux/ptrace.h:142:
In file included from
/lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/ptrace.h:5:
/lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/segment.h:266:2:
error: expected '(' after 'asm'
        alternative_io ("lsl %[seg],%[p]",
        ^
/lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/alternative.h:240:2:
note: expanded from macro 'alternative_io'
        asm_inline volatile (ALTERNATIVE(oldinstr, newinstr, feature)   \
        ^
/lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/linux/compiler_types.h:210:24:
note: expanded from macro 'asm_inline'
#define asm_inline asm __inline
                       ^
In file included from /virtual/main.c:3:
In file included from
/lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/linux/blkdev.h:5:
In file included from
/lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/linux/sched.h:14:
In file included from
/lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/linux/pid.h:5:
In file included from
/lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/linux/rculist.h:11:
In file included from
/lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/linux/rcupdate.h:27:
In file included from
/lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/linux/preempt.h:78:
In file included from
/lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/preempt.h:7:
In file included from
/lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/linux/thread_info.h:38:
In file included from
/lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/thread_info.h:12:
In file included from
/lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/page.h:12:
/lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/page_64.h:49:2:
error: expected '(' after 'asm'
        alternative_call_2(clear_page_orig,

Ming Lei <tom.leiming@gmail.com> 于2020年3月16日周一 下午11:22写道:
>
> On Sun, Mar 15, 2020 at 9:34 AM Feng Li <lifeng1519@gmail.com> wrote:
> >
> > Hi Ming,
> > This is my cmd to run qemu:
> > qemu-2.12.0/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -device
> > virtio-balloon -cpu host -smp 4 -m 2G -drive
> > file=/root/html/fedora-10g.img,format=raw,cache=none,aio=native,if=none,id=drive-virtio-disk1
> > -device virtio-blk-pci,scsi=off,drive=drive-virtio-disk1,id=virtio-disk1,bootindex=1
> > -drive file=/dev/sdb,format=raw,cache=none,aio=native,if=none,id=drive-virtio-disk2
> > -device virtio-blk-pci,scsi=off,drive=drive-virtio-disk2,id=virtio-disk2,bootindex=2
> > -device virtio-net,netdev=nw1,mac=00:11:22:EE:EE:10 -netdev
> > tap,id=nw1,script=no,downscript=no,ifname=tap0 -serial mon:stdio
> > -nographic -object
> > memory-backend-file,id=mem0,size=2G,mem-path=/dev/hugepages,share=on
> > -numa node,memdev=mem0 -vnc 0.0.0.0:100 -machine usb=on,nvdimm -device
> > usb-tablet -monitor unix:///tmp/a.socket,server,nowait -qmp
> > tcp:0.0.0.0:2234,server,nowait
> >
> > OS image is Fedora 31. Kernel is 5.3.7-301.fc31.x86_64.
> >
> > The address from virio in qemu like this:
> > ========= size: 262144, iovcnt: 64
> >       0: size: 4096 addr: 0x7fffc83f1000
> >       1: size: 4096 addr: 0x7fffc8037000
> >       2: size: 4096 addr: 0x7fffd3710000
> >       3: size: 4096 addr: 0x7fffd5624000
> >       4: size: 4096 addr: 0x7fffc766c000
> >       5: size: 4096 addr: 0x7fffc7c21000
> >       6: size: 4096 addr: 0x7fffc8d54000
> >       7: size: 4096 addr: 0x7fffc8fc6000
> >       8: size: 4096 addr: 0x7fffd5659000
> >       9: size: 4096 addr: 0x7fffc7f88000
> >       10: size: 4096 addr: 0x7fffc767b000
> >       11: size: 4096 addr: 0x7fffc8332000
> >       12: size: 4096 addr: 0x7fffb4297000
> >       13: size: 4096 addr: 0x7fffc8888000
> >       14: size: 4096 addr: 0x7fffc93d7000
> >       15: size: 4096 addr: 0x7fffc9f1f000
> >
> > They are not contiguous pages, so the pages in bvec are not continus
> > physical pages.
> >
> > I don't know how to dump the bvec address in bio without recompiling the kernel.
>
> I just run similar test on 5.3.11-100.fc29.x86_64, and the observation
> is similar with
> yours.
>
> However, not observe similar problem in 5.6-rc kernel in VM, maybe kernel config
> causes the difference.
>
> BTW, I usually use the attached bcc script to observe bvec pages, and you may
> try that on upstream kernel.
>
> Thanks,
> Ming

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Question] IO is split by block layer when size is larger than 4k
  2020-03-17  8:19           ` Feng Li
@ 2020-03-17 10:26             ` Ming Lei
  2020-03-18  6:29               ` Feng Li
  0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2020-03-17 10:26 UTC (permalink / raw)
  To: Feng Li; +Cc: Ming Lei, linux-block

On Tue, Mar 17, 2020 at 04:19:44PM +0800, Feng Li wrote:
> Thanks.
> Sometimes when I observe multipage bvec on 5.3.7-301.fc31.x86_64.
> This log is from Qemu virtio-blk.
> 
> ========= size: 262144, iovcnt: 2
>       0: size: 229376 addr: 0x7fff6a7c8000
>       1: size: 32768 addr: 0x7fff64c00000
> ========= size: 262144, iovcnt: 2
>       0: size: 229376 addr: 0x7fff6a7c8000
>       1: size: 32768 addr: 0x7fff64c00000

Then it is working.

> 
> I also tested on 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64.
> And observe 64 iovcnt.
> ========= size: 262144, iovcnt: 64
>       0: size: 4096 addr: 0x7fffb5ece000
>       1: size: 4096 addr: 0x7fffb5ecd000
> ...
>       63: size: 4096 addr: 0x7fff8baec000
> 
> So I think this is a common issue of the upstream kernel, from 5.3 to 5.6.

As I mentioned before, it is because the pages aren't contiguous
physically.

If you enable hugepage, you will see lot of pages in one single bvec.

> 
> BTW, I have used your script on 5.3.7-301.fc31.x86_64, it works well.
> However, when updating to kernel 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64.
> It complains:
> 
> root@192.168.19.239 16:57:23 ~ $ ./bvec_avg_pages.py
> In file included from /virtual/main.c:2:
> In file included from
> /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/uapi/linux/ptrace.h:142:
> In file included from
> /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/ptrace.h:5:
> /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/segment.h:266:2:
> error: expected '(' after 'asm'
>         alternative_io ("lsl %[seg],%[p]",

It can be workaround by commenting the following line in 
/lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/generated/autoconf.h:

#define CONFIG_CC_HAS_ASM_INLINE 1


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Question] IO is split by block layer when size is larger than 4k
  2020-03-17 10:26             ` Ming Lei
@ 2020-03-18  6:29               ` Feng Li
  2020-03-18  7:36                 ` Ming Lei
  0 siblings, 1 reply; 13+ messages in thread
From: Feng Li @ 2020-03-18  6:29 UTC (permalink / raw)
  To: Ming Lei; +Cc: Ming Lei, linux-block

Hi Ming,
What I need is that always get contiguous pages in one bvec.
Maybe currently it's hard to satisfy this requirement.
About huge pages, I know the userspace processes could use huge pages
that kernel reserved.
Could bio/block layer support use huge pages?

Thanks again for your help.

Ming Lei <ming.lei@redhat.com> 于2020年3月17日周二 下午6:27写道:

>
> On Tue, Mar 17, 2020 at 04:19:44PM +0800, Feng Li wrote:
> > Thanks.
> > Sometimes when I observe multipage bvec on 5.3.7-301.fc31.x86_64.
> > This log is from Qemu virtio-blk.
> >
> > ========= size: 262144, iovcnt: 2
> >       0: size: 229376 addr: 0x7fff6a7c8000
> >       1: size: 32768 addr: 0x7fff64c00000
> > ========= size: 262144, iovcnt: 2
> >       0: size: 229376 addr: 0x7fff6a7c8000
> >       1: size: 32768 addr: 0x7fff64c00000
>
> Then it is working.
>
> >
> > I also tested on 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64.
> > And observe 64 iovcnt.
> > ========= size: 262144, iovcnt: 64
> >       0: size: 4096 addr: 0x7fffb5ece000
> >       1: size: 4096 addr: 0x7fffb5ecd000
> > ...
> >       63: size: 4096 addr: 0x7fff8baec000
> >
> > So I think this is a common issue of the upstream kernel, from 5.3 to 5.6.
>
> As I mentioned before, it is because the pages aren't contiguous
> physically.
>
> If you enable hugepage, you will see lot of pages in one single bvec.
>
> >
> > BTW, I have used your script on 5.3.7-301.fc31.x86_64, it works well.
> > However, when updating to kernel 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64.
> > It complains:
> >
> > root@192.168.19.239 16:57:23 ~ $ ./bvec_avg_pages.py
> > In file included from /virtual/main.c:2:
> > In file included from
> > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/uapi/linux/ptrace.h:142:
> > In file included from
> > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/ptrace.h:5:
> > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/segment.h:266:2:
> > error: expected '(' after 'asm'
> >         alternative_io ("lsl %[seg],%[p]",
>
> It can be workaround by commenting the following line in
> /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/generated/autoconf.h:
>
> #define CONFIG_CC_HAS_ASM_INLINE 1
>
>
> Thanks,
> Ming
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Question] IO is split by block layer when size is larger than 4k
  2020-03-18  6:29               ` Feng Li
@ 2020-03-18  7:36                 ` Ming Lei
  2020-03-19  6:38                   ` Feng Li
  0 siblings, 1 reply; 13+ messages in thread
From: Ming Lei @ 2020-03-18  7:36 UTC (permalink / raw)
  To: Feng Li; +Cc: Ming Lei, linux-block

On Wed, Mar 18, 2020 at 02:29:17PM +0800, Feng Li wrote:
> Hi Ming,
> What I need is that always get contiguous pages in one bvec.
> Maybe currently it's hard to satisfy this requirement.
> About huge pages, I know the userspace processes could use huge pages
> that kernel reserved.
> Could bio/block layer support use huge pages?

Yes, you will see all pages in one huge page are stored in one single
bvec.

> 
> Thanks again for your help.
> 
> Ming Lei <ming.lei@redhat.com> 于2020年3月17日周二 下午6:27写道:
> 
> >
> > On Tue, Mar 17, 2020 at 04:19:44PM +0800, Feng Li wrote:
> > > Thanks.
> > > Sometimes when I observe multipage bvec on 5.3.7-301.fc31.x86_64.
> > > This log is from Qemu virtio-blk.
> > >
> > > ========= size: 262144, iovcnt: 2
> > >       0: size: 229376 addr: 0x7fff6a7c8000
> > >       1: size: 32768 addr: 0x7fff64c00000
> > > ========= size: 262144, iovcnt: 2
> > >       0: size: 229376 addr: 0x7fff6a7c8000
> > >       1: size: 32768 addr: 0x7fff64c00000
> >
> > Then it is working.
> >
> > >
> > > I also tested on 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64.
> > > And observe 64 iovcnt.
> > > ========= size: 262144, iovcnt: 64
> > >       0: size: 4096 addr: 0x7fffb5ece000
> > >       1: size: 4096 addr: 0x7fffb5ecd000
> > > ...
> > >       63: size: 4096 addr: 0x7fff8baec000
> > >
> > > So I think this is a common issue of the upstream kernel, from 5.3 to 5.6.
> >
> > As I mentioned before, it is because the pages aren't contiguous
> > physically.
> >
> > If you enable hugepage, you will see lot of pages in one single bvec.
> >
> > >
> > > BTW, I have used your script on 5.3.7-301.fc31.x86_64, it works well.
> > > However, when updating to kernel 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64.
> > > It complains:
> > >
> > > root@192.168.19.239 16:57:23 ~ $ ./bvec_avg_pages.py
> > > In file included from /virtual/main.c:2:
> > > In file included from
> > > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/uapi/linux/ptrace.h:142:
> > > In file included from
> > > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/ptrace.h:5:
> > > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/segment.h:266:2:
> > > error: expected '(' after 'asm'
> > >         alternative_io ("lsl %[seg],%[p]",
> >
> > It can be workaround by commenting the following line in
> > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/generated/autoconf.h:
> >
> > #define CONFIG_CC_HAS_ASM_INLINE 1
> >
> >
> > Thanks,
> > Ming
> >
> 

-- 
Ming


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Question] IO is split by block layer when size is larger than 4k
  2020-03-18  7:36                 ` Ming Lei
@ 2020-03-19  6:38                   ` Feng Li
  0 siblings, 0 replies; 13+ messages in thread
From: Feng Li @ 2020-03-19  6:38 UTC (permalink / raw)
  To: Ming Lei; +Cc: Ming Lei, linux-block

Ok, thanks.

Ming Lei <ming.lei@redhat.com> 于2020年3月18日周三 下午3:36写道:
>
> On Wed, Mar 18, 2020 at 02:29:17PM +0800, Feng Li wrote:
> > Hi Ming,
> > What I need is that always get contiguous pages in one bvec.
> > Maybe currently it's hard to satisfy this requirement.
> > About huge pages, I know the userspace processes could use huge pages
> > that kernel reserved.
> > Could bio/block layer support use huge pages?
>
> Yes, you will see all pages in one huge page are stored in one single
> bvec.
>
> >
> > Thanks again for your help.
> >
> > Ming Lei <ming.lei@redhat.com> 于2020年3月17日周二 下午6:27写道:
> >
> > >
> > > On Tue, Mar 17, 2020 at 04:19:44PM +0800, Feng Li wrote:
> > > > Thanks.
> > > > Sometimes when I observe multipage bvec on 5.3.7-301.fc31.x86_64.
> > > > This log is from Qemu virtio-blk.
> > > >
> > > > ========= size: 262144, iovcnt: 2
> > > >       0: size: 229376 addr: 0x7fff6a7c8000
> > > >       1: size: 32768 addr: 0x7fff64c00000
> > > > ========= size: 262144, iovcnt: 2
> > > >       0: size: 229376 addr: 0x7fff6a7c8000
> > > >       1: size: 32768 addr: 0x7fff64c00000
> > >
> > > Then it is working.
> > >
> > > >
> > > > I also tested on 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64.
> > > > And observe 64 iovcnt.
> > > > ========= size: 262144, iovcnt: 64
> > > >       0: size: 4096 addr: 0x7fffb5ece000
> > > >       1: size: 4096 addr: 0x7fffb5ecd000
> > > > ...
> > > >       63: size: 4096 addr: 0x7fff8baec000
> > > >
> > > > So I think this is a common issue of the upstream kernel, from 5.3 to 5.6.
> > >
> > > As I mentioned before, it is because the pages aren't contiguous
> > > physically.
> > >
> > > If you enable hugepage, you will see lot of pages in one single bvec.
> > >
> > > >
> > > > BTW, I have used your script on 5.3.7-301.fc31.x86_64, it works well.
> > > > However, when updating to kernel 5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64.
> > > > It complains:
> > > >
> > > > root@192.168.19.239 16:57:23 ~ $ ./bvec_avg_pages.py
> > > > In file included from /virtual/main.c:2:
> > > > In file included from
> > > > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/uapi/linux/ptrace.h:142:
> > > > In file included from
> > > > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/ptrace.h:5:
> > > > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/arch/x86/include/asm/segment.h:266:2:
> > > > error: expected '(' after 'asm'
> > > >         alternative_io ("lsl %[seg],%[p]",
> > >
> > > It can be workaround by commenting the following line in
> > > /lib/modules/5.6.0-0.rc6.git0.1.vanilla.knurd.1.fc31.x86_64/build/include/generated/autoconf.h:
> > >
> > > #define CONFIG_CC_HAS_ASM_INLINE 1
> > >
> > >
> > > Thanks,
> > > Ming
> > >
> >
>
> --
> Ming
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-03-19  6:39 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-12 11:13 [Question] IO is split by block layer when size is larger than 4k Feng Li
2020-03-12 11:51 ` Hannes Reinecke
2020-03-12 12:09   ` Feng Li
2020-03-12 12:34 ` Ming Lei
2020-03-12 13:21   ` Feng Li
2020-03-13  2:31     ` Ming Lei
2020-03-14 17:43       ` Feng Li
2020-03-16 15:22         ` Ming Lei
2020-03-17  8:19           ` Feng Li
2020-03-17 10:26             ` Ming Lei
2020-03-18  6:29               ` Feng Li
2020-03-18  7:36                 ` Ming Lei
2020-03-19  6:38                   ` Feng Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).