From: Dennis Zhou <dennis@kernel.org> To: Mike Snitzer <snitzer@kernel.org> Cc: tj@kernel.org, axboe@kernel.dk, linux-block@vger.kernel.org, dm-devel@redhat.com Subject: Re: can we reduce bio_set_dev overhead due to bio_associate_blkg? Date: Wed, 30 Mar 2022 08:28:28 -0400 [thread overview] Message-ID: <YkRM7Iyp8m6A1BCl@fedora> (raw) In-Reply-To: <YkSK6mU1fja2OykG@redhat.com> Hi Mike, On Wed, Mar 30, 2022 at 12:52:58PM -0400, Mike Snitzer wrote: > Hey Tejun and Dennis, > > I recently found that due to bio_set_dev()'s call to > bio_associate_blkg(), bio_set_dev() needs much more cpu than ideal; > especially when doing 4K IOs via io_uring's HIPRI bio-polling. > > I'm very naive about blk-cgroups.. so I'm hopeful you or others can > help me cut through this to understand what the ideal outcome should > be for DM's bio clone + remap heavy use-case as it relates to > bio_associate_blkg. > > If I hack dm-linear with a local __bio_set_dev that simply removes > the call to bio_associate_blkg() my IOPS go from ~980K to 995K. > > Looking at what is happening a bit, relative to this DM bio cloning > usecase, it seems __bio_clone() calls bio_clone_blkg_association() to > clone the blkg from DM device, then dm-linear.c:linear_map's call > to bio_set_dev() will cause bio_associate_blkg(bio) to reuse the css > but then it triggers an update because the bdev is being remapped in > the bio (due to linear_map sending the IO to the real underlying > device). End result _seems_ like collective wasteful effort to get the > blk-cgroup resources setup properly in the face of a simple remap. > > Seems the current DM pattern is causing repeat blkg work for _every_ > remapped bio? Do you see a way to speed up repeat calls to > bio_associate_blkg()? > I must admit I wrote this with limited knowledge of bio cloning at the time. I can fill in the thought process here. The idea was every bio should have a blkg associated with it for io accounting and things like blk-iolatency and blk-iocost. The device abstraction I believe means we can set limits here as well on submission rate to the md device. I think cloning is a special case that I might have gotten wrong. If there is a bio_set_dev() call after each clone(), then the bio_clone_blkg_association() is excess work. We'd need to audit how bio_alloc_clone() is being used to be safe. Alternatively, we could opt for a bio_alloc_clone_noblkg(), but that's a little bit uglier. 1. bio_set_dev() above md <- needed so we can do throttling on the md. 2. bio_alloc_clone() <- doesn't need to clone the blkg() info. 3. bio_set_dev() in md <- sets the right underlying device association. Thanks, Dennis > Test kernel is my latest dm-5.19 branch (though latest Linus 5.18-rc0 > kernel should be fine too): > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.19 > > I'm using dm-linear ontop on a 16G blk-mq null_blk device: > > modprobe null_blk queue_mode=2 poll_queues=2 bs=4096 gb=16 > SIZE=`blockdev --getsz /dev/nullb0` > echo "0 $SIZE linear /dev/nullb0 0" | dmsetup create linear > > And running the workload with fio using this wrapper script: > io_uring.sh 20 1 /dev/mapper/linear 4096 > > #!/bin/bash > > RTIME=$1 > JOBS=$2 > DEV=$3 > BS=$4 > > QD=64 > BATCH=16 > HI=1 > > fio --bs=$BS --ioengine=io_uring --fixedbufs --registerfiles --hipri=$HI \ > --iodepth=$QD \ > --iodepth_batch_submit=$BATCH \ > --iodepth_batch_complete_min=$BATCH \ > --filename=$DEV \ > --direct=1 --runtime=$RTIME --numjobs=$JOBS --rw=randread \ > --name=test --group_reporting
WARNING: multiple messages have this Message-ID (diff)
From: Dennis Zhou <dennis@kernel.org> To: Mike Snitzer <snitzer@kernel.org> Cc: tj@kernel.org, axboe@kernel.dk, dm-devel@redhat.com, linux-block@vger.kernel.org Subject: Re: [dm-devel] can we reduce bio_set_dev overhead due to bio_associate_blkg? Date: Wed, 30 Mar 2022 08:28:28 -0400 [thread overview] Message-ID: <YkRM7Iyp8m6A1BCl@fedora> (raw) In-Reply-To: <YkSK6mU1fja2OykG@redhat.com> Hi Mike, On Wed, Mar 30, 2022 at 12:52:58PM -0400, Mike Snitzer wrote: > Hey Tejun and Dennis, > > I recently found that due to bio_set_dev()'s call to > bio_associate_blkg(), bio_set_dev() needs much more cpu than ideal; > especially when doing 4K IOs via io_uring's HIPRI bio-polling. > > I'm very naive about blk-cgroups.. so I'm hopeful you or others can > help me cut through this to understand what the ideal outcome should > be for DM's bio clone + remap heavy use-case as it relates to > bio_associate_blkg. > > If I hack dm-linear with a local __bio_set_dev that simply removes > the call to bio_associate_blkg() my IOPS go from ~980K to 995K. > > Looking at what is happening a bit, relative to this DM bio cloning > usecase, it seems __bio_clone() calls bio_clone_blkg_association() to > clone the blkg from DM device, then dm-linear.c:linear_map's call > to bio_set_dev() will cause bio_associate_blkg(bio) to reuse the css > but then it triggers an update because the bdev is being remapped in > the bio (due to linear_map sending the IO to the real underlying > device). End result _seems_ like collective wasteful effort to get the > blk-cgroup resources setup properly in the face of a simple remap. > > Seems the current DM pattern is causing repeat blkg work for _every_ > remapped bio? Do you see a way to speed up repeat calls to > bio_associate_blkg()? > I must admit I wrote this with limited knowledge of bio cloning at the time. I can fill in the thought process here. The idea was every bio should have a blkg associated with it for io accounting and things like blk-iolatency and blk-iocost. The device abstraction I believe means we can set limits here as well on submission rate to the md device. I think cloning is a special case that I might have gotten wrong. If there is a bio_set_dev() call after each clone(), then the bio_clone_blkg_association() is excess work. We'd need to audit how bio_alloc_clone() is being used to be safe. Alternatively, we could opt for a bio_alloc_clone_noblkg(), but that's a little bit uglier. 1. bio_set_dev() above md <- needed so we can do throttling on the md. 2. bio_alloc_clone() <- doesn't need to clone the blkg() info. 3. bio_set_dev() in md <- sets the right underlying device association. Thanks, Dennis > Test kernel is my latest dm-5.19 branch (though latest Linus 5.18-rc0 > kernel should be fine too): > https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.19 > > I'm using dm-linear ontop on a 16G blk-mq null_blk device: > > modprobe null_blk queue_mode=2 poll_queues=2 bs=4096 gb=16 > SIZE=`blockdev --getsz /dev/nullb0` > echo "0 $SIZE linear /dev/nullb0 0" | dmsetup create linear > > And running the workload with fio using this wrapper script: > io_uring.sh 20 1 /dev/mapper/linear 4096 > > #!/bin/bash > > RTIME=$1 > JOBS=$2 > DEV=$3 > BS=$4 > > QD=64 > BATCH=16 > HI=1 > > fio --bs=$BS --ioengine=io_uring --fixedbufs --registerfiles --hipri=$HI \ > --iodepth=$QD \ > --iodepth_batch_submit=$BATCH \ > --iodepth_batch_complete_min=$BATCH \ > --filename=$DEV \ > --direct=1 --runtime=$RTIME --numjobs=$JOBS --rw=randread \ > --name=test --group_reporting -- dm-devel mailing list dm-devel@redhat.com https://listman.redhat.com/mailman/listinfo/dm-devel
next prev parent reply other threads:[~2022-03-30 19:23 UTC|newest] Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-03-30 16:52 can we reduce bio_set_dev overhead due to bio_associate_blkg? Mike Snitzer 2022-03-30 16:52 ` [dm-devel] " Mike Snitzer 2022-03-30 12:28 ` Dennis Zhou [this message] 2022-03-30 12:28 ` Dennis Zhou 2022-03-31 4:39 ` Christoph Hellwig 2022-03-31 4:39 ` [dm-devel] " Christoph Hellwig 2022-03-31 5:52 ` Dennis Zhou 2022-03-31 5:52 ` [dm-devel] " Dennis Zhou 2022-03-31 9:15 ` Christoph Hellwig 2022-03-31 9:15 ` [dm-devel] " Christoph Hellwig 2022-04-08 15:42 ` Mike Snitzer 2022-04-08 15:42 ` [dm-devel] " Mike Snitzer 2022-04-09 5:15 ` Christoph Hellwig 2022-04-09 5:15 ` [dm-devel] " Christoph Hellwig 2022-04-11 16:58 ` Mike Snitzer 2022-04-11 16:58 ` [dm-devel] " Mike Snitzer 2022-04-11 17:16 ` Mike Snitzer 2022-04-11 17:16 ` [dm-devel] " Mike Snitzer 2022-04-11 17:33 ` [PATCH] block: remove redundant blk-cgroup init from __bio_clone Mike Snitzer 2022-04-11 17:33 ` [dm-devel] " Mike Snitzer 2022-04-12 5:27 ` Christoph Hellwig 2022-04-12 5:27 ` [dm-devel] " Christoph Hellwig 2022-04-12 7:52 ` Dennis Zhou 2022-04-12 7:52 ` Dennis Zhou 2022-04-23 16:55 ` Christoph Hellwig 2022-04-23 16:55 ` [dm-devel] " Christoph Hellwig 2022-04-26 17:30 ` Mike Snitzer 2022-04-26 17:30 ` [dm-devel] " Mike Snitzer
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YkRM7Iyp8m6A1BCl@fedora \ --to=dennis@kernel.org \ --cc=axboe@kernel.dk \ --cc=dm-devel@redhat.com \ --cc=linux-block@vger.kernel.org \ --cc=snitzer@kernel.org \ --cc=tj@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.