All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: can we reduce bio_set_dev overhead due to bio_associate_blkg?
  2022-03-30 16:52 ` [dm-devel] " Mike Snitzer
@ 2022-03-30 12:28   ` Dennis Zhou
  -1 siblings, 0 replies; 28+ messages in thread
From: Dennis Zhou @ 2022-03-30 12:28 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: tj, axboe, linux-block, dm-devel

Hi Mike,

On Wed, Mar 30, 2022 at 12:52:58PM -0400, Mike Snitzer wrote:
> Hey Tejun and Dennis,
> 
> I recently found that due to bio_set_dev()'s call to
> bio_associate_blkg(), bio_set_dev() needs much more cpu than ideal;
> especially when doing 4K IOs via io_uring's HIPRI bio-polling.
> 
> I'm very naive about blk-cgroups.. so I'm hopeful you or others can
> help me cut through this to understand what the ideal outcome should
> be for DM's bio clone + remap heavy use-case as it relates to
> bio_associate_blkg.
> 
> If I hack dm-linear with a local __bio_set_dev that simply removes
> the call to bio_associate_blkg() my IOPS go from ~980K to 995K.
> 
> Looking at what is happening a bit, relative to this DM bio cloning
> usecase, it seems __bio_clone() calls bio_clone_blkg_association() to
> clone the blkg from DM device, then dm-linear.c:linear_map's call
> to bio_set_dev() will cause bio_associate_blkg(bio) to reuse the css
> but then it triggers an update because the bdev is being remapped in
> the bio (due to linear_map sending the IO to the real underlying
> device). End result _seems_ like collective wasteful effort to get the
> blk-cgroup resources setup properly in the face of a simple remap.
> 
> Seems the current DM pattern is causing repeat blkg work for _every_
> remapped bio?  Do you see a way to speed up repeat calls to
> bio_associate_blkg()?
> 

I must admit I wrote this with limited knowledge of bio cloning at the
time. I can fill in the thought process here.

The idea was every bio should have a blkg associated with it for io
accounting and things like blk-iolatency and blk-iocost. The device
abstraction I believe means we can set limits here as well on submission
rate to the md device.

I think cloning is a special case that I might have gotten wrong. If
there is a bio_set_dev() call after each clone(), then the
bio_clone_blkg_association() is excess work. We'd need to audit how
bio_alloc_clone() is being used to be safe. Alternatively, we could opt
for a bio_alloc_clone_noblkg(), but that's a little bit uglier.

1. bio_set_dev() above md <- needed so we can do throttling on the md.
2. bio_alloc_clone() <- doesn't need to clone the blkg() info.
3. bio_set_dev() in md <- sets the right underlying device association.

Thanks,
Dennis

> Test kernel is my latest dm-5.19 branch (though latest Linus 5.18-rc0
> kernel should be fine too):
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.19
> 
> I'm using dm-linear ontop on a 16G blk-mq null_blk device:
> 
> modprobe null_blk queue_mode=2 poll_queues=2 bs=4096 gb=16
> SIZE=`blockdev --getsz /dev/nullb0`
> echo "0 $SIZE linear /dev/nullb0 0" | dmsetup create linear
> 
> And running the workload with fio using this wrapper script:
> io_uring.sh 20 1 /dev/mapper/linear 4096
> 
> #!/bin/bash
> 
> RTIME=$1
> JOBS=$2
> DEV=$3
> BS=$4
> 
> QD=64
> BATCH=16
> HI=1
> 
> fio --bs=$BS --ioengine=io_uring --fixedbufs --registerfiles --hipri=$HI \
>         --iodepth=$QD \
>         --iodepth_batch_submit=$BATCH \
>         --iodepth_batch_complete_min=$BATCH \
>         --filename=$DEV \
>         --direct=1 --runtime=$RTIME --numjobs=$JOBS --rw=randread \
>         --name=test --group_reporting

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dm-devel] can we reduce bio_set_dev overhead due to bio_associate_blkg?
@ 2022-03-30 12:28   ` Dennis Zhou
  0 siblings, 0 replies; 28+ messages in thread
From: Dennis Zhou @ 2022-03-30 12:28 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: tj, axboe, dm-devel, linux-block

Hi Mike,

On Wed, Mar 30, 2022 at 12:52:58PM -0400, Mike Snitzer wrote:
> Hey Tejun and Dennis,
> 
> I recently found that due to bio_set_dev()'s call to
> bio_associate_blkg(), bio_set_dev() needs much more cpu than ideal;
> especially when doing 4K IOs via io_uring's HIPRI bio-polling.
> 
> I'm very naive about blk-cgroups.. so I'm hopeful you or others can
> help me cut through this to understand what the ideal outcome should
> be for DM's bio clone + remap heavy use-case as it relates to
> bio_associate_blkg.
> 
> If I hack dm-linear with a local __bio_set_dev that simply removes
> the call to bio_associate_blkg() my IOPS go from ~980K to 995K.
> 
> Looking at what is happening a bit, relative to this DM bio cloning
> usecase, it seems __bio_clone() calls bio_clone_blkg_association() to
> clone the blkg from DM device, then dm-linear.c:linear_map's call
> to bio_set_dev() will cause bio_associate_blkg(bio) to reuse the css
> but then it triggers an update because the bdev is being remapped in
> the bio (due to linear_map sending the IO to the real underlying
> device). End result _seems_ like collective wasteful effort to get the
> blk-cgroup resources setup properly in the face of a simple remap.
> 
> Seems the current DM pattern is causing repeat blkg work for _every_
> remapped bio?  Do you see a way to speed up repeat calls to
> bio_associate_blkg()?
> 

I must admit I wrote this with limited knowledge of bio cloning at the
time. I can fill in the thought process here.

The idea was every bio should have a blkg associated with it for io
accounting and things like blk-iolatency and blk-iocost. The device
abstraction I believe means we can set limits here as well on submission
rate to the md device.

I think cloning is a special case that I might have gotten wrong. If
there is a bio_set_dev() call after each clone(), then the
bio_clone_blkg_association() is excess work. We'd need to audit how
bio_alloc_clone() is being used to be safe. Alternatively, we could opt
for a bio_alloc_clone_noblkg(), but that's a little bit uglier.

1. bio_set_dev() above md <- needed so we can do throttling on the md.
2. bio_alloc_clone() <- doesn't need to clone the blkg() info.
3. bio_set_dev() in md <- sets the right underlying device association.

Thanks,
Dennis

> Test kernel is my latest dm-5.19 branch (though latest Linus 5.18-rc0
> kernel should be fine too):
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.19
> 
> I'm using dm-linear ontop on a 16G blk-mq null_blk device:
> 
> modprobe null_blk queue_mode=2 poll_queues=2 bs=4096 gb=16
> SIZE=`blockdev --getsz /dev/nullb0`
> echo "0 $SIZE linear /dev/nullb0 0" | dmsetup create linear
> 
> And running the workload with fio using this wrapper script:
> io_uring.sh 20 1 /dev/mapper/linear 4096
> 
> #!/bin/bash
> 
> RTIME=$1
> JOBS=$2
> DEV=$3
> BS=$4
> 
> QD=64
> BATCH=16
> HI=1
> 
> fio --bs=$BS --ioengine=io_uring --fixedbufs --registerfiles --hipri=$HI \
>         --iodepth=$QD \
>         --iodepth_batch_submit=$BATCH \
>         --iodepth_batch_complete_min=$BATCH \
>         --filename=$DEV \
>         --direct=1 --runtime=$RTIME --numjobs=$JOBS --rw=randread \
>         --name=test --group_reporting

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* can we reduce bio_set_dev overhead due to bio_associate_blkg?
@ 2022-03-30 16:52 ` Mike Snitzer
  0 siblings, 0 replies; 28+ messages in thread
From: Mike Snitzer @ 2022-03-30 16:52 UTC (permalink / raw)
  To: tj, dennis; +Cc: axboe, linux-block, dm-devel

Hey Tejun and Dennis,

I recently found that due to bio_set_dev()'s call to
bio_associate_blkg(), bio_set_dev() needs much more cpu than ideal;
especially when doing 4K IOs via io_uring's HIPRI bio-polling.

I'm very naive about blk-cgroups.. so I'm hopeful you or others can
help me cut through this to understand what the ideal outcome should
be for DM's bio clone + remap heavy use-case as it relates to
bio_associate_blkg.

If I hack dm-linear with a local __bio_set_dev that simply removes
the call to bio_associate_blkg() my IOPS go from ~980K to 995K.

Looking at what is happening a bit, relative to this DM bio cloning
usecase, it seems __bio_clone() calls bio_clone_blkg_association() to
clone the blkg from DM device, then dm-linear.c:linear_map's call
to bio_set_dev() will cause bio_associate_blkg(bio) to reuse the css
but then it triggers an update because the bdev is being remapped in
the bio (due to linear_map sending the IO to the real underlying
device). End result _seems_ like collective wasteful effort to get the
blk-cgroup resources setup properly in the face of a simple remap.

Seems the current DM pattern is causing repeat blkg work for _every_
remapped bio?  Do you see a way to speed up repeat calls to
bio_associate_blkg()?

Test kernel is my latest dm-5.19 branch (though latest Linus 5.18-rc0
kernel should be fine too):
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.19

I'm using dm-linear ontop on a 16G blk-mq null_blk device:

modprobe null_blk queue_mode=2 poll_queues=2 bs=4096 gb=16
SIZE=`blockdev --getsz /dev/nullb0`
echo "0 $SIZE linear /dev/nullb0 0" | dmsetup create linear

And running the workload with fio using this wrapper script:
io_uring.sh 20 1 /dev/mapper/linear 4096

#!/bin/bash

RTIME=$1
JOBS=$2
DEV=$3
BS=$4

QD=64
BATCH=16
HI=1

fio --bs=$BS --ioengine=io_uring --fixedbufs --registerfiles --hipri=$HI \
        --iodepth=$QD \
        --iodepth_batch_submit=$BATCH \
        --iodepth_batch_complete_min=$BATCH \
        --filename=$DEV \
        --direct=1 --runtime=$RTIME --numjobs=$JOBS --rw=randread \
        --name=test --group_reporting

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [dm-devel] can we reduce bio_set_dev overhead due to bio_associate_blkg?
@ 2022-03-30 16:52 ` Mike Snitzer
  0 siblings, 0 replies; 28+ messages in thread
From: Mike Snitzer @ 2022-03-30 16:52 UTC (permalink / raw)
  To: tj, dennis; +Cc: axboe, linux-block, dm-devel

Hey Tejun and Dennis,

I recently found that due to bio_set_dev()'s call to
bio_associate_blkg(), bio_set_dev() needs much more cpu than ideal;
especially when doing 4K IOs via io_uring's HIPRI bio-polling.

I'm very naive about blk-cgroups.. so I'm hopeful you or others can
help me cut through this to understand what the ideal outcome should
be for DM's bio clone + remap heavy use-case as it relates to
bio_associate_blkg.

If I hack dm-linear with a local __bio_set_dev that simply removes
the call to bio_associate_blkg() my IOPS go from ~980K to 995K.

Looking at what is happening a bit, relative to this DM bio cloning
usecase, it seems __bio_clone() calls bio_clone_blkg_association() to
clone the blkg from DM device, then dm-linear.c:linear_map's call
to bio_set_dev() will cause bio_associate_blkg(bio) to reuse the css
but then it triggers an update because the bdev is being remapped in
the bio (due to linear_map sending the IO to the real underlying
device). End result _seems_ like collective wasteful effort to get the
blk-cgroup resources setup properly in the face of a simple remap.

Seems the current DM pattern is causing repeat blkg work for _every_
remapped bio?  Do you see a way to speed up repeat calls to
bio_associate_blkg()?

Test kernel is my latest dm-5.19 branch (though latest Linus 5.18-rc0
kernel should be fine too):
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/log/?h=dm-5.19

I'm using dm-linear ontop on a 16G blk-mq null_blk device:

modprobe null_blk queue_mode=2 poll_queues=2 bs=4096 gb=16
SIZE=`blockdev --getsz /dev/nullb0`
echo "0 $SIZE linear /dev/nullb0 0" | dmsetup create linear

And running the workload with fio using this wrapper script:
io_uring.sh 20 1 /dev/mapper/linear 4096

#!/bin/bash

RTIME=$1
JOBS=$2
DEV=$3
BS=$4

QD=64
BATCH=16
HI=1

fio --bs=$BS --ioengine=io_uring --fixedbufs --registerfiles --hipri=$HI \
        --iodepth=$QD \
        --iodepth_batch_submit=$BATCH \
        --iodepth_batch_complete_min=$BATCH \
        --filename=$DEV \
        --direct=1 --runtime=$RTIME --numjobs=$JOBS --rw=randread \
        --name=test --group_reporting

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: can we reduce bio_set_dev overhead due to bio_associate_blkg?
  2022-03-30 12:28   ` [dm-devel] " Dennis Zhou
@ 2022-03-31  4:39     ` Christoph Hellwig
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2022-03-31  4:39 UTC (permalink / raw)
  To: Dennis Zhou; +Cc: Mike Snitzer, tj, axboe, linux-block, dm-devel

On Wed, Mar 30, 2022 at 08:28:28AM -0400, Dennis Zhou wrote:
> I think cloning is a special case that I might have gotten wrong. If
> there is a bio_set_dev() call after each clone(), then the
> bio_clone_blkg_association() is excess work. We'd need to audit how
> bio_alloc_clone() is being used to be safe. Alternatively, we could opt
> for a bio_alloc_clone_noblkg(), but that's a little bit uglier.

As of Linux 5.18, the cloning interfaces have changed and take
a block devie that the clone is intended to be used for, and bio_set_dev
is mostly (there is a few more sports to be cleaned up in
dm/md/bcache/btrfs) only used for remapping to a new device.

That being said I've eyed the code in bio_associate_blkg a bit and
I've been wondering about some of how it is implemented as well.

Is recursive throttling really a thing?  i.e. we can have cgroup
policies on the upper (e.g. dm) device and then again on the lower
(e.g. nvme device)?  I think the code currently supports that, and
if we want to keep that I don't really see much of a way to avoid
the lookup, but maybe we cn make it faster.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dm-devel] can we reduce bio_set_dev overhead due to bio_associate_blkg?
@ 2022-03-31  4:39     ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2022-03-31  4:39 UTC (permalink / raw)
  To: Dennis Zhou; +Cc: tj, axboe, dm-devel, Mike Snitzer, linux-block

On Wed, Mar 30, 2022 at 08:28:28AM -0400, Dennis Zhou wrote:
> I think cloning is a special case that I might have gotten wrong. If
> there is a bio_set_dev() call after each clone(), then the
> bio_clone_blkg_association() is excess work. We'd need to audit how
> bio_alloc_clone() is being used to be safe. Alternatively, we could opt
> for a bio_alloc_clone_noblkg(), but that's a little bit uglier.

As of Linux 5.18, the cloning interfaces have changed and take
a block devie that the clone is intended to be used for, and bio_set_dev
is mostly (there is a few more sports to be cleaned up in
dm/md/bcache/btrfs) only used for remapping to a new device.

That being said I've eyed the code in bio_associate_blkg a bit and
I've been wondering about some of how it is implemented as well.

Is recursive throttling really a thing?  i.e. we can have cgroup
policies on the upper (e.g. dm) device and then again on the lower
(e.g. nvme device)?  I think the code currently supports that, and
if we want to keep that I don't really see much of a way to avoid
the lookup, but maybe we cn make it faster.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: can we reduce bio_set_dev overhead due to bio_associate_blkg?
  2022-03-31  4:39     ` [dm-devel] " Christoph Hellwig
@ 2022-03-31  5:52       ` Dennis Zhou
  -1 siblings, 0 replies; 28+ messages in thread
From: Dennis Zhou @ 2022-03-31  5:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Mike Snitzer, tj, axboe, linux-block, dm-devel

Hello,

On Wed, Mar 30, 2022 at 09:39:55PM -0700, Christoph Hellwig wrote:
> On Wed, Mar 30, 2022 at 08:28:28AM -0400, Dennis Zhou wrote:
> > I think cloning is a special case that I might have gotten wrong. If
> > there is a bio_set_dev() call after each clone(), then the
> > bio_clone_blkg_association() is excess work. We'd need to audit how
> > bio_alloc_clone() is being used to be safe. Alternatively, we could opt
> > for a bio_alloc_clone_noblkg(), but that's a little bit uglier.
> 
> As of Linux 5.18, the cloning interfaces have changed and take
> a block devie that the clone is intended to be used for, and bio_set_dev
> is mostly (there is a few more sports to be cleaned up in
> dm/md/bcache/btrfs) only used for remapping to a new device.
> 

I took a quick look. It seems with the new interface,
bio_clone_blkg_association() is unnecessary given the correct
association should be derived from the bio_alloc*() calls with the
passed in bdev. Also, blkcg_bio_issue_init() in clone seems wrong.

Maybe the right thing to do here for md-linear and btrfs (what I've
looked at) is to delay cloning until the map occurs and the right device
is already in hand?

> That being said I've eyed the code in bio_associate_blkg a bit and
> I've been wondering about some of how it is implemented as well.
> 

I'm sure stuff has evolved since I've last been involved, but here is a
brief explanation of the initial story. I suspect most of it holds true.
Apologies if this isn't helpful.

For others, a blkcg is a block cgroup. A blkcg_gq, blkg for short, is
the marrying of a blkcg and a request_queue. It takes a reference on
both so IO associated with the cgroup is tracked to the appropriate
cgroup and prevents the request_queue from going away. Punted IOs go
here and writeback is managed here as well. On the hot path, this is the
tagging that blk-rq-qos stuff might depend on.

The lookup itself is handled by blkg_lookup() which is a radix tree
lookup of the request_queue. There is also a last hint which helps.
blkg's are percpu-refcounted.

In terms of lifetimes and pinning. child_blkcg pins parent_blkcgs in a
tree hierarchy up to the root_blkcg. blkgs pin the blkcg it's associated
to, the request_queue, and the blkg_parent (parent_blkcg and
request_queue). They die in hierarchical order, alive until all children
have passed.

If there's anything else I can try to help answer please let me know.

> Is recursive throttling really a thing?  i.e. we can have cgroup
> policies on the upper (e.g. dm) device and then again on the lower
> (e.g. nvme device)?  I think the code currently supports that, and
> if we want to keep that I don't really see much of a way to avoid
> the lookup, but maybe we cn make it faster.

I'm not sure. I've primarily dealt with physical devices. However, I'm
sure there are more complex setups that use it. Is it a good idea is
probably debatable.

Backing up though, I feel like the abstraction naturally alludes to this
multiple association because you don't necessarily know when you hit
physical devices until you finally submit through.

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dm-devel] can we reduce bio_set_dev overhead due to bio_associate_blkg?
@ 2022-03-31  5:52       ` Dennis Zhou
  0 siblings, 0 replies; 28+ messages in thread
From: Dennis Zhou @ 2022-03-31  5:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: tj, axboe, dm-devel, Mike Snitzer, linux-block

Hello,

On Wed, Mar 30, 2022 at 09:39:55PM -0700, Christoph Hellwig wrote:
> On Wed, Mar 30, 2022 at 08:28:28AM -0400, Dennis Zhou wrote:
> > I think cloning is a special case that I might have gotten wrong. If
> > there is a bio_set_dev() call after each clone(), then the
> > bio_clone_blkg_association() is excess work. We'd need to audit how
> > bio_alloc_clone() is being used to be safe. Alternatively, we could opt
> > for a bio_alloc_clone_noblkg(), but that's a little bit uglier.
> 
> As of Linux 5.18, the cloning interfaces have changed and take
> a block devie that the clone is intended to be used for, and bio_set_dev
> is mostly (there is a few more sports to be cleaned up in
> dm/md/bcache/btrfs) only used for remapping to a new device.
> 

I took a quick look. It seems with the new interface,
bio_clone_blkg_association() is unnecessary given the correct
association should be derived from the bio_alloc*() calls with the
passed in bdev. Also, blkcg_bio_issue_init() in clone seems wrong.

Maybe the right thing to do here for md-linear and btrfs (what I've
looked at) is to delay cloning until the map occurs and the right device
is already in hand?

> That being said I've eyed the code in bio_associate_blkg a bit and
> I've been wondering about some of how it is implemented as well.
> 

I'm sure stuff has evolved since I've last been involved, but here is a
brief explanation of the initial story. I suspect most of it holds true.
Apologies if this isn't helpful.

For others, a blkcg is a block cgroup. A blkcg_gq, blkg for short, is
the marrying of a blkcg and a request_queue. It takes a reference on
both so IO associated with the cgroup is tracked to the appropriate
cgroup and prevents the request_queue from going away. Punted IOs go
here and writeback is managed here as well. On the hot path, this is the
tagging that blk-rq-qos stuff might depend on.

The lookup itself is handled by blkg_lookup() which is a radix tree
lookup of the request_queue. There is also a last hint which helps.
blkg's are percpu-refcounted.

In terms of lifetimes and pinning. child_blkcg pins parent_blkcgs in a
tree hierarchy up to the root_blkcg. blkgs pin the blkcg it's associated
to, the request_queue, and the blkg_parent (parent_blkcg and
request_queue). They die in hierarchical order, alive until all children
have passed.

If there's anything else I can try to help answer please let me know.

> Is recursive throttling really a thing?  i.e. we can have cgroup
> policies on the upper (e.g. dm) device and then again on the lower
> (e.g. nvme device)?  I think the code currently supports that, and
> if we want to keep that I don't really see much of a way to avoid
> the lookup, but maybe we cn make it faster.

I'm not sure. I've primarily dealt with physical devices. However, I'm
sure there are more complex setups that use it. Is it a good idea is
probably debatable.

Backing up though, I feel like the abstraction naturally alludes to this
multiple association because you don't necessarily know when you hit
physical devices until you finally submit through.

Thanks,
Dennis

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: can we reduce bio_set_dev overhead due to bio_associate_blkg?
  2022-03-31  5:52       ` [dm-devel] " Dennis Zhou
@ 2022-03-31  9:15         ` Christoph Hellwig
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2022-03-31  9:15 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Christoph Hellwig, Mike Snitzer, tj, axboe, linux-block, dm-devel

On Wed, Mar 30, 2022 at 10:52:13PM -0700, Dennis Zhou wrote:
> I took a quick look. It seems with the new interface,
> bio_clone_blkg_association() is unnecessary given the correct
> association should be derived from the bio_alloc*() calls with the
> passed in bdev. Also, blkcg_bio_issue_init() in clone seems wrong.

Yes, I think you are right.

> Maybe the right thing to do here for md-linear and btrfs (what I've
> looked at) is to delay cloning until the map occurs and the right device
> is already in hand?

That would in general be the preferred option where possible.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dm-devel] can we reduce bio_set_dev overhead due to bio_associate_blkg?
@ 2022-03-31  9:15         ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2022-03-31  9:15 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: axboe, linux-block, Mike Snitzer, Christoph Hellwig, dm-devel, tj

On Wed, Mar 30, 2022 at 10:52:13PM -0700, Dennis Zhou wrote:
> I took a quick look. It seems with the new interface,
> bio_clone_blkg_association() is unnecessary given the correct
> association should be derived from the bio_alloc*() calls with the
> passed in bdev. Also, blkcg_bio_issue_init() in clone seems wrong.

Yes, I think you are right.

> Maybe the right thing to do here for md-linear and btrfs (what I've
> looked at) is to delay cloning until the map occurs and the right device
> is already in hand?

That would in general be the preferred option where possible.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: can we reduce bio_set_dev overhead due to bio_associate_blkg?
  2022-03-31  9:15         ` [dm-devel] " Christoph Hellwig
@ 2022-04-08 15:42           ` Mike Snitzer
  -1 siblings, 0 replies; 28+ messages in thread
From: Mike Snitzer @ 2022-04-08 15:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dennis Zhou, Mike Snitzer, tj, axboe, linux-block, dm-devel

On Thu, Mar 31 2022 at  5:15P -0400,
Christoph Hellwig <hch@infradead.org> wrote:

> On Wed, Mar 30, 2022 at 10:52:13PM -0700, Dennis Zhou wrote:
> > I took a quick look. It seems with the new interface,
> > bio_clone_blkg_association() is unnecessary given the correct
> > association should be derived from the bio_alloc*() calls with the
> > passed in bdev. Also, blkcg_bio_issue_init() in clone seems wrong.
> 
> Yes, I think you are right.
> 
> > Maybe the right thing to do here for md-linear and btrfs (what I've
> > looked at) is to delay cloning until the map occurs and the right device
> > is already in hand?
> 
> That would in general be the preferred option where possible.

Delaying cloning until remap is a problem for DM given the target_type
->map interface for all DM targets assumes the passed bio is already a
clone that needs to be remapped accordingly.

I think we can achieve the goal of efficient cloning/remapping for
both usecases simply by splitting out the bio_set_dev() and leaving it
to the caller to pick which interface to use (e.g. clone vs
clone_and_remap).

Christoph is this something you're open to doing as continuation of
your bio alloc/clone related audit/changes?  Or would you prefer
someone else deal with it?  I could take a closer look next week if
needed.

Mike


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dm-devel] can we reduce bio_set_dev overhead due to bio_associate_blkg?
@ 2022-04-08 15:42           ` Mike Snitzer
  0 siblings, 0 replies; 28+ messages in thread
From: Mike Snitzer @ 2022-04-08 15:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: axboe, Mike Snitzer, linux-block, dm-devel, tj, Dennis Zhou

On Thu, Mar 31 2022 at  5:15P -0400,
Christoph Hellwig <hch@infradead.org> wrote:

> On Wed, Mar 30, 2022 at 10:52:13PM -0700, Dennis Zhou wrote:
> > I took a quick look. It seems with the new interface,
> > bio_clone_blkg_association() is unnecessary given the correct
> > association should be derived from the bio_alloc*() calls with the
> > passed in bdev. Also, blkcg_bio_issue_init() in clone seems wrong.
> 
> Yes, I think you are right.
> 
> > Maybe the right thing to do here for md-linear and btrfs (what I've
> > looked at) is to delay cloning until the map occurs and the right device
> > is already in hand?
> 
> That would in general be the preferred option where possible.

Delaying cloning until remap is a problem for DM given the target_type
->map interface for all DM targets assumes the passed bio is already a
clone that needs to be remapped accordingly.

I think we can achieve the goal of efficient cloning/remapping for
both usecases simply by splitting out the bio_set_dev() and leaving it
to the caller to pick which interface to use (e.g. clone vs
clone_and_remap).

Christoph is this something you're open to doing as continuation of
your bio alloc/clone related audit/changes?  Or would you prefer
someone else deal with it?  I could take a closer look next week if
needed.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: can we reduce bio_set_dev overhead due to bio_associate_blkg?
  2022-04-08 15:42           ` [dm-devel] " Mike Snitzer
@ 2022-04-09  5:15             ` Christoph Hellwig
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2022-04-09  5:15 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, Dennis Zhou, Mike Snitzer, tj, axboe,
	linux-block, dm-devel

On Fri, Apr 08, 2022 at 11:42:51AM -0400, Mike Snitzer wrote:
> I think we can achieve the goal of efficient cloning/remapping for
> both usecases simply by splitting out the bio_set_dev() and leaving it
> to the caller to pick which interface to use (e.g. clone vs
> clone_and_remap).

You can just pass a NULL bdev to bio_alloc_clone/bio_init_clone.
I've been hoping to get rid of that, but if we have a clear use case
it will have to stay.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dm-devel] can we reduce bio_set_dev overhead due to bio_associate_blkg?
@ 2022-04-09  5:15             ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2022-04-09  5:15 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe, linux-block, Mike Snitzer, Christoph Hellwig, dm-devel,
	tj, Dennis Zhou

On Fri, Apr 08, 2022 at 11:42:51AM -0400, Mike Snitzer wrote:
> I think we can achieve the goal of efficient cloning/remapping for
> both usecases simply by splitting out the bio_set_dev() and leaving it
> to the caller to pick which interface to use (e.g. clone vs
> clone_and_remap).

You can just pass a NULL bdev to bio_alloc_clone/bio_init_clone.
I've been hoping to get rid of that, but if we have a clear use case
it will have to stay.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: can we reduce bio_set_dev overhead due to bio_associate_blkg?
  2022-04-09  5:15             ` [dm-devel] " Christoph Hellwig
@ 2022-04-11 16:58               ` Mike Snitzer
  -1 siblings, 0 replies; 28+ messages in thread
From: Mike Snitzer @ 2022-04-11 16:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Dennis Zhou, tj, axboe, linux-block, dm-devel

On Sat, Apr 09 2022 at  1:15P -0400,
Christoph Hellwig <hch@infradead.org> wrote:

> On Fri, Apr 08, 2022 at 11:42:51AM -0400, Mike Snitzer wrote:
> > I think we can achieve the goal of efficient cloning/remapping for
> > both usecases simply by splitting out the bio_set_dev() and leaving it
> > to the caller to pick which interface to use (e.g. clone vs
> > clone_and_remap).
> 
> You can just pass a NULL bdev to bio_alloc_clone/bio_init_clone.
> I've been hoping to get rid of that, but if we have a clear use case
> it will have to stay.

DM core is just using bio_alloc_clone. And bio_alloc_bioset() allows
bdev to be NULL -- so you're likely referring to that (which will skip
bio_init's bio_associate_blkg).

Circling back to earlier in this thread, Dennis and you agreed that it
doesn't make sense to have __bio_clone() do blkcg work if the clone
bio will be remapped (via bio_set_dev).  Given that, and the fact that
bio_clone_blkg_association() assumes both bios are from same bdev,
this change makes sense:

diff --git a/block/bio.c b/block/bio.c
index 7892f1108ca6..0340acc283a0 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -772,14 +772,16 @@ static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
	bio_set_flag(bio, BIO_CLONED);
	if (bio_flagged(bio_src, BIO_THROTTLED))
		bio_set_flag(bio, BIO_THROTTLED);
-	if (bio->bi_bdev == bio_src->bi_bdev &&
-	    bio_flagged(bio_src, BIO_REMAPPED))
-		bio_set_flag(bio, BIO_REMAPPED);
	bio->bi_ioprio = bio_src->bi_ioprio;
	bio->bi_iter = bio_src->bi_iter;

-	bio_clone_blkg_association(bio, bio_src);
-	blkcg_bio_issue_init(bio);
+	if (bio->bi_bdev == bio_src->bi_bdev) {
+		if (bio_flagged(bio_src, BIO_REMAPPED))
+			bio_set_flag(bio, BIO_REMAPPED);
+
+		bio_clone_blkg_association(bio, bio_src);
+		blkcg_bio_issue_init(bio);
+	}

	if (bio_crypt_clone(bio, bio_src, gfp) < 0)
		return -ENOMEM;

Think this will fix some of the performance penalty of redundant blkcg
initialization that I reported (though like was also discussed: more
work likely needed to further optimize bio_associate_blkg).

I'll audit DM targets and test to verify my changes and will post
proper patch(es) once done.

Mike

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [dm-devel] can we reduce bio_set_dev overhead due to bio_associate_blkg?
@ 2022-04-11 16:58               ` Mike Snitzer
  0 siblings, 0 replies; 28+ messages in thread
From: Mike Snitzer @ 2022-04-11 16:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: tj, Dennis Zhou, dm-devel, linux-block, axboe

On Sat, Apr 09 2022 at  1:15P -0400,
Christoph Hellwig <hch@infradead.org> wrote:

> On Fri, Apr 08, 2022 at 11:42:51AM -0400, Mike Snitzer wrote:
> > I think we can achieve the goal of efficient cloning/remapping for
> > both usecases simply by splitting out the bio_set_dev() and leaving it
> > to the caller to pick which interface to use (e.g. clone vs
> > clone_and_remap).
> 
> You can just pass a NULL bdev to bio_alloc_clone/bio_init_clone.
> I've been hoping to get rid of that, but if we have a clear use case
> it will have to stay.

DM core is just using bio_alloc_clone. And bio_alloc_bioset() allows
bdev to be NULL -- so you're likely referring to that (which will skip
bio_init's bio_associate_blkg).

Circling back to earlier in this thread, Dennis and you agreed that it
doesn't make sense to have __bio_clone() do blkcg work if the clone
bio will be remapped (via bio_set_dev).  Given that, and the fact that
bio_clone_blkg_association() assumes both bios are from same bdev,
this change makes sense:

diff --git a/block/bio.c b/block/bio.c
index 7892f1108ca6..0340acc283a0 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -772,14 +772,16 @@ static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
	bio_set_flag(bio, BIO_CLONED);
	if (bio_flagged(bio_src, BIO_THROTTLED))
		bio_set_flag(bio, BIO_THROTTLED);
-	if (bio->bi_bdev == bio_src->bi_bdev &&
-	    bio_flagged(bio_src, BIO_REMAPPED))
-		bio_set_flag(bio, BIO_REMAPPED);
	bio->bi_ioprio = bio_src->bi_ioprio;
	bio->bi_iter = bio_src->bi_iter;

-	bio_clone_blkg_association(bio, bio_src);
-	blkcg_bio_issue_init(bio);
+	if (bio->bi_bdev == bio_src->bi_bdev) {
+		if (bio_flagged(bio_src, BIO_REMAPPED))
+			bio_set_flag(bio, BIO_REMAPPED);
+
+		bio_clone_blkg_association(bio, bio_src);
+		blkcg_bio_issue_init(bio);
+	}

	if (bio_crypt_clone(bio, bio_src, gfp) < 0)
		return -ENOMEM;

Think this will fix some of the performance penalty of redundant blkcg
initialization that I reported (though like was also discussed: more
work likely needed to further optimize bio_associate_blkg).

I'll audit DM targets and test to verify my changes and will post
proper patch(es) once done.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: can we reduce bio_set_dev overhead due to bio_associate_blkg?
  2022-04-11 16:58               ` [dm-devel] " Mike Snitzer
@ 2022-04-11 17:16                 ` Mike Snitzer
  -1 siblings, 0 replies; 28+ messages in thread
From: Mike Snitzer @ 2022-04-11 17:16 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, Dennis Zhou, tj, axboe, linux-block, dm-devel

On Mon, Apr 11 2022 at 12:58P -0400,
Mike Snitzer <snitzer@kernel.org> wrote:

> On Sat, Apr 09 2022 at  1:15P -0400,
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > On Fri, Apr 08, 2022 at 11:42:51AM -0400, Mike Snitzer wrote:
> > > I think we can achieve the goal of efficient cloning/remapping for
> > > both usecases simply by splitting out the bio_set_dev() and leaving it
> > > to the caller to pick which interface to use (e.g. clone vs
> > > clone_and_remap).
> > 
> > You can just pass a NULL bdev to bio_alloc_clone/bio_init_clone.
> > I've been hoping to get rid of that, but if we have a clear use case
> > it will have to stay.
> 
> DM core is just using bio_alloc_clone. And bio_alloc_bioset() allows
> bdev to be NULL -- so you're likely referring to that (which will skip
> bio_init's bio_associate_blkg).
...

> diff --git a/block/bio.c b/block/bio.c
> index 7892f1108ca6..0340acc283a0 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -772,14 +772,16 @@ static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
> 	bio_set_flag(bio, BIO_CLONED);
> 	if (bio_flagged(bio_src, BIO_THROTTLED))
> 		bio_set_flag(bio, BIO_THROTTLED);
> -	if (bio->bi_bdev == bio_src->bi_bdev &&
> -	    bio_flagged(bio_src, BIO_REMAPPED))
> -		bio_set_flag(bio, BIO_REMAPPED);
> 	bio->bi_ioprio = bio_src->bi_ioprio;
> 	bio->bi_iter = bio_src->bi_iter;
> 
> -	bio_clone_blkg_association(bio, bio_src);
> -	blkcg_bio_issue_init(bio);
> +	if (bio->bi_bdev == bio_src->bi_bdev) {
> +		if (bio_flagged(bio_src, BIO_REMAPPED))
> +			bio_set_flag(bio, BIO_REMAPPED);
> +
> +		bio_clone_blkg_association(bio, bio_src);
> +		blkcg_bio_issue_init(bio);
> +	}
> 
> 	if (bio_crypt_clone(bio, bio_src, gfp) < 0)
> 		return -ENOMEM;
> 
> Think this will fix some of the performance penalty of redundant blkcg
> initialization that I reported (though like was also discussed: more
> work likely needed to further optimize bio_associate_blkg).

Looking closer at the case where bio_{alloc,init}_clone are passed a
bdev, bio_init() will call bio_associate_blkg() so the __bio_clone()
work to do anything with blkbg isn't needed at all. So this patch is
best:

diff --git a/block/bio.c b/block/bio.c
index 7892f1108ca6..6980f1b4b0f4 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -778,9 +778,6 @@ static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
	bio->bi_ioprio = bio_src->bi_ioprio;
	bio->bi_iter = bio_src->bi_iter;

-	bio_clone_blkg_association(bio, bio_src);
-	blkcg_bio_issue_init(bio);
-
	if (bio_crypt_clone(bio, bio_src, gfp) < 0)
		return -ENOMEM;
	if (bio_integrity(bio_src) &&


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [dm-devel] can we reduce bio_set_dev overhead due to bio_associate_blkg?
@ 2022-04-11 17:16                 ` Mike Snitzer
  0 siblings, 0 replies; 28+ messages in thread
From: Mike Snitzer @ 2022-04-11 17:16 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe, linux-block, Christoph Hellwig, dm-devel, tj, Dennis Zhou

On Mon, Apr 11 2022 at 12:58P -0400,
Mike Snitzer <snitzer@kernel.org> wrote:

> On Sat, Apr 09 2022 at  1:15P -0400,
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > On Fri, Apr 08, 2022 at 11:42:51AM -0400, Mike Snitzer wrote:
> > > I think we can achieve the goal of efficient cloning/remapping for
> > > both usecases simply by splitting out the bio_set_dev() and leaving it
> > > to the caller to pick which interface to use (e.g. clone vs
> > > clone_and_remap).
> > 
> > You can just pass a NULL bdev to bio_alloc_clone/bio_init_clone.
> > I've been hoping to get rid of that, but if we have a clear use case
> > it will have to stay.
> 
> DM core is just using bio_alloc_clone. And bio_alloc_bioset() allows
> bdev to be NULL -- so you're likely referring to that (which will skip
> bio_init's bio_associate_blkg).
...

> diff --git a/block/bio.c b/block/bio.c
> index 7892f1108ca6..0340acc283a0 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -772,14 +772,16 @@ static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
> 	bio_set_flag(bio, BIO_CLONED);
> 	if (bio_flagged(bio_src, BIO_THROTTLED))
> 		bio_set_flag(bio, BIO_THROTTLED);
> -	if (bio->bi_bdev == bio_src->bi_bdev &&
> -	    bio_flagged(bio_src, BIO_REMAPPED))
> -		bio_set_flag(bio, BIO_REMAPPED);
> 	bio->bi_ioprio = bio_src->bi_ioprio;
> 	bio->bi_iter = bio_src->bi_iter;
> 
> -	bio_clone_blkg_association(bio, bio_src);
> -	blkcg_bio_issue_init(bio);
> +	if (bio->bi_bdev == bio_src->bi_bdev) {
> +		if (bio_flagged(bio_src, BIO_REMAPPED))
> +			bio_set_flag(bio, BIO_REMAPPED);
> +
> +		bio_clone_blkg_association(bio, bio_src);
> +		blkcg_bio_issue_init(bio);
> +	}
> 
> 	if (bio_crypt_clone(bio, bio_src, gfp) < 0)
> 		return -ENOMEM;
> 
> Think this will fix some of the performance penalty of redundant blkcg
> initialization that I reported (though like was also discussed: more
> work likely needed to further optimize bio_associate_blkg).

Looking closer at the case where bio_{alloc,init}_clone are passed a
bdev, bio_init() will call bio_associate_blkg() so the __bio_clone()
work to do anything with blkbg isn't needed at all. So this patch is
best:

diff --git a/block/bio.c b/block/bio.c
index 7892f1108ca6..6980f1b4b0f4 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -778,9 +778,6 @@ static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
	bio->bi_ioprio = bio_src->bi_ioprio;
	bio->bi_iter = bio_src->bi_iter;

-	bio_clone_blkg_association(bio, bio_src);
-	blkcg_bio_issue_init(bio);
-
	if (bio_crypt_clone(bio, bio_src, gfp) < 0)
		return -ENOMEM;
	if (bio_integrity(bio_src) &&

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH] block: remove redundant blk-cgroup init from __bio_clone
  2022-04-11 17:16                 ` [dm-devel] " Mike Snitzer
@ 2022-04-11 17:33                   ` Mike Snitzer
  -1 siblings, 0 replies; 28+ messages in thread
From: Mike Snitzer @ 2022-04-11 17:33 UTC (permalink / raw)
  To: linux-block; +Cc: Christoph Hellwig, Dennis Zhou, tj, axboe, dm-devel

When bio_{alloc,init}_clone are passed a bdev, bio_init() will call
bio_associate_blkg() so the __bio_clone() work to initialize blkcg
isn't needed.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 block/bio.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 7892f1108ca6..6980f1b4b0f4 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -778,9 +778,6 @@ static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
 	bio->bi_ioprio = bio_src->bi_ioprio;
 	bio->bi_iter = bio_src->bi_iter;
 
-	bio_clone_blkg_association(bio, bio_src);
-	blkcg_bio_issue_init(bio);
-
 	if (bio_crypt_clone(bio, bio_src, gfp) < 0)
 		return -ENOMEM;
 	if (bio_integrity(bio_src) &&
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [dm-devel] [PATCH] block: remove redundant blk-cgroup init from __bio_clone
@ 2022-04-11 17:33                   ` Mike Snitzer
  0 siblings, 0 replies; 28+ messages in thread
From: Mike Snitzer @ 2022-04-11 17:33 UTC (permalink / raw)
  To: linux-block; +Cc: Christoph Hellwig, Dennis Zhou, tj, axboe, dm-devel

When bio_{alloc,init}_clone are passed a bdev, bio_init() will call
bio_associate_blkg() so the __bio_clone() work to initialize blkcg
isn't needed.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 block/bio.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/block/bio.c b/block/bio.c
index 7892f1108ca6..6980f1b4b0f4 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -778,9 +778,6 @@ static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
 	bio->bi_ioprio = bio_src->bi_ioprio;
 	bio->bi_iter = bio_src->bi_iter;
 
-	bio_clone_blkg_association(bio, bio_src);
-	blkcg_bio_issue_init(bio);
-
 	if (bio_crypt_clone(bio, bio_src, gfp) < 0)
 		return -ENOMEM;
 	if (bio_integrity(bio_src) &&
-- 
2.30.0

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] block: remove redundant blk-cgroup init from __bio_clone
  2022-04-11 17:33                   ` [dm-devel] " Mike Snitzer
@ 2022-04-12  5:27                     ` Christoph Hellwig
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2022-04-12  5:27 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: linux-block, Christoph Hellwig, Dennis Zhou, tj, axboe, dm-devel

On Mon, Apr 11, 2022 at 01:33:58PM -0400, Mike Snitzer wrote:
> When bio_{alloc,init}_clone are passed a bdev, bio_init() will call
> bio_associate_blkg() so the __bio_clone() work to initialize blkcg
> isn't needed.

No, unfortunately it isn't as simple as that.  There are bios that do
not use the default cgroup and thus blkg, e.g. those that come from
cgroup writeback.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dm-devel] [PATCH] block: remove redundant blk-cgroup init from __bio_clone
@ 2022-04-12  5:27                     ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2022-04-12  5:27 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe, Christoph Hellwig, linux-block, dm-devel, tj, Dennis Zhou

On Mon, Apr 11, 2022 at 01:33:58PM -0400, Mike Snitzer wrote:
> When bio_{alloc,init}_clone are passed a bdev, bio_init() will call
> bio_associate_blkg() so the __bio_clone() work to initialize blkcg
> isn't needed.

No, unfortunately it isn't as simple as that.  There are bios that do
not use the default cgroup and thus blkg, e.g. those that come from
cgroup writeback.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dm-devel] [PATCH] block: remove redundant blk-cgroup init from __bio_clone
  2022-04-12  5:27                     ` [dm-devel] " Christoph Hellwig
@ 2022-04-12  7:52                       ` Dennis Zhou
  -1 siblings, 0 replies; 28+ messages in thread
From: Dennis Zhou @ 2022-04-12  7:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-block, tj, dm-devel, Mike Snitzer, axboe

On Mon, Apr 11, 2022 at 10:27:54PM -0700, Christoph Hellwig wrote:
> On Mon, Apr 11, 2022 at 01:33:58PM -0400, Mike Snitzer wrote:
> > When bio_{alloc,init}_clone are passed a bdev, bio_init() will call
> > bio_associate_blkg() so the __bio_clone() work to initialize blkcg
> > isn't needed.
> 
> No, unfortunately it isn't as simple as that.  There are bios that do
> not use the default cgroup and thus blkg, e.g. those that come from
> cgroup writeback.

Yeah I wasn't quite right earlier. But, the new api isn't in line with
the original semantics. Cloning the blkg preserves the original bios
request_queue which likely differs from the bdev passed into clone. This
means an IO might be charged to the wrong device.

So, the blkg combines the who, blkcg, and the where, the corresponding
request_queue. Before bios were inited in 2 phases:
    bio_alloc();
    bio_set_dev();

This meant at clone time, we didn't have the where, but the who was
encased in the blkg. So, after bio_clone_blkg_association() expected a
bio_set_dev() call which called bio_associate_blkg(). When the bio
already has a blkg, it attempts to reuse the blkcg while using the new
bdev to find the correct blkg.

The tricky part seems to be how to seamlessly expose the appropriate
blkcg without being intrusive to bio_alloc*() apis.

Regarding the NULL bdev, I think that works as long as we keep the
bio_clone_blkg_association() call to carry the correct blkcg to the
bio_set_dev() call.

Thanks,
Dennis

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] block: remove redundant blk-cgroup init from __bio_clone
@ 2022-04-12  7:52                       ` Dennis Zhou
  0 siblings, 0 replies; 28+ messages in thread
From: Dennis Zhou @ 2022-04-12  7:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Mike Snitzer, linux-block, tj, axboe, dm-devel

On Mon, Apr 11, 2022 at 10:27:54PM -0700, Christoph Hellwig wrote:
> On Mon, Apr 11, 2022 at 01:33:58PM -0400, Mike Snitzer wrote:
> > When bio_{alloc,init}_clone are passed a bdev, bio_init() will call
> > bio_associate_blkg() so the __bio_clone() work to initialize blkcg
> > isn't needed.
> 
> No, unfortunately it isn't as simple as that.  There are bios that do
> not use the default cgroup and thus blkg, e.g. those that come from
> cgroup writeback.

Yeah I wasn't quite right earlier. But, the new api isn't in line with
the original semantics. Cloning the blkg preserves the original bios
request_queue which likely differs from the bdev passed into clone. This
means an IO might be charged to the wrong device.

So, the blkg combines the who, blkcg, and the where, the corresponding
request_queue. Before bios were inited in 2 phases:
    bio_alloc();
    bio_set_dev();

This meant at clone time, we didn't have the where, but the who was
encased in the blkg. So, after bio_clone_blkg_association() expected a
bio_set_dev() call which called bio_associate_blkg(). When the bio
already has a blkg, it attempts to reuse the blkcg while using the new
bdev to find the correct blkg.

The tricky part seems to be how to seamlessly expose the appropriate
blkcg without being intrusive to bio_alloc*() apis.

Regarding the NULL bdev, I think that works as long as we keep the
bio_clone_blkg_association() call to carry the correct blkcg to the
bio_set_dev() call.

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] block: remove redundant blk-cgroup init from __bio_clone
  2022-04-11 17:33                   ` [dm-devel] " Mike Snitzer
@ 2022-04-23 16:55                     ` Christoph Hellwig
  -1 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2022-04-23 16:55 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: linux-block, Christoph Hellwig, Dennis Zhou, tj, axboe, dm-devel

So while I'm still diving into the somewhat invasive changes to
optimize some of the cloning all we might relaly need for your
use case should be this:

http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/blk-clone-no-bdev

Can you check if this helps you?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dm-devel] [PATCH] block: remove redundant blk-cgroup init from __bio_clone
@ 2022-04-23 16:55                     ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2022-04-23 16:55 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: axboe, Christoph Hellwig, linux-block, dm-devel, tj, Dennis Zhou

So while I'm still diving into the somewhat invasive changes to
optimize some of the cloning all we might relaly need for your
use case should be this:

http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/blk-clone-no-bdev

Can you check if this helps you?

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: block: remove redundant blk-cgroup init from __bio_clone
  2022-04-23 16:55                     ` [dm-devel] " Christoph Hellwig
@ 2022-04-26 17:30                       ` Mike Snitzer
  -1 siblings, 0 replies; 28+ messages in thread
From: Mike Snitzer @ 2022-04-26 17:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, linux-block, dm-devel, tj, Dennis Zhou

On Sat, Apr 23 2022 at 12:55P -0400,
Christoph Hellwig <hch@infradead.org> wrote:

> So while I'm still diving into the somewhat invasive changes to
> optimize some of the cloning all we might relaly need for your
> use case should be this:
> 
> http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/blk-clone-no-bdev
> 
> Can you check if this helps you?

Passing NULL will be useful for targets that do call bio_set_dev() but
there are some that inherit the original bdev.  I'll audit all targets
and sort out how best to handle the conditional initialization to
avoid needless work.

This change is useful and a requirement for relevant follow-on DM
changes:

Reviewed-by: Mike Snitzer <snitzer@kernel.org>

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [dm-devel] block: remove redundant blk-cgroup init from __bio_clone
@ 2022-04-26 17:30                       ` Mike Snitzer
  0 siblings, 0 replies; 28+ messages in thread
From: Mike Snitzer @ 2022-04-26 17:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: axboe, linux-block, dm-devel, Dennis Zhou, tj

On Sat, Apr 23 2022 at 12:55P -0400,
Christoph Hellwig <hch@infradead.org> wrote:

> So while I'm still diving into the somewhat invasive changes to
> optimize some of the cloning all we might relaly need for your
> use case should be this:
> 
> http://git.infradead.org/users/hch/block.git/shortlog/refs/heads/blk-clone-no-bdev
> 
> Can you check if this helps you?

Passing NULL will be useful for targets that do call bio_set_dev() but
there are some that inherit the original bdev.  I'll audit all targets
and sort out how best to handle the conditional initialization to
avoid needless work.

This change is useful and a requirement for relevant follow-on DM
changes:

Reviewed-by: Mike Snitzer <snitzer@kernel.org>

Thanks.

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-04-26 17:30 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-30 16:52 can we reduce bio_set_dev overhead due to bio_associate_blkg? Mike Snitzer
2022-03-30 16:52 ` [dm-devel] " Mike Snitzer
2022-03-30 12:28 ` Dennis Zhou
2022-03-30 12:28   ` [dm-devel] " Dennis Zhou
2022-03-31  4:39   ` Christoph Hellwig
2022-03-31  4:39     ` [dm-devel] " Christoph Hellwig
2022-03-31  5:52     ` Dennis Zhou
2022-03-31  5:52       ` [dm-devel] " Dennis Zhou
2022-03-31  9:15       ` Christoph Hellwig
2022-03-31  9:15         ` [dm-devel] " Christoph Hellwig
2022-04-08 15:42         ` Mike Snitzer
2022-04-08 15:42           ` [dm-devel] " Mike Snitzer
2022-04-09  5:15           ` Christoph Hellwig
2022-04-09  5:15             ` [dm-devel] " Christoph Hellwig
2022-04-11 16:58             ` Mike Snitzer
2022-04-11 16:58               ` [dm-devel] " Mike Snitzer
2022-04-11 17:16               ` Mike Snitzer
2022-04-11 17:16                 ` [dm-devel] " Mike Snitzer
2022-04-11 17:33                 ` [PATCH] block: remove redundant blk-cgroup init from __bio_clone Mike Snitzer
2022-04-11 17:33                   ` [dm-devel] " Mike Snitzer
2022-04-12  5:27                   ` Christoph Hellwig
2022-04-12  5:27                     ` [dm-devel] " Christoph Hellwig
2022-04-12  7:52                     ` Dennis Zhou
2022-04-12  7:52                       ` Dennis Zhou
2022-04-23 16:55                   ` Christoph Hellwig
2022-04-23 16:55                     ` [dm-devel] " Christoph Hellwig
2022-04-26 17:30                     ` Mike Snitzer
2022-04-26 17:30                       ` [dm-devel] " Mike Snitzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.