All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dennis Zhou <dennis@kernel.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: Mike Snitzer <snitzer@kernel.org>,
	tj@kernel.org, axboe@kernel.dk, linux-block@vger.kernel.org,
	dm-devel@redhat.com
Subject: Re: can we reduce bio_set_dev overhead due to bio_associate_blkg?
Date: Wed, 30 Mar 2022 22:52:13 -0700	[thread overview]
Message-ID: <YkVBjUy9GeSMbh5Q@fedora> (raw)
In-Reply-To: <YkUwmyrIqnRGIOHm@infradead.org>

Hello,

On Wed, Mar 30, 2022 at 09:39:55PM -0700, Christoph Hellwig wrote:
> On Wed, Mar 30, 2022 at 08:28:28AM -0400, Dennis Zhou wrote:
> > I think cloning is a special case that I might have gotten wrong. If
> > there is a bio_set_dev() call after each clone(), then the
> > bio_clone_blkg_association() is excess work. We'd need to audit how
> > bio_alloc_clone() is being used to be safe. Alternatively, we could opt
> > for a bio_alloc_clone_noblkg(), but that's a little bit uglier.
> 
> As of Linux 5.18, the cloning interfaces have changed and take
> a block devie that the clone is intended to be used for, and bio_set_dev
> is mostly (there is a few more sports to be cleaned up in
> dm/md/bcache/btrfs) only used for remapping to a new device.
> 

I took a quick look. It seems with the new interface,
bio_clone_blkg_association() is unnecessary given the correct
association should be derived from the bio_alloc*() calls with the
passed in bdev. Also, blkcg_bio_issue_init() in clone seems wrong.

Maybe the right thing to do here for md-linear and btrfs (what I've
looked at) is to delay cloning until the map occurs and the right device
is already in hand?

> That being said I've eyed the code in bio_associate_blkg a bit and
> I've been wondering about some of how it is implemented as well.
> 

I'm sure stuff has evolved since I've last been involved, but here is a
brief explanation of the initial story. I suspect most of it holds true.
Apologies if this isn't helpful.

For others, a blkcg is a block cgroup. A blkcg_gq, blkg for short, is
the marrying of a blkcg and a request_queue. It takes a reference on
both so IO associated with the cgroup is tracked to the appropriate
cgroup and prevents the request_queue from going away. Punted IOs go
here and writeback is managed here as well. On the hot path, this is the
tagging that blk-rq-qos stuff might depend on.

The lookup itself is handled by blkg_lookup() which is a radix tree
lookup of the request_queue. There is also a last hint which helps.
blkg's are percpu-refcounted.

In terms of lifetimes and pinning. child_blkcg pins parent_blkcgs in a
tree hierarchy up to the root_blkcg. blkgs pin the blkcg it's associated
to, the request_queue, and the blkg_parent (parent_blkcg and
request_queue). They die in hierarchical order, alive until all children
have passed.

If there's anything else I can try to help answer please let me know.

> Is recursive throttling really a thing?  i.e. we can have cgroup
> policies on the upper (e.g. dm) device and then again on the lower
> (e.g. nvme device)?  I think the code currently supports that, and
> if we want to keep that I don't really see much of a way to avoid
> the lookup, but maybe we cn make it faster.

I'm not sure. I've primarily dealt with physical devices. However, I'm
sure there are more complex setups that use it. Is it a good idea is
probably debatable.

Backing up though, I feel like the abstraction naturally alludes to this
multiple association because you don't necessarily know when you hit
physical devices until you finally submit through.

Thanks,
Dennis

WARNING: multiple messages have this Message-ID (diff)
From: Dennis Zhou <dennis@kernel.org>
To: Christoph Hellwig <hch@infradead.org>
Cc: tj@kernel.org, axboe@kernel.dk, dm-devel@redhat.com,
	Mike Snitzer <snitzer@kernel.org>,
	linux-block@vger.kernel.org
Subject: Re: [dm-devel] can we reduce bio_set_dev overhead due to bio_associate_blkg?
Date: Wed, 30 Mar 2022 22:52:13 -0700	[thread overview]
Message-ID: <YkVBjUy9GeSMbh5Q@fedora> (raw)
In-Reply-To: <YkUwmyrIqnRGIOHm@infradead.org>

Hello,

On Wed, Mar 30, 2022 at 09:39:55PM -0700, Christoph Hellwig wrote:
> On Wed, Mar 30, 2022 at 08:28:28AM -0400, Dennis Zhou wrote:
> > I think cloning is a special case that I might have gotten wrong. If
> > there is a bio_set_dev() call after each clone(), then the
> > bio_clone_blkg_association() is excess work. We'd need to audit how
> > bio_alloc_clone() is being used to be safe. Alternatively, we could opt
> > for a bio_alloc_clone_noblkg(), but that's a little bit uglier.
> 
> As of Linux 5.18, the cloning interfaces have changed and take
> a block devie that the clone is intended to be used for, and bio_set_dev
> is mostly (there is a few more sports to be cleaned up in
> dm/md/bcache/btrfs) only used for remapping to a new device.
> 

I took a quick look. It seems with the new interface,
bio_clone_blkg_association() is unnecessary given the correct
association should be derived from the bio_alloc*() calls with the
passed in bdev. Also, blkcg_bio_issue_init() in clone seems wrong.

Maybe the right thing to do here for md-linear and btrfs (what I've
looked at) is to delay cloning until the map occurs and the right device
is already in hand?

> That being said I've eyed the code in bio_associate_blkg a bit and
> I've been wondering about some of how it is implemented as well.
> 

I'm sure stuff has evolved since I've last been involved, but here is a
brief explanation of the initial story. I suspect most of it holds true.
Apologies if this isn't helpful.

For others, a blkcg is a block cgroup. A blkcg_gq, blkg for short, is
the marrying of a blkcg and a request_queue. It takes a reference on
both so IO associated with the cgroup is tracked to the appropriate
cgroup and prevents the request_queue from going away. Punted IOs go
here and writeback is managed here as well. On the hot path, this is the
tagging that blk-rq-qos stuff might depend on.

The lookup itself is handled by blkg_lookup() which is a radix tree
lookup of the request_queue. There is also a last hint which helps.
blkg's are percpu-refcounted.

In terms of lifetimes and pinning. child_blkcg pins parent_blkcgs in a
tree hierarchy up to the root_blkcg. blkgs pin the blkcg it's associated
to, the request_queue, and the blkg_parent (parent_blkcg and
request_queue). They die in hierarchical order, alive until all children
have passed.

If there's anything else I can try to help answer please let me know.

> Is recursive throttling really a thing?  i.e. we can have cgroup
> policies on the upper (e.g. dm) device and then again on the lower
> (e.g. nvme device)?  I think the code currently supports that, and
> if we want to keep that I don't really see much of a way to avoid
> the lookup, but maybe we cn make it faster.

I'm not sure. I've primarily dealt with physical devices. However, I'm
sure there are more complex setups that use it. Is it a good idea is
probably debatable.

Backing up though, I feel like the abstraction naturally alludes to this
multiple association because you don't necessarily know when you hit
physical devices until you finally submit through.

Thanks,
Dennis

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


  reply	other threads:[~2022-03-31  5:52 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-30 16:52 can we reduce bio_set_dev overhead due to bio_associate_blkg? Mike Snitzer
2022-03-30 16:52 ` [dm-devel] " Mike Snitzer
2022-03-30 12:28 ` Dennis Zhou
2022-03-30 12:28   ` [dm-devel] " Dennis Zhou
2022-03-31  4:39   ` Christoph Hellwig
2022-03-31  4:39     ` [dm-devel] " Christoph Hellwig
2022-03-31  5:52     ` Dennis Zhou [this message]
2022-03-31  5:52       ` Dennis Zhou
2022-03-31  9:15       ` Christoph Hellwig
2022-03-31  9:15         ` [dm-devel] " Christoph Hellwig
2022-04-08 15:42         ` Mike Snitzer
2022-04-08 15:42           ` [dm-devel] " Mike Snitzer
2022-04-09  5:15           ` Christoph Hellwig
2022-04-09  5:15             ` [dm-devel] " Christoph Hellwig
2022-04-11 16:58             ` Mike Snitzer
2022-04-11 16:58               ` [dm-devel] " Mike Snitzer
2022-04-11 17:16               ` Mike Snitzer
2022-04-11 17:16                 ` [dm-devel] " Mike Snitzer
2022-04-11 17:33                 ` [PATCH] block: remove redundant blk-cgroup init from __bio_clone Mike Snitzer
2022-04-11 17:33                   ` [dm-devel] " Mike Snitzer
2022-04-12  5:27                   ` Christoph Hellwig
2022-04-12  5:27                     ` [dm-devel] " Christoph Hellwig
2022-04-12  7:52                     ` Dennis Zhou
2022-04-12  7:52                       ` Dennis Zhou
2022-04-23 16:55                   ` Christoph Hellwig
2022-04-23 16:55                     ` [dm-devel] " Christoph Hellwig
2022-04-26 17:30                     ` Mike Snitzer
2022-04-26 17:30                       ` [dm-devel] " Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YkVBjUy9GeSMbh5Q@fedora \
    --to=dennis@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=snitzer@kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.