All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@redhat.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, martin.petersen@oracle.com,
	Hans de Goede <hdegoede@redhat.com>, Song Liu <song@kernel.org>,
	Richard Weinberger <richard@nod.at>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-raid@vger.kernel.org, Minchan Kim <minchan@kernel.org>,
	dm-devel@redhat.com, linux-mtd@lists.infradead.org,
	linux-mm@kvack.org, drbd-dev@tron.linbit.com,
	cgroups@vger.kernel.org
Subject: Re: [PATCH 06/14] block: lift setting the readahead size into the block layer
Date: Thu, 10 Sep 2020 13:15:41 -0400	[thread overview]
Message-ID: <20200910171541.GB21919@redhat.com> (raw)
In-Reply-To: <20200910092813.GA27229@lst.de>

On Thu, Sep 10 2020 at  5:28am -0400,
Christoph Hellwig <hch@lst.de> wrote:

> On Wed, Sep 02, 2020 at 12:20:07PM -0400, Mike Snitzer wrote:
> > On Wed, Sep 02 2020 at 11:11am -0400,
> > Christoph Hellwig <hch@lst.de> wrote:
> > 
> > > On Wed, Aug 26, 2020 at 06:07:38PM -0400, Mike Snitzer wrote:
> > > > On Sun, Jul 26 2020 at 11:03am -0400,
> > > > Christoph Hellwig <hch@lst.de> wrote:
> > > > 
> > > > > Drivers shouldn't really mess with the readahead size, as that is a VM
> > > > > concept.  Instead set it based on the optimal I/O size by lifting the
> > > > > algorithm from the md driver when registering the disk.  Also set
> > > > > bdi->io_pages there as well by applying the same scheme based on
> > > > > max_sectors.
> > > > > 
> > > > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > > > ---
> > > > >  block/blk-settings.c         |  5 ++---
> > > > >  block/blk-sysfs.c            |  1 -
> > > > >  block/genhd.c                | 13 +++++++++++--
> > > > >  drivers/block/aoe/aoeblk.c   |  2 --
> > > > >  drivers/block/drbd/drbd_nl.c | 12 +-----------
> > > > >  drivers/md/bcache/super.c    |  4 ----
> > > > >  drivers/md/dm-table.c        |  3 ---
> > > > >  drivers/md/raid0.c           | 16 ----------------
> > > > >  drivers/md/raid10.c          | 24 +-----------------------
> > > > >  drivers/md/raid5.c           | 13 +------------
> > > > >  10 files changed, 16 insertions(+), 77 deletions(-)
> > > > 
> > > > 
> > > > In general these changes need a solid audit relative to stacking
> > > > drivers.  That is, the limits stacking methods (blk_stack_limits)
> > > > vs lower level allocation methods (__device_add_disk).
> > > > 
> > > > You optimized for lowlevel __device_add_disk establishing the bdi's
> > > > ra_pages and io_pages.  That is at the beginning of disk allocation,
> > > > well before any build up of stacking driver's queue_io_opt() -- which
> > > > was previously done in disk_stack_limits or driver specific methods
> > > > (e.g. dm_table_set_restrictions) that are called _after_ all the limits
> > > > stacking occurs.
> > > > 
> > > > By inverting the setting of the bdi's ra_pages and io_pages to be done
> > > > so early in __device_add_disk it'll break properly setting these values
> > > > for at least DM afaict.
> > > 
> > > ra_pages never got inherited by stacking drivers, check it by modifying
> > > it on an underlying device and then creating a trivial dm or md one.
> > 
> > Sure, not saying that it did.  But if the goal is to set ra_pages based
> > on io_opt then to do that correctly on stacking drivers it must be done
> > in terms of limits stacking right?  Or at least done at a location that
> > is after the limits stacking has occurred?  So should DM just open-code
> > setting ra_pages like it did for io_pages?
> > 
> > Because setting ra_pages in __device_add_disk() is way too early for DM
> > -- given it uses device_add_disk_no_queue_reg via add_disk_no_queue_reg
> > at DM device creation (before stacking all underlying devices' limits).
> 
> I'll move it to blk_register_queue, which should work just fine.

That'll work for initial DM table load as part of DM device creation
(dm_setup_md_queue).  But it won't account for DM table reloads that
might change underlying devices on a live DM device (done using
__bind).

Both dm_setup_md_queue() and __bind() call dm_table_set_restrictions()
to set/update queue_limits.  It feels like __bind() will need to call a
new block helper to set/update parts of queue_limits (e.g. ra_pages and
io_pages).

Any chance you're open to factoring out that block function as an
exported symbol for use by blk_register_queue() and code like DM's
__bind()?

Thanks,
Mike


WARNING: multiple messages have this Message-ID (diff)
From: Mike Snitzer <snitzer@redhat.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-raid@vger.kernel.org, martin.petersen@oracle.com,
	Hans de Goede <hdegoede@redhat.com>,
	Richard Weinberger <richard@nod.at>,
	Minchan Kim <minchan@kernel.org>,
	drbd-dev@tron.linbit.com, linux-kernel@vger.kernel.org,
	linux-block@vger.kernel.org, Song Liu <song@kernel.org>,
	dm-devel@redhat.com, linux-mtd@lists.infradead.org,
	linux-fsdevel@vger.kernel.org, cgroups@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH 06/14] block: lift setting the readahead size into the block layer
Date: Thu, 10 Sep 2020 13:15:41 -0400	[thread overview]
Message-ID: <20200910171541.GB21919@redhat.com> (raw)
In-Reply-To: <20200910092813.GA27229@lst.de>

On Thu, Sep 10 2020 at  5:28am -0400,
Christoph Hellwig <hch@lst.de> wrote:

> On Wed, Sep 02, 2020 at 12:20:07PM -0400, Mike Snitzer wrote:
> > On Wed, Sep 02 2020 at 11:11am -0400,
> > Christoph Hellwig <hch@lst.de> wrote:
> > 
> > > On Wed, Aug 26, 2020 at 06:07:38PM -0400, Mike Snitzer wrote:
> > > > On Sun, Jul 26 2020 at 11:03am -0400,
> > > > Christoph Hellwig <hch@lst.de> wrote:
> > > > 
> > > > > Drivers shouldn't really mess with the readahead size, as that is a VM
> > > > > concept.  Instead set it based on the optimal I/O size by lifting the
> > > > > algorithm from the md driver when registering the disk.  Also set
> > > > > bdi->io_pages there as well by applying the same scheme based on
> > > > > max_sectors.
> > > > > 
> > > > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > > > ---
> > > > >  block/blk-settings.c         |  5 ++---
> > > > >  block/blk-sysfs.c            |  1 -
> > > > >  block/genhd.c                | 13 +++++++++++--
> > > > >  drivers/block/aoe/aoeblk.c   |  2 --
> > > > >  drivers/block/drbd/drbd_nl.c | 12 +-----------
> > > > >  drivers/md/bcache/super.c    |  4 ----
> > > > >  drivers/md/dm-table.c        |  3 ---
> > > > >  drivers/md/raid0.c           | 16 ----------------
> > > > >  drivers/md/raid10.c          | 24 +-----------------------
> > > > >  drivers/md/raid5.c           | 13 +------------
> > > > >  10 files changed, 16 insertions(+), 77 deletions(-)
> > > > 
> > > > 
> > > > In general these changes need a solid audit relative to stacking
> > > > drivers.  That is, the limits stacking methods (blk_stack_limits)
> > > > vs lower level allocation methods (__device_add_disk).
> > > > 
> > > > You optimized for lowlevel __device_add_disk establishing the bdi's
> > > > ra_pages and io_pages.  That is at the beginning of disk allocation,
> > > > well before any build up of stacking driver's queue_io_opt() -- which
> > > > was previously done in disk_stack_limits or driver specific methods
> > > > (e.g. dm_table_set_restrictions) that are called _after_ all the limits
> > > > stacking occurs.
> > > > 
> > > > By inverting the setting of the bdi's ra_pages and io_pages to be done
> > > > so early in __device_add_disk it'll break properly setting these values
> > > > for at least DM afaict.
> > > 
> > > ra_pages never got inherited by stacking drivers, check it by modifying
> > > it on an underlying device and then creating a trivial dm or md one.
> > 
> > Sure, not saying that it did.  But if the goal is to set ra_pages based
> > on io_opt then to do that correctly on stacking drivers it must be done
> > in terms of limits stacking right?  Or at least done at a location that
> > is after the limits stacking has occurred?  So should DM just open-code
> > setting ra_pages like it did for io_pages?
> > 
> > Because setting ra_pages in __device_add_disk() is way too early for DM
> > -- given it uses device_add_disk_no_queue_reg via add_disk_no_queue_reg
> > at DM device creation (before stacking all underlying devices' limits).
> 
> I'll move it to blk_register_queue, which should work just fine.

That'll work for initial DM table load as part of DM device creation
(dm_setup_md_queue).  But it won't account for DM table reloads that
might change underlying devices on a live DM device (done using
__bind).

Both dm_setup_md_queue() and __bind() call dm_table_set_restrictions()
to set/update queue_limits.  It feels like __bind() will need to call a
new block helper to set/update parts of queue_limits (e.g. ra_pages and
io_pages).

Any chance you're open to factoring out that block function as an
exported symbol for use by blk_register_queue() and code like DM's
__bind()?

Thanks,
Mike


______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

  reply	other threads:[~2020-09-10 17:19 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-26 15:03 bdi cleanups v3 Christoph Hellwig
2020-07-26 15:03 ` Christoph Hellwig
2020-07-26 15:03 ` [PATCH 01/14] fs: remove the unused SB_I_MULTIROOT flag Christoph Hellwig
2020-07-26 15:03   ` Christoph Hellwig
2020-07-26 15:03 ` [PATCH 03/14] drbd: remove RB_CONGESTED_REMOTE Christoph Hellwig
2020-07-26 15:03   ` Christoph Hellwig
     [not found] ` <20200726150333.305527-1-hch-jcswGhMUV9g@public.gmane.org>
2020-07-26 15:03   ` [PATCH 02/14] drbd: remove dead code in device_to_statistics Christoph Hellwig
2020-07-26 15:03     ` Christoph Hellwig
2020-07-26 15:03     ` Christoph Hellwig
2020-07-26 15:03   ` [PATCH 04/14] bdi: initialize ->ra_pages in bdi_init Christoph Hellwig
2020-07-26 15:03     ` Christoph Hellwig
2020-07-26 15:03     ` Christoph Hellwig
2020-07-26 15:03   ` [PATCH 05/14] md: update the optimal I/O size on reshape Christoph Hellwig
2020-07-26 15:03     ` Christoph Hellwig
2020-07-26 15:03     ` Christoph Hellwig
2020-07-26 15:03   ` [PATCH 06/14] block: lift setting the readahead size into the block layer Christoph Hellwig
2020-07-26 15:03     ` Christoph Hellwig
2020-07-26 15:03     ` Christoph Hellwig
2020-08-26 22:07     ` Mike Snitzer
2020-08-26 22:07       ` Mike Snitzer
2020-09-02 15:11       ` Christoph Hellwig
2020-09-02 15:11         ` Christoph Hellwig
2020-09-02 15:11         ` Christoph Hellwig
2020-09-02 16:20         ` Mike Snitzer
2020-09-02 16:20           ` Mike Snitzer
2020-09-10  9:28           ` Christoph Hellwig
2020-09-10  9:28             ` Christoph Hellwig
2020-09-10  9:28             ` Christoph Hellwig
2020-09-10 17:15             ` Mike Snitzer [this message]
2020-09-10 17:15               ` Mike Snitzer
2020-09-15  7:05               ` Christoph Hellwig
2020-09-15  7:05                 ` Christoph Hellwig
2020-09-15  7:05                 ` Christoph Hellwig
2020-07-26 15:03   ` [PATCH 07/14] block: make QUEUE_SYSFS_BIT_FNS a little more useful Christoph Hellwig
2020-07-26 15:03     ` Christoph Hellwig
2020-07-26 15:03     ` Christoph Hellwig
2020-07-29  8:57     ` [block] 54529aac98: fsmark.files_per_sec -13.6% regression kernel test robot
2020-07-29  8:57       ` kernel test robot
2020-07-30  2:14     ` [block] 54529aac98: blktests.block.005.fail kernel test robot
2020-07-30  2:14       ` kernel test robot
2020-07-26 15:03   ` [PATCH 08/14] block: add helper macros for queue sysfs entries Christoph Hellwig
2020-07-26 15:03     ` Christoph Hellwig
2020-07-26 15:03     ` Christoph Hellwig
2020-07-26 15:03   ` [PATCH 13/14] bdi: invert BDI_CAP_NO_ACCT_WB Christoph Hellwig
2020-07-26 15:03     ` Christoph Hellwig
2020-07-26 15:03     ` Christoph Hellwig
2020-07-26 15:03 ` [PATCH 09/14] bdi: remove BDI_CAP_CGROUP_WRITEBACK Christoph Hellwig
2020-07-26 15:03   ` Christoph Hellwig
2020-07-26 15:03 ` [PATCH 10/14] bdi: remove BDI_CAP_SYNCHRONOUS_IO Christoph Hellwig
2020-07-26 15:03   ` Christoph Hellwig
2020-07-26 19:06   ` Minchan Kim
2020-07-26 19:06     ` Minchan Kim
     [not found]     ` <20200726190639.GA560221-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2020-07-27  7:58       ` Christoph Hellwig
2020-07-27  7:58         ` Christoph Hellwig
2020-07-27  7:58         ` Christoph Hellwig
2020-07-26 15:03 ` [PATCH 11/14] mm: use SWP_SYNCHRONOUS_IO more intelligently Christoph Hellwig
2020-07-26 15:03   ` Christoph Hellwig
2020-07-26 15:03 ` [PATCH 12/14] bdi: replace BDI_CAP_STABLE_WRITES with a queue and a sb flag Christoph Hellwig
2020-07-26 15:03   ` Christoph Hellwig
2020-07-26 15:03 ` [PATCH 14/14] bdi: replace BDI_CAP_NO_{WRITEBACK,ACCT_DIRTY} with a single flag Christoph Hellwig
2020-07-26 15:03   ` [PATCH 14/14] bdi: replace BDI_CAP_NO_{WRITEBACK, ACCT_DIRTY} " Christoph Hellwig
2020-07-26 15:12 ` bdi cleanups v3 Christoph Hellwig
2020-07-26 15:12   ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2020-07-24  7:32 Christoph Hellwig
     [not found] ` <20200724073313.138789-1-hch-jcswGhMUV9g@public.gmane.org>
2020-07-24  7:33   ` [PATCH 06/14] block: lift setting the readahead size into the block layer Christoph Hellwig
2020-07-24  7:33     ` Christoph Hellwig
2020-07-24  7:33     ` Christoph Hellwig
2020-07-28 12:23     ` Johannes Thumshirn
2020-07-28 12:23       ` Johannes Thumshirn
2020-07-28 12:23       ` Johannes Thumshirn
2020-07-22  6:25 bdi cleanups v2 Christoph Hellwig
2020-07-22  6:25 ` [PATCH 06/14] block: lift setting the readahead size into the block layer Christoph Hellwig
2020-07-22  6:25   ` Christoph Hellwig
2020-07-22  7:13   ` Johannes Thumshirn
2020-07-22  7:13     ` Johannes Thumshirn
2020-07-22  7:13     ` Johannes Thumshirn
     [not found]     ` <SN4PR0401MB3598470B14C754768A2D8F389B790-OZENnpeWoeG1GMmP3NTBp/nBZW37Ciw+nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2020-07-22  7:18       ` Christoph Hellwig
2020-07-22  7:18         ` Christoph Hellwig
2020-07-22  7:18         ` Christoph Hellwig
2020-07-20  7:51 bdi cleanups Christoph Hellwig
2020-07-20  7:51 ` [PATCH 06/14] block: lift setting the readahead size into the block layer Christoph Hellwig
2020-07-20  7:51   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200910171541.GB21919@redhat.com \
    --to=snitzer@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=dm-devel@redhat.com \
    --cc=drbd-dev@tron.linbit.com \
    --cc=hch@lst.de \
    --cc=hdegoede@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=minchan@kernel.org \
    --cc=richard@nod.at \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.