linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Damien Le Moal <Damien.LeMoal@wdc.com>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	Jens Axboe <axboe@kernel.dk>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"linux-ide@vger.kernel.org" <linux-ide@vger.kernel.org>,
	Hannes Reinecke <hare@suse.de>
Subject: Re: [PATCH v3 0/4] Initial support for multi-actuator HDDs
Date: Fri, 6 Aug 2021 04:05:30 +0000	[thread overview]
Message-ID: <DM6PR04MB7081398426CA28606DC39491E7F39@DM6PR04MB7081.namprd04.prod.outlook.com> (raw)
In-Reply-To: yq18s1ffdz7.fsf@ca-mkp.ca.oracle.com

On 2021/08/06 12:42, Martin K. Petersen wrote:
> 
> Damien,
> 
>> Single LUN multi-actuator hard-disks are cappable to seek and execute
>> multiple commands in parallel. This capability is exposed to the host
>> using the Concurrent Positioning Ranges VPD page (SCSI) and Log (ATA).
>> Each positioning range describes the contiguous set of LBAs that an
>> actuator serves.
> 
> I have to say that I prefer the multi-LUN model.

It is certainly easier: nothing to do :)
SATA, as usual, makes things harder...

> 
>> The first patch adds the block layer plumbing to expose concurrent
>> sector ranges of the device through sysfs as a sub-directory of the
>> device sysfs queue directory.
> 
> So how do you envision this range reporting should work when putting
> DM/MD on top of a multi-actuator disk?

The ranges are attached to the device request queue. So the DM/MD target driver
can use that information from the underlying devices for whatever possible
optimization. For the logical device exposed by the target driver, the ranges
are not limits so they are not inherited. As is, right now, DM target devices
will not show any range information for the logical devices they create, even if
the underlying devices have multiple ranges.

The DM/MD target driver is free to set any range information pertinent to the
target. E.g. dm-liear could set the range information corresponding to sector
chunks from different devices used to build the dm-linear device.

> And even without multi-actuator drives, how would you express concurrent
> ranges on a DM/MD device sitting on top of a several single-actuator
> devices?

Similar comment as above: it is up to the DM/MD target driver to decide if range
information can be useful. For dm-linear, there are obvious cases where it is.
Ex: 2 single actuator drives concatenated together can generate 2 ranges
similarly to a real split-actuator disk. Expressing the chunks of a dm-linear
setup as ranges may not always be possible though, that is, if we keep the
assumption that a range is independent from others in terms of command
execution. Ex: a dm-linear setup that shuffles a drive LBA mapping (high to low
and low to high) has no business showing sector ranges.

> While I appreciate that it is easy to just export what the hardware
> reports in sysfs, I also think we should consider how filesystems would
> use that information. And how things would work outside of the simple
> fs-on-top-of-multi-actuator-drive case.

Without any change anywhere in existing code (kernel and applications using raw
disk accesses), things will just work as is. The multi/split actuator drive will
behave as a single actuator drive, even for commands spanning range boundaries.
Your guess on potential IOPS gains is as good as mine in this case. Performance
will totally depend on the workload but will not be worse than an equivalent
single actuator disk.

FS block allocators can definitely use the range information to distribute
writes among actuators. For reads, well, gains will depend on the workload,
obviously, but optimizations at the block IO scheduler level can improve things
too, especially if the drive is being used at a QD beyond its capability (that
is, requests are accumulated in the IO scheduler).

Similar write optimization can be achieved by applications using block device
files directly. This series is intended for this case for now. FS and bloc IO
scheduler optimization can be added later.


-- 
Damien Le Moal
Western Digital Research

  reply	other threads:[~2021-08-06  4:05 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-26  1:38 [PATCH v3 0/4] Initial support for multi-actuator HDDs Damien Le Moal
2021-07-26  1:38 ` [PATCH v3 1/4] block: Add concurrent positioning ranges support Damien Le Moal
2021-07-26  7:33   ` Hannes Reinecke
2021-07-26  8:30     ` Damien Le Moal
2021-07-26  8:47       ` Hannes Reinecke
2021-07-26 11:33         ` Damien Le Moal
2021-07-27 14:07           ` Paolo Valente
2021-07-27 23:44             ` Damien Le Moal
2021-08-10  8:23   ` Christoph Hellwig
2021-08-10 11:03     ` Damien Le Moal
2021-08-10 16:02       ` hch
2021-08-10 23:46         ` Damien Le Moal
2021-07-26  1:38 ` [PATCH v3 2/4] scsi: sd: add " Damien Le Moal
2021-08-10  8:24   ` Christoph Hellwig
2021-07-26  1:38 ` [PATCH v3 3/4] libata: support concurrent positioning ranges log Damien Le Moal
2021-07-26  7:34   ` Hannes Reinecke
2021-08-10  8:26   ` Christoph Hellwig
2021-07-26  1:38 ` [PATCH v3 4/4] doc: document sysfs queue/cranges attributes Damien Le Moal
2021-07-26  7:35   ` Hannes Reinecke
2021-08-10  8:27   ` Christoph Hellwig
2021-08-10 11:04     ` Damien Le Moal
2021-07-28 22:59 ` [PATCH v3 0/4] Initial support for multi-actuator HDDs Damien Le Moal
2021-08-06  2:12 ` Damien Le Moal
2021-08-06  3:41 ` Martin K. Petersen
2021-08-06  4:05   ` Damien Le Moal [this message]
2021-08-06  8:35     ` Hannes Reinecke
2021-08-06  8:52       ` Damien Le Moal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM6PR04MB7081398426CA28606DC39491E7F39@DM6PR04MB7081.namprd04.prod.outlook.com \
    --to=damien.lemoal@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=hare@suse.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).