All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tom Yan <tom.ty89@gmail.com>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Tejun Heo <tj@kernel.org>,
	jmoyer@redhat.com, axboe@fb.com, linux-ide@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-block@vger.kernel.org,
	Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Subject: Re: [PATCH v2 2/2] libata-core: do not set dev->max_sectors for LBA48 devices
Date: Thu, 11 Aug 2016 17:30:59 +0800	[thread overview]
Message-ID: <CAGnHSEn3pXR33Ju+mr0DYdE0G7kNhxNCbvocX1pZXvYfxx5+3A@mail.gmail.com> (raw)
In-Reply-To: <yq1twesq8lm.fsf@sermon.lab.mkp.net>

On 11 August 2016 at 11:37, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:
>
> I don't agree with conflating the optimal transfer size and the maximum
> supported ditto. Submitting the largest possible I/O to a device does
> not guarantee that you get the best overall performance.
>
>  - max_hw_sectors is gated by controller DMA constraints.
>
>  - max_dev_sectors is set for devices that explicitly report a transfer
>    length limit.
>
>  - max_sectors, the soft limit for filesystem read/write requests,
>    should be left at BLK_DEF_MAX_SECTORS unless the device explicitly
>    requests transfers to be aligned multiples of a different value
>    (typically the internal stripe size in large arrays).

Shouldn't we use Maximum Transfer Length to derive max_sectors (and
get rid of the almost useless max_dev_sectors)? Honestly it looks
pretty non-sensical to me that the SCSI disk driver uses Optimal
Transfer Length for max_sectors. Also in libata's case, this make
setting the effective max_sectors (e.g. see ATA_HORKAGE_MAX_SEC_LBA48)
impossible if we do not want to touch io_opt.

It would look to me that our block layer simply have a flawed design
if we really need to derive both io_opt and max_sectors from the same
field.

>
> The point of BLK_DEF_MAX_SECTORS is to offer a reasonable default for
> common workloads unless otherwise instructed by the storage device.
>
> We can have a discussion about what the right value for
> BLK_DEF_MAX_SECTORS should be. It has gone up over time but it used to
> be the case that permitting large transfers significantly impacted
> interactive I/O performance. And finding a sweet spot that works for a
> wide variety of hardware, interconnects and workloads is obviously
> non-trivial.
>

If BLK_DEF_MAX_SECTORS is supposed to be used as a fallback, then it
should be a safe value, especially when max_sectors_kb can be adjusted
through sysfs.

But the biggest problem isn't on bumping it, but the value picked is
totally irrational for a general default. I mean, given that it was
1024 (512k), try to double it? Fine. Try to quadruple it? Alright.
We'll need to deal with some alignment / boundary issue (like the
typical 65535 vs 65536 case)? Okay let's do it. But what's the sense
in picking a random RAID configuartion as the base to decide the
default? Also, if max_sectors need to concern about the number of
disks used and chunk sizes in a RAID configuartion, it should be
calculated in the device-mapper layer or a specific driver or so.
Changing a block layer default won't help anyway. Say 2560 will
accomodate a 10-disk 128k-chunk RAID. What about a 12-disk 128k-chunk
RAID then? Why not just decide the value base on an 8-disk 128k-chunk
RAID, which HAPPENED to be a double of 1024 as well?

It does not make sense that the SCSI disk driver uses it as the
fallback either. SCSI host templates that does not have max_sectors
set (as well as some specific driver) will use
SCSI_DEFAULT_MAX_SECTORS as the fallback, for such hosts
max_hw_sectors will be 1024, where the current BLK_DEF_MAX_SECTORS
cannot apply as max_sectors anyway. So we should use also
SCSI_DEFAULT_MAX_SECTORS in the SCSI disk driver as fallback for
max_sectors. If the value is considered to low even as a safe
fallback, then it should be bumped appropriately. (Or we might want to
replace it with BLK_DEF_MAX_SECTORS everywhere in the SCSI layer, that
said, after the value is fixed.)

> --
> Martin K. Petersen      Oracle Linux Engineering

  reply	other threads:[~2016-08-11  9:31 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-09 14:45 [PATCH v2 1/2] libata-scsi: set max_hw_sectors again only when dev->max_sectors is set tom.ty89
2016-08-09 14:45 ` [PATCH v2 2/2] libata-core: do not set dev->max_sectors for LBA48 devices tom.ty89
2016-08-09 16:50   ` Sergei Shtylyov
2016-08-10  4:10   ` Tejun Heo
2016-08-10  8:32     ` Tom Yan
2016-08-10 15:22       ` Tejun Heo
2016-08-11  3:37       ` Martin K. Petersen
2016-08-11  9:30         ` Tom Yan [this message]
2016-08-12  2:01           ` Martin K. Petersen
2016-08-12  5:18             ` Tom Yan
2016-08-12  8:17               ` Tom Yan
2016-08-12 21:06               ` Martin K. Petersen
2016-08-12  9:16             ` One Thousand Gnomes
2016-08-12 21:17               ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGnHSEn3pXR33Ju+mr0DYdE0G7kNhxNCbvocX1pZXvYfxx5+3A@mail.gmail.com \
    --to=tom.ty89@gmail.com \
    --cc=axboe@fb.com \
    --cc=jmoyer@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=sergei.shtylyov@cogentembedded.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.