All of lore.kernel.org
 help / color / mirror / Atom feed
* optimal io size / custom alignment
@ 2015-06-13 14:52 Tom Yan
  2015-06-15 13:31 ` Karel Zak
  0 siblings, 1 reply; 22+ messages in thread
From: Tom Yan @ 2015-06-13 14:52 UTC (permalink / raw)
  To: util-linux

As I have mentioned in previous mails, I have an sata/usb3 adapter
which could work in uas mode, and when it does, it has a weird optimal
i/o size:

Disk /dev/sdb: 74.5 GiB, 80026361856 bytes, 156301488 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 33553920 bytes

http://www.linuxquestions.org/questions/linux-newbie-8/how-to-foramt-2tb-external-hard-drive-4175529792/

In the above link, there shows another similar case of an external
drive with 4k physical sector.

I am not sure if there's anything wrong with the device(s) or the
kernel, but anyway I doubt if fdisk should determine alignment with
this size. As you can calculate, it may not necessarily be a multiple
of the size of physical sectors, or that of common erase block of SSDs
(which is not reported anywhere AFAIK).

Perhaps this I/O size does matter on alignment for certain cases, but
shouldn't physical sector or erase block be at least of higher
priority when it comes to alignment?

In any case, it would be nice if fdisk can allow customize alignment
(like gdisk does), so that users can at least decide how partitions
should be aligned in weird cases like this. With that, the long-time
deprecated "dos compatibility" might be able to go as well.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
  2015-06-13 14:52 optimal io size / custom alignment Tom Yan
@ 2015-06-15 13:31 ` Karel Zak
  2015-06-16  5:20   ` Tom Yan
  2015-07-12  4:19   ` optimal io size / custom alignment -- caution on custom aligns Linda Walsh
  0 siblings, 2 replies; 22+ messages in thread
From: Karel Zak @ 2015-06-15 13:31 UTC (permalink / raw)
  To: Tom Yan; +Cc: util-linux, Martin K. Petersen

On Sat, Jun 13, 2015 at 10:52:04PM +0800, Tom Yan wrote:
> As I have mentioned in previous mails, I have an sata/usb3 adapter
> which could work in uas mode, and when it does, it has a weird optimal
> i/o size:
> 
> Disk /dev/sdb: 74.5 GiB, 80026361856 bytes, 156301488 sectors
> Units: sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 512 bytes / 33553920 bytes

This is no problem (33553920 % 512 = 0) with the current kernel and
the current util-linux git tree where we support non power of 2
alignment.

> http://www.linuxquestions.org/questions/linux-newbie-8/how-to-foramt-2tb-external-hard-drive-4175529792/
> 
> In the above link, there shows another similar case of an external
> drive with 4k physical sector.

from the link:

    Sector size (logical/physical): 512 bytes / 4096 bytes
    I/O size (minimum/optimal): 4096 bytes / 33553920 bytes

this is problem (33553920 % 4096 != 0) and frankly it seems like
pretty strange thing, maybe kernel guys can comment it (CC: to
Martin).

> I am not sure if there's anything wrong with the device(s) or the
> kernel, but anyway I doubt if fdisk should determine alignment with
> this size. As you can calculate, it may not necessarily be a multiple
> of the size of physical sectors, or that of common erase block of SSDs
> (which is not reported anywhere AFAIK).
> 
> Perhaps this I/O size does matter on alignment for certain cases, but
> shouldn't physical sector or erase block be at least of higher
> priority when it comes to alignment?

I think we can test "optimal_io_size % physical_sector_size" and use physical
sector size as the granularity if the optimal_io_size is a strange number.

> In any case, it would be nice if fdisk can allow customize alignment
> (like gdisk does), so that users can at least decide how partitions
> should be aligned in weird cases like this. With that, the long-time
> deprecated "dos compatibility" might be able to go as well.

I'll think about it...

    Karel


-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
  2015-06-15 13:31 ` Karel Zak
@ 2015-06-16  5:20   ` Tom Yan
  2015-06-16  5:37     ` Tom Yan
                       ` (2 more replies)
  2015-07-12  4:19   ` optimal io size / custom alignment -- caution on custom aligns Linda Walsh
  1 sibling, 3 replies; 22+ messages in thread
From: Tom Yan @ 2015-06-16  5:20 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux, Martin K. Petersen

http://www.spinics.net/lists/linux-usb/msg125988.html

This optimal i/o size is derived from a "Optimal transfer length"
provided by the hardware through "VPD". The issue might not have
seemed common because not all drive provide VPDs and not all driver
reads them.

>From the adapter/drive I have, it is the same as the "Maximum transfer
length" and they seem to be simply limits of SCSI "WRITE SAME (10/16)"
command:

[tom@localhost ~]$ sudo sg_inq -p 0xb0 /dev/sdb
VPD INQUIRY: Block limits page (SBC)
  Maximum compare and write length: 0 blocks
  Optimal transfer length granularity: 1 blocks
  Maximum transfer length: 65535 blocks
  Optimal transfer length: 65535 blocks
  Maximum prefetch, xdread, xdwrite transfer length: 65535 blocks
  Maximum unmap LBA count: 0
  Maximum unmap block descriptor count: 0
  Optimal unmap granularity: 0
  Unmap granularity alignment valid: 0
  Unmap granularity alignment: 0
  Maximum write same length: 0x0 blocks
  Maximum atomic transfer length: 0
  Atomic alignment: 0
  Atomic transfer length granularity: 0

[tom@localhost ~]$ sudo sg_inq -p 0xb0 /dev/sdc
VPD INQUIRY: Block limits page (SBC)
  Maximum compare and write length: 0 blocks
  Optimal transfer length granularity: 0 blocks
  Maximum transfer length: 8388607 blocks
  Optimal transfer length: 8388607 blocks
  Maximum prefetch, xdread, xdwrite transfer length: 0 blocks

The thing is, why any io/transfer size/length should be considered
when it comes to partition alignment? From what I understand,
partition alignment is only to make sure partition starts at physical
boundaries of the disk because of the mismatch between logicial sector
(512 bytes) and physical sectors (4096 bytes) or pages/erase blocks of
SSDs.

On 15 June 2015 at 21:31, Karel Zak <kzak@redhat.com> wrote:
> On Sat, Jun 13, 2015 at 10:52:04PM +0800, Tom Yan wrote:
>> As I have mentioned in previous mails, I have an sata/usb3 adapter
>> which could work in uas mode, and when it does, it has a weird optimal
>> i/o size:
>>
>> Disk /dev/sdb: 74.5 GiB, 80026361856 bytes, 156301488 sectors
>> Units: sectors of 1 * 512 = 512 bytes
>> Sector size (logical/physical): 512 bytes / 512 bytes
>> I/O size (minimum/optimal): 512 bytes / 33553920 bytes
>
> This is no problem (33553920 % 512 = 0) with the current kernel and
> the current util-linux git tree where we support non power of 2
> alignment.
>
>> http://www.linuxquestions.org/questions/linux-newbie-8/how-to-foramt-2tb-external-hard-drive-4175529792/
>>
>> In the above link, there shows another similar case of an external
>> drive with 4k physical sector.
>
> from the link:
>
>     Sector size (logical/physical): 512 bytes / 4096 bytes
>     I/O size (minimum/optimal): 4096 bytes / 33553920 bytes
>
> this is problem (33553920 % 4096 != 0) and frankly it seems like
> pretty strange thing, maybe kernel guys can comment it (CC: to
> Martin).
>
>> I am not sure if there's anything wrong with the device(s) or the
>> kernel, but anyway I doubt if fdisk should determine alignment with
>> this size. As you can calculate, it may not necessarily be a multiple
>> of the size of physical sectors, or that of common erase block of SSDs
>> (which is not reported anywhere AFAIK).
>>
>> Perhaps this I/O size does matter on alignment for certain cases, but
>> shouldn't physical sector or erase block be at least of higher
>> priority when it comes to alignment?
>
> I think we can test "optimal_io_size % physical_sector_size" and use physical
> sector size as the granularity if the optimal_io_size is a strange number.
>
>> In any case, it would be nice if fdisk can allow customize alignment
>> (like gdisk does), so that users can at least decide how partitions
>> should be aligned in weird cases like this. With that, the long-time
>> deprecated "dos compatibility" might be able to go as well.
>
> I'll think about it...
>
>     Karel
>
>
> --
>  Karel Zak  <kzak@redhat.com>
>  http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
  2015-06-16  5:20   ` Tom Yan
@ 2015-06-16  5:37     ` Tom Yan
  2015-06-16  9:43     ` Karel Zak
  2015-06-16 17:08     ` Martin K. Petersen
  2 siblings, 0 replies; 22+ messages in thread
From: Tom Yan @ 2015-06-16  5:37 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux, Martin K. Petersen

I forgot to highlight that it might be a very bad idea to simply "mod
check" the physical sector size and the optimal i/o size. For one
physical sector size doesn't reflect anything of SSDs. Also as you can
see in my last mail, the optimal i/o size could be huge. (And since
the numbers seem to be "SCSI standards", I'll say it reflects that
they simply means nothing for partition alignment.)

IMHO we should find out in what case (if any) optimal i/o size REALLY
matters for partition alignment, and only use it to derive alignment
for those cases (only if they can be rationally differentiated).

On 16 June 2015 at 13:20, Tom Yan <tom.ty89@gmail.com> wrote:
> http://www.spinics.net/lists/linux-usb/msg125988.html
>
> This optimal i/o size is derived from a "Optimal transfer length"
> provided by the hardware through "VPD". The issue might not have
> seemed common because not all drive provide VPDs and not all driver
> reads them.
>
> From the adapter/drive I have, it is the same as the "Maximum transfer
> length" and they seem to be simply limits of SCSI "WRITE SAME (10/16)"
> command:
>
> [tom@localhost ~]$ sudo sg_inq -p 0xb0 /dev/sdb
> VPD INQUIRY: Block limits page (SBC)
>   Maximum compare and write length: 0 blocks
>   Optimal transfer length granularity: 1 blocks
>   Maximum transfer length: 65535 blocks
>   Optimal transfer length: 65535 blocks
>   Maximum prefetch, xdread, xdwrite transfer length: 65535 blocks
>   Maximum unmap LBA count: 0
>   Maximum unmap block descriptor count: 0
>   Optimal unmap granularity: 0
>   Unmap granularity alignment valid: 0
>   Unmap granularity alignment: 0
>   Maximum write same length: 0x0 blocks
>   Maximum atomic transfer length: 0
>   Atomic alignment: 0
>   Atomic transfer length granularity: 0
>
> [tom@localhost ~]$ sudo sg_inq -p 0xb0 /dev/sdc
> VPD INQUIRY: Block limits page (SBC)
>   Maximum compare and write length: 0 blocks
>   Optimal transfer length granularity: 0 blocks
>   Maximum transfer length: 8388607 blocks
>   Optimal transfer length: 8388607 blocks
>   Maximum prefetch, xdread, xdwrite transfer length: 0 blocks
>
> The thing is, why any io/transfer size/length should be considered
> when it comes to partition alignment? From what I understand,
> partition alignment is only to make sure partition starts at physical
> boundaries of the disk because of the mismatch between logicial sector
> (512 bytes) and physical sectors (4096 bytes) or pages/erase blocks of
> SSDs.
>
> On 15 June 2015 at 21:31, Karel Zak <kzak@redhat.com> wrote:
>> On Sat, Jun 13, 2015 at 10:52:04PM +0800, Tom Yan wrote:
>>> As I have mentioned in previous mails, I have an sata/usb3 adapter
>>> which could work in uas mode, and when it does, it has a weird optimal
>>> i/o size:
>>>
>>> Disk /dev/sdb: 74.5 GiB, 80026361856 bytes, 156301488 sectors
>>> Units: sectors of 1 * 512 = 512 bytes
>>> Sector size (logical/physical): 512 bytes / 512 bytes
>>> I/O size (minimum/optimal): 512 bytes / 33553920 bytes
>>
>> This is no problem (33553920 % 512 = 0) with the current kernel and
>> the current util-linux git tree where we support non power of 2
>> alignment.
>>
>>> http://www.linuxquestions.org/questions/linux-newbie-8/how-to-foramt-2tb-external-hard-drive-4175529792/
>>>
>>> In the above link, there shows another similar case of an external
>>> drive with 4k physical sector.
>>
>> from the link:
>>
>>     Sector size (logical/physical): 512 bytes / 4096 bytes
>>     I/O size (minimum/optimal): 4096 bytes / 33553920 bytes
>>
>> this is problem (33553920 % 4096 != 0) and frankly it seems like
>> pretty strange thing, maybe kernel guys can comment it (CC: to
>> Martin).
>>
>>> I am not sure if there's anything wrong with the device(s) or the
>>> kernel, but anyway I doubt if fdisk should determine alignment with
>>> this size. As you can calculate, it may not necessarily be a multiple
>>> of the size of physical sectors, or that of common erase block of SSDs
>>> (which is not reported anywhere AFAIK).
>>>
>>> Perhaps this I/O size does matter on alignment for certain cases, but
>>> shouldn't physical sector or erase block be at least of higher
>>> priority when it comes to alignment?
>>
>> I think we can test "optimal_io_size % physical_sector_size" and use physical
>> sector size as the granularity if the optimal_io_size is a strange number.
>>
>>> In any case, it would be nice if fdisk can allow customize alignment
>>> (like gdisk does), so that users can at least decide how partitions
>>> should be aligned in weird cases like this. With that, the long-time
>>> deprecated "dos compatibility" might be able to go as well.
>>
>> I'll think about it...
>>
>>     Karel
>>
>>
>> --
>>  Karel Zak  <kzak@redhat.com>
>>  http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
  2015-06-16  5:20   ` Tom Yan
  2015-06-16  5:37     ` Tom Yan
@ 2015-06-16  9:43     ` Karel Zak
  2015-06-16 10:22         ` Tom Yan
  2015-06-16 17:08     ` Martin K. Petersen
  2 siblings, 1 reply; 22+ messages in thread
From: Karel Zak @ 2015-06-16  9:43 UTC (permalink / raw)
  To: Tom Yan; +Cc: util-linux, Martin K. Petersen

On Tue, Jun 16, 2015 at 01:20:37PM +0800, Tom Yan wrote:
> The thing is, why any io/transfer size/length should be considered
> when it comes to partition alignment? From what I understand,
> partition alignment is only to make sure partition starts at physical
> boundaries of the disk because of the mismatch between logicial sector
> (512 bytes) and physical sectors (4096 bytes) or pages/erase blocks of
> SSDs.

It's more complicated, the I/O limits are the most important for RAIDs
where optimal I/O size is usually stripe size and you want to use it
for partitions alignment for better performance (if you align to
sector size then read/write on RAID maybe performed on more disks on
unaligned partitions). And it's not only fdisk who cares, it's also
important for mkfs.<type> (for example XFS align according to I/O limits).

And because all this is mess and sometimes HW does not provide
relevant information and because people use dd(1) to copy partition
tables we have decided to use 1MiB granularity if possible. If 1MiB is
useless then we use optimal_io_size, if undefined then minimal_io_size 
and if undefined then sector_size.

http://people.redhat.com/msnitzer/docs/io-limits.txt


Unfortunately the current code does not check if optimal_io_size makes
sense, so thing like 33553920 for 4k device is blindly accepted ;-(

    Karel


-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
@ 2015-06-16 10:22         ` Tom Yan
  0 siblings, 0 replies; 22+ messages in thread
From: Tom Yan @ 2015-06-16 10:22 UTC (permalink / raw)
  To: Karel Zak, linux-scsi; +Cc: util-linux, Martin K. Petersen

I heard about that it matters for RAID but since I don't really know
about RAID so I can't comment.

I do wonder whether the scsi disk driver should derive minimum/optimal
i/o size from VPD at all then. It might still be "tolerable" if it's
the limit of WRITE SAME(10), but definitely not if it's that of WRITE
SAME (16):

[tom@localhost ~]$ sudo fdisk /dev/sdc

Welcome to fdisk (util-linux 2.26.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0xccb261a9.

Command (m for help): p
Disk /dev/sdc: 29.2 GiB, 31376707072 bytes, 61282631 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 4294966784 bytes
Disklabel type: dos
Disk identifier: 0xccb261a9

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p):

Using default response p.
Partition number (1-4, default 1):
First sector (8388607-61282630, default 8388607):
Last sector, +sectors or +size{K,M,G,T,P} (8388607-61282630, default 61282630):

Created a new partition 1 of type 'Linux' and of size 25.2 GiB.

On 16 June 2015 at 17:43, Karel Zak <kzak@redhat.com> wrote:
> On Tue, Jun 16, 2015 at 01:20:37PM +0800, Tom Yan wrote:
>> The thing is, why any io/transfer size/length should be considered
>> when it comes to partition alignment? From what I understand,
>> partition alignment is only to make sure partition starts at physical
>> boundaries of the disk because of the mismatch between logicial sector
>> (512 bytes) and physical sectors (4096 bytes) or pages/erase blocks of
>> SSDs.
>
> It's more complicated, the I/O limits are the most important for RAIDs
> where optimal I/O size is usually stripe size and you want to use it
> for partitions alignment for better performance (if you align to
> sector size then read/write on RAID maybe performed on more disks on
> unaligned partitions). And it's not only fdisk who cares, it's also
> important for mkfs.<type> (for example XFS align according to I/O limits).
>
> And because all this is mess and sometimes HW does not provide
> relevant information and because people use dd(1) to copy partition
> tables we have decided to use 1MiB granularity if possible. If 1MiB is
> useless then we use optimal_io_size, if undefined then minimal_io_size
> and if undefined then sector_size.
>
> http://people.redhat.com/msnitzer/docs/io-limits.txt
>
>
> Unfortunately the current code does not check if optimal_io_size makes
> sense, so thing like 33553920 for 4k device is blindly accepted ;-(
>
>     Karel
>
>
> --
>  Karel Zak  <kzak@redhat.com>
>  http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
@ 2015-06-16 10:22         ` Tom Yan
  0 siblings, 0 replies; 22+ messages in thread
From: Tom Yan @ 2015-06-16 10:22 UTC (permalink / raw)
  To: Karel Zak, linux-scsi-u79uwXL29TY76Z2rM5mHXA
  Cc: util-linux-u79uwXL29TY76Z2rM5mHXA, Martin K. Petersen

I heard about that it matters for RAID but since I don't really know
about RAID so I can't comment.

I do wonder whether the scsi disk driver should derive minimum/optimal
i/o size from VPD at all then. It might still be "tolerable" if it's
the limit of WRITE SAME(10), but definitely not if it's that of WRITE
SAME (16):

[tom@localhost ~]$ sudo fdisk /dev/sdc

Welcome to fdisk (util-linux 2.26.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0xccb261a9.

Command (m for help): p
Disk /dev/sdc: 29.2 GiB, 31376707072 bytes, 61282631 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 4294966784 bytes
Disklabel type: dos
Disk identifier: 0xccb261a9

Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p):

Using default response p.
Partition number (1-4, default 1):
First sector (8388607-61282630, default 8388607):
Last sector, +sectors or +size{K,M,G,T,P} (8388607-61282630, default 61282630):

Created a new partition 1 of type 'Linux' and of size 25.2 GiB.

On 16 June 2015 at 17:43, Karel Zak <kzak-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Tue, Jun 16, 2015 at 01:20:37PM +0800, Tom Yan wrote:
>> The thing is, why any io/transfer size/length should be considered
>> when it comes to partition alignment? From what I understand,
>> partition alignment is only to make sure partition starts at physical
>> boundaries of the disk because of the mismatch between logicial sector
>> (512 bytes) and physical sectors (4096 bytes) or pages/erase blocks of
>> SSDs.
>
> It's more complicated, the I/O limits are the most important for RAIDs
> where optimal I/O size is usually stripe size and you want to use it
> for partitions alignment for better performance (if you align to
> sector size then read/write on RAID maybe performed on more disks on
> unaligned partitions). And it's not only fdisk who cares, it's also
> important for mkfs.<type> (for example XFS align according to I/O limits).
>
> And because all this is mess and sometimes HW does not provide
> relevant information and because people use dd(1) to copy partition
> tables we have decided to use 1MiB granularity if possible. If 1MiB is
> useless then we use optimal_io_size, if undefined then minimal_io_size
> and if undefined then sector_size.
>
> http://people.redhat.com/msnitzer/docs/io-limits.txt
>
>
> Unfortunately the current code does not check if optimal_io_size makes
> sense, so thing like 33553920 for 4k device is blindly accepted ;-(
>
>     Karel
>
>
> --
>  Karel Zak  <kzak-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>  http://karelzak.blogspot.com
--
To unsubscribe from this list: send the line "unsubscribe util-linux" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
  2015-06-16  5:20   ` Tom Yan
  2015-06-16  5:37     ` Tom Yan
  2015-06-16  9:43     ` Karel Zak
@ 2015-06-16 17:08     ` Martin K. Petersen
  2015-06-16 19:26         ` Tom Yan
  2 siblings, 1 reply; 22+ messages in thread
From: Martin K. Petersen @ 2015-06-16 17:08 UTC (permalink / raw)
  To: Tom Yan; +Cc: Karel Zak, util-linux, Martin K. Petersen

>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:

Tom> From the adapter/drive I have, it is the same as the "Maximum
Tom> transfer length" and they seem to be simply limits of SCSI "WRITE
Tom> SAME (10/16)" command:

The two values have nothing to do with each other. They just happen to
be the same in your case (65535 is the maximum block count for the WRITE
SAME(10) command).

Tom> [tom@localhost ~]$ sudo sg_inq -p 0xb0 /dev/sdb VPD INQUIRY: Block
Tom> limits page (SBC) Maximum compare and write length: 0 blocks
Tom> Optimal transfer length granularity: 1 blocks Maximum transfer
Tom> length: 65535 blocks Optimal transfer length: 65535 blocks

Your device sets the transfer length granularity to 1 logical block and
the optimal transfer length to 65535 logical blocks. If it then reports
a 4096-byte physical block size in response to READ CAPACITY(16) then
it's clearly on crack.

There's only so much we can do about devices that report garbage.

Also, the kernel only reports things. It is up to Karel to decide
whether to sanity check the values before he uses them.

I would probably err on the side of trusting the physical block size
reporting more than anything seeded from the Block Limits VPD. And in
this case, assuming the alignment offset is reported to be 0, I guess
one could entertain aligning to the nearest 4K boundary. But on the
other hand it'll quickly get hairy to have to maintain this kind of
heuristics.

The best fix, of course, is to complain to the manufacturer of your
broken widget and hope for a firmware upgrade. Failing that, adjust your
partitions manually.

Tom> The thing is, why any io/transfer size/length should be considered
Tom> when it comes to partition alignment? From what I understand,
Tom> partition alignment is only to make sure partition starts at
Tom> physical boundaries of the disk because of the mismatch between
Tom> logicial sector (512 bytes) and physical sectors (4096 bytes) or
Tom> pages/erase blocks of SSDs.

For RAID it makes a big difference to ensure the partition is aligned on
a stripe boundary.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
@ 2015-06-16 19:26         ` Tom Yan
  0 siblings, 0 replies; 22+ messages in thread
From: Tom Yan @ 2015-06-16 19:26 UTC (permalink / raw)
  To: Martin K. Petersen, linux-scsi; +Cc: Karel Zak, util-linux, linux-usb

On 17 June 2015 at 01:08, Martin K. Petersen <martin.petersen@oracle.com> wrote:
> The two values have nothing to do with each other. They just happen to
> be the same in your case (65535 is the maximum block count for the WRITE
> SAME(10) command).
>
> Your device sets the transfer length granularity to 1 logical block and
> the optimal transfer length to 65535 logical blocks. If it then reports
> a 4096-byte physical block size in response to READ CAPACITY(16) then
> it's clearly on crack.
>
> There's only so much we can do about devices that report garbage.

All drives I have are flash drives so none of them reports 4k physical
sectors. But it does seems possible in the case I linked. The thing is
these VPDs/transfer lengths are probably provided by the USB to
ATA(/SCSI?) bridges. I can't judge if they are wrong to set the
lengths that way but it seem to be a common practice. I have two USB
devices provide the SBC-2 (Block limit VPD), one is a SanDisk Extreme
USB (SDCZ80), another an Intel X25-M Gen1 on an ASMedia SATA adapter,
and both of them set the Optimal transfer length. The usb-storage
driver does not read vpd so it won't be a thing, but the the uas
driver does.

> Also, the kernel only reports things. It is up to Karel to decide
> whether to sanity check the values before he uses them.

I just feel like the kernel shouldn't bind values from totally
different source (raid stripe vs vpd limit) to the same variable. I
don't know if what else would make use of this variable but by only
considering the fdisk case, it seems the scsi disk driver should be
the one who should stop binding.

> The best fix, of course, is to complain to the manufacturer of your
> broken widget and hope for a firmware upgrade.

This is simply too idealistic especially when it seems that this issue
mostly happens on USB bridges. I am not even sure if the SCSI
standards has anything to say about this practice.

> Failing that, adjust your partitions manually.

Yeah that's why I said fdisk should allow custom alignment.

On 17 June 2015 at 01:08, Martin K. Petersen <martin.petersen@oracle.com> wrote:
>>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:
>
> Tom> From the adapter/drive I have, it is the same as the "Maximum
> Tom> transfer length" and they seem to be simply limits of SCSI "WRITE
> Tom> SAME (10/16)" command:
>
> The two values have nothing to do with each other. They just happen to
> be the same in your case (65535 is the maximum block count for the WRITE
> SAME(10) command).
>
> Tom> [tom@localhost ~]$ sudo sg_inq -p 0xb0 /dev/sdb VPD INQUIRY: Block
> Tom> limits page (SBC) Maximum compare and write length: 0 blocks
> Tom> Optimal transfer length granularity: 1 blocks Maximum transfer
> Tom> length: 65535 blocks Optimal transfer length: 65535 blocks
>
> Your device sets the transfer length granularity to 1 logical block and
> the optimal transfer length to 65535 logical blocks. If it then reports
> a 4096-byte physical block size in response to READ CAPACITY(16) then
> it's clearly on crack.
>
> There's only so much we can do about devices that report garbage.
>
> Also, the kernel only reports things. It is up to Karel to decide
> whether to sanity check the values before he uses them.
>
> I would probably err on the side of trusting the physical block size
> reporting more than anything seeded from the Block Limits VPD. And in
> this case, assuming the alignment offset is reported to be 0, I guess
> one could entertain aligning to the nearest 4K boundary. But on the
> other hand it'll quickly get hairy to have to maintain this kind of
> heuristics.
>
> The best fix, of course, is to complain to the manufacturer of your
> broken widget and hope for a firmware upgrade. Failing that, adjust your
> partitions manually.
>
> Tom> The thing is, why any io/transfer size/length should be considered
> Tom> when it comes to partition alignment? From what I understand,
> Tom> partition alignment is only to make sure partition starts at
> Tom> physical boundaries of the disk because of the mismatch between
> Tom> logicial sector (512 bytes) and physical sectors (4096 bytes) or
> Tom> pages/erase blocks of SSDs.
>
> For RAID it makes a big difference to ensure the partition is aligned on
> a stripe boundary.
>
> --
> Martin K. Petersen      Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
@ 2015-06-16 19:26         ` Tom Yan
  0 siblings, 0 replies; 22+ messages in thread
From: Tom Yan @ 2015-06-16 19:26 UTC (permalink / raw)
  To: Martin K. Petersen, linux-scsi-u79uwXL29TY76Z2rM5mHXA
  Cc: Karel Zak, util-linux-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA

On 17 June 2015 at 01:08, Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> The two values have nothing to do with each other. They just happen to
> be the same in your case (65535 is the maximum block count for the WRITE
> SAME(10) command).
>
> Your device sets the transfer length granularity to 1 logical block and
> the optimal transfer length to 65535 logical blocks. If it then reports
> a 4096-byte physical block size in response to READ CAPACITY(16) then
> it's clearly on crack.
>
> There's only so much we can do about devices that report garbage.

All drives I have are flash drives so none of them reports 4k physical
sectors. But it does seems possible in the case I linked. The thing is
these VPDs/transfer lengths are probably provided by the USB to
ATA(/SCSI?) bridges. I can't judge if they are wrong to set the
lengths that way but it seem to be a common practice. I have two USB
devices provide the SBC-2 (Block limit VPD), one is a SanDisk Extreme
USB (SDCZ80), another an Intel X25-M Gen1 on an ASMedia SATA adapter,
and both of them set the Optimal transfer length. The usb-storage
driver does not read vpd so it won't be a thing, but the the uas
driver does.

> Also, the kernel only reports things. It is up to Karel to decide
> whether to sanity check the values before he uses them.

I just feel like the kernel shouldn't bind values from totally
different source (raid stripe vs vpd limit) to the same variable. I
don't know if what else would make use of this variable but by only
considering the fdisk case, it seems the scsi disk driver should be
the one who should stop binding.

> The best fix, of course, is to complain to the manufacturer of your
> broken widget and hope for a firmware upgrade.

This is simply too idealistic especially when it seems that this issue
mostly happens on USB bridges. I am not even sure if the SCSI
standards has anything to say about this practice.

> Failing that, adjust your partitions manually.

Yeah that's why I said fdisk should allow custom alignment.

On 17 June 2015 at 01:08, Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>>>>>> "Tom" == Tom Yan <tom.ty89-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> Tom> From the adapter/drive I have, it is the same as the "Maximum
> Tom> transfer length" and they seem to be simply limits of SCSI "WRITE
> Tom> SAME (10/16)" command:
>
> The two values have nothing to do with each other. They just happen to
> be the same in your case (65535 is the maximum block count for the WRITE
> SAME(10) command).
>
> Tom> [tom@localhost ~]$ sudo sg_inq -p 0xb0 /dev/sdb VPD INQUIRY: Block
> Tom> limits page (SBC) Maximum compare and write length: 0 blocks
> Tom> Optimal transfer length granularity: 1 blocks Maximum transfer
> Tom> length: 65535 blocks Optimal transfer length: 65535 blocks
>
> Your device sets the transfer length granularity to 1 logical block and
> the optimal transfer length to 65535 logical blocks. If it then reports
> a 4096-byte physical block size in response to READ CAPACITY(16) then
> it's clearly on crack.
>
> There's only so much we can do about devices that report garbage.
>
> Also, the kernel only reports things. It is up to Karel to decide
> whether to sanity check the values before he uses them.
>
> I would probably err on the side of trusting the physical block size
> reporting more than anything seeded from the Block Limits VPD. And in
> this case, assuming the alignment offset is reported to be 0, I guess
> one could entertain aligning to the nearest 4K boundary. But on the
> other hand it'll quickly get hairy to have to maintain this kind of
> heuristics.
>
> The best fix, of course, is to complain to the manufacturer of your
> broken widget and hope for a firmware upgrade. Failing that, adjust your
> partitions manually.
>
> Tom> The thing is, why any io/transfer size/length should be considered
> Tom> when it comes to partition alignment? From what I understand,
> Tom> partition alignment is only to make sure partition starts at
> Tom> physical boundaries of the disk because of the mismatch between
> Tom> logicial sector (512 bytes) and physical sectors (4096 bytes) or
> Tom> pages/erase blocks of SSDs.
>
> For RAID it makes a big difference to ensure the partition is aligned on
> a stripe boundary.
>
> --
> Martin K. Petersen      Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe util-linux" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
  2015-06-16 19:26         ` Tom Yan
  (?)
@ 2015-06-16 21:28         ` Martin K. Petersen
  2015-06-17  9:49             ` Tom Yan
  -1 siblings, 1 reply; 22+ messages in thread
From: Martin K. Petersen @ 2015-06-16 21:28 UTC (permalink / raw)
  To: Tom Yan; +Cc: Martin K. Petersen, linux-scsi, Karel Zak, util-linux, linux-usb

>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:

Tom> All drives I have are flash drives so none of them reports 4k
Tom> physical sectors.

There are plenty of SSDs that report 4K physical sectors, fwiw.

Tom> The usb-storage driver does not read vpd so it won't be a thing,
Tom> but the the uas driver does.

We gave up on USB-SATA bridges long ago. Their designers appear to have
a pretty comprehensive misunderstanding of both the ATA and SCSI
protocols.

We had higher hopes for UAS since it provided a clean slate. So far,
however, the results are equally discouraging.

Tom> I just feel like the kernel shouldn't bind values from totally
Tom> different source (raid stripe vs vpd limit) to the same variable.

RAID devices communicate the stripe width through the Block Limits VPD.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
@ 2015-06-17  9:49             ` Tom Yan
  0 siblings, 0 replies; 22+ messages in thread
From: Tom Yan @ 2015-06-17  9:49 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, Karel Zak, util-linux, linux-usb

On 17 June 2015 at 05:28, Martin K. Petersen <martin.petersen@oracle.com> wrote:
> There are plenty of SSDs that report 4K physical sectors, fwiw.

Oh didn't know that. Wonder if it's yet another garbage info. Though
4k is often a nice value to make use of.

> We gave up on USB-SATA bridges long ago. Their designers appear to have
> a pretty comprehensive misunderstanding of both the ATA and SCSI
> protocols.

Aren't there tons of thumb drives make use of it anyway?

> Tom> I just feel like the kernel shouldn't bind values from totally
> Tom> different source (raid stripe vs vpd limit) to the same variable.
>
> RAID devices communicate the stripe width through the Block Limits VPD.

No I put it in the wrong way. What I meant was "sd vs md". For
example, couldn't the scsi disk driver bind the value it reads from
the VPD to another variable instead of "optimal i/o size", so that
this value would be exclusively for RAID (and other virtual devices)?
Is it even necessary for it to report? Because it seems only to make
this variable ambiguous.

If it HAS TO BE ambiguous, I see no reason why fdisk should use it to
derive the alignment. It should simply let the users do their
judgement and provide a way for them to adjust manually.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
@ 2015-06-17  9:49             ` Tom Yan
  0 siblings, 0 replies; 22+ messages in thread
From: Tom Yan @ 2015-06-17  9:49 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-scsi-u79uwXL29TY76Z2rM5mHXA, Karel Zak,
	util-linux-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA

On 17 June 2015 at 05:28, Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> There are plenty of SSDs that report 4K physical sectors, fwiw.

Oh didn't know that. Wonder if it's yet another garbage info. Though
4k is often a nice value to make use of.

> We gave up on USB-SATA bridges long ago. Their designers appear to have
> a pretty comprehensive misunderstanding of both the ATA and SCSI
> protocols.

Aren't there tons of thumb drives make use of it anyway?

> Tom> I just feel like the kernel shouldn't bind values from totally
> Tom> different source (raid stripe vs vpd limit) to the same variable.
>
> RAID devices communicate the stripe width through the Block Limits VPD.

No I put it in the wrong way. What I meant was "sd vs md". For
example, couldn't the scsi disk driver bind the value it reads from
the VPD to another variable instead of "optimal i/o size", so that
this value would be exclusively for RAID (and other virtual devices)?
Is it even necessary for it to report? Because it seems only to make
this variable ambiguous.

If it HAS TO BE ambiguous, I see no reason why fdisk should use it to
derive the alignment. It should simply let the users do their
judgement and provide a way for them to adjust manually.
--
To unsubscribe from this list: send the line "unsubscribe util-linux" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
@ 2015-06-18 21:01               ` Martin K. Petersen
  0 siblings, 0 replies; 22+ messages in thread
From: Martin K. Petersen @ 2015-06-18 21:01 UTC (permalink / raw)
  To: Tom Yan; +Cc: Martin K. Petersen, linux-scsi, Karel Zak, util-linux, linux-usb

>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:

Tom> No I put it in the wrong way. What I meant was "sd vs md". For
Tom> example, couldn't the scsi disk driver bind the value it reads from
Tom> the VPD to another variable instead of "optimal i/o size", so that
Tom> this value would be exclusively for RAID (and other virtual
Tom> devices)?

Who says that RAID is a virtual device? Hardware RAID controllers as
well as SAS, iSCSI and Fibre Channel disk arrays all use the Block
Limits VPD to communicate their preferred I/O size and alignment to
us. As do enterprise disk drives.

We deal with broken devices by blacklisting them. I suggest you try to
find a way we can reliably identify your UAS devices. If there is a
common pattern, we can entertain adding a workaround.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
@ 2015-06-18 21:01               ` Martin K. Petersen
  0 siblings, 0 replies; 22+ messages in thread
From: Martin K. Petersen @ 2015-06-18 21:01 UTC (permalink / raw)
  To: Tom Yan
  Cc: Martin K. Petersen, linux-scsi-u79uwXL29TY76Z2rM5mHXA, Karel Zak,
	util-linux-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA

>>>>> "Tom" == Tom Yan <tom.ty89-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

Tom> No I put it in the wrong way. What I meant was "sd vs md". For
Tom> example, couldn't the scsi disk driver bind the value it reads from
Tom> the VPD to another variable instead of "optimal i/o size", so that
Tom> this value would be exclusively for RAID (and other virtual
Tom> devices)?

Who says that RAID is a virtual device? Hardware RAID controllers as
well as SAS, iSCSI and Fibre Channel disk arrays all use the Block
Limits VPD to communicate their preferred I/O size and alignment to
us. As do enterprise disk drives.

We deal with broken devices by blacklisting them. I suggest you try to
find a way we can reliably identify your UAS devices. If there is a
common pattern, we can entertain adding a workaround.

-- 
Martin K. Petersen	Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
@ 2015-06-20 16:01                 ` Tom Yan
  0 siblings, 0 replies; 22+ messages in thread
From: Tom Yan @ 2015-06-20 16:01 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, Karel Zak, util-linux, linux-usb

I was not saying RAIDs are virtual devices. I just mentioned it
because I saw things like virtio-blk or zram use blk_queue_io_opt().

I know they all use VPDs, but the main point is whether those hardware
RAIDs or so are handled by sd_mod, and whether those "transfer
lengths" info are still important when it's just a simple drive. To me
they look like to be of different nature. That's why I think it's
inappropraite that they use the same "variable" / "file" to report
because that makes tools like fdisk have trouble determining when does
those values really matters.

In fact, (maybe I am just unlucky :P) VPDs of all my devices are to
some extent broken. I just found out today my Intel 530 SSD connecting
directly to SATA also reports totally garbage values for TRIM : (

To be honest the UAS thing doesn't really affect me a lot, I mostly
use gdisk now (which doesn't care about i/o size AFAIK). I can also
disable uas with the quirk so that VPDs are skipped when I really need
fdisk for msdos/mbr. It's just I think that it kind of reveal a
problem that has to be dealt with sooner or later, though you can
optimistically think that vendors would do better on VPDs in the
future.

On 19 June 2015 at 05:01, Martin K. Petersen <martin.petersen@oracle.com> wrote:
>>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:
>
> Tom> No I put it in the wrong way. What I meant was "sd vs md". For
> Tom> example, couldn't the scsi disk driver bind the value it reads from
> Tom> the VPD to another variable instead of "optimal i/o size", so that
> Tom> this value would be exclusively for RAID (and other virtual
> Tom> devices)?
>
> Who says that RAID is a virtual device? Hardware RAID controllers as
> well as SAS, iSCSI and Fibre Channel disk arrays all use the Block
> Limits VPD to communicate their preferred I/O size and alignment to
> us. As do enterprise disk drives.
>
> We deal with broken devices by blacklisting them. I suggest you try to
> find a way we can reliably identify your UAS devices. If there is a
> common pattern, we can entertain adding a workaround.
>
> --
> Martin K. Petersen      Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
@ 2015-06-20 16:01                 ` Tom Yan
  0 siblings, 0 replies; 22+ messages in thread
From: Tom Yan @ 2015-06-20 16:01 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-scsi-u79uwXL29TY76Z2rM5mHXA, Karel Zak,
	util-linux-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA

I was not saying RAIDs are virtual devices. I just mentioned it
because I saw things like virtio-blk or zram use blk_queue_io_opt().

I know they all use VPDs, but the main point is whether those hardware
RAIDs or so are handled by sd_mod, and whether those "transfer
lengths" info are still important when it's just a simple drive. To me
they look like to be of different nature. That's why I think it's
inappropraite that they use the same "variable" / "file" to report
because that makes tools like fdisk have trouble determining when does
those values really matters.

In fact, (maybe I am just unlucky :P) VPDs of all my devices are to
some extent broken. I just found out today my Intel 530 SSD connecting
directly to SATA also reports totally garbage values for TRIM : (

To be honest the UAS thing doesn't really affect me a lot, I mostly
use gdisk now (which doesn't care about i/o size AFAIK). I can also
disable uas with the quirk so that VPDs are skipped when I really need
fdisk for msdos/mbr. It's just I think that it kind of reveal a
problem that has to be dealt with sooner or later, though you can
optimistically think that vendors would do better on VPDs in the
future.

On 19 June 2015 at 05:01, Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>>>>>> "Tom" == Tom Yan <tom.ty89-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:
>
> Tom> No I put it in the wrong way. What I meant was "sd vs md". For
> Tom> example, couldn't the scsi disk driver bind the value it reads from
> Tom> the VPD to another variable instead of "optimal i/o size", so that
> Tom> this value would be exclusively for RAID (and other virtual
> Tom> devices)?
>
> Who says that RAID is a virtual device? Hardware RAID controllers as
> well as SAS, iSCSI and Fibre Channel disk arrays all use the Block
> Limits VPD to communicate their preferred I/O size and alignment to
> us. As do enterprise disk drives.
>
> We deal with broken devices by blacklisting them. I suggest you try to
> find a way we can reliably identify your UAS devices. If there is a
> common pattern, we can entertain adding a workaround.
>
> --
> Martin K. Petersen      Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe util-linux" in

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
@ 2015-06-21  0:12                   ` Martin K. Petersen
  0 siblings, 0 replies; 22+ messages in thread
From: Martin K. Petersen @ 2015-06-21  0:12 UTC (permalink / raw)
  To: Tom Yan; +Cc: Martin K. Petersen, linux-scsi, Karel Zak, util-linux, linux-usb

>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:

Tom> I know they all use VPDs, but the main point is whether those
Tom> hardware RAIDs or so are handled by sd_mod, and whether those
Tom> "transfer lengths" info are still important when it's just a simple
Tom> drive. To me they look like to be of different nature.

We don't know whether a discovered device is "a simple drive".

And once again: The whole point of the queue limit is to have an common
abstraction for all block devices.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
@ 2015-06-21  0:12                   ` Martin K. Petersen
  0 siblings, 0 replies; 22+ messages in thread
From: Martin K. Petersen @ 2015-06-21  0:12 UTC (permalink / raw)
  To: Tom Yan
  Cc: Martin K. Petersen, linux-scsi-u79uwXL29TY76Z2rM5mHXA, Karel Zak,
	util-linux-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA

>>>>> "Tom" == Tom Yan <tom.ty89-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

Tom> I know they all use VPDs, but the main point is whether those
Tom> hardware RAIDs or so are handled by sd_mod, and whether those
Tom> "transfer lengths" info are still important when it's just a simple
Tom> drive. To me they look like to be of different nature.

We don't know whether a discovered device is "a simple drive".

And once again: The whole point of the queue limit is to have an common
abstraction for all block devices.

-- 
Martin K. Petersen	Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe util-linux" in

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
@ 2015-06-22 14:32                   ` Alan Stern
  0 siblings, 0 replies; 22+ messages in thread
From: Alan Stern @ 2015-06-22 14:32 UTC (permalink / raw)
  To: Tom Yan; +Cc: Martin K. Petersen, linux-scsi, Karel Zak, util-linux, linux-usb

On Sun, 21 Jun 2015, Tom Yan wrote:

> I was not saying RAIDs are virtual devices. I just mentioned it
> because I saw things like virtio-blk or zram use blk_queue_io_opt().
> 
> I know they all use VPDs, but the main point is whether those hardware
> RAIDs or so are handled by sd_mod, and whether those "transfer
> lengths" info are still important when it's just a simple drive. To me
> they look like to be of different nature. That's why I think it's
> inappropraite that they use the same "variable" / "file" to report
> because that makes tools like fdisk have trouble determining when does
> those values really matters.
> 
> In fact, (maybe I am just unlucky :P) VPDs of all my devices are to
> some extent broken. I just found out today my Intel 530 SSD connecting
> directly to SATA also reports totally garbage values for TRIM : (
> 
> To be honest the UAS thing doesn't really affect me a lot, I mostly
> use gdisk now (which doesn't care about i/o size AFAIK). I can also
> disable uas with the quirk so that VPDs are skipped when I really need
> fdisk for msdos/mbr. It's just I think that it kind of reveal a
> problem that has to be dealt with sooner or later, though you can
> optimistically think that vendors would do better on VPDs in the
> future.

Regardless of all these issues, it is clear that a lot of devices don't 
implement the VPD data correctly.  Therefore the information in the 
kernel will often be wrong.

And consequently, fdisk needs to offer the user an option to override 
the default partition-alignment setting.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe util-linux" in

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment
@ 2015-06-22 14:32                   ` Alan Stern
  0 siblings, 0 replies; 22+ messages in thread
From: Alan Stern @ 2015-06-22 14:32 UTC (permalink / raw)
  To: Tom Yan
  Cc: Martin K. Petersen, linux-scsi-u79uwXL29TY76Z2rM5mHXA, Karel Zak,
	util-linux-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA

On Sun, 21 Jun 2015, Tom Yan wrote:

> I was not saying RAIDs are virtual devices. I just mentioned it
> because I saw things like virtio-blk or zram use blk_queue_io_opt().
> 
> I know they all use VPDs, but the main point is whether those hardware
> RAIDs or so are handled by sd_mod, and whether those "transfer
> lengths" info are still important when it's just a simple drive. To me
> they look like to be of different nature. That's why I think it's
> inappropraite that they use the same "variable" / "file" to report
> because that makes tools like fdisk have trouble determining when does
> those values really matters.
> 
> In fact, (maybe I am just unlucky :P) VPDs of all my devices are to
> some extent broken. I just found out today my Intel 530 SSD connecting
> directly to SATA also reports totally garbage values for TRIM : (
> 
> To be honest the UAS thing doesn't really affect me a lot, I mostly
> use gdisk now (which doesn't care about i/o size AFAIK). I can also
> disable uas with the quirk so that VPDs are skipped when I really need
> fdisk for msdos/mbr. It's just I think that it kind of reveal a
> problem that has to be dealt with sooner or later, though you can
> optimistically think that vendors would do better on VPDs in the
> future.

Regardless of all these issues, it is clear that a lot of devices don't 
implement the VPD data correctly.  Therefore the information in the 
kernel will often be wrong.

And consequently, fdisk needs to offer the user an option to override 
the default partition-alignment setting.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe util-linux" in

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: optimal io size / custom alignment -- caution on custom aligns
  2015-06-15 13:31 ` Karel Zak
  2015-06-16  5:20   ` Tom Yan
@ 2015-07-12  4:19   ` Linda Walsh
  1 sibling, 0 replies; 22+ messages in thread
From: Linda Walsh @ 2015-07-12  4:19 UTC (permalink / raw)
  To: Karel Zak; +Cc: Tom Yan, util-linux, Martin K. Petersen

Karel Zak wrote:
> On Sat, Jun 13, 2015 at 10:52:04PM +0800, Tom Yan wrote
>
> I think we can test "optimal_io_size % physical_sector_size" and use physical
> sector size as the granularity if the optimal_io_size is a strange number.
>
>> In any case, it would be nice if fdisk can allow customize alignment
>> (like gdisk does), so that users can at least decide how partitions
>> should be aligned in weird cases like this. With that, the long-time
>> deprecated "dos compatibility" might be able to go as well.
>
> I'll think about it...
>
>     Karel
> ---------------------
>   
I know it's been a while since the above note was written, but just
saw it reviewing old messages and thought I'd pass on a warning.


Warning...if your stripe size  (the strip on 1 disk x # data disks)
is not a power of 2, don't bother trying to make perl.

It uses the gnu DB libraries that choke on non-power-of-two
"optimal" I/O sizes (had a RAID 50 that I took 3, 4-data-spindle RAIDS
and striped them.  strip=64k, stripe=256k, optimal with 3 stripes was
listed @ 768k.  Several of the gnu libs used assumptions that the optimal
size would be a power of 2.  If not, the DB would become corrupt --
of course it was only my machine -- not until I read the code and saw
the power of 2 assumption....ARG!  Took almost a year for that, since first
version of perl I found it in was 5.14.  Wasn't fixed for 5.16 or 5.18...
dunnow about now.

Just a random caution...the bug was in the gdbm/ndbm code.  Sometime
later I needed a disk replacement.  went to a RAID10 -- 2 mirrors,
stripped.  The old bug hasn't cropped up since.   I recommended to
the perl folk that they should test for that case.  They didn't think
it would be a problem, given I was the only reportee... But others
may  have run into that or not used db's... but, the p5p teamm
didn't want the extra work so they closed the bug as invalid....
invalid?!




^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-07-12  4:40 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-13 14:52 optimal io size / custom alignment Tom Yan
2015-06-15 13:31 ` Karel Zak
2015-06-16  5:20   ` Tom Yan
2015-06-16  5:37     ` Tom Yan
2015-06-16  9:43     ` Karel Zak
2015-06-16 10:22       ` Tom Yan
2015-06-16 10:22         ` Tom Yan
2015-06-16 17:08     ` Martin K. Petersen
2015-06-16 19:26       ` Tom Yan
2015-06-16 19:26         ` Tom Yan
2015-06-16 21:28         ` Martin K. Petersen
2015-06-17  9:49           ` Tom Yan
2015-06-17  9:49             ` Tom Yan
2015-06-18 21:01             ` Martin K. Petersen
2015-06-18 21:01               ` Martin K. Petersen
2015-06-20 16:01               ` Tom Yan
2015-06-20 16:01                 ` Tom Yan
2015-06-21  0:12                 ` Martin K. Petersen
2015-06-21  0:12                   ` Martin K. Petersen
2015-06-22 14:32                 ` Alan Stern
2015-06-22 14:32                   ` Alan Stern
2015-07-12  4:19   ` optimal io size / custom alignment -- caution on custom aligns Linda Walsh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.