All of lore.kernel.org
 help / color / mirror / Atom feed
* LSF/MM Schedule and improving discard support
@ 2016-04-07 15:39 Bart Van Assche
  2016-04-07 15:51 ` James Bottomley
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2016-04-07 15:39 UTC (permalink / raw)
  To: James Bottomley, Jens Axboe; +Cc: linux-scsi, linux-block

Hello James,

Some time ago I proposed to discuss how to improve discard support 
during the LSF/MM (http://thread.gmane.org/gmane.linux.scsi/110048). I 
would appreciate it if this would be added to the LSF/MM agenda since 
there has been no progress yet for the patch series I posted in December 
2015.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LSF/MM Schedule and improving discard support
  2016-04-07 15:39 LSF/MM Schedule and improving discard support Bart Van Assche
@ 2016-04-07 15:51 ` James Bottomley
  2016-04-13 15:57   ` Bart Van Assche
  0 siblings, 1 reply; 11+ messages in thread
From: James Bottomley @ 2016-04-07 15:51 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe; +Cc: linux-scsi, linux-block

On Thu, 2016-04-07 at 08:39 -0700, Bart Van Assche wrote:
> Hello James,
> 
> Some time ago I proposed to discuss how to improve discard support 
> during the LSF/MM (http://thread.gmane.org/gmane.linux.scsi/110048). 
> I would appreciate it if this would be added to the LSF/MM agenda 
> since there has been no progress yet for the patch series I posted in
> December 2015.

Well, adding a cc to the lsf@ list to interest the attendees might have
been a good idea.

The basic problem with this topic is that it didn't really garner any
interest when you proposed it.  It also really just looks like there's
nothing to discuss: you just propose the unifying patch and people
discuss and modify that and it either gets accepted or not depending on
the level of the objections.

James



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LSF/MM Schedule and improving discard support
  2016-04-07 15:51 ` James Bottomley
@ 2016-04-13 15:57   ` Bart Van Assche
  2016-04-13 16:21     ` Martin K. Petersen
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2016-04-13 15:57 UTC (permalink / raw)
  To: James Bottomley, Bart Van Assche, Jens Axboe; +Cc: linux-scsi, linux-block, lsf

(+ LSF mailing list)

On 04/07/2016 08:51 AM, James Bottomley wrote:
> On Thu, 2016-04-07 at 08:39 -0700, Bart Van Assche wrote:
>> Some time ago I proposed to discuss how to improve discard support
>> during the LSF/MM (http://thread.gmane.org/gmane.linux.scsi/110048).
>> I would appreciate it if this would be added to the LSF/MM agenda
>> since there has been no progress yet for the patch series I posted in
>> December 2015.
>
> Well, adding a cc to the lsf@ list to interest the attendees might have
> been a good idea.
>
> The basic problem with this topic is that it didn't really garner any
> interest when you proposed it.  It also really just looks like there's
> nothing to discuss: you just propose the unifying patch and people
> discuss and modify that and it either gets accepted or not depending on
> the level of the objections.

Hello James,

There is something that should be discussed further, namely what the 
behavior of the BLKDISCARD and BLKSECDISCARD ioctls should be if the 
start and/or end sectors are not aligned on a discard boundary. Should 
such requests fail with an error code, should the non-aligned head and 
tail be zeroed or should the non-aligned parts be left unmodified?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LSF/MM Schedule and improving discard support
  2016-04-13 15:57   ` Bart Van Assche
@ 2016-04-13 16:21     ` Martin K. Petersen
  2016-04-13 16:29       ` Bart Van Assche
  0 siblings, 1 reply; 11+ messages in thread
From: Martin K. Petersen @ 2016-04-13 16:21 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, Jens Axboe, linux-scsi, linux-block, lsf,
	Darrick J. Wong

>>>>> "Bart" == Bart Van Assche <bart.vanassche@sandisk.com> writes:

Bart,

Bart> There is something that should be discussed further, namely what
Bart> the behavior of the BLKDISCARD and BLKSECDISCARD ioctls should be
Bart> if the start and/or end sectors are not aligned on a discard
Bart> boundary. Should such requests fail with an error code, should the
Bart> non-aligned head and tail be zeroed or should the non-aligned
Bart> parts be left unmodified?

I think the problem with your changes is that they mix upward and
downward-facing behaviors.

>From a filesystem/ioctl perspective, BLKDISCARD is a hint. We should not be
rounding off or aligning anything. DM or the thin provisioned device may
or may not ignore parts of the request. Their business. And since
discard is there to alleviate write amplification, it absolutely should
never cause any media writing to happen.

BLKZEROOUT, on the other hand, needs to provide hard guarantees. It is
up to the filesystem or ioctl caller to indicate whether allocation or
deallocation is preferred in this case. And then that should get turned
into a mix of WRITE, WRITE SAME or DISCARD commands based on alignment,
granularity and discard_zeroes_data reported by the device.

Darrick has been trying to clean all this up and I would like to see the
explicit distinction between allocate(anchor), deallocate(discard/unmap)
and zeroout (write same, unmap) in the upward-facing interfaces. And
then have those interfaces use WRITE SAME, DISCARD and ANCHOR to
implement their functionality based on the parameters reported by the
underlying device.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: LSF/MM Schedule and improving discard support
  2016-04-13 16:21     ` Martin K. Petersen
@ 2016-04-13 16:29       ` Bart Van Assche
  2016-04-13 16:43         ` [Lsf] " Martin K. Petersen
  2016-04-13 16:51         ` James Bottomley
  0 siblings, 2 replies; 11+ messages in thread
From: Bart Van Assche @ 2016-04-13 16:29 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, Jens Axboe, linux-scsi, linux-block, lsf,
	Darrick J. Wong

On 04/13/2016 09:21 AM, Martin K. Petersen wrote:
> From a filesystem/ioctl perspective, BLKDISCARD is a hint. We should not be
> rounding off or aligning anything.

Hello Martin,

Today if a BLKDISCARD ioctl passes a non-aligned start and/or end sector 
to the kernel then the block layer will submit invalid (non-aligned) 
REQ_DISCARD requests to the block driver the ioctl applies to. This is 
not acceptable. Does the above mean that you are proposing to fail such 
BLKDISCARD ioctls with an error code?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Lsf] LSF/MM Schedule and improving discard support
  2016-04-13 16:29       ` Bart Van Assche
@ 2016-04-13 16:43         ` Martin K. Petersen
  2016-04-13 16:57           ` Bart Van Assche
  2016-04-13 16:51         ` James Bottomley
  1 sibling, 1 reply; 11+ messages in thread
From: Martin K. Petersen @ 2016-04-13 16:43 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K. Petersen, linux-block, James Bottomley, linux-scsi,
	Darrick J. Wong, Jens Axboe, lsf

>>>>> "Bart" == Bart Van Assche <bart.vanassche@sandisk.com> writes:

Bart> Today if a BLKDISCARD ioctl passes a non-aligned start and/or end
Bart> sector to the kernel then the block layer will submit invalid
Bart> (non-aligned) REQ_DISCARD requests to the block driver the ioctl
Bart> applies to.

I do not know what you mean by "invalid".

A device that implements UNMAP can freely ignore any parts of a request
block range that are not aligned to its internal allocation units.

>From SBC: "An unmap request with a number of logical blocks that is not
a multiple of this value (OPTIMAL UNMAP GRANULARITY) may result in unmap
operations on fewer LBAs than requested." and "An unmap request with a
starting LBA that is not optimal may result in unmap operations on fewer
LBAs than requested."

The same is true for SATA which has no way report granularity and
alignment at all.

Nowhere does it say that a request that is not an aligned multiple of
any reported granularity should be considered "invalid" or rejected by
the storage.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Lsf] LSF/MM Schedule and improving discard support
  2016-04-13 16:29       ` Bart Van Assche
  2016-04-13 16:43         ` [Lsf] " Martin K. Petersen
@ 2016-04-13 16:51         ` James Bottomley
  2016-04-13 17:30           ` Darrick J. Wong
  1 sibling, 1 reply; 11+ messages in thread
From: James Bottomley @ 2016-04-13 16:51 UTC (permalink / raw)
  To: Bart Van Assche, Martin K. Petersen
  Cc: linux-block, linux-scsi, Darrick J. Wong, Jens Axboe, lsf

On Wed, 2016-04-13 at 09:29 -0700, Bart Van Assche wrote:
> On 04/13/2016 09:21 AM, Martin K. Petersen wrote:
> > From a filesystem/ioctl perspective, BLKDISCARD is a hint. We
> > should not be
> > rounding off or aligning anything.
> 
> Hello Martin,
> 
> Today if a BLKDISCARD ioctl passes a non-aligned start and/or end 
> sector to the kernel then the block layer will submit invalid (non
> -aligned) REQ_DISCARD requests to the block driver the ioctl applies 
> to. This is not acceptable. Does the above mean that you are 
> proposing to fail such BLKDISCARD ioctls with an error code?

The answer would be of course not.  discard is a hint so malformed
discard gets ignored by the device and success is returned because you
can't oblige devices to obey hints (that's why they're called hints).

However, the problem of needing a mandatory discard for scrubbing
blocks is part of the fallocate discussion, I think.

James



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Lsf] LSF/MM Schedule and improving discard support
  2016-04-13 16:43         ` [Lsf] " Martin K. Petersen
@ 2016-04-13 16:57           ` Bart Van Assche
  2016-04-13 17:13             ` Martin K. Petersen
  0 siblings, 1 reply; 11+ messages in thread
From: Bart Van Assche @ 2016-04-13 16:57 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-block, James Bottomley, linux-scsi, Darrick J. Wong,
	Jens Axboe, lsf

On 04/13/2016 09:43 AM, Martin K. Petersen wrote:
>>>>>> "Bart" == Bart Van Assche <bart.vanassche@sandisk.com> writes:
>
> Bart> Today if a BLKDISCARD ioctl passes a non-aligned start and/or end
> Bart> sector to the kernel then the block layer will submit invalid
> Bart> (non-aligned) REQ_DISCARD requests to the block driver the ioctl
> Bart> applies to.
>
> I do not know what you mean by "invalid".
>
> A device that implements UNMAP can freely ignore any parts of a request
> block range that are not aligned to its internal allocation units.
>
>  From SBC: "An unmap request with a number of logical blocks that is not
> a multiple of this value (OPTIMAL UNMAP GRANULARITY) may result in unmap
> operations on fewer LBAs than requested." and "An unmap request with a
> starting LBA that is not optimal may result in unmap operations on fewer
> LBAs than requested."
>
> The same is true for SATA which has no way report granularity and
> alignment at all.
>
> Nowhere does it say that a request that is not an aligned multiple of
> any reported granularity should be considered "invalid" or rejected by
> the storage.

Hello Martin,

Sorry if I wasn't clear enough. Regarding the SCSI UNMAP command, 
consider e.g. the following code from drivers/scsi/sd.c:

static int sd_setup_discard_cmnd(struct scsi_cmnd *cmd)
{
	[ ... ]
	sector >>= ilog2(sdp->sector_size) - 9;
	nr_sectors >>= ilog2(sdp->sector_size) - 9;
	[ ... ]
	case SD_LBP_UNMAP:
		[ ... ]
		cmd->cmd_len = 10;
		cmd->cmnd[0] = UNMAP;
		cmd->cmnd[8] = 24;
		put_unaligned_be16(6 + 16, &buf[0]);
		put_unaligned_be16(16, &buf[2]);
		put_unaligned_be64(sector, &buf[8]);
		put_unaligned_be32(nr_sectors, &buf[16]);
		[ ... ]
}

For sdp->sector_size > 512, should we allow the block layer to pass 
sector and nr_sector values to this function that are not a multiple of 
sdp->sector_size? And if so, how should this code behave and if sector 
and/or nr_sectors is not a multiple of sdp->sector_size? As one can see 
the above code rounds down sector and nr_sectors while converting from 
sectors to logical blocks. This means that if a non-aligned BLKDISCARD 
is submitted from user space that one or more sectors *before* the start 
of the range specified in the ioctl will be discarded. Isn't that a bug?

Bart.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Lsf] LSF/MM Schedule and improving discard support
  2016-04-13 16:57           ` Bart Van Assche
@ 2016-04-13 17:13             ` Martin K. Petersen
  0 siblings, 0 replies; 11+ messages in thread
From: Martin K. Petersen @ 2016-04-13 17:13 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K. Petersen, linux-block, James Bottomley, linux-scsi,
	Darrick J. Wong, Jens Axboe, lsf

>>>>> "Bart" == Bart Van Assche <bart.vanassche@sandisk.com> writes:

Bart> And if so, how should this code behave and if sector and/or
Bart> nr_sectors is not a multiple of sdp->sector_size? As one can see
Bart> the above code rounds down sector and nr_sectors while converting
Bart> from sectors to logical blocks. This means that if a non-aligned
Bart> BLKDISCARD is submitted from user space that one or more sectors
Bart> *before* the start of the range specified in the ioctl will be
Bart> discarded. Isn't that a bug?

Yes. But Darrick already addressed that, I believe.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Lsf] LSF/MM Schedule and improving discard support
  2016-04-13 16:51         ` James Bottomley
@ 2016-04-13 17:30           ` Darrick J. Wong
  2016-04-13 22:04             ` Douglas Gilbert
  0 siblings, 1 reply; 11+ messages in thread
From: Darrick J. Wong @ 2016-04-13 17:30 UTC (permalink / raw)
  To: James Bottomley
  Cc: Bart Van Assche, Martin K. Petersen, linux-block, linux-scsi,
	Jens Axboe, lsf

On Wed, Apr 13, 2016 at 09:51:04AM -0700, James Bottomley wrote:
> On Wed, 2016-04-13 at 09:29 -0700, Bart Van Assche wrote:
> > On 04/13/2016 09:21 AM, Martin K. Petersen wrote:
> > > From a filesystem/ioctl perspective, BLKDISCARD is a hint. We
> > > should not be
> > > rounding off or aligning anything.
> > 
> > Hello Martin,
> > 
> > Today if a BLKDISCARD ioctl passes a non-aligned start and/or end 
> > sector to the kernel then the block layer will submit invalid (non
> > -aligned) REQ_DISCARD requests to the block driver the ioctl applies 
> > to. This is not acceptable. Does the above mean that you are 
> > proposing to fail such BLKDISCARD ioctls with an error code?
> 
> The answer would be of course not.  discard is a hint so malformed
> discard gets ignored by the device and success is returned because you
> can't oblige devices to obey hints (that's why they're called hints).

Agree.  For blockdev FALLOC_FL_PUNCH_HOLE I think we can simply check for
logical block size ("lbs") alignment and then pass the request to the
device with the understanding that it can do as it pleases.  We asked the
device to try to deallocate blocks, and perhaps it cannot.

Just to be clear, this only applies to zeroing discard; the "discard and who
knows what you can now read back" thing that nobody likes has been temporarily
wired up to FALLOC_FL_PUNCH_HOLE | FALLOC_FL_NO_HIDE_STALE. :)

> However, the problem of needing a mandatory discard for scrubbing
> blocks is part of the fallocate discussion, I think.

The third fallocate mode (FALLOC_FL_ZERO_RANGE) doesn't fit with the phrase
"mandatory discard for scrubbing blocks", though if one removed "discard" from
that phrase then it would.  The only thing that ZERO_RANGE guarantees is that
subsequent reads return zeroes.  XFS punches the entire range and reallocates
it with unwritten extents; ext4 fills the holes in the range with unwritten
extents and converts real extents to unwritten.  Both also write zeroes to any
part of the range that doesn't align to an FS block.

Yes, I think there are several questions to resolve here for mandatory zeroing
with FALLOC_FL_ZERO_RANGE (summarizing the issues I've come up with so far):

a) Should blockdev fallocate accept byte-granular offset/length arguments, even
if it has to use the page cache to write zeroes to the device?  This is what
file fallocate does today.

b) If blockdev fallocate does impose alignment requirements, should it return
EINVAL to a request that isn't aligned to the logical block size?

c) If a device really really prefers that its requests are aligned to
min_io_size (which can be much larger than the logical block size), should it
reject requests that aren't aligned to min_io?  Or perhaps it should take care
of the alignment problems on its own somehow?

For allocate mode (the thing Mike Snitzer brought up in another thread
yesterday), the alignment problems are much easier because we're allowed to
round the start down and the end up to fit whatever alignment we require.

Should we promote this to a storage track session at LSF next week?

--D

> 
> James
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Lsf] LSF/MM Schedule and improving discard support
  2016-04-13 17:30           ` Darrick J. Wong
@ 2016-04-13 22:04             ` Douglas Gilbert
  0 siblings, 0 replies; 11+ messages in thread
From: Douglas Gilbert @ 2016-04-13 22:04 UTC (permalink / raw)
  To: Darrick J. Wong, James Bottomley
  Cc: Bart Van Assche, Martin K. Petersen, linux-block, linux-scsi,
	Jens Axboe, lsf

On 16-04-13 01:30 PM, Darrick J. Wong wrote:
> On Wed, Apr 13, 2016 at 09:51:04AM -0700, James Bottomley wrote:
>> On Wed, 2016-04-13 at 09:29 -0700, Bart Van Assche wrote:
>>> On 04/13/2016 09:21 AM, Martin K. Petersen wrote:
>>>>  From a filesystem/ioctl perspective, BLKDISCARD is a hint. We
>>>> should not be
>>>> rounding off or aligning anything.
>>>
>>> Hello Martin,
>>>
>>> Today if a BLKDISCARD ioctl passes a non-aligned start and/or end
>>> sector to the kernel then the block layer will submit invalid (non
>>> -aligned) REQ_DISCARD requests to the block driver the ioctl applies
>>> to. This is not acceptable. Does the above mean that you are
>>> proposing to fail such BLKDISCARD ioctls with an error code?
>>
>> The answer would be of course not.  discard is a hint so malformed
>> discard gets ignored by the device and success is returned because you
>> can't oblige devices to obey hints (that's why they're called hints).
>
> Agree.  For blockdev FALLOC_FL_PUNCH_HOLE I think we can simply check for
> logical block size ("lbs") alignment and then pass the request to the
> device with the understanding that it can do as it pleases.  We asked the
> device to try to deallocate blocks, and perhaps it cannot.
>
> Just to be clear, this only applies to zeroing discard; the "discard and who
> knows what you can now read back" thing that nobody likes has been temporarily
> wired up to FALLOC_FL_PUNCH_HOLE | FALLOC_FL_NO_HIDE_STALE. :)

In May last year, T10 added another wrinkle when they expanded the LBPRZ
field from 1 to 3 bits (in the LBP VPD page but _not_ in the READ
CAPACITY(16) response). The expansion is to allow a new response when
an unmapped logical block is read: return a "provisioning initialization
pattern". That new piece of jargon is defined as a "non-zero pattern that
is the length of one logical block".

It seems that the "provisioning initialization pattern" is the same for
every unmapped logical block and is chosen by the manufacturer. It can
be read with the new REPORT PROVISIONING INITIALIZATION PATTERN command.
If LBPRZ=2 and FORMAT UNIT is called with an "initialization pattern"
equal to the disk's "provisioning initialization pattern" then all
logical blocks are unmapped. Clear?

Doug Gilbert

>> However, the problem of needing a mandatory discard for scrubbing
>> blocks is part of the fallocate discussion, I think.
>
> The third fallocate mode (FALLOC_FL_ZERO_RANGE) doesn't fit with the phrase
> "mandatory discard for scrubbing blocks", though if one removed "discard" from
> that phrase then it would.  The only thing that ZERO_RANGE guarantees is that
> subsequent reads return zeroes.  XFS punches the entire range and reallocates
> it with unwritten extents; ext4 fills the holes in the range with unwritten
> extents and converts real extents to unwritten.  Both also write zeroes to any
> part of the range that doesn't align to an FS block.
>
> Yes, I think there are several questions to resolve here for mandatory zeroing
> with FALLOC_FL_ZERO_RANGE (summarizing the issues I've come up with so far):
>
> a) Should blockdev fallocate accept byte-granular offset/length arguments, even
> if it has to use the page cache to write zeroes to the device?  This is what
> file fallocate does today.
>
> b) If blockdev fallocate does impose alignment requirements, should it return
> EINVAL to a request that isn't aligned to the logical block size?
>
> c) If a device really really prefers that its requests are aligned to
> min_io_size (which can be much larger than the logical block size), should it
> reject requests that aren't aligned to min_io?  Or perhaps it should take care
> of the alignment problems on its own somehow?
>
> For allocate mode (the thing Mike Snitzer brought up in another thread
> yesterday), the alignment problems are much easier because we're allowed to
> round the start down and the end up to fit whatever alignment we require.
>
> Should we promote this to a storage track session at LSF next week?
>
> --D
>
>>
>> James
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-04-13 22:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-07 15:39 LSF/MM Schedule and improving discard support Bart Van Assche
2016-04-07 15:51 ` James Bottomley
2016-04-13 15:57   ` Bart Van Assche
2016-04-13 16:21     ` Martin K. Petersen
2016-04-13 16:29       ` Bart Van Assche
2016-04-13 16:43         ` [Lsf] " Martin K. Petersen
2016-04-13 16:57           ` Bart Van Assche
2016-04-13 17:13             ` Martin K. Petersen
2016-04-13 16:51         ` James Bottomley
2016-04-13 17:30           ` Darrick J. Wong
2016-04-13 22:04             ` Douglas Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.