All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lars Ellenberg <lars.ellenberg-63ez5xqkn6DQT0dZR+AlfA@public.gmane.org>
To: Mike Snitzer <snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org,
	linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA@public.gmane.org,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	shli-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>,
	agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	drbd-dev-cunTk1MwBs8qoQakbn7OcQ@public.gmane.org
Subject: Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
Date: Thu, 23 Mar 2017 23:53:09 +0100	[thread overview]
Message-ID: <20170323225256.GK1138@soda.linbit> (raw)
In-Reply-To: <20170323170221.GA20854-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Thu, Mar 23, 2017 at 01:02:22PM -0400, Mike Snitzer wrote:
> On Thu, Mar 23 2017 at 11:54am -0400,
> Lars Ellenberg <lars.ellenberg-63ez5xqkn6DQT0dZR+AlfA@public.gmane.org> wrote:
> 
> > On Thu, Mar 23, 2017 at 10:33:18AM -0400, Christoph Hellwig wrote:
> > > This series makes REQ_OP_WRITE_ZEROES the only zeroing offload
> > > supported by the block layer, and switches existing implementations
> > > of REQ_OP_DISCARD that correctly set discard_zeroes_data to it,
> > > removes incorrect discard_zeroes_data, and also switches WRITE SAME
> > > based zeroing in SCSI to this new method.
> > > 
> > > I've done testing with ATA, SCSI and NVMe setups, but there are
> > > a few things that will need more attention:
> > > 
> > 
> > >  - The DRBD code in this area was very odd,
> > 
> > DRBD wants all replicas to give back identical data.
> > If what comes back after a discard is "undefined",
> > we cannot really use that.
> > 
> > We used to "stack" discard only if our local backend claimed
> > "discard_zeroes_data". We replicate that IO request to the peer
> > as discard, and if the peer cannot do discards itself, or has
> > discard_zeroes_data == 0, the peer will use zeroout instead.
> > 
> > One use-case for this is the device mapper "thin provisioning".
> > At the time I wrote those "odd" hacks, dm thin targets
> > would set discard_zeroes_data=0, NOT change discard granularity,
> > but only actually discard (drop from the tree) whole "chunks",
> > leaving partial start/end chunks in the mapping tree unchanged.
> > 
> > The logic of "only stack discard, if backend discard_zeroes_data"

That is DRBDs logic I just explained above.
And the "backend" (to DRBD) in that sentence would be thin, and not
the "real" hardware below thin, which may not even support discard.

> > would mean that we would not be able to accept and pass down discards
> > to dm-thin targets. But with data on dm-thin, you would really like
> > to do the occasional fstrim.
> 
> Are you sure you aren't thinking of MD raid?

Yes.

> To this day, dm-thin.c has: ti->discard_zeroes_data_unsupported = true

That is exactly what I was saying.

Thin does not claim to zero data on discard.  which is ok, and correct,
because it only punches holes on full chunks (or whatever you call
them), and leaves the rest in the mapping tree as is.

And that behaviour would prevent DRBD from exposing discards if
configured on top of thin. (see above)

But thin *could* easily guarantee zeroing, by simply punching holes
where it can, and zeroing out the not fully-aligned partial start and
end of the range.

Which is what I added as an option between DRBD and whatever is below,
with the use-case of dm-thin in mind.

And that made it possible for DRBD to
 a) expose "discard" to upper layers, even if we would usually only do
    if the DRBD Primary sits on top of a device that guarantees discard
    zeros data,
 b) still use discards on a secondary, without falling back to zero-out,
    which would unexpectedly fully allocate, instead of trim, a thinly
    provisioned device-mapper target.


Thanks,

    Lars

WARNING: multiple messages have this Message-ID (diff)
From: Lars Ellenberg <lars.ellenberg@linbit.com>
To: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>,
	axboe@kernel.dk, martin.petersen@oracle.com, agk@redhat.com,
	shli@kernel.org, philipp.reisner@linbit.com,
	linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
	drbd-dev@lists.linbit.com, dm-devel@redhat.com,
	linux-raid@vger.kernel.org
Subject: Re: RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload
Date: Thu, 23 Mar 2017 23:53:09 +0100	[thread overview]
Message-ID: <20170323225256.GK1138@soda.linbit> (raw)
In-Reply-To: <20170323170221.GA20854@redhat.com>

On Thu, Mar 23, 2017 at 01:02:22PM -0400, Mike Snitzer wrote:
> On Thu, Mar 23 2017 at 11:54am -0400,
> Lars Ellenberg <lars.ellenberg@linbit.com> wrote:
> 
> > On Thu, Mar 23, 2017 at 10:33:18AM -0400, Christoph Hellwig wrote:
> > > This series makes REQ_OP_WRITE_ZEROES the only zeroing offload
> > > supported by the block layer, and switches existing implementations
> > > of REQ_OP_DISCARD that correctly set discard_zeroes_data to it,
> > > removes incorrect discard_zeroes_data, and also switches WRITE SAME
> > > based zeroing in SCSI to this new method.
> > > 
> > > I've done testing with ATA, SCSI and NVMe setups, but there are
> > > a few things that will need more attention:
> > > 
> > 
> > >  - The DRBD code in this area was very odd,
> > 
> > DRBD wants all replicas to give back identical data.
> > If what comes back after a discard is "undefined",
> > we cannot really use that.
> > 
> > We used to "stack" discard only if our local backend claimed
> > "discard_zeroes_data". We replicate that IO request to the peer
> > as discard, and if the peer cannot do discards itself, or has
> > discard_zeroes_data == 0, the peer will use zeroout instead.
> > 
> > One use-case for this is the device mapper "thin provisioning".
> > At the time I wrote those "odd" hacks, dm thin targets
> > would set discard_zeroes_data=0, NOT change discard granularity,
> > but only actually discard (drop from the tree) whole "chunks",
> > leaving partial start/end chunks in the mapping tree unchanged.
> > 
> > The logic of "only stack discard, if backend discard_zeroes_data"

That is DRBDs logic I just explained above.
And the "backend" (to DRBD) in that sentence would be thin, and not
the "real" hardware below thin, which may not even support discard.

> > would mean that we would not be able to accept and pass down discards
> > to dm-thin targets. But with data on dm-thin, you would really like
> > to do the occasional fstrim.
> 
> Are you sure you aren't thinking of MD raid?

Yes.

> To this day, dm-thin.c has: ti->discard_zeroes_data_unsupported = true

That is exactly what I was saying.

Thin does not claim to zero data on discard.  which is ok, and correct,
because it only punches holes on full chunks (or whatever you call
them), and leaves the rest in the mapping tree as is.

And that behaviour would prevent DRBD from exposing discards if
configured on top of thin. (see above)

But thin *could* easily guarantee zeroing, by simply punching holes
where it can, and zeroing out the not fully-aligned partial start and
end of the range.

Which is what I added as an option between DRBD and whatever is below,
with the use-case of dm-thin in mind.

And that made it possible for DRBD to
 a) expose "discard" to upper layers, even if we would usually only do
    if the DRBD Primary sits on top of a device that guarantees discard
    zeros data,
 b) still use discards on a secondary, without falling back to zero-out,
    which would unexpectedly fully allocate, instead of trim, a thinly
    provisioned device-mapper target.


Thanks,

    Lars

  parent reply	other threads:[~2017-03-23 22:53 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-23 14:33 RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Christoph Hellwig
2017-03-23 14:33 ` [PATCH 01/23] block: renumber REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-03-28 16:12   ` Bart Van Assche
2017-03-28 16:12     ` Bart Van Assche
     [not found]     ` <1490717553.2573.4.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-03-30  8:53       ` hch-jcswGhMUV9g
2017-03-30  8:53         ` hch
2017-03-23 14:33 ` [PATCH 02/23] block: implement splitting of REQ_OP_WRITE_ZEROES bios Christoph Hellwig
2017-03-23 14:33 ` [PATCH 03/23] sd: implement REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-03-28 18:50   ` Bart Van Assche
2017-03-28 18:50     ` Bart Van Assche
     [not found]     ` <1490726988.2573.16.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-03-28 19:33       ` Mike Snitzer
2017-03-28 19:33         ` Mike Snitzer
2017-03-30  2:25       ` Martin K. Petersen
2017-03-30  2:25         ` Martin K. Petersen
2017-03-29 14:51     ` Paolo Bonzini
2017-03-29 16:28       ` Bart Van Assche
2017-03-29 16:28         ` Bart Van Assche
2017-03-29 16:53         ` Paolo Bonzini
2017-03-29 16:53           ` Paolo Bonzini
2017-03-23 14:33 ` [PATCH 04/23] md: support REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-03-23 14:33 ` [PATCH 05/23] dm: " Christoph Hellwig
2017-03-23 14:33 ` [PATCH 06/23] dm-kcopyd: switch to use REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-03-23 14:55   ` Mike Snitzer
2017-03-23 14:56     ` Christoph Hellwig
2017-03-23 15:10       ` Mike Snitzer
2017-03-27  9:12         ` Christoph Hellwig
2017-03-27  9:12           ` Christoph Hellwig
2017-03-23 14:33 ` [PATCH 07/23] block: stop using blkdev_issue_write_same for zeroing Christoph Hellwig
2017-03-23 14:33 ` [PATCH 08/23] block: add a flags argument to (__)blkdev_issue_zeroout Christoph Hellwig
2017-03-23 14:33 ` [PATCH 09/23] block: add a REQ_UNMAP flag for REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-03-23 14:33 ` [PATCH 10/23] block: add a new BLKDEV_ZERO_NOFALLBACK flag Christoph Hellwig
2017-03-23 14:33 ` [PATCH 11/23] block_dev: use blkdev_issue_zerout for hole punches Christoph Hellwig
2017-03-28 16:50   ` Bart Van Assche
2017-03-28 16:50     ` Bart Van Assche
     [not found]     ` <1490719834.2573.9.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-03-30  8:59       ` hch-jcswGhMUV9g
2017-03-30  8:59         ` hch
2017-03-23 14:33 ` [PATCH 12/23] sd: handle REQ_UNMAP Christoph Hellwig
     [not found]   ` <20170323143341.31549-13-hch-jcswGhMUV9g@public.gmane.org>
2017-03-28 16:48     ` Bart Van Assche
2017-03-28 16:48       ` Bart Van Assche
2017-03-29 14:57       ` Paolo Bonzini
     [not found]       ` <1490719722.2573.8.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-03-30  9:02         ` hch-jcswGhMUV9g
2017-03-30  9:02           ` hch
     [not found]           ` <20170330090201.GD12015-jcswGhMUV9g@public.gmane.org>
2017-03-30 15:28             ` Martin K. Petersen
2017-03-30 15:28               ` Martin K. Petersen
2017-03-30 17:30               ` hch
2017-03-30 17:30                 ` hch
     [not found]                 ` <20170330173020.GB24229-jcswGhMUV9g@public.gmane.org>
2017-03-31  2:19                   ` Martin K. Petersen
2017-03-31  2:19                     ` Martin K. Petersen
     [not found]                     ` <yq17f36yypg.fsf-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-03-31  7:18                       ` hch-jcswGhMUV9g
2017-03-31  7:18                         ` hch
2017-03-23 14:33 ` [PATCH 13/23] nvme: implement REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-03-23 14:33 ` [PATCH 14/23] zram: " Christoph Hellwig
2017-03-23 14:33 ` [PATCH 15/23] loop: " Christoph Hellwig
2017-03-23 14:33 ` [PATCH 16/23] brd: remove discard support Christoph Hellwig
2017-03-23 14:33 ` [PATCH 17/23] rbd: remove the discard_zeroes_data flag Christoph Hellwig
2017-03-23 14:33 ` [PATCH 18/23] rsxx: " Christoph Hellwig
2017-03-23 14:33 ` [PATCH 19/23] mmc: " Christoph Hellwig
2017-03-23 14:33 ` [PATCH 20/23] block: stop using discards for zeroing Christoph Hellwig
2017-03-23 14:33 ` [PATCH 21/23] drbd: make intelligent use of blkdev_issue_zeroout Christoph Hellwig
2017-03-23 14:33 ` [PATCH 22/23] drbd: implement REQ_OP_WRITE_ZEROES Christoph Hellwig
     [not found]   ` <20170323143341.31549-23-hch-jcswGhMUV9g@public.gmane.org>
2017-03-30 10:06     ` Lars Ellenberg
2017-03-30 10:06       ` Lars Ellenberg
2017-03-30 11:44       ` Christoph Hellwig
2017-03-30 12:50         ` [Drbd-dev] " Lars Ellenberg
     [not found]         ` <20170330114408.GA15777-jcswGhMUV9g@public.gmane.org>
2017-03-30 13:49           ` Mike Snitzer
2017-03-30 13:49             ` Mike Snitzer
     [not found]             ` <20170330134957.GA508-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-03-30 15:20               ` Martin K. Petersen
2017-03-30 15:20                 ` Martin K. Petersen
2017-03-30 23:15                 ` Mike Snitzer
     [not found]                   ` <20170330231550.GA3102-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-03-31  2:34                     ` Martin K. Petersen
2017-03-31  2:34                       ` Martin K. Petersen
2017-03-31  7:17                   ` Christoph Hellwig
2017-03-23 14:33 ` [PATCH 23/23] block: remove the discard_zeroes_data flag Christoph Hellwig
2017-03-28 17:00   ` Bart Van Assche
2017-03-28 17:00     ` Bart Van Assche
2017-03-29 14:52     ` Paolo Bonzini
2017-03-30  9:06     ` hch
     [not found]       ` <20170330090655.GF12015-jcswGhMUV9g@public.gmane.org>
2017-03-30 15:29         ` Martin K. Petersen
2017-03-30 15:29           ` Martin K. Petersen
2017-03-30 17:29           ` hch
     [not found] ` <20170323143341.31549-1-hch-jcswGhMUV9g@public.gmane.org>
2017-03-23 15:54   ` RFC: always use REQ_OP_WRITE_ZEROES for zeroing offload Lars Ellenberg
2017-03-23 15:54     ` Lars Ellenberg
2017-03-23 17:02     ` Mike Snitzer
2017-03-23 17:02       ` Mike Snitzer
     [not found]       ` <20170323170221.GA20854-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-03-23 22:53         ` Lars Ellenberg [this message]
2017-03-23 22:53           ` Lars Ellenberg
2017-03-29 14:57           ` Paolo Bonzini
     [not found]     ` <20170323155410.GD1138-w1SgEEioFePxa46PmUWvFg@public.gmane.org>
2017-03-27  9:10       ` Christoph Hellwig
2017-03-27  9:10         ` Christoph Hellwig
     [not found]         ` <20170327091056.GB6879-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2017-03-27 14:03           ` Mike Snitzer
2017-03-27 14:03             ` Mike Snitzer
     [not found]             ` <20170327140307.GA13020-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-03-27 14:57               ` Christoph Hellwig
2017-03-27 14:57                 ` Christoph Hellwig
2017-03-27 15:08             ` [Drbd-dev] " Bart Van Assche
2017-03-27 15:08               ` Bart Van Assche
2017-03-30  9:04       ` Christoph Hellwig
2017-03-30  9:04         ` Christoph Hellwig
2017-03-30 15:12 ` Mike Snitzer
2017-03-30 15:22   ` Martin K. Petersen
     [not found]     ` <yq1lgrm3i36.fsf-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-03-30 15:38       ` Mike Snitzer
2017-03-30 15:38         ` Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170323225256.GK1138@soda.linbit \
    --to=lars.ellenberg-63ez5xqkn6dqt0dzr+alfa@public.gmane.org \
    --cc=agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org \
    --cc=dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=drbd-dev-cunTk1MwBs8qoQakbn7OcQ@public.gmane.org \
    --cc=hch-jcswGhMUV9g@public.gmane.org \
    --cc=linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=philipp.reisner-63ez5xqkn6DQT0dZR+AlfA@public.gmane.org \
    --cc=shli-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.