All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Wheeler <drbd-dev@lists.ewheeler.net>
To: Christoph Hellwig <hch@lst.de>
Cc: axboe@kernel.dk, martin.petersen@oracle.com, agk@redhat.com,
	snitzer@redhat.com, shli@kernel.org, philipp.reisner@linbit.com,
	lars.ellenberg@linbit.com, linux-block@vger.kernel.org,
	linux-raid@vger.kernel.org, dm-devel@redhat.com,
	linux-scsi@vger.kernel.org, drbd-dev@lists.linbit.com
Subject: Re: [Drbd-dev] [PATCH 23/27] drbd: make intelligent use of blkdev_issue_zeroout
Date: Sat, 13 Jan 2018 00:46:40 +0000 (UTC)	[thread overview]
Message-ID: <alpine.LRH.2.11.1801130035010.13147@mail.ewheeler.net> (raw)
In-Reply-To: <20170405172125.22600-24-hch@lst.de>

Hello All,

We just noticed that discards to DRBD devices backed by dm-thin devices 
are fully allocating the thin blocks.

This behavior does not exist before 
ee472d83 block: add a flags argument to (__)blkdev_issue_zeroout

The problem exists somewhere between
[working] c20cfc27 block: stop using blkdev_issue_write_same for zeroing
  and
[broken]  45c21793 drbd: implement REQ_OP_WRITE_ZEROES

Note that c20cfc27 works as expected, but 45c21793 discards blocks 
being zeroed on the dm-thin backing device. All commits between those two 
produce the following error:

blkdiscard: /dev/drbd/by-res/test: BLKDISCARD ioctl failed: Input/output error

Also note that issuing a blkdiscard to the backing device directly 
discards as you would expect. This is just a problem when sending discards 
through DRBD.

Is there an easy way to solve this in the short term, even if the ultimate 
fix is more involved?

Thank you for your help!

-Eric

--
Eric Wheeler

On Wed, 5 Apr 2017, Christoph Hellwig wrote:

> drbd always wants its discard wire operations to zero the blocks, so
> use blkdev_issue_zeroout with the BLKDEV_ZERO_UNMAP flag instead of
> reinventing it poorly.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Hannes Reinecke <hare@suse.com>
> ---
>  drivers/block/drbd/drbd_debugfs.c  |   3 --
>  drivers/block/drbd/drbd_int.h      |   6 ---
>  drivers/block/drbd/drbd_receiver.c | 102 ++-----------------------------------
>  drivers/block/drbd/drbd_req.c      |   6 +--
>  4 files changed, 7 insertions(+), 110 deletions(-)
> 
> diff --git a/drivers/block/drbd/drbd_debugfs.c b/drivers/block/drbd/drbd_debugfs.c
> index de5c3ee8a790..494837e59f23 100644
> --- a/drivers/block/drbd/drbd_debugfs.c
> +++ b/drivers/block/drbd/drbd_debugfs.c
> @@ -236,9 +236,6 @@ static void seq_print_peer_request_flags(struct seq_file *m, struct drbd_peer_re
>  	seq_print_rq_state_bit(m, f & EE_CALL_AL_COMPLETE_IO, &sep, "in-AL");
>  	seq_print_rq_state_bit(m, f & EE_SEND_WRITE_ACK, &sep, "C");
>  	seq_print_rq_state_bit(m, f & EE_MAY_SET_IN_SYNC, &sep, "set-in-sync");
> -
> -	if (f & EE_IS_TRIM)
> -		__seq_print_rq_state_bit(m, f & EE_IS_TRIM_USE_ZEROOUT, &sep, "zero-out", "trim");
>  	seq_print_rq_state_bit(m, f & EE_WRITE_SAME, &sep, "write-same");
>  	seq_putc(m, '\n');
>  }
> diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
> index 724d1c50fc52..d5da45bb03a6 100644
> --- a/drivers/block/drbd/drbd_int.h
> +++ b/drivers/block/drbd/drbd_int.h
> @@ -437,9 +437,6 @@ enum {
>  
>  	/* is this a TRIM aka REQ_DISCARD? */
>  	__EE_IS_TRIM,
> -	/* our lower level cannot handle trim,
> -	 * and we want to fall back to zeroout instead */
> -	__EE_IS_TRIM_USE_ZEROOUT,
>  
>  	/* In case a barrier failed,
>  	 * we need to resubmit without the barrier flag. */
> @@ -482,7 +479,6 @@ enum {
>  #define EE_CALL_AL_COMPLETE_IO (1<<__EE_CALL_AL_COMPLETE_IO)
>  #define EE_MAY_SET_IN_SYNC     (1<<__EE_MAY_SET_IN_SYNC)
>  #define EE_IS_TRIM             (1<<__EE_IS_TRIM)
> -#define EE_IS_TRIM_USE_ZEROOUT (1<<__EE_IS_TRIM_USE_ZEROOUT)
>  #define EE_RESUBMITTED         (1<<__EE_RESUBMITTED)
>  #define EE_WAS_ERROR           (1<<__EE_WAS_ERROR)
>  #define EE_HAS_DIGEST          (1<<__EE_HAS_DIGEST)
> @@ -1561,8 +1557,6 @@ extern void start_resync_timer_fn(unsigned long data);
>  extern void drbd_endio_write_sec_final(struct drbd_peer_request *peer_req);
>  
>  /* drbd_receiver.c */
> -extern int drbd_issue_discard_or_zero_out(struct drbd_device *device,
> -		sector_t start, unsigned int nr_sectors, bool discard);
>  extern int drbd_receiver(struct drbd_thread *thi);
>  extern int drbd_ack_receiver(struct drbd_thread *thi);
>  extern void drbd_send_ping_wf(struct work_struct *ws);
> diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
> index dc9a6dcd431c..bc1d296581f9 100644
> --- a/drivers/block/drbd/drbd_receiver.c
> +++ b/drivers/block/drbd/drbd_receiver.c
> @@ -1448,108 +1448,14 @@ void drbd_bump_write_ordering(struct drbd_resource *resource, struct drbd_backin
>  		drbd_info(resource, "Method to ensure write ordering: %s\n", write_ordering_str[resource->write_ordering]);
>  }
>  
> -/*
> - * We *may* ignore the discard-zeroes-data setting, if so configured.
> - *
> - * Assumption is that it "discard_zeroes_data=0" is only because the backend
> - * may ignore partial unaligned discards.
> - *
> - * LVM/DM thin as of at least
> - *   LVM version:     2.02.115(2)-RHEL7 (2015-01-28)
> - *   Library version: 1.02.93-RHEL7 (2015-01-28)
> - *   Driver version:  4.29.0
> - * still behaves this way.
> - *
> - * For unaligned (wrt. alignment and granularity) or too small discards,
> - * we zero-out the initial (and/or) trailing unaligned partial chunks,
> - * but discard all the aligned full chunks.
> - *
> - * At least for LVM/DM thin, the result is effectively "discard_zeroes_data=1".
> - */
> -int drbd_issue_discard_or_zero_out(struct drbd_device *device, sector_t start, unsigned int nr_sectors, bool discard)
> -{
> -	struct block_device *bdev = device->ldev->backing_bdev;
> -	struct request_queue *q = bdev_get_queue(bdev);
> -	sector_t tmp, nr;
> -	unsigned int max_discard_sectors, granularity;
> -	int alignment;
> -	int err = 0;
> -
> -	if (!discard)
> -		goto zero_out;
> -
> -	/* Zero-sector (unknown) and one-sector granularities are the same.  */
> -	granularity = max(q->limits.discard_granularity >> 9, 1U);
> -	alignment = (bdev_discard_alignment(bdev) >> 9) % granularity;
> -
> -	max_discard_sectors = min(q->limits.max_discard_sectors, (1U << 22));
> -	max_discard_sectors -= max_discard_sectors % granularity;
> -	if (unlikely(!max_discard_sectors))
> -		goto zero_out;
> -
> -	if (nr_sectors < granularity)
> -		goto zero_out;
> -
> -	tmp = start;
> -	if (sector_div(tmp, granularity) != alignment) {
> -		if (nr_sectors < 2*granularity)
> -			goto zero_out;
> -		/* start + gran - (start + gran - align) % gran */
> -		tmp = start + granularity - alignment;
> -		tmp = start + granularity - sector_div(tmp, granularity);
> -
> -		nr = tmp - start;
> -		err |= blkdev_issue_zeroout(bdev, start, nr, GFP_NOIO,
> -				BLKDEV_ZERO_NOUNMAP);
> -		nr_sectors -= nr;
> -		start = tmp;
> -	}
> -	while (nr_sectors >= granularity) {
> -		nr = min_t(sector_t, nr_sectors, max_discard_sectors);
> -		err |= blkdev_issue_discard(bdev, start, nr, GFP_NOIO,
> -				BLKDEV_ZERO_NOUNMAP);
> -		nr_sectors -= nr;
> -		start += nr;
> -	}
> - zero_out:
> -	if (nr_sectors) {
> -		err |= blkdev_issue_zeroout(bdev, start, nr_sectors, GFP_NOIO,
> -				BLKDEV_ZERO_NOUNMAP);
> -	}
> -	return err != 0;
> -}
> -
> -static bool can_do_reliable_discards(struct drbd_device *device)
> -{
> -	struct request_queue *q = bdev_get_queue(device->ldev->backing_bdev);
> -	struct disk_conf *dc;
> -	bool can_do;
> -
> -	if (!blk_queue_discard(q))
> -		return false;
> -
> -	if (q->limits.discard_zeroes_data)
> -		return true;
> -
> -	rcu_read_lock();
> -	dc = rcu_dereference(device->ldev->disk_conf);
> -	can_do = dc->discard_zeroes_if_aligned;
> -	rcu_read_unlock();
> -	return can_do;
> -}
> -
>  static void drbd_issue_peer_discard(struct drbd_device *device, struct drbd_peer_request *peer_req)
>  {
> -	/* If the backend cannot discard, or does not guarantee
> -	 * read-back zeroes in discarded ranges, we fall back to
> -	 * zero-out.  Unless configuration specifically requested
> -	 * otherwise. */
> -	if (!can_do_reliable_discards(device))
> -		peer_req->flags |= EE_IS_TRIM_USE_ZEROOUT;
> +	struct block_device *bdev = device->ldev->backing_bdev;
>  
> -	if (drbd_issue_discard_or_zero_out(device, peer_req->i.sector,
> -	    peer_req->i.size >> 9, !(peer_req->flags & EE_IS_TRIM_USE_ZEROOUT)))
> +	if (blkdev_issue_zeroout(bdev, peer_req->i.sector, peer_req->i.size >> 9,
> +			GFP_NOIO, 0))
>  		peer_req->flags |= EE_WAS_ERROR;
> +
>  	drbd_endio_write_sec_final(peer_req);
>  }
>  
> diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
> index 652114ae1a8a..6da9ea8c48b6 100644
> --- a/drivers/block/drbd/drbd_req.c
> +++ b/drivers/block/drbd/drbd_req.c
> @@ -1148,10 +1148,10 @@ static int drbd_process_write_request(struct drbd_request *req)
>  
>  static void drbd_process_discard_req(struct drbd_request *req)
>  {
> -	int err = drbd_issue_discard_or_zero_out(req->device,
> -				req->i.sector, req->i.size >> 9, true);
> +	struct block_device *bdev = req->device->ldev->backing_bdev;
>  
> -	if (err)
> +	if (blkdev_issue_zeroout(bdev, req->i.sector, req->i.size >> 9,
> +			GFP_NOIO, 0))
>  		req->private_bio->bi_error = -EIO;
>  	bio_endio(req->private_bio);
>  }
> -- 
> 2.11.0
> 
> _______________________________________________
> drbd-dev mailing list
> drbd-dev@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-dev
> 

  reply	other threads:[~2018-01-13  0:46 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-05 17:20 always use REQ_OP_WRITE_ZEROES for zeroing offload V2 Christoph Hellwig
2017-04-05 17:20 ` Christoph Hellwig
2017-04-05 17:20 ` [PATCH 01/27] sd: split sd_setup_discard_cmnd Christoph Hellwig
2017-04-05 17:20   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 02/27] block: renumber REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 03/27] block: implement splitting of REQ_OP_WRITE_ZEROES bios Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 04/27] sd: implement REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 05/27] md: support REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 06/27] dm io: discards don't take a payload Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 07/27] dm: support REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 08/27] dm kcopyd: switch to use REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 09/27] block: stop using blkdev_issue_write_same for zeroing Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 10/27] block: add a flags argument to (__)blkdev_issue_zeroout Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 11/27] block: add a REQ_NOUNMAP flag for REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 12/27] block: add a new BLKDEV_ZERO_NOFALLBACK flag Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 13/27] block_dev: use blkdev_issue_zerout for hole punches Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 14/27] sd: implement unmapping Write Zeroes Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 15/27] nvme: implement REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 16/27] zram: " Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 17/27] loop: " Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 18/27] brd: remove discard support Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 19/27] rbd: remove the discard_zeroes_data flag Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 20/27] rsxx: " Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 21/27] mmc: " Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 22/27] block: stop using discards for zeroing Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 23/27] drbd: make intelligent use of blkdev_issue_zeroout Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2018-01-13  0:46   ` Eric Wheeler [this message]
2018-01-15 12:46     ` [Drbd-dev] " Lars Ellenberg
2018-01-15 12:46       ` Lars Ellenberg
     [not found]       ` <20180115124635.GA4107-w1SgEEioFePxa46PmUWvFg@public.gmane.org>
2018-01-15 15:07         ` Mike Snitzer
2018-01-15 15:07           ` Mike Snitzer
2018-01-16  8:55           ` [Drbd-dev] " Lars Ellenberg
2017-04-05 17:21 ` [PATCH 24/27] drbd: implement REQ_OP_WRITE_ZEROES Christoph Hellwig
2017-04-05 17:21   ` Christoph Hellwig
2017-04-05 17:21 ` [PATCH 25/27] block: remove the discard_zeroes_data flag Christoph Hellwig
2017-05-01 20:45   ` Bart Van Assche
2017-05-01 20:45     ` Bart Van Assche
     [not found]     ` <1493671519.2665.15.camel-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2017-05-02  6:43       ` Nicholas A. Bellinger
2017-05-02  6:43         ` Nicholas A. Bellinger
     [not found]         ` <1493707425.23202.77.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2017-05-02  7:16           ` Nicholas A. Bellinger
2017-05-02  7:16             ` Nicholas A. Bellinger
     [not found]             ` <1493709373.23202.79.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2017-05-02  7:23               ` hch-jcswGhMUV9g
2017-05-02  7:23                 ` hch
2017-05-03  3:33                 ` Nicholas A. Bellinger
2017-05-03  3:33                   ` Nicholas A. Bellinger
2017-05-03 14:33                   ` Mike Snitzer
2017-05-05  3:10                     ` Nicholas A. Bellinger
2017-05-05  3:10                       ` Nicholas A. Bellinger
     [not found]                   ` <1493782395.23202.84.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2017-05-07  9:22                     ` hch-jcswGhMUV9g
2017-05-07  9:22                       ` hch
     [not found]                       ` <20170507092209.GA27370-jcswGhMUV9g@public.gmane.org>
2017-05-09  6:46                         ` Nicholas A. Bellinger
2017-05-09  6:46                           ` Nicholas A. Bellinger
2017-05-10 14:06                           ` hch
     [not found]                             ` <20170510140627.GA23759-jcswGhMUV9g@public.gmane.org>
2017-05-11  4:50                               ` Nicholas A. Bellinger
2017-05-11  4:50                                 ` Nicholas A. Bellinger
     [not found]                                 ` <1494478235.16894.115.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2017-05-11  6:26                                   ` hch-jcswGhMUV9g
2017-05-11  6:26                                     ` hch
     [not found]                                     ` <20170511062630.GA18517-jcswGhMUV9g@public.gmane.org>
2017-05-11  6:36                                       ` Nicholas A. Bellinger
2017-05-11  6:36                                         ` Nicholas A. Bellinger
2017-04-05 17:21 ` [PATCH 26/27] scsi: sd: Separate zeroout and discard command choices Christoph Hellwig
2017-04-06  6:17   ` Hannes Reinecke
2017-04-06  6:17     ` Hannes Reinecke
2017-04-19 14:56   ` Paolo Bonzini
     [not found]     ` <58c3d6a6-924e-cc86-1907-a9fd02a39c0e-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-04-20  1:34       ` Martin K. Petersen
2017-04-20  1:34         ` Martin K. Petersen
2017-04-05 17:21 ` [PATCH 27/27] scsi: sd: Remove LBPRZ dependency for discards Christoph Hellwig
2017-04-06  6:18   ` Hannes Reinecke
2017-04-06  6:18     ` Hannes Reinecke
2017-04-08 17:26 ` always use REQ_OP_WRITE_ZEROES for zeroing offload V2 Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LRH.2.11.1801130035010.13147@mail.ewheeler.net \
    --to=drbd-dev@lists.ewheeler.net \
    --cc=agk@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=dm-devel@redhat.com \
    --cc=drbd-dev@lists.linbit.com \
    --cc=hch@lst.de \
    --cc=lars.ellenberg@linbit.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=philipp.reisner@linbit.com \
    --cc=shli@kernel.org \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.