linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	Christoph Hellwig <hch@infradead.org>,
	linux-block <linux-block@vger.kernel.org>,
	Damien Le Moal <Damien.LeMoal@wdc.com>,
	Keith Busch <kbusch@kernel.org>,
	"linux-scsi @ vger . kernel . org" <linux-scsi@vger.kernel.org>,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	"linux-fsdevel @ vger . kernel . org"
	<linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH v3 06/10] scsi: sd_zbc: emulate ZONE_APPEND commands
Date: Sat, 28 Mar 2020 01:51:06 -0700	[thread overview]
Message-ID: <20200328085106.GA22315@infradead.org> (raw)
In-Reply-To: <20200327165012.34443-7-johannes.thumshirn@wdc.com>

> Since zone reset and finish operations can be issued concurrently with
> writes and zone append requests, ensure a coherent update of the zone
> write pointer offsets by also write locking the target zones for these
> zone management requests.

While they can be issued concurrently you can't expect sane behavior
in that case.  So I'm not sure why we need the zone write lock in this
case.

> +++ b/drivers/scsi/sd.c
> @@ -1215,6 +1215,12 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd)
>  	else
>  		protect = 0;
>  
> +	if (req_op(rq) == REQ_OP_ZONE_APPEND) {
> +		ret = sd_zbc_prepare_zone_append(cmd, &lba, nr_blocks);
> +		if (ret)
> +			return ret;
> +	}

I'd move this up a few lines to keep all the PI related code together.

> +#define SD_ZBC_INVALID_WP_OFST	~(0u)
> +#define SD_ZBC_UPDATING_WP_OFST	(SD_ZBC_INVALID_WP_OFST - 1)

Given that this goes into the seq_zones_wp_ofst shouldn't the block
layer define these values?

> +struct sd_zbc_zone_work {
> +	struct work_struct work;
> +	struct scsi_disk *sdkp;
> +	unsigned int zno;
> +	char buf[SD_BUF_SIZE];
> +};

Wouldn't it make sense to have one work_struct per scsi device and batch
updates?  That is also query a decenent sized buffer with a bunch of
zones and update them all at once?  Also given that the other write
pointer caching code is in the block layer, why is this in SCSI?

> +	spin_lock_bh(&sdkp->zone_wp_ofst_lock);
> +
> +	wp_ofst = rq->q->seq_zones_wp_ofst[zno];
> +
> +	if (wp_ofst == SD_ZBC_UPDATING_WP_OFST) {
> +		/* Write pointer offset update in progress: ask for a requeue */
> +		ret = BLK_STS_RESOURCE;
> +		goto err;
> +	}
> +
> +	if (wp_ofst == SD_ZBC_INVALID_WP_OFST) {
> +		/* Invalid write pointer offset: trigger an update from disk */
> +		ret = sd_zbc_update_wp_ofst(sdkp, zno);
> +		goto err;
> +	}
> +
> +	wp_ofst = sectors_to_logical(sdkp->device, wp_ofst);
> +	if (wp_ofst + nr_blocks > sdkp->zone_blocks) {
> +		ret = BLK_STS_IOERR;
> +		goto err;
> +	}
> +
> +	/* Set the LBA for the write command used to emulate zone append */
> +	*lba += wp_ofst;
> +
> +	spin_unlock_bh(&sdkp->zone_wp_ofst_lock);

This seems like a really good use case for cmpxchg.  But I guess
premature optimization is the root of all evil, so let's keep this in
mind for later.

> +	/*
> +	 * For zone append, the zone was locked in sd_zbc_prepare_zone_append().
> +	 * For zone reset and zone finish, the zone was locked in
> +	 * sd_zbc_setup_zone_mgmt_cmnd().
> +	 * For regular writes, the zone is unlocked by the block layer elevator.
> +	 */
> +	return req_op(rq) == REQ_OP_ZONE_APPEND ||
> +		req_op(rq) == REQ_OP_ZONE_RESET ||
> +		req_op(rq) == REQ_OP_ZONE_FINISH;
> +}
> +
> +static bool sd_zbc_need_zone_wp_update(struct request *rq)
> +{
> +	if (req_op(rq) == REQ_OP_WRITE ||
> +	    req_op(rq) == REQ_OP_WRITE_ZEROES ||
> +	    req_op(rq) == REQ_OP_WRITE_SAME)
> +		return blk_rq_zone_is_seq(rq);
> +
> +	if (req_op(rq) == REQ_OP_ZONE_RESET_ALL)
> +		return true;
> +
> +	return sd_zbc_zone_needs_write_unlock(rq);

To me all this would look cleaner with a switch statement:

static bool sd_zbc_need_zone_wp_update(struct request *rq)

	switch (req_op(rq)) {
	case REQ_OP_ZONE_APPEND:
	case REQ_OP_ZONE_FINISH:
	case REQ_OP_ZONE_RESET:
	case REQ_OP_ZONE_RESET_ALL:
		return true;
	case REQ_OP_WRITE:
	case REQ_OP_WRITE_ZEROES:
	case REQ_OP_WRITE_SAME:
		return blk_rq_zone_is_seq(rq);
	default:
		return false;
	}
}

> +	if (!sd_zbc_need_zone_wp_update(rq))
> +		goto unlock_zone;

Split the wp update into a little helper?

> +void sd_zbc_init_disk(struct scsi_disk *sdkp)
> +{
> +	if (!sd_is_zoned(sdkp))
> +		return;
> +
> +	spin_lock_init(&sdkp->zone_wp_ofst_lock);

Shouldn't this lock also go into the block code where the cached
write pointer lives?

  reply	other threads:[~2020-03-28  8:51 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-27 16:50 [PATCH v3 00/10] Introduce Zone Append for writing to zoned block devices Johannes Thumshirn
2020-03-27 16:50 ` [PATCH v3 01/10] block: provide fallbacks for blk_queue_zone_is_seq and blk_queue_zone_no Johannes Thumshirn
2020-03-27 17:10   ` Christoph Hellwig
2020-03-27 16:50 ` [PATCH v3 02/10] block: Introduce REQ_OP_ZONE_APPEND Johannes Thumshirn
2020-03-27 17:19   ` Christoph Hellwig
2020-03-31 15:23   ` Keith Busch
2020-03-31 15:35     ` Johannes Thumshirn
2020-03-27 16:50 ` [PATCH v3 03/10] block: introduce blk_req_zone_write_trylock Johannes Thumshirn
2020-03-27 17:19   ` Christoph Hellwig
2020-03-27 16:50 ` [PATCH v3 04/10] block: Introduce zone write pointer offset caching Johannes Thumshirn
2020-03-27 17:21   ` Christoph Hellwig
2020-03-27 16:50 ` [PATCH v3 05/10] scsi: sd_zbc: factor out sanity checks for zoned commands Johannes Thumshirn
2020-03-27 17:21   ` Christoph Hellwig
2020-03-27 16:50 ` [PATCH v3 06/10] scsi: sd_zbc: emulate ZONE_APPEND commands Johannes Thumshirn
2020-03-28  8:51   ` Christoph Hellwig [this message]
2020-03-28  9:02     ` Damien Le Moal
2020-03-28  9:07       ` hch
2020-03-28  9:18         ` Damien Le Moal
2020-03-28  9:21           ` hch
2020-03-27 16:50 ` [PATCH v3 07/10] null_blk: Cleanup zoned device initialization Johannes Thumshirn
2020-03-27 17:23   ` Christoph Hellwig
2020-03-27 16:50 ` [PATCH v3 08/10] null_blk: Support REQ_OP_ZONE_APPEND Johannes Thumshirn
2020-03-27 17:26   ` Christoph Hellwig
2020-03-28  8:51     ` Damien Le Moal
2020-03-28 14:17       ` Johannes Thumshirn
2020-03-27 16:50 ` [PATCH v3 09/10] block: export bio_release_pages and bio_iov_iter_get_pages Johannes Thumshirn
2020-03-27 17:07   ` Christoph Hellwig
2020-03-27 17:13     ` Johannes Thumshirn
2020-03-27 17:22       ` hch
2020-03-27 16:50 ` [PATCH v3 10/10] zonefs: use REQ_OP_ZONE_APPEND for sync DIO Johannes Thumshirn
2020-03-27 17:10   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200328085106.GA22315@infradead.org \
    --to=hch@infradead.org \
    --cc=Damien.LeMoal@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=johannes.thumshirn@wdc.com \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).