All of lore.kernel.org
 help / color / mirror / Atom feed
From: Damien Le Moal <Damien.LeMoal@wdc.com>
To: "hch@infradead.org" <hch@infradead.org>
Cc: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	Jens Axboe <axboe@kernel.dk>,
	linux-block <linux-block@vger.kernel.org>,
	Keith Busch <kbusch@kernel.org>,
	"linux-scsi @ vger . kernel . org" <linux-scsi@vger.kernel.org>,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	"linux-fsdevel @ vger . kernel . org"
	<linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH v3 06/10] scsi: sd_zbc: emulate ZONE_APPEND commands
Date: Sat, 28 Mar 2020 09:18:20 +0000	[thread overview]
Message-ID: <CO2PR04MB23430A87641EDD359E5101FEE7CD0@CO2PR04MB2343.namprd04.prod.outlook.com> (raw)
In-Reply-To: 20200328090715.GA26719@infradead.org

On 2020/03/28 18:07, hch@infradead.org wrote:
> On Sat, Mar 28, 2020 at 09:02:43AM +0000, Damien Le Moal wrote:
>> On 2020/03/28 17:51, Christoph Hellwig wrote:
>>>> Since zone reset and finish operations can be issued concurrently with
>>>> writes and zone append requests, ensure a coherent update of the zone
>>>> write pointer offsets by also write locking the target zones for these
>>>> zone management requests.
>>>
>>> While they can be issued concurrently you can't expect sane behavior
>>> in that case.  So I'm not sure why we need the zone write lock in this
>>> case.
>>
>> The behavior will certainly not be sane for the buggy application doing writes
>> and resets to the same zone concurrently (I have debugged that several time in
>> the field). So I am not worried about that at all. The zone write lock here is
>> still used to make sure the wp cache stays in sync with the drive. Without it,
>> we could have races on completion update of the wp and get out of sync.
> 
> How do the applications expect to get sane results from that in general?

They do not get sane results :) That's application bugs. I do not care about
those. What I do care is that the wp cache stays in sync with the drive so that
it itself does not become the cause of errors.

Rethinking about it though, the error processing code doing a zone report and wp
cache refresh will trigger for any write error, even those resulting from dumb
application bugs. So protection or not, since the wp cache refresh will be done,
we could simply no do zone write locking for reset and finish since these are
really expected to be done without any in-flight write.

> But if you think protecting against that is worth the effort I think
> there should be a separate patch to take the zone write lock for
> reset/finish.

OK. That would be easy to add. But from the point above, I am now trying to
convince myself that this is not necessary.

> 
>>>> +#define SD_ZBC_INVALID_WP_OFST	~(0u)
>>>> +#define SD_ZBC_UPDATING_WP_OFST	(SD_ZBC_INVALID_WP_OFST - 1)
>>>
>>> Given that this goes into the seq_zones_wp_ofst shouldn't the block
>>> layer define these values?
>>
>> We could, at least the first one. The second one is really something that could
>> be considered completely driver dependent since other drivers doing this
>> emulation may handle the updating state differently.
>>
>> Since this is the only driver where this is needed, may be we can keep this here
>> for now ?
> 
> Well, I'd rather keep magic values for a field defined in common code
> in the common code.  Having behavior details spread over different
> modules makes code rather hard to follow.
> 
>>>> +struct sd_zbc_zone_work {
>>>> +	struct work_struct work;
>>>> +	struct scsi_disk *sdkp;
>>>> +	unsigned int zno;
>>>> +	char buf[SD_BUF_SIZE];
>>>> +};
>>>
>>> Wouldn't it make sense to have one work_struct per scsi device and batch
>>> updates?  That is also query a decenent sized buffer with a bunch of
>>> zones and update them all at once?  Also given that the other write
>>> pointer caching code is in the block layer, why is this in SCSI?
>>
>> Again, because we thought this is driver dependent in the sense that other
>> drivers may want to handle invalid WP entries differently.
> 
> What sensible other strategy exists?  Nevermind that I hope we never
> see another driver.  And as above - I really want to keep behavior
> togetether instead of wiredly split over different code bases.  My
> preference would still be to have it just in sd, but you gave some good
> arguments for keeping it in the block layer.  Maybe we need to take a
> deeper look and figure out a way to keep it isolated in SCSI.

OK. We can try again to see if we can keep all this WP caching in sd. The only
pain point is the revalidation as I explained before. Everything else would stay
pretty much the same and all be scsi specific. I will dig again to see what can
be done.

> 
>> Also, I think that
>> one work struct per device may be an overkill. This is for error recovery and on
>> a normal healthy systems, write errors are rare.
> 
> I think it is less overkill than the dynamic allocation scheme with
> the mempool and slab cache, that is why I suggested it.

Ah. OK. Good point.

> 


-- 
Damien Le Moal
Western Digital Research

  reply	other threads:[~2020-03-28  9:18 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-27 16:50 [PATCH v3 00/10] Introduce Zone Append for writing to zoned block devices Johannes Thumshirn
2020-03-27 16:50 ` [PATCH v3 01/10] block: provide fallbacks for blk_queue_zone_is_seq and blk_queue_zone_no Johannes Thumshirn
2020-03-27 17:10   ` Christoph Hellwig
2020-03-27 16:50 ` [PATCH v3 02/10] block: Introduce REQ_OP_ZONE_APPEND Johannes Thumshirn
2020-03-27 17:19   ` Christoph Hellwig
2020-03-31 15:23   ` Keith Busch
2020-03-31 15:35     ` Johannes Thumshirn
2020-03-27 16:50 ` [PATCH v3 03/10] block: introduce blk_req_zone_write_trylock Johannes Thumshirn
2020-03-27 17:19   ` Christoph Hellwig
2020-03-27 16:50 ` [PATCH v3 04/10] block: Introduce zone write pointer offset caching Johannes Thumshirn
2020-03-27 17:21   ` Christoph Hellwig
2020-03-27 16:50 ` [PATCH v3 05/10] scsi: sd_zbc: factor out sanity checks for zoned commands Johannes Thumshirn
2020-03-27 17:21   ` Christoph Hellwig
2020-03-27 16:50 ` [PATCH v3 06/10] scsi: sd_zbc: emulate ZONE_APPEND commands Johannes Thumshirn
2020-03-28  8:51   ` Christoph Hellwig
2020-03-28  9:02     ` Damien Le Moal
2020-03-28  9:07       ` hch
2020-03-28  9:18         ` Damien Le Moal [this message]
2020-03-28  9:21           ` hch
2020-03-27 16:50 ` [PATCH v3 07/10] null_blk: Cleanup zoned device initialization Johannes Thumshirn
2020-03-27 17:23   ` Christoph Hellwig
2020-03-27 16:50 ` [PATCH v3 08/10] null_blk: Support REQ_OP_ZONE_APPEND Johannes Thumshirn
2020-03-27 17:26   ` Christoph Hellwig
2020-03-28  8:51     ` Damien Le Moal
2020-03-28 14:17       ` Johannes Thumshirn
2020-03-27 16:50 ` [PATCH v3 09/10] block: export bio_release_pages and bio_iov_iter_get_pages Johannes Thumshirn
2020-03-27 17:07   ` Christoph Hellwig
2020-03-27 17:13     ` Johannes Thumshirn
2020-03-27 17:22       ` hch
2020-03-27 16:50 ` [PATCH v3 10/10] zonefs: use REQ_OP_ZONE_APPEND for sync DIO Johannes Thumshirn
2020-03-27 17:10   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CO2PR04MB23430A87641EDD359E5101FEE7CD0@CO2PR04MB2343.namprd04.prod.outlook.com \
    --to=damien.lemoal@wdc.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=hch@infradead.org \
    --cc=kbusch@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.