From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B4DFC433E1 for ; Sat, 16 May 2020 03:55:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 63300206F4 for ; Sat, 16 May 2020 03:55:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728010AbgEPDzV (ORCPT ); Fri, 15 May 2020 23:55:21 -0400 Received: from mx2.suse.de ([195.135.220.15]:40808 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726247AbgEPDzT (ORCPT ); Fri, 15 May 2020 23:55:19 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 22410AE33; Sat, 16 May 2020 03:55:20 +0000 (UTC) From: Coly Li To: linux-block@vger.kernel.org, damien.lemoal@wdc.com, hare@suse.com, hch@lst.de, axboe@kernel.dk Cc: linux-bcache@vger.kernel.org, kbusch@kernel.org, Coly Li , Ajay Joshi , Chaitanya Kulkarni , Hannes Reinecke , Johannes Thumshirn Subject: [RFC PATCH v2 4/4] block: set bi_size to REQ_OP_ZONE_RESET bio Date: Sat, 16 May 2020 11:54:34 +0800 Message-Id: <20200516035434.82809-5-colyli@suse.de> X-Mailer: git-send-email 2.25.0 In-Reply-To: <20200516035434.82809-1-colyli@suse.de> References: <20200516035434.82809-1-colyli@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org Now for zoned device, zone management ioctl commands are converted into zone management bios and handled by blkdev_zone_mgmt(). There are 4 zone management bios are handled, their op code is, - REQ_OP_ZONE_RESET Reset the zone's writer pointer and empty all previously stored data. - REQ_OP_ZONE_OPEN Open the zones in the specified sector range, no influence on data. - REQ_OP_ZONE_CLOSE Close the zones in the specified sector range, no influence on data. - REQ_OP_ZONE_FINISH Mark the zone as full, no influence on data. All the zone management bios has 0 byte size, a.k.a their bi_size is 0. Exept for REQ_OP_ZONE_RESET request, zero length bio works fine for other zone management bio, before the zoned device e.g. host managed SMR hard drive can be created as a bcache device. When a bcache device (virtual block device to forward bios like md raid drivers) can be created on top of the zoned device, and a fast SSD is attached as a cache device, bcache driver may cache the frequent random READ requests on fast SSD to accelerate hot data READ performance. When bcache driver receives a zone management bio for REQ_OP_ZONE_RESET op, while forwarding the request to underlying zoned device e.g. host managed SMR hard drive, it should also invalidate all cached data from SSD for the resetting zone. Otherwise bcache will continue provide the outdated cached data to READ request and cause potential data storage inconsistency and corruption. In order to invalidate outdated data from SSD for the reset zone, bcache needs to know not only the start LBA but also the range length of the resetting zone. Otherwise, bcache won't be able to accurately invalidate the outdated cached data. Is it possible to simply set the bi_size inside bcache driver? The answer is NO. Although every REQ_OP_ZONE_RESET bio has exact length as zone size or q->limits.chunk_sectors, it is possible that some other layer stacking block driver (in the future) exists between bcache driver and blkdev_zone_mgmt() where the zone management bio is made. The best location to set bi_size is where the zone management bio is composed in blkdev_zone_mgmt(), then no matter how this bio is split before bcache driver receives it, bcache driver can always correctly invalidate the resetting range. This patch sets the bi_size of REQ_OP_ZONE_RESET bio for each resetting zone. Here REQ_OP_ZONE_RESET_ALL is special whose bi_size should be set as capacity of whole drive size, then bcache can invalidate all cached data from SSD for the zoned backing device. With this change, now bcache code can handle REQ_OP_ZONE_RESET bio in the way very similar to REQ_OP_DISCARD bio with very little change. Signed-off-by: Coly Li Cc: Ajay Joshi Cc: Chaitanya Kulkarni Cc: Christoph Hellwig Cc: Damien Le Moal Cc: Hannes Reinecke Cc: Johannes Thumshirn Cc: Keith Busch --- Changelog: v2: fix typo for REQ_OP_ZONE_RESET_ALL. v1: initial version. block/blk-zoned.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/block/blk-zoned.c b/block/blk-zoned.c index 1e0708c68267..01d91314399b 100644 --- a/block/blk-zoned.c +++ b/block/blk-zoned.c @@ -227,11 +227,15 @@ int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op, if (op == REQ_OP_ZONE_RESET && blkdev_allow_reset_all_zones(bdev, sector, nr_sectors)) { bio->bi_opf = REQ_OP_ZONE_RESET_ALL; + bio->bi_iter.bi_sector = sector; + bio->bi_iter.bi_size = nr_sectors; break; } bio->bi_opf = op | REQ_SYNC; bio->bi_iter.bi_sector = sector; + if (op == REQ_OP_ZONE_RESET) + bio->bi_iter.bi_size = zone_sectors; sector += zone_sectors; /* This may take a while, so be nice to others */ -- 2.25.0