From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757067AbcHVX5b (ORCPT ); Mon, 22 Aug 2016 19:57:31 -0400 Received: from esa2.hgst.iphmx.com ([68.232.143.124]:58301 "EHLO esa2.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752999AbcHVX51 (ORCPT ); Mon, 22 Aug 2016 19:57:27 -0400 X-IronPort-AV: E=Sophos;i="5.28,562,1464624000"; d="scan'208";a="11971559" Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Damien.LeMoal@hgst.com; Subject: Re: [PATCH v2 2/4] On Discard either do Reset WP or Write Same To: Shaun Tancheff , , , References: <20160822043116.21168-1-shaun@tancheff.com> <20160822043116.21168-3-shaun@tancheff.com> CC: Jens Axboe , Christoph Hellwig , "James E . J . Bottomley" , "Martin K . Petersen" , Hannes Reinecke , Josh Bingaman , Dan Williams , Sagi Grimberg , Mike Christie , Toshi Kani , Ming Lei , Shaun Tancheff From: Damien Le Moal Message-ID: <53c2949f-f8b9-463f-2adf-faf4603429bb@hgst.com> Date: Tue, 23 Aug 2016 08:57:15 +0900 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160822043116.21168-3-shaun@tancheff.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [199.255.47.8] X-ClientProxiedBy: BY1PR0201CA0025.namprd02.prod.outlook.com (10.160.191.163) To BY2PR04MB1974.namprd04.prod.outlook.com (10.166.111.14) X-MS-Office365-Filtering-Correlation-Id: 76186c02-eb6f-426c-4030-08d3cae80e65 X-Microsoft-Exchange-Diagnostics: 1;BY2PR04MB1974;2:osjqvjfQR/yJzRpQNMe8WhCJoPSZB3xPzvnkT9T39fPvpdBCAn0RG6PPtcAzEtmX20cJNNIkxEuNNaZD2V9HcNtjTpq4DctQgiZCOVW95Tfhc5kxx9R9YVkE2Yh6laJkXh275zkpt8TUq54UYlTUdgDH80tQb59ZT82Ewx9Fq9eoBkxWbPuyZ0vToRFWXg0Z;3:by7PP4TbP0Lf3qiSeNLoPWS4GlGnrKhrWlX9NEBkZ7KhOsBGYra4MnF4o78/7yzCdQ/cDbTu9t5kLS9jQrQHXBiArcRKNcV/98rd6M5qFeFSx1hH+2veVY2tvlY36zQy;25:JUT8WyIEwKOqHuAx5U/O3wEjLsliMNEdwQm3nDswQmym9UXPnTgciBJhLAviKDVKA7tZdxhqZKtpRijWS1rIh6LItMeowdBz4tA4oR88H95re9NJvJJPw/l14/mPIDboFJ3CYgrmyxwXd1PtOUPmlOaBJj4O+tqa9+IiH2lkASQqlbdJAGtVIDazDyJTbrdL9p/2euGEOdeVE5+Eo/RKz+qMnJMf8FDkACvn5BmoVcJIaAm51Q74VIqeqeP1bqo7ZRvNB3s4z5+PUkOcPQAuBBIoOQ/fs87koSD8LMfwx3xRkfZ74DOAM70Q2Lp9a2Q6dZz3MYTN2Qm+g+YCrY+Q2HBdTqSF+r+gyLqQ8KFZFJmYnfOjm+op4XUwkC1kXKB4fsfnphohyClp/DTLiMq1/wcAKh/zBlCmuWZsZ4vn5Rs= X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BY2PR04MB1974; X-Microsoft-Exchange-Diagnostics: 1;BY2PR04MB1974;31:lxITansiadxyVcEiqu2ZlXKAbpSztrcma7FvhlGgvVnu2fpeneO1W3sBzJXchDpnXY/DdkHj+kcHKR1mfs43vVlKKKb9OJyJRUcj2rGnBeVbL5AIT1g2xdYhqbXv3AjCbnKaVmCpaaND+C+TaDYA7e+hG9ndtDDJlzgA9QLDvMeXcYHvq/Vfk+TFbsywViZ/cerBQuFhoMi5VYRcszako4SmjR6bsuwjXoIXfjXt4qo=;20:I4DCbz1HvP2asG5y7319B85nl1fAHDptrPlMgTshiUP0INyFqcoW90TMuQgmbFbeEB/0uefg8QlaeySZOsBBzBa8yQpaI/YhuK/HqD5DsXRRkvt6fhTmeO6oiL792ZQQ26JIYu4zobe2mHwQEVnBmqmJehELBt/7KgmFWOmWMz5NMGcMOP1gP0PCs3CCWnlr8agj3Yis76Vhpq+Sr5gjCRMF7cQN+TgFsFFhiUM84XlAo624YN+H0JfFYwDd2WmwyuR0DXvbxNVvGrqTVfNlQHMIismAuSFk/AGyeb/JaeF30P7rVgpFeYTBBmjdsylNGVF3yRIMbonaY2Rc0PwQ2sDSHC5/uC+KWn/Ta0/uGM24WeHazFc6SS14FfAAtwMW/yA83kLVdFxHCIFUvkNDcoJwCp7iGjxqDnz6bxb0b0lA/y/rJAux2zfMsFneLs9ambGqkKrBWJkb4F/4oM6Db7qYlx7Oun//jW1hFT99dACxYLGXm/o3p2tUgY07mb8e X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(79135771888625)(56741022784758); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6055026);SRVR:BY2PR04MB1974;BCL:0;PCL:0;RULEID:;SRVR:BY2PR04MB1974; X-Microsoft-Exchange-Diagnostics: 1;BY2PR04MB1974;4:vtdxVGGD7eZOb/IrC742m0D2ZL7VYuVZn/plp2wbpMmfullppdmgGMRFsK1gGj6yL1jMX7rl9zmyFwuu7U+kUt6pgCqgJtC6eydaHBnstdF8BAH/j9pGNOq8XLwFK06UCJw2J4X6Ha/GIOGc73c8gkMiBTLlaeutlwilGGi1af85OanmKa9cOXKnVM4+f1BFmyEAHSc1qkoB9kI1+yePZUSf1oJi35xIzxjoH7pzdpW4V5n5JS+wPv411CBLMicD9jSLyde3afSAlJeXtmHEUYEMDv+LVSibzOZTbRFuPq2cJSCDRfdE4hkE+jDy5LzKiwotG26IsySMLFgdzd/GZ0uK3aGInzxO+4bHSvMKhgPPQ9eNbUwi2E9QsFWwJPaxpLkGNS+3SPaE9o8O8Mh1n2VNdCXL6HVuUCPM9gjVvPORWjfuq6THh1CHHWUyHL+hrUkn06+A1m5ujfe/2PY/0ZYnvDOsCrjnFOOPoPwiD2Y= X-Forefront-PRVS: 00429279BA X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(4630300001)(6009001)(7916002)(199003)(377424004)(51444003)(24454002)(189002)(4326007)(189998001)(97736004)(5001770100001)(2950100001)(7846002)(42186005)(4001350100001)(31696002)(92566002)(305945005)(7736002)(7416002)(66066001)(50986999)(77096005)(65956001)(65806001)(81166006)(33646002)(86362001)(31686004)(76176999)(81156014)(54356999)(50466002)(47776003)(101416001)(8676002)(23746002)(6116002)(105586002)(19580395003)(19580405001)(2201001)(2906002)(3846002)(64126003)(586003)(83506001)(106356001)(230700001)(15974865002)(5660300001)(36756003)(68736007)(7099028)(65826006)(18886075002);DIR:OUT;SFP:1102;SCL:1;SRVR:BY2PR04MB1974;H:damien-imac.dhcp.fujisawa.hgst.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?Windows-1252?Q?1;BY2PR04MB1974;23:9LaRnWWR9wHl0oXRgwb/ILOfSnypyBtqxEhry?= =?Windows-1252?Q?q4W63dtFsRbohz7W5pUajYnjFWx/lHUGOh5R69wr3XE+EBbMzu8VWSl0?= =?Windows-1252?Q?+YjEZ+ytY3dOXg9+0wl9Peu3Ef3tvKFBzi0vT84IRsqtnzYyjbV4is+e?= =?Windows-1252?Q?nM/bC9f56lUvK2PncNZe44QmsLgHW6BGCdTEIYQRqKJiXUuCtizjS5tQ?= =?Windows-1252?Q?CfILhVp5Ss9qgrdzwt15YQrGjAIORW44H4JV0oVCdBFYx6/RSIGP1jbw?= =?Windows-1252?Q?GYA0bKioVi1nFD0j7eLypAWrkqV8CDeC9TW7TLA6NLo6/+iwbVgAtpcY?= =?Windows-1252?Q?zWWU8yeEbm/Gued/oNpw91wlYc4lusg/XaDGT8WVrGVmuTyFTRDM6Liq?= =?Windows-1252?Q?27MPhNgGl9L50oXk+wUq4MpPSvx/MQ9ExFmcvuIQTssHQRw6btEESZRJ?= =?Windows-1252?Q?teE8Zt7sU4jwCu6YM80v2B+GhmroEFoJ+9HerEsUeOULySHlekopZmlH?= =?Windows-1252?Q?/S5o4E2piuSx6F4zR0bdZL8pC3olsDW77qsNb794Un6DqkRq7IJwobR8?= =?Windows-1252?Q?2pgMdBsCFfpdoMWZ+qZyTjBz6YeSNVRJZ8gnE5X+3zTVpeaqdPK9Je3v?= =?Windows-1252?Q?ySbkgeTaA6AnZXrcOlYW4yhlvyXowZjYrPMssuhlsmIUUQNy9NUyjle/?= =?Windows-1252?Q?cWbagw1+ET4xL/VOX2s2FUzkLd+YEfQiSmEva68uoXrGWhcC5q0h81jW?= =?Windows-1252?Q?ltcXAf6Yzywnf0QXL1oBAIg/cFJYiciKOV7pc1rjmrs0+tIDFJ8Ui0cL?= =?Windows-1252?Q?F83DQrrxGTiG9SJvTFKZQN+UJewAl1468DHLLZ8KzNlDoAGRKI0WeUYi?= =?Windows-1252?Q?iDhuG2p5g88UJPhUJciLFObO350BMLSWsH9k5OdmG3Ynle8FuwGy/hXD?= =?Windows-1252?Q?WNM+e1sXq8SMkCCzNVr7sG0BkqG3HK0S9/jyQRwOZITUXP7pESq2nVT0?= =?Windows-1252?Q?xcwAZwoTdcwEkjbbxWKVUknf1GYINkOya/QO7eAKhBkwxVXPjbQcB5Kk?= =?Windows-1252?Q?fOtw0P/dZxkMrgbwu0TDBpXyp6K1czGy6z+bj/TyJDBtrGMzCMHvAvKs?= =?Windows-1252?Q?27UPGZKV6EuogBe4gv8r265NW099OxDUb9zaX+dMmUupmu1OHe5MgZYS?= =?Windows-1252?Q?JL/u6l+KkDdfxKNwIJcETTTv9hcmFP/C2spkuFTp5Z478g8IoZLWxY0d?= =?Windows-1252?Q?ArkcIJ8qRNkyTvQIiaoi1U1D9E+EJd0FqC4mDnE9USLT75hQ1DsCO/pv?= =?Windows-1252?Q?sP3QpGQnivVP6DqaHzo0ZYN+mogo9afHe2LARhyI/RN+k75aeDr7N4db?= =?Windows-1252?Q?QiAmSqkwCzGzbxN20OSAu8zqfU3mdP3J0p8TFVez0qa9xVXGhxO5/TYt?= =?Windows-1252?Q?8cpRA1b7ZLKHRbkCoT33x2jlwNV+1x9s2+9pvbw6sSYJZFkabchTMF/z?= =?Windows-1252?Q?odYtFw=3D?= X-Microsoft-Exchange-Diagnostics: 1;BY2PR04MB1974;6:bUKNilY9Uxq93OVQSam2LJxeaPIOAoeTIllIo6B5N3Z3CLeZmMDV0ytBALPRybME5ARIx1nzxqAF7ZOpBRIRWZWsxC9q0EIula2JvO6i6zo3Egupzm/3zry00djYHha7vnhvPGcGu3Q+umA0Cgs9FrWxKkPwikpKKB4mVr806B/If+XPgi6PY0D57S/q4kZx8jrOL6I46NbltMKo3qqTjrW33nwfyiTUnBPAJtbZU7pfFLap5kPAeh/K4evLp1KHfX2IFLggq+NnQp1UANmdeVjQLhmtyyZld94q2XPwzZooD/92APvoIbut/qJzBQcVXxP8VeGDLKDfT3AjXrGT3g==;5:85GnvfX6+KtRzV0mzdXf0Xe8HFTJZa1iU8f0YItLxLYL8xHMQGQPfAY9oXcWBkZb6nnNh1uvr0LoVWRWxYcMJwDJMXtfWCCYEMKDUtfLNtcEkFcV+mpii2YpcxdNfdZ06/MrjdIi2T6kLmpWf0aO/w==;24:Nzj3kDOoBusXmP26ZtjJI8mdRiiihPRCPTIEr0ZWgStcNMQrvgc5wQDDM6jSgUmiIvG0Je4Bmg9Jo5Ap37FX0+VEeWzwVbaVlG1I1AUH7m0=;7:QTUHVyge+NTSluRxIR8K9hOwHsTiFvp4Wt/g+GijbzoRH2Ka8dri0Uzn0n8SXQFXoWFRESK619YRH6iMxV4XN5cKXTA1TWRAsJ/dtoMgUrahfDfkJaOYlE8SC0e+AOHadwFlpuqNmYJV5Ag46JdFTXjrXZp9BLiCkZDjK8WXnpnp6jmBdfg8tPTSsIL2JFUPOz7ZOA2edLYzCVb7IPRYMoUJwytCQ1dMGEG6XDICxhVezLtnw126ZA8y9M0De8cd SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;BY2PR04MB1974;20:c8WuBynwPVVtvrc99zbd+QV7u4TSrxAOKSJN2Wr70P3YGPbOPUFv59UNCntOs2PGba/AFjl8VVeD274Y++XhIAUAPvWPrLt5NyUVFBH04A7bxKcI2MTpTGg63JnJV0zvJ7F+q6LBkxXiBtrS2Og+tg22NoKmbhwD2dWYqB2XioAwAwXJrlDQwDEU69aGqeNFDE0iWGOb4PnNIl111Exvc4POdC37TXe5w1xEkfzNdtELE/Tkn694eC5miRWYn0ss X-OriginatorOrg: hgst.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Aug 2016 23:57:18.9453 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY2PR04MB1974 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Shaun, On 8/22/16 13:31, Shaun Tancheff wrote: [...] > -int sd_zbc_setup_discard(struct scsi_disk *sdkp, struct request *rq, > - sector_t sector, unsigned int num_sectors) > +int sd_zbc_setup_discard(struct scsi_cmnd *cmd) > { > - struct blk_zone *zone; > + struct request *rq = cmd->request; > + struct scsi_device *sdp = cmd->device; > + struct scsi_disk *sdkp = scsi_disk(rq->rq_disk); > + sector_t sector = blk_rq_pos(rq); > + unsigned int nr_sectors = blk_rq_sectors(rq); > int ret = BLKPREP_OK; > + struct blk_zone *zone; > unsigned long flags; > + u32 wp_offset; > + bool use_write_same = false; > > zone = blk_lookup_zone(rq->q, sector); > - if (!zone) > + if (!zone) { > + /* Test for a runt zone before giving up */ > + if (sdp->type != TYPE_ZBC) { > + struct request_queue *q = rq->q; > + struct rb_node *node; > + > + node = rb_last(&q->zones); > + if (node) > + zone = rb_entry(node, struct blk_zone, node); > + if (zone) { > + spin_lock_irqsave(&zone->lock, flags); > + if ((zone->start + zone->len) <= sector) > + goto out; > + spin_unlock_irqrestore(&zone->lock, flags); > + zone = NULL; > + } > + } > return BLKPREP_KILL; > + } I do not understand the point of this code here to test for the runt zone. As long as sector is within the device maximum capacity (in 512B unit), blk_lookup_zone will return the pointer to the zone structure containing that sector (the RB-tree does not have any constraint regarding zone size). The only case where NULL would be returned is if discard is issued super early right after the disk is probed and before the zone refresh work has completed. We can certainly protect against that by delaying the discard. > > spin_lock_irqsave(&zone->lock, flags); > - > if (zone->state == BLK_ZONE_UNKNOWN || > zone->state == BLK_ZONE_BUSY) { > sd_zbc_debug_ratelimit(sdkp, > - "Discarding zone %zu state %x, deferring\n", > + "Discarding zone %zx state %x, deferring\n", Sector values are usually displayed in decimal. Why use Hex here ? At least "0x" would be needed to avoid confusion I think. > zone->start, zone->state); > ret = BLKPREP_DEFER; > goto out; > @@ -406,46 +428,80 @@ int sd_zbc_setup_discard(struct scsi_disk *sdkp, struct request *rq, > if (zone->state == BLK_ZONE_OFFLINE) { > /* let the drive fail the command */ > sd_zbc_debug_ratelimit(sdkp, > - "Discarding offline zone %zu\n", > + "Discarding offline zone %zx\n", > zone->start); > goto out; > } > - > - if (!blk_zone_is_smr(zone)) { > + if (blk_zone_is_cmr(zone)) { > + use_write_same = true; > sd_zbc_debug_ratelimit(sdkp, > - "Discarding %s zone %zu\n", > - blk_zone_is_cmr(zone) ? "CMR" : "unknown", > + "Discarding CMR zone %zx\n", > zone->start); > - ret = BLKPREP_DONE; > goto out; > } Some 10TB host managed disks out there have 1% conventional zone space, that is 100GB of capacity. When issuing a "reset all", doing a write same in these zones will take forever... If the user really wants zeroes in those zones, let it issue a zeroout. I think that it would a better choice to simply not report discard_zeroes_data as true and do nothing for conventional zones reset. > - if (blk_zone_is_empty(zone)) { > - sd_zbc_debug_ratelimit(sdkp, > - "Discarding empty zone %zu\n", > - zone->start); > - ret = BLKPREP_DONE; > + if (zone->start != sector || zone->len < nr_sectors) { > + sd_printk(KERN_ERR, sdkp, > + "Misaligned RESET WP %zx/%x on zone %zx/%zx\n", > + sector, nr_sectors, zone->start, zone->len); > + ret = BLKPREP_KILL; > goto out; > } > - > - if (zone->start != sector || > - zone->len < num_sectors) { > + /* Protect against Reset WP when more data had been written to the > + * zone than is being discarded. > + */ > + wp_offset = zone->wp - zone->start; > + if (wp_offset > nr_sectors) { > sd_printk(KERN_ERR, sdkp, > - "Misaligned RESET WP, start %zu/%zu " > - "len %zu/%u\n", > - zone->start, sector, zone->len, num_sectors); > + "Will Corrupt RESET WP %zx/%x/%x on zone %zx/%zx/%zx\n", > + sector, wp_offset, nr_sectors, > + zone->start, zone->wp, zone->len); > ret = BLKPREP_KILL; > goto out; > } > - > - /* > - * Opportunistic setting, will be fixed up with > - * zone update if RESET WRITE POINTER fails. > - */ > - zone->wp = zone->start; > + if (blk_zone_is_empty(zone)) { > + sd_zbc_debug_ratelimit(sdkp, > + "Discarding empty zone %zx [WP: %zx]\n", > + zone->start, zone->wp); > + ret = BLKPREP_DONE; > + goto out; > + } > > out: > spin_unlock_irqrestore(&zone->lock, flags); > > + if (ret != BLKPREP_OK) > + goto done; > + /* > + * blk_zone cache uses block layer sector units > + * but commands use device units > + */ > + sector >>= ilog2(sdp->sector_size) - 9; > + nr_sectors >>= ilog2(sdp->sector_size) - 9; > + > + if (use_write_same) { > + cmd->cmd_len = 16; > + cmd->cmnd[0] = WRITE_SAME_16; > + cmd->cmnd[1] = 0; /* UNMAP (not set) */ > + put_unaligned_be64(sector, &cmd->cmnd[2]); > + put_unaligned_be32(nr_sectors, &cmd->cmnd[10]); > + cmd->transfersize = sdp->sector_size; > + rq->timeout = SD_WRITE_SAME_TIMEOUT; > + } else { > + cmd->cmd_len = 16; > + cmd->cmnd[0] = ZBC_OUT; > + cmd->cmnd[1] = ZO_RESET_WRITE_POINTER; > + put_unaligned_be64(sector, &cmd->cmnd[2]); > + /* Reset Write Pointer doesn't have a payload */ > + cmd->transfersize = 0; > + cmd->sc_data_direction = DMA_NONE; > + /* > + * Opportunistic setting, will be fixed up with > + * zone update if RESET WRITE POINTER fails. > + */ > + zone->wp = zone->start; > + } > + > +done: > return ret; > } > > @@ -468,6 +524,9 @@ int sd_zbc_setup_read_write(struct scsi_disk *sdkp, struct request *rq, > > spin_lock_irqsave(&zone->lock, flags); > > + if (blk_zone_is_cmr(zone)) > + goto out; > + > if (zone->state == BLK_ZONE_UNKNOWN || > zone->state == BLK_ZONE_BUSY) { > sd_zbc_debug_ratelimit(sdkp, > @@ -476,16 +535,6 @@ int sd_zbc_setup_read_write(struct scsi_disk *sdkp, struct request *rq, > ret = BLKPREP_DEFER; > goto out; > } > - if (zone->state == BLK_ZONE_OFFLINE) { > - /* let the drive fail the command */ > - sd_zbc_debug_ratelimit(sdkp, > - "zone %zu offline\n", > - zone->start); > - goto out; > - } > - > - if (blk_zone_is_cmr(zone)) > - goto out; > > if (blk_zone_is_seq_pref(zone)) { > if (op_is_write(req_op(rq))) { > @@ -514,6 +563,14 @@ int sd_zbc_setup_read_write(struct scsi_disk *sdkp, struct request *rq, > goto out; > } > > + if (zone->state == BLK_ZONE_OFFLINE) { > + /* let the drive fail the command */ > + sd_zbc_debug_ratelimit(sdkp, > + "zone %zu offline\n", > + zone->start); > + goto out; > + } > + > if (op_is_write(req_op(rq))) { > if (zone->state == BLK_ZONE_READONLY) > goto out; > -- Damien Le Moal, Ph.D. Sr. Manager, System Software Group, HGST Research, HGST, a Western Digital brand Damien.LeMoal@hgst.com (+81) 0466-98-3593 (ext. 513593) 1 kirihara-cho, Fujisawa, Kanagawa, 252-0888 Japan www.hgst.com