From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43609C46464 for ; Fri, 10 Aug 2018 07:26:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EDC49223E3 for ; Fri, 10 Aug 2018 07:26:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EDC49223E3 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727804AbeHJJyn (ORCPT ); Fri, 10 Aug 2018 05:54:43 -0400 Received: from smtp.nue.novell.com ([195.135.221.5]:33585 "EHLO smtp.nue.novell.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727462AbeHJJym (ORCPT ); Fri, 10 Aug 2018 05:54:42 -0400 Received: from [10.160.4.48] (charybdis-ext.suse.de [195.135.221.2]) by smtp.nue.novell.com with ESMTP (TLS encrypted); Fri, 10 Aug 2018 09:26:04 +0200 Subject: Re: [RFC PATCH 00/17] btrfs zoned block device support To: Naohiro Aota , David Sterba , linux-btrfs@vger.kernel.org Cc: Chris Mason , Josef Bacik , linux-kernel@vger.kernel.org, Damien Le Moal , Bart Van Assche , Matias Bjorling References: <20180809180450.5091-1-naota@elisp.net> From: Hannes Reinecke Message-ID: <9a37f119-5e9f-ef98-88a3-45c0f936d9ad@suse.com> Date: Fri, 10 Aug 2018 09:26:03 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180809180450.5091-1-naota@elisp.net> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/09/2018 08:04 PM, Naohiro Aota wrote: > This series adds zoned block device support to btrfs. > > A zoned block device consists of a number of zones. Zones are either > conventional and accepting random writes or sequential and requiring that > writes be issued in LBA order from each zone write pointer position. This > patch series ensures that the sequential write constraint of sequential > zones is respected while fundamentally not changing BtrFS block and I/O > management for block stored in conventional zones. > > To achieve this, the default dev extent size of btrfs is changed on zoned > block devices so that dev extents are always aligned to a zone. Allocation > of blocks within a block group is changed so that the allocation is always > sequential from the beginning of the block groups. To do so, an allocation > pointer is added to block groups and used as the allocation hint. The > allocation changes also ensures that block freed below the allocation > pointer are ignored, resulting in sequential block allocation regardless of > the block group usage. > > While the introduction of the allocation pointer ensure that blocks will be > allocated sequentially, I/Os to write out newly allocated blocks may be > issued out of order, causing errors when writing to sequential zones. This > problem s solved by introducing a submit_buffer() function and changes to > the internal I/O scheduler to ensure in-order issuing of write I/Os for > each chunk and corresponding to the block allocation order in the chunk. > > The zones of a chunk are reset to allow reusing of the zone only when the > block group is being freed, that is, when all the extents of the block group > are unused. > > For btrfs volumes composed of multiple zoned disks, restrictions are added > to ensure that all disks have the same zone size. This matches the existing > constraint that all dev extents in a chunk must have the same size. > > It requires zoned block devices to test the patchset. Even if you don't > have zone devices, you can use tcmu-runner [1] to emulate zoned block > devices. It can export emulated zoned block devices via iSCSI. Please see > the README.md of tcmu-runner [2] for howtos to generate a zoned block > device on tcmu-runner. > > [1] https://github.com/open-iscsi/tcmu-runner > [2] https://github.com/open-iscsi/tcmu-runner/blob/master/README.md > > Patch 1 introduces the HMZONED incompatible feature flag to indicate that > the btrfs volume was formatted for use on zoned block devices. > > Patches 2 and 3 implement functions to gather information on the zones of > the device (zones type and write pointer position). > > Patch 4 restrict the possible locations of super blocks to conventional > zones to preserve the existing update in-place mechanism for the super > blocks. > > Patches 5 to 7 disable features which are not compatible with the sequential > write constraints of zoned block devices. This includes fallocate and > direct I/O support. Device replace is also disabled for now. > > Patches 8 and 9 tweak the extent buffer allocation for HMZONED mode to > implement sequential block allocation in block groups and chunks. > > Patches 10 to 12 implement the new submit buffer I/O path to ensure sequential > write I/O delivery to the device zones. > > Patches 13 to 16 modify several parts of btrfs to handle free blocks > without breaking the sequential block allocation and sequential write order > as well as zone reset for unused chunks. > > Finally, patch 17 adds the HMZONED feature to the list of supported > features. > > Naohiro Aota (17): > btrfs: introduce HMZONED feature flag > btrfs: Get zone information of zoned block devices > btrfs: Check and enable HMZONED mode > btrfs: limit super block locations in HMZONED mode > btrfs: disable fallocate in HMZONED mode > btrfs: disable direct IO in HMZONED mode > btrfs: disable device replace in HMZONED mode > btrfs: align extent allocation to zone boundary > btrfs: do sequential allocation on HMZONED drives > btrfs: split btrfs_map_bio() > btrfs: introduce submit buffer > btrfs: expire submit buffer on timeout > btrfs: avoid sync IO prioritization on checksum in HMZONED mode > btrfs: redirty released extent buffers in sequential BGs > btrfs: reset zones of unused block groups > btrfs: wait existing extents before truncating > btrfs: enable to mount HMZONED incompat flag > And unfortunately this series fails to boot for me: BTRFS error (device nvme0n1p5): zoned devices mixed with regular devices BTRFS error (device nvme0n1p5): failed to init hmzoned mode: -22 BTRFS error (device nvme0n1p5): open_ctree failed Needless to say, /dev/nvme0n1p5 is _not_ a zoned device. Nor has the zoned device a btrfs superblock ATM. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@suse.com +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg)