From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E888C2D0DB for ; Tue, 28 Jan 2020 23:53:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3A941206A4 for ; Tue, 28 Jan 2020 23:53:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726307AbgA1Xxf (ORCPT ); Tue, 28 Jan 2020 18:53:35 -0500 Received: from mail104.syd.optusnet.com.au ([211.29.132.246]:58201 "EHLO mail104.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726293AbgA1Xxf (ORCPT ); Tue, 28 Jan 2020 18:53:35 -0500 Received: from dread.disaster.area (pa49-195-111-217.pa.nsw.optusnet.com.au [49.195.111.217]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id BDA537E9CD2; Wed, 29 Jan 2020 10:53:31 +1100 (AEDT) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1iwafi-0005U5-D9; Wed, 29 Jan 2020 10:53:30 +1100 Date: Wed, 29 Jan 2020 10:53:30 +1100 From: Dave Chinner To: Jeff Moyer Cc: "Darrick J. Wong" , linux-xfs@vger.kernel.org Subject: Re: [PATCH] xfsprogs: mkfs: don't default to the physical sector size if it's bigger than XFS_MAX_SECTORSIZE Message-ID: <20200128235330.GG18610@dread.disaster.area> References: <20200128161423.GO3447196@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.3 cv=W5xGqiek c=1 sm=1 tr=0 a=0OveGI8p3fsTA6FL6ss4ZQ==:117 a=0OveGI8p3fsTA6FL6ss4ZQ==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=kj9zAlcOel0A:10 a=Jdjhy38mL1oA:10 a=yPCof4ZbAAAA:8 a=7-415B0cAAAA:8 a=DPlXw0DKg_gyqOym5DQA:9 a=CjuIK1q_8ugA:10 a=biEYGPWJfzWAr4FL6Ov7:22 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Tue, Jan 28, 2020 at 11:54:17AM -0500, Jeff Moyer wrote: > "Darrick J. Wong" writes: > > > Do we need to check that ft->lsectorsize <= XFS_MAX_SECTORSIZE too? > > We can. What would be the correct response to such a situation? Fail. The device is outside of the bounds of the validated on-disk format at that point. Historically speaking, the on-disk format only supports up to 32kB sector sizes and the limitation is a journal format limitation. That is, the journal implementation is based around 32kB buffers and journal writes requiring atomic sector writes. Even though v2 logs can support up to 256kB buffers, the format is still based around 32kB "segments" within the larger buffer. IIRC, this limitation is due to the built-in sector-based "whole buffer written" validation mechanism built into the log headers. The v1 format has an array that can store 64 individual sector indexes, so for 512 byte sectors the max log buffer size is 32kB. As long as that first sector of the log buffer is written atomically, it doesn't matter what the sector size actually is. However, we cannot do 32kB writes to a 64kB physical sector device, and hence the v1 format icannot support log records larger than 32kB, so the format is limited to 32kB sector sizes. The v2 log format does increase the size of the log buffer, but it does so by adding extension header sectors that contain the index mapping for each 32kB sector. It works under the same principle as the v1 log - as long as the first sector write is atomic, we can validate the extension headers are intact and so validate all 32kB segments of the larger buffer. The v2 log format also allows for stripe unit padding of writes, such that it will always write entire stripe units even when the log buffer is not full. Hence it supports writing aligned sizes larger than 32kB, even up to 256kB. However, it still relies on atomic sector writes for validation. When we moved to the v5 on-disk format, we also turned on CRC checking of the log records. THis means we are no longer dependent on atomic sector writes to detect torn log writes - if the first 512 byte sector is not written atomically, we can detect that in recovery on v5 filesystems. That means we don't really care what the sector size used by the log is anymore. We can do aligned log writes up to 256kB, and we can detect torn single sector writes. Hence I suspect XFS_MAX_SECTORSIZE is no longer relevant to v5 filesystems and we may be able to increase it to 64kB without any issues. We'd need to do quite a bit more code spelunking than I've just done, not to mention validation testing, before we can do such a thing, though.... > > (Not that I really expect disks with 64k LBA units...) > > It looks like you can tell loop to do that, but I wasn't able to make > that happen in practice. I'm not quite sure why, though. IIRC, it depends on what IO alignment capability the underlying filesystem provides via it's sector/block sizes. Cheers, Dave. -- Dave Chinner david@fromorbit.com