From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CEA4C433F5 for ; Wed, 6 Apr 2022 21:30:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231822AbiDFVcr (ORCPT ); Wed, 6 Apr 2022 17:32:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35472 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232644AbiDFVc1 (ORCPT ); Wed, 6 Apr 2022 17:32:27 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7DA692D35CA; Wed, 6 Apr 2022 13:39:04 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id B3C98B8254C; Wed, 6 Apr 2022 20:39:02 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 757FFC385A5; Wed, 6 Apr 2022 20:39:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1649277541; bh=PXUDYE1r6Gil/o9m5rfjYKx/BFUoyr1dI3rq+0GFdOc=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Cd9d3yNYx5ueGCXvizdmPbLOeNrVL6KS5C9ulrFKncWNwuw/eFoj4ZgXMdrIGjDxv a7fjB7+wNcj4jiIBOJK35xYyIsj6e+NpbqJQinyrR1FNlX3ZgTWjHUksIBI7YC3Omx gGwXlXFrEFw+DdU48Ubs3JzmiTAVW17j0Naaa8et4nrA9ezYZlBmQvEKN9N6GxDyrt V4EgEOqknmuYSsUSXN9oA2iX4BkcxxQbiFkg6SphOxCEbIr2ml5/lLBC4yKrLZ0z/f UCHO7hgqZ6LQ5zOaQ7/07RWJJxT5VxoUEIQq2NGtk6k/McpEorw6y0xTMPEc3f1MqM 71yCvdJd8NJqQ== Date: Wed, 6 Apr 2022 13:39:00 -0700 From: "Darrick J. Wong" To: Dan Williams Cc: Jane Chu , Christoph Hellwig , Shiyang Ruan , Linux Kernel Mailing List , linux-xfs , Linux NVDIMM , Linux MM , linux-fsdevel , david Subject: Re: [PATCH v11 1/8] dax: Introduce holder for dax_device Message-ID: <20220406203900.GR27690@magnolia> References: <4fd95f0b-106f-6933-7bc6-9f0890012b53@fujitsu.com> <15a635d6-2069-2af5-15f8-1c0513487a2f@fujitsu.com> <4ed8baf7-7eb9-71e5-58ea-7c73b7e5bb73@fujitsu.com> <20220330161812.GA27649@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Tue, Apr 05, 2022 at 06:22:48PM -0700, Dan Williams wrote: > On Tue, Apr 5, 2022 at 5:55 PM Jane Chu wrote: > > > > On 3/30/2022 9:18 AM, Darrick J. Wong wrote: > > > On Wed, Mar 30, 2022 at 08:49:29AM -0700, Christoph Hellwig wrote: > > >> On Wed, Mar 30, 2022 at 06:58:21PM +0800, Shiyang Ruan wrote: > > >>> As the code I pasted before, pmem driver will subtract its ->data_offset, > > >>> which is byte-based. And the filesystem who implements ->notify_failure() > > >>> will calculate the offset in unit of byte again. > > >>> > > >>> So, leave its function signature byte-based, to avoid repeated conversions. > > >> > > >> I'm actually fine either way, so I'll wait for Dan to comment. > > > > > > FWIW I'd convinced myself that the reason for using byte units is to > > > make it possible to reduce the pmem failure blast radius to subpage > > > units... but then I've also been distracted for months. :/ > > > > > > > Yes, thanks Darrick! I recall that. > > Maybe just add a comment about why byte unit is used? > > I think we start with page failure notification and then figure out > how to get finer grained through the dax interface in follow-on > changes. Otherwise, for finer grained error handling support, > memory_failure() would also need to be converted to stop upcasting > cache-line granularity to page granularity failures. The native MCE > notification communicates a 'struct mce' that can be in terms of > sub-page bytes, but the memory management implications are all page > based. I assume the FS implications are all FS-block-size based? I wouldn't necessarily make that assumption -- for regular files, the user program is in a better position to figure out how to reset the file contents. For fs metadata, it really depends. In principle, if (say) we could get byte granularity poison info, we could look up the space usage within the block to decide if the poisoned part was actually free space, in which case we can correct the problem by (re)zeroing the affected bytes to clear the poison. Obviously, if the blast radius hits the internal space info or something that was storing useful data, then you'd have to rebuild the whole block (or the whole data structure), but that's not necessarily a given. --D