From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.4 required=3.0 tests=BAYES_50, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99D3BC433C1 for ; Wed, 24 Mar 2021 17:39:43 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CA85A61A1A for ; Wed, 24 Mar 2021 17:39:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CA85A61A1A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id 6421C100EB332; Wed, 24 Mar 2021 10:39:42 -0700 (PDT) Received-SPF: None (mailfrom) identity=mailfrom; client-ip=213.95.11.211; helo=verein.lst.de; envelope-from=hch@lst.de; receiver= Received: from verein.lst.de (verein.lst.de [213.95.11.211]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 440AE100EB32D for ; Wed, 24 Mar 2021 10:39:39 -0700 (PDT) Received: by verein.lst.de (Postfix, from userid 2407) id 1520C68BEB; Wed, 24 Mar 2021 18:39:36 +0100 (CET) Date: Wed, 24 Mar 2021 18:39:35 +0100 From: Christoph Hellwig To: Dan Williams Subject: Re: [PATCH v3 01/11] pagemap: Introduce ->memory_failure() Message-ID: <20210324173935.GB12770@lst.de> References: <20210324074751.GA1630@lst.de> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.17 (2007-11-01) Message-ID-Hash: UIBOL7PMCKZHKNLW4P473V73NBZBGEI6 X-Message-ID-Hash: UIBOL7PMCKZHKNLW4P473V73NBZBGEI6 X-MailFrom: hch@lst.de X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation CC: Christoph Hellwig , "ruansy.fnst@fujitsu.com" , Linux Kernel Mailing List , linux-xfs , linux-nvdimm , Linux MM , linux-fsdevel , device-mapper development , "Darrick J. Wong" , david , Alasdair Kergon , Mike Snitzer , Goldwyn Rodrigues , "qi.fuli@fujitsu.com" , "y-goto@fujitsu.com" X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Wed, Mar 24, 2021 at 09:37:01AM -0700, Dan Williams wrote: > > Eww. As I said I think the right way is that the file system (or > > other consumer) can register a set of callbacks for opening the device. > > How does that solve the problem of the driver being notified of all > pfn failure events? Ok, I probably just showed I need to spend more time looking at your proposal vs the actual code.. Don't we have a proper way how one of the nvdimm layers own a spefific memory range and call directly into that instead of through a notifier? > Today pmem only finds out about the ones that are > notified via native x86 machine check error handling via a notifier > (yes "firmware-first" error handling fails to do the right thing for > the pmem driver), Did any kind of firmware-first error handling ever get anything right? I wish people would have learned that by now. > or the ones that are eventually reported via address > range scrub, but only for the nvdimms that implement range scrubbing. > memory_failure() seems a reasonable catch all point to route pfn > failure events, in an arch independent way, to interested drivers. Yeah. > I'm fine swapping out dax_device blocking_notiier chains for your > proposal, but that does not address all the proposed reworks in my > list which are: > > - delete "drivers/acpi/nfit/mce.c" > > - teach memory_failure() to be able to communicate range failure > > - enable memory_failure() to defer to a filesystem that can say > "critical metadata is impacted, no point in trying to do file-by-file > isolation, bring the whole fs down". This all sounds sensible. _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org