From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A34B4C433DB for ; Wed, 24 Mar 2021 17:39:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AE85361A15 for ; Wed, 24 Mar 2021 17:39:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AE85361A15 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=lst.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3BD486B02E6; Wed, 24 Mar 2021 13:39:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 36D756B02E7; Wed, 24 Mar 2021 13:39:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 233E96B02E8; Wed, 24 Mar 2021 13:39:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0154.hostedemail.com [216.40.44.154]) by kanga.kvack.org (Postfix) with ESMTP id 82E436B02E6 for ; Wed, 24 Mar 2021 13:39:41 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 42496181B7E23 for ; Wed, 24 Mar 2021 17:39:41 +0000 (UTC) X-FDA: 77955480162.06.ADC19AE Received: from verein.lst.de (verein.lst.de [213.95.11.211]) by imf20.hostedemail.com (Postfix) with ESMTP id 715FE13A for ; Wed, 24 Mar 2021 17:39:39 +0000 (UTC) Received: by verein.lst.de (Postfix, from userid 2407) id 1520C68BEB; Wed, 24 Mar 2021 18:39:36 +0100 (CET) Date: Wed, 24 Mar 2021 18:39:35 +0100 From: Christoph Hellwig To: Dan Williams Cc: Christoph Hellwig , "ruansy.fnst@fujitsu.com" , Linux Kernel Mailing List , linux-xfs , linux-nvdimm , Linux MM , linux-fsdevel , device-mapper development , "Darrick J. Wong" , david , Alasdair Kergon , Mike Snitzer , Goldwyn Rodrigues , "qi.fuli@fujitsu.com" , "y-goto@fujitsu.com" Subject: Re: [PATCH v3 01/11] pagemap: Introduce ->memory_failure() Message-ID: <20210324173935.GB12770@lst.de> References: <20210324074751.GA1630@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.17 (2007-11-01) X-Stat-Signature: bhtkkeytx5d9eujrkk1uhgts6nsbhpa7 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 715FE13A Received-SPF: none (lst.de>: No applicable sender policy available) receiver=imf20; identity=mailfrom; envelope-from=""; helo=verein.lst.de; client-ip=213.95.11.211 X-HE-DKIM-Result: none/none X-HE-Tag: 1616607579-668081 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 24, 2021 at 09:37:01AM -0700, Dan Williams wrote: > > Eww. As I said I think the right way is that the file system (or > > other consumer) can register a set of callbacks for opening the device. > > How does that solve the problem of the driver being notified of all > pfn failure events? Ok, I probably just showed I need to spend more time looking at your proposal vs the actual code.. Don't we have a proper way how one of the nvdimm layers own a spefific memory range and call directly into that instead of through a notifier? > Today pmem only finds out about the ones that are > notified via native x86 machine check error handling via a notifier > (yes "firmware-first" error handling fails to do the right thing for > the pmem driver), Did any kind of firmware-first error handling ever get anything right? I wish people would have learned that by now. > or the ones that are eventually reported via address > range scrub, but only for the nvdimms that implement range scrubbing. > memory_failure() seems a reasonable catch all point to route pfn > failure events, in an arch independent way, to interested drivers. Yeah. > I'm fine swapping out dax_device blocking_notiier chains for your > proposal, but that does not address all the proposed reworks in my > list which are: > > - delete "drivers/acpi/nfit/mce.c" > > - teach memory_failure() to be able to communicate range failure > > - enable memory_failure() to defer to a filesystem that can say > "critical metadata is impacted, no point in trying to do file-by-file > isolation, bring the whole fs down". This all sounds sensible.