From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3E88C433DB for ; Wed, 24 Mar 2021 17:40:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9A02261A19 for ; Wed, 24 Mar 2021 17:40:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237071AbhCXRj6 (ORCPT ); Wed, 24 Mar 2021 13:39:58 -0400 Received: from verein.lst.de ([213.95.11.211]:38051 "EHLO verein.lst.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237143AbhCXRjl (ORCPT ); Wed, 24 Mar 2021 13:39:41 -0400 Received: by verein.lst.de (Postfix, from userid 2407) id 1520C68BEB; Wed, 24 Mar 2021 18:39:36 +0100 (CET) Date: Wed, 24 Mar 2021 18:39:35 +0100 From: Christoph Hellwig To: Dan Williams Cc: Christoph Hellwig , "ruansy.fnst@fujitsu.com" , Linux Kernel Mailing List , linux-xfs , linux-nvdimm , Linux MM , linux-fsdevel , device-mapper development , "Darrick J. Wong" , david , Alasdair Kergon , Mike Snitzer , Goldwyn Rodrigues , "qi.fuli@fujitsu.com" , "y-goto@fujitsu.com" Subject: Re: [PATCH v3 01/11] pagemap: Introduce ->memory_failure() Message-ID: <20210324173935.GB12770@lst.de> References: <20210324074751.GA1630@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.17 (2007-11-01) Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Wed, Mar 24, 2021 at 09:37:01AM -0700, Dan Williams wrote: > > Eww. As I said I think the right way is that the file system (or > > other consumer) can register a set of callbacks for opening the device. > > How does that solve the problem of the driver being notified of all > pfn failure events? Ok, I probably just showed I need to spend more time looking at your proposal vs the actual code.. Don't we have a proper way how one of the nvdimm layers own a spefific memory range and call directly into that instead of through a notifier? > Today pmem only finds out about the ones that are > notified via native x86 machine check error handling via a notifier > (yes "firmware-first" error handling fails to do the right thing for > the pmem driver), Did any kind of firmware-first error handling ever get anything right? I wish people would have learned that by now. > or the ones that are eventually reported via address > range scrub, but only for the nvdimms that implement range scrubbing. > memory_failure() seems a reasonable catch all point to route pfn > failure events, in an arch independent way, to interested drivers. Yeah. > I'm fine swapping out dax_device blocking_notiier chains for your > proposal, but that does not address all the proposed reworks in my > list which are: > > - delete "drivers/acpi/nfit/mce.c" > > - teach memory_failure() to be able to communicate range failure > > - enable memory_failure() to defer to a filesystem that can say > "critical metadata is impacted, no point in trying to do file-by-file > isolation, bring the whole fs down". This all sounds sensible.