linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jane Chu <jane.chu@oracle.com>
To: Shiyang Ruan <ruansy.fnst@fujitsu.com>,
	linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org,
	nvdimm@lists.linux.dev, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, dm-devel@redhat.com
Cc: djwong@kernel.org, dan.j.williams@intel.com, david@fromorbit.com,
	hch@lst.de, agk@redhat.com, snitzer@redhat.com
Subject: Re: [PATCH RESEND v6 1/9] pagemap: Introduce ->memory_failure()
Date: Tue, 17 Aug 2021 23:08:40 -0700	[thread overview]
Message-ID: <78c22960-3f6d-8e5d-890a-72915236bedc@oracle.com> (raw)
In-Reply-To: <beee643c-0fd9-b0f7-5330-0d64bde499d3@oracle.com>


On 8/17/2021 10:43 PM, Jane Chu wrote:
> More information -
> 
> On 8/16/2021 10:20 AM, Jane Chu wrote:
>> Hi, ShiYang,
>>
>> So I applied the v6 patch series to my 5.14-rc3 as it's what you 
>> indicated is what v6 was based at, and injected a hardware poison.
>>
>> I'm seeing the same problem that was reported a while ago after the
>> poison was consumed - in the SIGBUS payload, the si_addr is missing:
>>
>> ** SIGBUS(7): canjmp=1, whichstep=0, **
>> ** si_addr(0x(nil)), si_lsb(0xC), si_code(0x4, BUS_MCEERR_AR) **
>>
>> The si_addr ought to be 0x7f6568000000 - the vaddr of the first page
>> in this case.
> 
> The failure came from here :
> 
> [PATCH RESEND v6 6/9] xfs: Implement ->notify_failure() for XFS
> 
> +static int
> +xfs_dax_notify_failure(
> ...
> +    if (!xfs_sb_version_hasrmapbt(&mp->m_sb)) {
> +        xfs_warn(mp, "notify_failure() needs rmapbt enabled!");
> +        return -EOPNOTSUPP;
> +    }
> 
> I am not familiar with XFS, but I have a few questions I hope to get 
> answers -
> 
> 1) What does it take and cost to make
>     xfs_sb_version_hasrmapbt(&mp->m_sb) to return true?
> 
> 2) For a running environment that fails the above check, is it
>     okay to leave the poison handle in limbo and why?
> 
> 3) If the above regression is not acceptable, any potential remedy?

How about moving the check to prior to the notifier registration?
And register only if the check is passed?  This seems better
than an alternative which is to fall back to the legacy memory_failure
handling in case the filesystem returns -EOPNOTSUPP.

thanks,
-jane

> 
> thanks!
> -jane
> 
> 
>>
>> Something is not right...
>>
>> thanks,
>> -jane
>>
>>
>> On 8/5/2021 6:17 PM, Jane Chu wrote:
>>> The filesystem part of the pmem failure handling is at minimum built
>>> on PAGE_SIZE granularity - an inheritance from general memory_failure 
>>> handling.  However, with Intel's DCPMEM technology, the error blast
>>> radius is no more than 256bytes, and might get smaller with future
>>> hardware generation, also advanced atomic 64B write to clear the poison.
>>> But I don't see any of that could be incorporated in, given that the
>>> filesystem is notified a corruption with pfn, rather than an exact
>>> address.
>>>
>>> So I guess this question is also for Dan: how to avoid unnecessarily
>>> repairing a PMD range for a 256B corrupt range going forward?
>>>
>>> thanks,
>>> -jane
>>>
>>>
>>> On 7/30/2021 3:01 AM, Shiyang Ruan wrote:
>>>> When memory-failure occurs, we call this function which is implemented
>>>> by each kind of devices.  For the fsdax case, pmem device driver
>>>> implements it.  Pmem device driver will find out the filesystem in 
>>>> which
>>>> the corrupted page located in.  And finally call filesystem handler to
>>>> deal with this error.
>>>>
>>>> The filesystem will try to recover the corrupted data if necessary.
>>>
>>
> 


  reply	other threads:[~2021-08-18  6:08 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-30 10:01 [PATCH RESEND v6 0/9] fsdax: introduce fs query to support reflink Shiyang Ruan
2021-07-30 10:01 ` [PATCH RESEND v6 1/9] pagemap: Introduce ->memory_failure() Shiyang Ruan
2021-08-06  1:17   ` Jane Chu
2021-08-16 17:20     ` Jane Chu
2021-08-17  1:44       ` ruansy.fnst
2021-08-18  5:43       ` Jane Chu
2021-08-18  6:08         ` Jane Chu [this message]
2021-08-18  7:52           ` ruansy.fnst
2021-08-18 17:10             ` Dan Williams
2021-08-23 13:21               ` hch
2021-08-18 15:52           ` Darrick J. Wong
2021-08-19  7:18           ` Jane Chu
2021-08-19  8:11             ` Jane Chu
2021-08-19  9:10               ` ruansy.fnst
2021-08-19 20:50                 ` Jane Chu
2021-08-20 16:07   ` Dan Williams
2021-07-30 10:01 ` [PATCH RESEND v6 2/9] dax: Introduce holder for dax_device Shiyang Ruan
2021-08-06  1:02   ` Jane Chu
2021-08-17  1:45     ` ruansy.fnst
2021-08-20 16:06   ` Dan Williams
2021-08-20 20:19   ` Dan Williams
2021-07-30 10:01 ` [PATCH RESEND v6 3/9] mm: factor helpers for memory_failure_dev_pagemap Shiyang Ruan
2021-08-06  1:00   ` Jane Chu
2021-08-20 16:54     ` Dan Williams
2021-07-30 10:01 ` [PATCH RESEND v6 4/9] pmem,mm: Implement ->memory_failure in pmem driver Shiyang Ruan
2021-08-20 20:51   ` Dan Williams
2021-07-30 10:01 ` [PATCH RESEND v6 5/9] mm: Introduce mf_dax_kill_procs() for fsdax case Shiyang Ruan
2021-08-06  0:59   ` Jane Chu
2021-08-20 22:40   ` Dan Williams
2021-07-30 10:01 ` [PATCH RESEND v6 6/9] xfs: Implement ->notify_failure() for XFS Shiyang Ruan
2021-08-06  0:50   ` Jane Chu
2021-08-20 22:56     ` Dan Williams
2021-08-20 22:59   ` Dan Williams
2021-07-30 10:01 ` [PATCH RESEND v6 7/9] dm: Introduce ->rmap() to find bdev offset Shiyang Ruan
2021-08-20 23:46   ` Dan Williams
2021-07-30 10:01 ` [PATCH RESEND v6 8/9] md: Implement dax_holder_operations Shiyang Ruan
2021-08-06  0:48   ` Jane Chu
2021-08-17  1:59     ` ruansy.fnst
2021-07-30 10:01 ` [PATCH RESEND v6 9/9] fsdax: add exception for reflinked files Shiyang Ruan
2021-08-06  0:46   ` Jane Chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=78c22960-3f6d-8e5d-890a-72915236bedc@oracle.com \
    --to=jane.chu@oracle.com \
    --cc=agk@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=dm-devel@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=ruansy.fnst@fujitsu.com \
    --cc=snitzer@redhat.com \
    --subject='Re: [PATCH RESEND v6 1/9] pagemap: Introduce ->memory_failure()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).