linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jane Chu <jane.chu@oracle.com>,
	Dan Williams <dan.j.williams@intel.com>,
	"hch@infradead.org" <hch@infradead.org>,
	"vishal.l.verma@intel.com" <vishal.l.verma@intel.com>,
	"dave.jiang@intel.com" <dave.jiang@intel.com>,
	"ira.weiny@intel.com" <ira.weiny@intel.com>,
	"nvdimm@lists.linux.dev" <nvdimm@lists.linux.dev>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] acpi/nfit: badrange report spill over to clean range
Date: Wed, 13 Jul 2022 17:24:10 -0700	[thread overview]
Message-ID: <62cf622a32e1_16b52e294ea@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <09df842d-d8e4-0594-56b0-b4bb9ea37b67@oracle.com>

Jane Chu wrote:
> On 7/12/2022 5:48 PM, Dan Williams wrote:
> > Jane Chu wrote:
> >> Commit 7917f9cdb503 ("acpi/nfit: rely on mce->misc to determine poison
> >> granularity") changed nfit_handle_mce() callback to report badrange for
> >> each poison at an alignment indicated by 1ULL << MCI_MISC_ADDR_LSB(mce->misc)
> >> instead of the hardcoded L1_CACHE_BYTES. However recently on a server
> >> populated with Intel DCPMEM v2 dimms, it appears that
> >> 1UL << MCI_MISC_ADDR_LSB(mce->misc) turns out is 4KiB, or 8 512-byte blocks.
> >> Consequently, injecting 2 back-to-back poisons via ndctl, and it reports
> >> 8 poisons.
> >>
> >> [29076.590281] {3}[Hardware Error]:   physical_address: 0x00000040a0602400
> >> [..]
> >> [29076.619447] Memory failure: 0x40a0602: recovery action for dax page: Recovered
> >> [29076.627519] mce: [Hardware Error]: Machine check events logged
> >> [29076.634033] nfit ACPI0012:00: addr in SPA 1 (0x4080000000, 0x1f80000000)
> >> [29076.648805] nd_bus ndbus0: XXX nvdimm_bus_add_badrange: (0x40a0602000, 0x1000)
> >> [..]
> >> [29078.634817] {4}[Hardware Error]:   physical_address: 0x00000040a0602600
> >> [..]
> >> [29079.595327] nfit ACPI0012:00: addr in SPA 1 (0x4080000000, 0x1f80000000)
> >> [29079.610106] nd_bus ndbus0: XXX nvdimm_bus_add_badrange: (0x40a0602000, 0x1000)
> >> [..]
> >> {
> >>    "dev":"namespace0.0",
> >>    "mode":"fsdax",
> >>    "map":"dev",
> >>    "size":33820770304,
> >>    "uuid":"a1b0f07f-747f-40a8-bcd4-de1560a1ef75",
> >>    "sector_size":512,
> >>    "align":2097152,
> >>    "blockdev":"pmem0",
> >>    "badblock_count":8,
> >>    "badblocks":[
> >>      {
> >>        "offset":8208,
> >>        "length":8,
> >>        "dimms":[
> >>          "nmem0"
> >>        ]
> >>      }
> >>    ]
> >> }
> >>
> >> So, 1UL << MCI_MISC_ADDR_LSB(mce->misc) is an unreliable indicator for poison
> >> radius and shouldn't be used.  More over, as each injected poison is being
> >> reported independently, any alignment under 512-byte appear works:
> >> L1_CACHE_BYTES (though inaccurate), or 256-bytes (as ars->length reports),
> >> or 512-byte.
> >>
> >> To get around this issue, 512-bytes is chosen as the alignment because
> >>    a. it happens to be the badblock granularity,
> >>    b. ndctl inject-error cannot inject more than one poison to a 512-byte block,
> >>    c. architecture agnostic
> > 
> > I am failing to see the kernel bug? Yes, you injected less than 8
> > "badblocks" of poison and the hardware reported 8 blocks of poison, but
> > that's not the kernel's fault, that's the hardware. What happens when
> > hardware really does detect 8 blocks of consective poison and this
> > implementation decides to only record 1 at a time?
> 
> In that case, there will be 8 reports of the poisons by APEI GHES,

Why would there be 8 reports for just one poison consumption event?

> ARC scan will also report 8 poisons, each will get to be added to the 
> bad range via nvdimm_bus_add_badrange(), so none of them will be missed.

Right, that's what I'm saying about the proposed change, trim the
reported poison by what is return from a "short" ARS. Recall that
short-ARS just reads from a staging buffer that the BIOS knows about, it
need not go all the way to hardware.

> In the above 2 poison example, the poison in 0x00000040a0602400 and in 
> 0x00000040a0602600 were separately reported.

Separately reported, each with a 4K alignment?

> > It seems the fix you want is for the hardware to report the precise
> > error bounds and that 1UL << MCI_MISC_ADDR_LSB(mce->misc) does not have
> > that precision in this case.
> 
> That field describes a 4K range even for a single poison, it confuses 
> people unnecessarily.

I agree with you on the problem statement, it's the fix where I have
questions.

> > However, the ARS engine likely can return the precise error ranges so I
> > think the fix is to just use the address range indicated by 1UL <<
> > MCI_MISC_ADDR_LSB(mce->misc) to filter the results from a short ARS
> > scrub request to ask the device for the precise error list.
> 
> You mean for nfit_handle_mce() callback to issue a short ARS per each 
> poison report over a 4K range 

Over a L1_CACHE_BYTES range...

> in order to decide the precise range as a workaround of the hardware
> issue?  if there are 8 poisoned detected, there will be 8 short ARS,
> sure we want to do that?

Seems ok to me, short ARS is meant to be cheap. I would hope there are
no latency concerns in this path.

> also, for now, is it possible to log more than 1 poison per 512byte
> block?

For the badrange tracking, no. So this would just be a check to say
"Yes, CPU I see you think the whole 4K is gone, but lets double check
with more precise information for what gets placed in the badrange
tracking".

  reply	other threads:[~2022-07-14  0:24 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-11 23:26 [PATCH] acpi/nfit: badrange report spill over to clean range Jane Chu
2022-07-13  0:48 ` Dan Williams
2022-07-13 23:52   ` Jane Chu
2022-07-14  0:24     ` Dan Williams [this message]
2022-07-14 23:22       ` Jane Chu
2022-07-15  0:58         ` Dan Williams
2022-07-15 17:38           ` Jane Chu
2022-07-15  1:19         ` Dan Williams
2022-07-15 17:26           ` Jane Chu
2022-07-15 19:17             ` Dan Williams
2022-07-15 22:46               ` Jane Chu
2022-08-29  8:11                 ` [tip: ras/core] x86/mce: Retrieve poison range from hardware tip-bot2 for Jane Chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=62cf622a32e1_16b52e294ea@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=hch@infradead.org \
    --cc=ira.weiny@intel.com \
    --cc=jane.chu@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).