All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Alison Schofield <alison.schofield@intel.com>,
	Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>, <nvdimm@lists.linux.dev>,
	<linux-cxl@vger.kernel.org>
Subject: Re: [ndctl PATCH v6 0/7] Support poison list retrieval
Date: Thu, 18 Jan 2024 15:55:00 -0800	[thread overview]
Message-ID: <65a9ba5469bc5_37ad29426@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <Zam1iPjxXA9iiUOl@aschofie-mobl2>

Alison Schofield wrote:
[..]
> > >         "dpa":1073741824,
> > >         "dpa_length":64,
> > 
> > The dpa_length is also the hpa_length, right? So maybe just call the
> > field "length".
> > 
> 
> No, the length only refers to the device address space. I don't think
> the hpa is guaranteed to be contiguous, so only the starting hpa addr
> is offered.
> 
> hmm..should we call it 'size' because that seems to imply less
> contiguous-ness than length?

The only way the length could be discontiguous in HPA space is if the
error length is greater than the interleave granularity. Given poison is
tracked in cachelines and the smallest granularity is 4 cachelines it is
unlikely to hit the mutiple HPA case.

However, I think the kernel side should aim to preclude that from
happening. Given that this is relying on the kernel's translation I
would make it so that the kernel never leaves the impacted HPAs as
ambiguous. For example, if the interleave_granularity of the region is
256 and the DPA length is 512, it would be helpful if the *kernel* split
that into multiple trace events to communicate the multiple impacted
HPAs rather than leave it as an exercise to userspace.

> Which should it be 'dpa_length' or 'size' (or 'length')

I recall we used "length" for the number of badblocks in "ndctl list
--media-errors", might as well keep in consistent.

> > >         "hpa":1035355557888,
> > >         "source":"Injected"
> > >       },
> > >       {
> > >         "region":"region5",
> > >         "dpa":1073745920,
> > >         "dpa_length":64,
> > >         "hpa":1035355566080,
> > >         "source":"Injected"
> > 
> > This "source" field feels like debug data. In production nobody is going
> > to be doing poison injection, and if the administrator injected it then
> > its implied they know that status. Otherwise a media-error is a
> > media-error regardless of the source.
> 
> From CXL Spec Tabel 8-140 Sources can be: 
> 
> Unknown.
> External. Poison received from a source external to the device.
> Internal. The device generated poison from an internal source.
> Injected. The error was injected into the device for testing purposes.
> Vendor Specific.
> 
> On the v5 review, Erwin commented:
> >> This is how I would use source.
> >> "external" = don't expect to see a cxl media error, look elsewhere like a UCNA or a mem_data error in the RP's CXL.CM RAS.
> >> "internal" = expect to see a media error for more information.
> >> "injected" = somebody injected the error, no service action needed except to maybe tighten up your security.
> >> "vendor" = see vendor
> 
> If it's not presented here, user can look it up in the cxl_poison trace
> event directly.
> 
> I think we should keep this as is.

Ah, I had forgotten Erwin's comment, yeah, showing "external" vs
"internal" looks useful, "injected" gets to come along for the ride, and
if any vendor actually ships that "vendor" status that's a good
indication to the end user to go shopping for a device that plays better
with open standards.

Might be useful to capture Erwin's analysis of how to use that field in
the man page, if it's not there already.

  reply	other threads:[~2024-01-18 23:55 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-18  0:27 [ndctl PATCH v6 0/7] Support poison list retrieval alison.schofield
2024-01-18  0:28 ` [PATCH v6 1/7] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield
2024-01-19 18:32   ` Dave Jiang
2024-01-18  0:28 ` [PATCH v6 2/7] cxl: add an optional pid check to event parsing alison.schofield
2024-01-19 18:35   ` Dave Jiang
2024-01-18  0:28 ` [PATCH v6 3/7] cxl/event_trace: add a private context for private parsers alison.schofield
2024-01-19 21:08   ` Dave Jiang
2024-01-18  0:28 ` [PATCH v6 4/7] cxl/event_trace: add helpers get_field_[string|data]() alison.schofield
2024-01-19 21:18   ` Dave Jiang
2024-01-18  0:28 ` [PATCH v6 5/7] cxl/list: collect and parse media_error records alison.schofield
2024-01-18  0:28 ` [PATCH v6 6/7] cxl/list: add --media-errors option to cxl list alison.schofield
2024-01-18  0:28 ` [PATCH v6 7/7] cxl/test: add cxl-poison.sh unit test alison.schofield
2024-01-18 21:56 ` [ndctl PATCH v6 0/7] Support poison list retrieval Dan Williams
2024-01-18 23:34   ` Alison Schofield
2024-01-18 23:55     ` Dan Williams [this message]
2024-02-07 22:54       ` Alison Schofield

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=65a9ba5469bc5_37ad29426@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=alison.schofield@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.