All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alison Schofield <alison.schofield@intel.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>,
	nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org
Subject: Re: [ndctl PATCH v6 0/7] Support poison list retrieval
Date: Wed, 7 Feb 2024 14:54:04 -0800	[thread overview]
Message-ID: <ZcQKDCiNXYqdtQ/z@aschofie-mobl2> (raw)
In-Reply-To: <65a9ba5469bc5_37ad29426@dwillia2-xfh.jf.intel.com.notmuch>

On Thu, Jan 18, 2024 at 03:55:00PM -0800, Dan Williams wrote:
> Alison Schofield wrote:
> [..]
> > > >         "dpa":1073741824,
> > > >         "dpa_length":64,
> > > 
> > > The dpa_length is also the hpa_length, right? So maybe just call the
> > > field "length".
> > > 
> > 
> > No, the length only refers to the device address space. I don't think
> > the hpa is guaranteed to be contiguous, so only the starting hpa addr
> > is offered.
> > 
> > hmm..should we call it 'size' because that seems to imply less
> > contiguous-ness than length?
> 
> The only way the length could be discontiguous in HPA space is if the
> error length is greater than the interleave granularity. Given poison is
> tracked in cachelines and the smallest granularity is 4 cachelines it is
> unlikely to hit the mutiple HPA case.

Hi Dan,

Circling back to this issue, as I'm posting an udpated rev.

I'm not getting how *only* an error length greater that IG can lead to
discontigous HPA. If the poison starts on the last 64 bytes of an IG and
has a length greater than 64 bytes, we go beyond the endpoints mapping,
even if that length is less than IG.

In the layout below, if the device underlying endpoint2 reports
^poison^ as shown, it is discontinguous in HPA space.

HPA 0..........................................................N
ep1 ..........          ..........          ..........    
ep2           ..........          ..........          ..........
bad                   ^poison^ 
good                  ^po          ison^

'bad' is what happens today if length is applied to HPA
'good' is what is right

Am I missing something wrt cachelines you mention?

> 
> However, I think the kernel side should aim to preclude that from
> happening. Given that this is relying on the kernel's translation I
> would make it so that the kernel never leaves the impacted HPAs as
> ambiguous. For example, if the interleave_granularity of the region is
> 256 and the DPA length is 512, it would be helpful if the *kernel* split
> that into multiple trace events to communicate the multiple impacted
> HPAs rather than leave it as an exercise to userspace.
> 

That's a familiar plan that we rejected in the driver implementation,
As defined, a cxl_poison event reports a starting dpa, a dpa_length,
and the starting hpa if the address is mapped. That left userspace to do
the HPA translation work.

We can move that work to the driver independent of this ndctl work.

> 
> Might be useful to capture Erwin's analysis of how to use that field in
> the man page, if it's not there already.

The man page now has the definitions of the source field and a spec
reference.  I don't see the cxl list man page as the place to offer
media-error trouble-shooting tips. 

      reply	other threads:[~2024-02-07 22:54 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-18  0:27 [ndctl PATCH v6 0/7] Support poison list retrieval alison.schofield
2024-01-18  0:28 ` [PATCH v6 1/7] libcxl: add interfaces for GET_POISON_LIST mailbox commands alison.schofield
2024-01-19 18:32   ` Dave Jiang
2024-01-18  0:28 ` [PATCH v6 2/7] cxl: add an optional pid check to event parsing alison.schofield
2024-01-19 18:35   ` Dave Jiang
2024-01-18  0:28 ` [PATCH v6 3/7] cxl/event_trace: add a private context for private parsers alison.schofield
2024-01-19 21:08   ` Dave Jiang
2024-01-18  0:28 ` [PATCH v6 4/7] cxl/event_trace: add helpers get_field_[string|data]() alison.schofield
2024-01-19 21:18   ` Dave Jiang
2024-01-18  0:28 ` [PATCH v6 5/7] cxl/list: collect and parse media_error records alison.schofield
2024-01-18  0:28 ` [PATCH v6 6/7] cxl/list: add --media-errors option to cxl list alison.schofield
2024-01-18  0:28 ` [PATCH v6 7/7] cxl/test: add cxl-poison.sh unit test alison.schofield
2024-01-18 21:56 ` [ndctl PATCH v6 0/7] Support poison list retrieval Dan Williams
2024-01-18 23:34   ` Alison Schofield
2024-01-18 23:55     ` Dan Williams
2024-02-07 22:54       ` Alison Schofield [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZcQKDCiNXYqdtQ/z@aschofie-mobl2 \
    --to=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.