All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Vikram Sethi <vsethi@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	"Yasunori Gotou (Fujitsu)" <y-goto@fujitsu.com>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"catalin.marinas@arm.com" <Catalin.Marinas@arm.com>,
	James Morse <james.morse@arm.com>
Cc: "Natu, Mahesh" <mahesh.natu@intel.com>
Subject: RE: Questions about CXL device (type 3 memory) hotplug
Date: Wed, 24 May 2023 14:20:23 -0700	[thread overview]
Message-ID: <646e7f96f33e2_33fb3294c1@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <BN8PR12MB3330831F2E666E9BB1319E66BD419@BN8PR12MB3330.namprd12.prod.outlook.com>

Vikram Sethi wrote:
[..]
> > I don't understand this failure mode. Accelerator is added, driver sets up an
> > HDM decode range and triggers CPU cache invalidation before mapping the
> > memory into page tables. Wouldn't the device, upon receiving an invalidation
> > request, just snoop its caches and say "nothing for me to do"?
> 
> Device's snoop filter is in a clean reset/power on state. It is not
> tracking anything checked out by the host CPU/peer.  If it starts
> receiving writebacks or even CleanEvicts for its memory, 

CleanEvict is a device-to-host request. We are talking about
host-to-device requests which is only SnpData, SnpInv, and SnpCur,
right?

> looks like an unexpected coherency message and i Know of at least one
> implementation that triggers an error interrupt in response. I don't
> know of a statement In the specification that this is expected and
> implementations should ignore. If there is such a statement, could you
> please point me to it? 

All the specification says (CXL 3.0 3.2.4.4 Host to Device Requests) is
what to do *if* the device is holding that cacheline.

If a device fails when it gets one of those requests when it does not
hold a line then how can this work in the nominal case of the device not
owning any random cacheline?

> Remove memory needs a cache flush IMO, in a way that prevents
> speculative fetches.  This can be done in kernel with uncacheable
> mappings alone, if possible in the arch callback, or via FW call. 

That assumes that the kernel owns all mappings. I worry about mappings
that the kernel cannot see like x86 SMM. That's why it's currently an
invalidate before next usage, but I am not opposed to also flushing on
remove if the current solution is causing device-failures in practice.

Can you confirm that the current kernel arrangement is causing failures
in practice, or is this a theoretical concern? ...and if it is happening
in practice do you have the example patch that fixes it?

  reply	other threads:[~2023-05-24 21:20 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-22  8:06 Questions about CXL device (type 3 memory) hotplug Yasunori Gotou (Fujitsu)
2023-05-23  0:11 ` Dan Williams
2023-05-23  8:31   ` Yasunori Gotou (Fujitsu)
2023-05-23 17:36     ` Dan Williams
2023-05-24 11:12       ` Yasunori Gotou (Fujitsu)
2023-05-24 20:51         ` Dan Williams
2023-05-25 10:32           ` Yasunori Gotou (Fujitsu)
2023-05-26  8:05         ` Yasunori Gotou (Fujitsu)
2023-05-26 14:48           ` Dan Williams
2023-05-29  8:07             ` Yasunori Gotou (Fujitsu)
2023-06-06 17:58               ` Dan Williams
2023-06-08  7:39                 ` Yasunori Gotou (Fujitsu)
2023-06-08 18:37                   ` Dan Williams
2023-06-09  1:02                     ` Yasunori Gotou (Fujitsu)
2023-05-23 13:34   ` Vikram Sethi
2023-05-23 18:40     ` Dan Williams
2023-05-24  0:02       ` Vikram Sethi
2023-05-24  4:03         ` Dan Williams
2023-05-24 14:47           ` Vikram Sethi
2023-05-24 21:20             ` Dan Williams [this message]
2023-05-31  4:25               ` Vikram Sethi
2023-06-06 20:54                 ` Dan Williams
2023-06-07  1:06                   ` Vikram Sethi
2023-06-07 15:12                     ` Jonathan Cameron
2023-06-07 18:44                       ` Vikram Sethi
2023-06-08 15:19                         ` Jonathan Cameron
2023-06-08 18:41                           ` Dan Williams
2024-03-27  7:10   ` Yuquan Wang
2024-03-27  7:18   ` Yuquan Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=646e7f96f33e2_33fb3294c1@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=Catalin.Marinas@arm.com \
    --cc=james.morse@arm.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=mahesh.natu@intel.com \
    --cc=vsethi@nvidia.com \
    --cc=y-goto@fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.