RE: Questions about CXL device (type 3 memory) hotplug

From: Dan Williams <dan.j.williams@intel.com>
To: Vikram Sethi <vsethi@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	"Yasunori Gotou (Fujitsu)" <y-goto@fujitsu.com>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"catalin.marinas@arm.com" <Catalin.Marinas@arm.com>,
	James Morse <james.morse@arm.com>
Cc: "Natu, Mahesh" <mahesh.natu@intel.com>
Subject: RE: Questions about CXL device (type 3 memory) hotplug
Date: Wed, 24 May 2023 14:20:23 -0700	[thread overview]
Message-ID: <646e7f96f33e2_33fb3294c1@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <BN8PR12MB3330831F2E666E9BB1319E66BD419@BN8PR12MB3330.namprd12.prod.outlook.com>

Vikram Sethi wrote:
[..]
> > I don't understand this failure mode. Accelerator is added, driver sets up an
> > HDM decode range and triggers CPU cache invalidation before mapping the
> > memory into page tables. Wouldn't the device, upon receiving an invalidation
> > request, just snoop its caches and say "nothing for me to do"?
> 
> Device's snoop filter is in a clean reset/power on state. It is not
> tracking anything checked out by the host CPU/peer.  If it starts
> receiving writebacks or even CleanEvicts for its memory, 

CleanEvict is a device-to-host request. We are talking about
host-to-device requests which is only SnpData, SnpInv, and SnpCur,
right?

> looks like an unexpected coherency message and i Know of at least one
> implementation that triggers an error interrupt in response. I don't
> know of a statement In the specification that this is expected and
> implementations should ignore. If there is such a statement, could you
> please point me to it? 

All the specification says (CXL 3.0 3.2.4.4 Host to Device Requests) is
what to do *if* the device is holding that cacheline.

If a device fails when it gets one of those requests when it does not
hold a line then how can this work in the nominal case of the device not
owning any random cacheline?

> Remove memory needs a cache flush IMO, in a way that prevents
> speculative fetches.  This can be done in kernel with uncacheable
> mappings alone, if possible in the arch callback, or via FW call. 

That assumes that the kernel owns all mappings. I worry about mappings
that the kernel cannot see like x86 SMM. That's why it's currently an
invalidate before next usage, but I am not opposed to also flushing on
remove if the current solution is causing device-failures in practice.

Can you confirm that the current kernel arrangement is causing failures
in practice, or is this a theoretical concern? ...and if it is happening
in practice do you have the example patch that fixes it?