From: Alistair Popple <apopple@nvidia.com>
To: Vikram Sethi <vsethi@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
David Hildenbrand <david@redhat.com>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
"Natu, Mahesh" <mahesh.natu@intel.com>,
"Rudoff, Andy" <andy.rudoff@intel.com>,
Jeff Smith <JSMITH@nvidia.com>,
Mark Hairgrove <mhairgrove@nvidia.com>,
"jglisse@redhat.com" <jglisse@redhat.com>,
Linux MM <linux-mm@kvack.org>,
Linux ACPI <linux-acpi@vger.kernel.org>,
"Anshuman Khandual" <anshuman.khandual@arm.com>,
"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
Samer El-Haj-Mahmoud <Samer.El-Haj-Mahmoud@arm.com>,
Shanker Donthineni <sdonthineni@nvidia.com>,
Joao Martins <joao.m.martins@oracle.com>
Subject: Re: Onlining CXL Type2 device coherent memory
Date: Tue, 3 Nov 2020 14:56:20 +1100 [thread overview]
Message-ID: <6645807.zUfqAqQW0h@nvdebian> (raw)
In-Reply-To: <BL0PR12MB2532F7D105A1DC2E41B13DF2BD100@BL0PR12MB2532.namprd12.prod.outlook.com>
On Tuesday, 3 November 2020 6:25:23 AM AEDT Vikram Sethi wrote:
> > > > be sufficient, but depending if driver had done the add_memory in
probe,
> > > > it perhaps would be onerous to have to remove_memory as well before
reset,
> > > > and then add it back after reset. I realize you’re saying such a
procedure
> > > > would be abusing hotplug framework, and we could perhaps require that
> > memory
> > > > be removed prior to reset, but not clear to me that it *must* be
removed for
> > > > correctness.
I'm not sure exactly what you meant by "unavailable", but on some platforms
(eg. PowerPC) it must be removed for correctness if hardware access to the
memory is going away for any period of time. remove_memory() is what makes it
safe to physically remove the memory as it triggers things like cache
flushing. Without this PPC would see memory failure machine checks if it ever
tried to writeback any dirty cache lines to the now inaccessible memory.
> > > > Another usecase of offlining without removing HDM could be around
> > > > Virtualization/passing entire device with its memory to a VM. If
device was
> > > > being used in the host kernel, and is then unbound, and bound to vfio-
pci
> > > > (vfio-cxl?), would we expect vfio-pci to add_memory_driver_managed?
> > >
> > > At least for passing through memory to VMs (via KVM), you don't actually
> > > need struct pages / memory exposed to the buddy via
> > > add_memory_driver_managed(). Actually, doing that sounds like the wrong
> > > approach.
> > >
> > > E.g., you would "allocate" the memory via devdax/dax_hmat and directly
> > > map the resulting device into guest address space. At least that's what
> > > some people are doing with
>
> How does memory_failure forwarding to guest work in that case?
> IIUC it doesn't without a struct page in the host.
> For normal memory, when VM consumes poison, host kernel signals
> Userspace with SIGBUS and si-code that says Action Required, which
> QEMU injects to the guest.
> IBM had done something like you suggest with coherent GPU memory and IIUC
> memory_failure forwarding to guest VM does not work there.
>
> kernel https://lkml.org/lkml/2018/12/20/103
> QEMU: https://patchwork.kernel.org/patch/10831455/
The above patches simply allow the coherent GPU physical memory ranges to get
mapped into a guest VM in a similar way to an MMIO range (ie. without a struct
page in the host). So you are correct in that they do not deal with forwarding
failures to a guest VM.
Any GPU memory failure on PPC would currently get sent to the host in the same
way as a normal system memory failure (ie. machine check). So in theory
notification to a guest would work the same as a normal system memory failure.
I say in theory because when I last looked at this some time back a guest
kernel on PPC is not notified of memory errors.
- Alistair
> I would think we *do want* memory errors to be sent to a VM.
>
> >
> > ...and Joao is working to see if the host kernel can skip allocating
> > 'struct page' or do it on demand if the guest ever requests host
> > kernel services on its memory. Typically it does not so host 'struct
> > page' space for devdax memory ranges goes wasted.
> Is memory_failure forwarded to and handled by guest?
>
next prev parent reply other threads:[~2020-11-03 3:56 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-28 23:05 Onlining CXL Type2 device coherent memory Vikram Sethi
2020-10-29 14:50 ` Ben Widawsky
2020-10-30 20:37 ` Dan Williams
2020-10-30 20:59 ` Matthew Wilcox
2020-10-30 23:38 ` Dan Williams
2020-10-30 22:39 ` Vikram Sethi
2020-11-02 17:47 ` Dan Williams
2020-10-31 10:21 ` David Hildenbrand
2020-10-31 16:51 ` Dan Williams
2020-11-02 9:51 ` David Hildenbrand
2020-11-02 16:17 ` Vikram Sethi
2020-11-02 17:53 ` David Hildenbrand
2020-11-02 18:03 ` Dan Williams
2020-11-02 19:25 ` Vikram Sethi
2020-11-02 19:45 ` Dan Williams
2020-11-03 3:56 ` Alistair Popple [this message]
2020-11-02 18:34 ` Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6645807.zUfqAqQW0h@nvdebian \
--to=apopple@nvidia.com \
--cc=JSMITH@nvidia.com \
--cc=Samer.El-Haj-Mahmoud@arm.com \
--cc=alex.williamson@redhat.com \
--cc=andy.rudoff@intel.com \
--cc=anshuman.khandual@arm.com \
--cc=dan.j.williams@intel.com \
--cc=david@redhat.com \
--cc=jglisse@redhat.com \
--cc=joao.m.martins@oracle.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mahesh.natu@intel.com \
--cc=mhairgrove@nvidia.com \
--cc=sdonthineni@nvidia.com \
--cc=vsethi@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).