linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Luck, Tony" <tony.luck@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Jue Wang <juew@google.com>,
	"Williams, Dan J" <dan.j.williams@intel.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] x86/mm: Don't try to change poison pages to uncacheable in a guest
Date: Mon, 18 May 2020 11:26:29 -0700	[thread overview]
Message-ID: <20200518182629.GA2957@agluck-desk2.amr.corp.intel.com> (raw)
In-Reply-To: <20200518165500.GD25034@zn.tnic>

On Mon, May 18, 2020 at 06:55:00PM +0200, Borislav Petkov wrote:
> On Mon, May 18, 2020 at 08:36:25AM -0700, Luck, Tony wrote:
> > The VMM gets the page fault (because the unmapping of the guest
> > physical address is at the VMM EPT level).  The VMM can't map a new
> > page into that guest physical address because it has no way to
> > replace the contents of the old page.  The VMM could pass the #PF
> > to the guest, but that would just confuse the guest (its page tables
> > all say that the page is still valid). In this particular case the
> > page is part of the 1:1 kernel map. So the kernel will OOPS (I think).
> 
> ...
> 
> > PLease explain how a guest (that doesn't even know that it is a guest)
> > is going to figure out that the EPT tables (that it has no way to access)
> > have marked this page invalid in guest physical address space.
> 
> So somewhere BUS_MCEERR_AR was mentioned. So I'm assuming the error
> severity was "action required". What does happen in the kernel, on
> baremetal, with an AR error in kernel space, i.e., kernel memory?

Outside of the now infamous memcpy_mcsafe() any kernel consumption
of poison results in a panic as the mce_severity() code will trip
this case:

        MCESEV(
                PANIC, "Data load in unrecoverable area of kernel",
                SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_DATA),
                KERNEL
                ),

> If we can't fixup the exception, we die.
> 
> So why should the guest behave any differently?

We don't see this particular problem on baremetal because a CLFLUSH
instruction isn't *consuming* data. It's just evicting things from
the cache to memory. So we reference the virtual address, which works
fine on baremetal because the kernel 1:1 map is still active. But in
the guest case the guest physical address has gone away. So we trap
to the VMM.

> Now, if you want for the guest to be more "robust" and handle that
> thing, fine. But then you'd need an explicit way to tell the guest
> kernel: "you've just had an MCE and I unmapped the page" so that the
> guest kernel can figure out what do to. Even if it means, to panic.
> 
> I.e., signal in an explicit way that EPT violation Jue is talking about
> in the other mail.
> 
> You can inject a #PF or better yet the *first* MCE which is being
> injected should say with a bit somehwere "I unmapped the address in
> m->addr". So that the guest kernel can handle that properly and know
> what *exactly* it is getting an MCE for.

That question only makes any sense if you know you are running as a
guest and that someone else has unmapped the address. It's a meaningless
question to ask if you are running bare metal. So we'd still have a check
for FEATURE_HYPERVISOR

> What I don't like is the "am I running as a guest" check. Because
> someone else would come later and say, err, I'm not virtualizing this
> portion of MCA either, lemme add another "am I guest" check.
> 
> Sure, it is a lot easier but when stuff like that starts spreading
> around in the MCE code, then we can just as well disable MCE when
> virtualized altogether. It would be a lot easier for everybody.

Maybe it isn't pretty. But I don't see another practical solution.

The VMM is doing exactly the right thing here. It should not trust
that the guest will behave and not touch the poison location again.
If/when the guest does touch the poison, the right action is
for the VMM to fake a new machine check to the guest.

Theoretlcally the VMM could decode the instruction that the guest
was trying to use on the poison page and decide "oh, this is that
weird case in Linux where it's just trying to CLFLUSH the page. I'll
just step the return IP past the CLFLUSH and let the guest continue".

But that doesn't sound at all reasonable to me (especially as the
next step is to realize that Linux is going to repeat that for every
cache line in the page, so you also want to VMM to fudge the register
contents to skip to the end of the loop and avoid another 63 VMEXITs).

N.B. Linux wants to switch the page to uncacheable so that in the
persistant memory case the filesytem code can continue to access
the other "blocks" in the page, rather than lose all of them. That's
futile in the case where the VMM took the whole 4K away. Maybe Dan
needs to think about the guest case too.

-Tony

  reply	other threads:[~2020-05-18 18:26 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-05 18:46 [PATCH] x86/mm: Don't try to change poison pages to uncacheable in a guest Tony Luck
2020-05-16  6:54 ` Borislav Petkov
2020-05-16 14:47   ` Luck, Tony
2020-05-16 15:02     ` Borislav Petkov
2020-05-17  1:52       ` Luck, Tony
     [not found]         ` <CAPcxDJ50pbuTbittyvPwKq1uUT8q8jJ+dHH8rCug8a1DDZXVYw@mail.gmail.com>
     [not found]           ` <CAPcxDJ6f3pBpwiR9nvXN_g_HBa1RAMG+aOmgfXLFT6aZ9HQn3w@mail.gmail.com>
2020-05-18 13:48             ` Borislav Petkov
2020-05-18 15:36               ` Luck, Tony
2020-05-18 16:55                 ` Borislav Petkov
2020-05-18 18:26                   ` Luck, Tony [this message]
2020-05-18 19:20                     ` Dan Williams
2020-05-19  5:22                     ` Sean Christopherson
2020-05-19  8:50                     ` Borislav Petkov
2020-05-20 16:35                       ` [PATCH v2] x86/mm: Change so poison pages are either unmapped or marked uncacheable Luck, Tony
2020-05-25 11:00                         ` [tip: ras/core] x86/{mce,mm}: " tip-bot2 for Tony Luck
2020-05-25 20:40                           ` Borislav Petkov
2020-05-26 17:37                             ` Luck, Tony
     [not found]                               ` <CAPcxDJ5arJojbY4pzOvYh=waSPd3X_JJb1_PSuzd+jQ0qbvFsA@mail.gmail.com>
     [not found]                                 ` <CAPcxDJ54EgX-SaDV=Lm+a2-43O68LhomyYfYdCDz38HGJCkh7g@mail.gmail.com>
2020-05-26 19:46                                   ` Borislav Petkov
2020-05-26 19:56                         ` [tip: ras/core] x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned tip-bot2 for Tony Luck
2020-05-19  5:04                   ` [PATCH] x86/mm: Don't try to change poison pages to uncacheable in a guest Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200518182629.GA2957@agluck-desk2.amr.corp.intel.com \
    --to=tony.luck@intel.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=juew@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).