From: "Huang, Kai" <kai.huang@intel.com>
To: "kirill.shutemov@linux.intel.com" <kirill.shutemov@linux.intel.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"Hansen, Dave" <dave.hansen@intel.com>,
"david@redhat.com" <david@redhat.com>,
"bagasdotme@gmail.com" <bagasdotme@gmail.com>,
"ak@linux.intel.com" <ak@linux.intel.com>,
"Wysocki, Rafael J" <rafael.j.wysocki@intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Chatre, Reinette" <reinette.chatre@intel.com>, "Christopherson,,
Sean" <seanjc@google.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"Yamahata, Isaku" <isaku.yamahata@intel.com>,
"Luck, Tony" <tony.luck@intel.com>,
"peterz@infradead.org" <peterz@infradead.org>,
"Shahar, Sagi" <sagis@google.com>,
"imammedo@redhat.com" <imammedo@redhat.com>,
"Gao, Chao" <chao.gao@intel.com>,
"Brown, Len" <len.brown@intel.com>,
"sathyanarayanan.kuppuswamy@linux.intel.com"
<sathyanarayanan.kuppuswamy@linux.intel.com>,
"Huang, Ying" <ying.huang@intel.com>,
"Williams, Dan J" <dan.j.williams@intel.com>
Subject: Re: [PATCH v11 19/20] x86/mce: Improve error log of kernel space TDX #MC due to erratum
Date: Mon, 12 Jun 2023 03:08:40 +0000 [thread overview]
Message-ID: <90aefcfd663c654197c5878e410f55cc4473eb79.camel@intel.com> (raw)
In-Reply-To: <20230609131754.dhii5ctfwtzx667o@box.shutemov.name>
On Fri, 2023-06-09 at 16:17 +0300, kirill.shutemov@linux.intel.com wrote:
> On Mon, Jun 05, 2023 at 02:27:32AM +1200, Kai Huang wrote:
> > The first few generations of TDX hardware have an erratum. Triggering
> > it in Linux requires some kind of kernel bug involving relatively exotic
> > memory writes to TDX private memory and will manifest via
> > spurious-looking machine checks when reading the affected memory.
> >
> > == Background ==
> >
> > Virtually all kernel memory accesses operations happen in full
> > cachelines. In practice, writing a "byte" of memory usually reads a 64
> > byte cacheline of memory, modifies it, then writes the whole line back.
> > Those operations do not trigger this problem.
> >
> > This problem is triggered by "partial" writes where a write transaction
> > of less than cacheline lands at the memory controller. The CPU does
> > these via non-temporal write instructions (like MOVNTI), or through
> > UC/WC memory mappings. The issue can also be triggered away from the
> > CPU by devices doing partial writes via DMA.
> >
> > == Problem ==
> >
> > A partial write to a TDX private memory cacheline will silently "poison"
> > the line. Subsequent reads will consume the poison and generate a
> > machine check. According to the TDX hardware spec, neither of these
> > things should have happened.
> >
> > To add insult to injury, the Linux machine code will present these as a
> > literal "Hardware error" when they were, in fact, a software-triggered
> > issue.
> >
> > == Solution ==
> >
> > In the end, this issue is hard to trigger. Rather than do something
> > rash (and incomplete) like unmap TDX private memory from the direct map,
> > improve the machine check handler.
> >
> > Currently, the #MC handler doesn't distinguish whether the memory is
> > TDX private memory or not but just dump, for instance, below message:
> >
> > [...] mce: [Hardware Error]: CPU 147: Machine Check Exception: f Bank 1: bd80000000100134
> > [...] mce: [Hardware Error]: RIP 10:<ffffffffadb69870> {__tlb_remove_page_size+0x10/0xa0}
> > ...
> > [...] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
> > [...] mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel
> > [...] Kernel panic - not syncing: Fatal local machine check
> >
> > Which says "Hardware Error" and "Data load in unrecoverable area of
> > kernel".
> >
> > Ideally, it's better for the log to say "software bug around TDX private
> > memory" instead of "Hardware Error". But in reality the real hardware
> > memory error can happen, and sadly such software-triggered #MC cannot be
> > distinguished from the real hardware error. Also, the error message is
> > used by userspace tool 'mcelog' to parse, so changing the output may
> > break userspace.
> >
> > So keep the "Hardware Error". The "Data load in unrecoverable area of
> > kernel" is also helpful, so keep it too.
> >
> > Instead of modifying above error log, improve the error log by printing
> > additional TDX related message to make the log like:
> >
> > ...
> > [...] mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel
> > [...] mce: [Hardware Error]: Machine Check: Memory error from TDX private memory. May be result of CPU erratum.
>
> The message mentions one part of issue -- CPU erratum -- but misses the
> other required part -- kernel bug that makes kernel access the memory it
> not suppose to.
>
How about below?
"Memory error from TDX private memory. May be result of CPU erratum caused by
kernel bug."
next prev parent reply other threads:[~2023-06-12 3:10 UTC|newest]
Thread overview: 174+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-04 14:27 [PATCH v11 00/20] TDX host kernel support Kai Huang
2023-06-04 14:27 ` [PATCH v11 01/20] x86/tdx: Define TDX supported page sizes as macros Kai Huang
2023-06-04 14:27 ` [PATCH v11 02/20] x86/virt/tdx: Detect TDX during kernel boot Kai Huang
2023-06-06 14:00 ` Sathyanarayanan Kuppuswamy
2023-06-06 22:58 ` Huang, Kai
2023-06-06 23:44 ` Isaku Yamahata
2023-06-19 12:12 ` David Hildenbrand
2023-06-19 23:58 ` Huang, Kai
2023-06-04 14:27 ` [PATCH v11 03/20] x86/virt/tdx: Make INTEL_TDX_HOST depend on X86_X2APIC Kai Huang
2023-06-08 0:08 ` kirill.shutemov
2023-06-04 14:27 ` [PATCH v11 04/20] x86/cpu: Detect TDX partial write machine check erratum Kai Huang
2023-06-06 12:38 ` kirill.shutemov
2023-06-06 22:58 ` Huang, Kai
2023-06-07 15:06 ` kirill.shutemov
2023-06-07 14:15 ` Dave Hansen
2023-06-07 22:43 ` Huang, Kai
2023-06-19 11:37 ` Huang, Kai
2023-06-20 15:44 ` Dave Hansen
2023-06-20 23:11 ` Huang, Kai
2023-06-19 12:21 ` David Hildenbrand
2023-06-20 10:31 ` Huang, Kai
2023-06-20 15:39 ` Dave Hansen
2023-06-20 16:03 ` David Hildenbrand
2023-06-20 16:21 ` Dave Hansen
2023-06-04 14:27 ` [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure Kai Huang
2023-06-06 23:55 ` Isaku Yamahata
2023-06-07 14:24 ` Dave Hansen
2023-06-07 18:53 ` Isaku Yamahata
2023-06-07 19:27 ` Dave Hansen
2023-06-07 19:47 ` Isaku Yamahata
2023-06-07 20:08 ` Sean Christopherson
2023-06-07 20:22 ` Dave Hansen
2023-06-08 0:51 ` Huang, Kai
2023-06-08 13:50 ` Dave Hansen
2023-06-07 22:56 ` Huang, Kai
2023-06-08 14:05 ` Dave Hansen
2023-06-19 12:52 ` David Hildenbrand
2023-06-20 10:37 ` Huang, Kai
2023-06-20 12:20 ` kirill.shutemov
2023-06-20 12:39 ` David Hildenbrand
2023-06-20 15:15 ` Dave Hansen
2023-06-04 14:27 ` [PATCH v11 06/20] x86/virt/tdx: Handle SEAMCALL running out of entropy error Kai Huang
2023-06-07 8:19 ` Isaku Yamahata
2023-06-07 15:08 ` Dave Hansen
2023-06-07 23:36 ` Huang, Kai
2023-06-08 0:29 ` Dave Hansen
2023-06-08 0:08 ` kirill.shutemov
2023-06-09 14:42 ` Nikolay Borisov
2023-06-12 11:04 ` Huang, Kai
2023-06-19 13:00 ` David Hildenbrand
2023-06-20 10:39 ` Huang, Kai
2023-06-20 11:14 ` David Hildenbrand
2023-06-04 14:27 ` [PATCH v11 07/20] x86/virt/tdx: Add skeleton to enable TDX on demand Kai Huang
2023-06-05 21:23 ` Isaku Yamahata
2023-06-05 23:04 ` Huang, Kai
2023-06-05 23:08 ` Dave Hansen
2023-06-05 23:24 ` Huang, Kai
2023-06-07 15:22 ` Dave Hansen
2023-06-08 2:10 ` Huang, Kai
2023-06-08 13:43 ` Dave Hansen
2023-06-12 11:21 ` Huang, Kai
2023-06-19 13:16 ` David Hildenbrand
2023-06-19 23:28 ` Huang, Kai
2023-06-04 14:27 ` [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory Kai Huang
2023-06-07 15:25 ` Dave Hansen
2023-06-08 0:27 ` kirill.shutemov
2023-06-08 2:40 ` Huang, Kai
2023-06-08 11:41 ` kirill.shutemov
2023-06-08 13:13 ` Dave Hansen
2023-06-12 2:00 ` Huang, Kai
2023-06-08 23:29 ` Isaku Yamahata
2023-06-08 23:54 ` kirill.shutemov
2023-06-09 1:33 ` Isaku Yamahata
2023-06-09 10:02 ` kirill.shutemov
2023-06-12 2:00 ` Huang, Kai
2023-06-19 13:29 ` David Hildenbrand
2023-06-19 23:51 ` Huang, Kai
2023-06-04 14:27 ` [PATCH v11 09/20] x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory Kai Huang
2023-06-07 15:48 ` Dave Hansen
2023-06-07 23:22 ` Huang, Kai
2023-06-08 22:40 ` kirill.shutemov
2023-06-04 14:27 ` [PATCH v11 10/20] x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions Kai Huang
2023-06-07 15:54 ` Dave Hansen
2023-06-07 15:57 ` Dave Hansen
2023-06-08 10:18 ` Huang, Kai
2023-06-08 22:52 ` kirill.shutemov
2023-06-12 2:21 ` Huang, Kai
2023-06-12 3:01 ` Dave Hansen
2023-06-04 14:27 ` [PATCH v11 11/20] x86/virt/tdx: Fill out " Kai Huang
2023-06-07 16:05 ` Dave Hansen
2023-06-08 10:48 ` Huang, Kai
2023-06-08 13:11 ` Dave Hansen
2023-06-12 2:33 ` Huang, Kai
2023-06-12 14:33 ` kirill.shutemov
2023-06-12 22:10 ` Huang, Kai
2023-06-13 10:18 ` kirill.shutemov
2023-06-13 23:19 ` Huang, Kai
2023-06-08 23:02 ` kirill.shutemov
2023-06-12 2:25 ` Huang, Kai
2023-06-09 4:01 ` Sathyanarayanan Kuppuswamy
2023-06-12 2:28 ` Huang, Kai
2023-06-14 12:31 ` Nikolay Borisov
2023-06-14 22:45 ` Huang, Kai
2023-06-04 14:27 ` [PATCH v11 12/20] x86/virt/tdx: Allocate and set up PAMTs for TDMRs Kai Huang
2023-06-08 23:24 ` kirill.shutemov
2023-06-08 23:43 ` Dave Hansen
2023-06-12 2:52 ` Huang, Kai
2023-06-25 15:38 ` Huang, Kai
2023-06-15 7:48 ` Nikolay Borisov
2023-06-04 14:27 ` [PATCH v11 13/20] x86/virt/tdx: Designate reserved areas for all TDMRs Kai Huang
2023-06-08 23:53 ` kirill.shutemov
2023-06-04 14:27 ` [PATCH v11 14/20] x86/virt/tdx: Configure TDX module with the TDMRs and global KeyID Kai Huang
2023-06-08 23:53 ` kirill.shutemov
2023-06-04 14:27 ` [PATCH v11 15/20] x86/virt/tdx: Configure global KeyID on all packages Kai Huang
2023-06-08 23:53 ` kirill.shutemov
2023-06-15 8:12 ` Nikolay Borisov
2023-06-15 22:24 ` Huang, Kai
2023-06-19 14:56 ` kirill.shutemov
2023-06-19 23:38 ` Huang, Kai
2023-06-04 14:27 ` [PATCH v11 16/20] x86/virt/tdx: Initialize all TDMRs Kai Huang
2023-06-09 10:03 ` kirill.shutemov
2023-06-04 14:27 ` [PATCH v11 17/20] x86/kexec: Flush cache of TDX private memory Kai Huang
2023-06-09 10:14 ` kirill.shutemov
2023-06-04 14:27 ` [PATCH v11 18/20] x86: Handle TDX erratum to reset TDX private memory during kexec() and reboot Kai Huang
2023-06-09 13:23 ` kirill.shutemov
2023-06-12 3:06 ` Huang, Kai
2023-06-12 7:58 ` kirill.shutemov
2023-06-12 10:27 ` Huang, Kai
2023-06-12 11:48 ` kirill.shutemov
2023-06-12 13:18 ` David Laight
2023-06-12 13:47 ` Dave Hansen
2023-06-13 0:51 ` Huang, Kai
2023-06-13 11:05 ` kirill.shutemov
2023-06-14 0:15 ` Huang, Kai
2023-06-13 14:25 ` Dave Hansen
2023-06-13 23:18 ` Huang, Kai
2023-06-14 0:24 ` Dave Hansen
2023-06-14 0:38 ` Huang, Kai
2023-06-14 0:42 ` Huang, Kai
2023-06-19 11:43 ` Huang, Kai
2023-06-19 14:31 ` Dave Hansen
2023-06-19 14:46 ` kirill.shutemov
2023-06-19 23:35 ` Huang, Kai
2023-06-19 23:41 ` Dave Hansen
2023-06-20 0:56 ` Huang, Kai
2023-06-20 1:06 ` Dave Hansen
2023-06-20 7:58 ` Peter Zijlstra
2023-06-25 15:30 ` Huang, Kai
2023-06-25 23:26 ` Huang, Kai
2023-06-20 7:48 ` Peter Zijlstra
2023-06-20 8:11 ` Peter Zijlstra
2023-06-20 10:42 ` Huang, Kai
2023-06-20 10:56 ` Peter Zijlstra
2023-06-14 9:33 ` Huang, Kai
2023-06-14 10:02 ` kirill.shutemov
2023-06-14 10:58 ` Huang, Kai
2023-06-14 11:08 ` kirill.shutemov
2023-06-14 11:17 ` Huang, Kai
2023-06-04 14:27 ` [PATCH v11 19/20] x86/mce: Improve error log of kernel space TDX #MC due to erratum Kai Huang
2023-06-05 2:13 ` Xiaoyao Li
2023-06-05 23:05 ` Huang, Kai
2023-06-09 13:17 ` kirill.shutemov
2023-06-12 3:08 ` Huang, Kai [this message]
2023-06-12 7:59 ` kirill.shutemov
2023-06-12 13:51 ` Dave Hansen
2023-06-12 23:31 ` Huang, Kai
2023-06-04 14:27 ` [PATCH v11 20/20] Documentation/x86: Add documentation for TDX host support Kai Huang
2023-06-08 23:56 ` Dave Hansen
2023-06-12 3:41 ` Huang, Kai
2023-06-16 9:02 ` Nikolay Borisov
2023-06-16 16:26 ` Dave Hansen
2023-06-06 0:36 ` [PATCH v11 00/20] TDX host kernel support Isaku Yamahata
2023-06-08 21:03 ` Dan Williams
2023-06-12 10:56 ` Huang, Kai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=90aefcfd663c654197c5878e410f55cc4473eb79.camel@intel.com \
--to=kai.huang@intel.com \
--cc=ak@linux.intel.com \
--cc=bagasdotme@gmail.com \
--cc=chao.gao@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=david@redhat.com \
--cc=imammedo@redhat.com \
--cc=isaku.yamahata@intel.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=len.brown@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rafael.j.wysocki@intel.com \
--cc=reinette.chatre@intel.com \
--cc=sagis@google.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).