kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Huang, Kai" <kai.huang@intel.com>
To: "kirill.shutemov@linux.intel.com" <kirill.shutemov@linux.intel.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"Hansen, Dave" <dave.hansen@intel.com>,
	"david@redhat.com" <david@redhat.com>,
	"bagasdotme@gmail.com" <bagasdotme@gmail.com>,
	"ak@linux.intel.com" <ak@linux.intel.com>,
	"Wysocki, Rafael J" <rafael.j.wysocki@intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Chatre, Reinette" <reinette.chatre@intel.com>, "Christopherson,,
	Sean" <seanjc@google.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"Yamahata, Isaku" <isaku.yamahata@intel.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"Shahar, Sagi" <sagis@google.com>,
	"imammedo@redhat.com" <imammedo@redhat.com>,
	"Gao, Chao" <chao.gao@intel.com>,
	"Brown, Len" <len.brown@intel.com>,
	"sathyanarayanan.kuppuswamy@linux.intel.com" 
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	"Williams, Dan J" <dan.j.williams@intel.com>
Subject: Re: [PATCH v11 19/20] x86/mce: Improve error log of kernel space TDX #MC due to erratum
Date: Mon, 12 Jun 2023 03:08:40 +0000	[thread overview]
Message-ID: <90aefcfd663c654197c5878e410f55cc4473eb79.camel@intel.com> (raw)
In-Reply-To: <20230609131754.dhii5ctfwtzx667o@box.shutemov.name>

On Fri, 2023-06-09 at 16:17 +0300, kirill.shutemov@linux.intel.com wrote:
> On Mon, Jun 05, 2023 at 02:27:32AM +1200, Kai Huang wrote:
> > The first few generations of TDX hardware have an erratum.  Triggering
> > it in Linux requires some kind of kernel bug involving relatively exotic
> > memory writes to TDX private memory and will manifest via
> > spurious-looking machine checks when reading the affected memory.
> > 
> > == Background ==
> > 
> > Virtually all kernel memory accesses operations happen in full
> > cachelines.  In practice, writing a "byte" of memory usually reads a 64
> > byte cacheline of memory, modifies it, then writes the whole line back.
> > Those operations do not trigger this problem.
> > 
> > This problem is triggered by "partial" writes where a write transaction
> > of less than cacheline lands at the memory controller.  The CPU does
> > these via non-temporal write instructions (like MOVNTI), or through
> > UC/WC memory mappings.  The issue can also be triggered away from the
> > CPU by devices doing partial writes via DMA.
> > 
> > == Problem ==
> > 
> > A partial write to a TDX private memory cacheline will silently "poison"
> > the line.  Subsequent reads will consume the poison and generate a
> > machine check.  According to the TDX hardware spec, neither of these
> > things should have happened.
> > 
> > To add insult to injury, the Linux machine code will present these as a
> > literal "Hardware error" when they were, in fact, a software-triggered
> > issue.
> > 
> > == Solution ==
> > 
> > In the end, this issue is hard to trigger.  Rather than do something
> > rash (and incomplete) like unmap TDX private memory from the direct map,
> > improve the machine check handler.
> > 
> > Currently, the #MC handler doesn't distinguish whether the memory is
> > TDX private memory or not but just dump, for instance, below message:
> > 
> >  [...] mce: [Hardware Error]: CPU 147: Machine Check Exception: f Bank 1: bd80000000100134
> >  [...] mce: [Hardware Error]: RIP 10:<ffffffffadb69870> {__tlb_remove_page_size+0x10/0xa0}
> >  	...
> >  [...] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
> >  [...] mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel
> >  [...] Kernel panic - not syncing: Fatal local machine check
> > 
> > Which says "Hardware Error" and "Data load in unrecoverable area of
> > kernel".
> > 
> > Ideally, it's better for the log to say "software bug around TDX private
> > memory" instead of "Hardware Error".  But in reality the real hardware
> > memory error can happen, and sadly such software-triggered #MC cannot be
> > distinguished from the real hardware error.  Also, the error message is
> > used by userspace tool 'mcelog' to parse, so changing the output may
> > break userspace.
> > 
> > So keep the "Hardware Error".  The "Data load in unrecoverable area of
> > kernel" is also helpful, so keep it too.
> > 
> > Instead of modifying above error log, improve the error log by printing
> > additional TDX related message to make the log like:
> > 
> >   ...
> >  [...] mce: [Hardware Error]: Machine check: Data load in unrecoverable area of kernel
> >  [...] mce: [Hardware Error]: Machine Check: Memory error from TDX private memory. May be result of CPU erratum.
> 
> The message mentions one part of issue -- CPU erratum -- but misses the
> other required part -- kernel bug that makes kernel access the memory it
> not suppose to.
> 

How about below?

"Memory error from TDX private memory. May be result of CPU erratum caused by
kernel bug."




  reply	other threads:[~2023-06-12  3:10 UTC|newest]

Thread overview: 174+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-04 14:27 [PATCH v11 00/20] TDX host kernel support Kai Huang
2023-06-04 14:27 ` [PATCH v11 01/20] x86/tdx: Define TDX supported page sizes as macros Kai Huang
2023-06-04 14:27 ` [PATCH v11 02/20] x86/virt/tdx: Detect TDX during kernel boot Kai Huang
2023-06-06 14:00   ` Sathyanarayanan Kuppuswamy
2023-06-06 22:58     ` Huang, Kai
2023-06-06 23:44   ` Isaku Yamahata
2023-06-19 12:12   ` David Hildenbrand
2023-06-19 23:58     ` Huang, Kai
2023-06-04 14:27 ` [PATCH v11 03/20] x86/virt/tdx: Make INTEL_TDX_HOST depend on X86_X2APIC Kai Huang
2023-06-08  0:08   ` kirill.shutemov
2023-06-04 14:27 ` [PATCH v11 04/20] x86/cpu: Detect TDX partial write machine check erratum Kai Huang
2023-06-06 12:38   ` kirill.shutemov
2023-06-06 22:58     ` Huang, Kai
2023-06-07 15:06       ` kirill.shutemov
2023-06-07 14:15   ` Dave Hansen
2023-06-07 22:43     ` Huang, Kai
2023-06-19 11:37       ` Huang, Kai
2023-06-20 15:44         ` Dave Hansen
2023-06-20 23:11           ` Huang, Kai
2023-06-19 12:21   ` David Hildenbrand
2023-06-20 10:31     ` Huang, Kai
2023-06-20 15:39     ` Dave Hansen
2023-06-20 16:03       ` David Hildenbrand
2023-06-20 16:21         ` Dave Hansen
2023-06-04 14:27 ` [PATCH v11 05/20] x86/virt/tdx: Add SEAMCALL infrastructure Kai Huang
2023-06-06 23:55   ` Isaku Yamahata
2023-06-07 14:24   ` Dave Hansen
2023-06-07 18:53     ` Isaku Yamahata
2023-06-07 19:27       ` Dave Hansen
2023-06-07 19:47         ` Isaku Yamahata
2023-06-07 20:08           ` Sean Christopherson
2023-06-07 20:22             ` Dave Hansen
2023-06-08  0:51               ` Huang, Kai
2023-06-08 13:50                 ` Dave Hansen
2023-06-07 22:56     ` Huang, Kai
2023-06-08 14:05       ` Dave Hansen
2023-06-19 12:52   ` David Hildenbrand
2023-06-20 10:37     ` Huang, Kai
2023-06-20 12:20       ` kirill.shutemov
2023-06-20 12:39         ` David Hildenbrand
2023-06-20 15:15     ` Dave Hansen
2023-06-04 14:27 ` [PATCH v11 06/20] x86/virt/tdx: Handle SEAMCALL running out of entropy error Kai Huang
2023-06-07  8:19   ` Isaku Yamahata
2023-06-07 15:08   ` Dave Hansen
2023-06-07 23:36     ` Huang, Kai
2023-06-08  0:29       ` Dave Hansen
2023-06-08  0:08   ` kirill.shutemov
2023-06-09 14:42   ` Nikolay Borisov
2023-06-12 11:04     ` Huang, Kai
2023-06-19 13:00   ` David Hildenbrand
2023-06-20 10:39     ` Huang, Kai
2023-06-20 11:14       ` David Hildenbrand
2023-06-04 14:27 ` [PATCH v11 07/20] x86/virt/tdx: Add skeleton to enable TDX on demand Kai Huang
2023-06-05 21:23   ` Isaku Yamahata
2023-06-05 23:04     ` Huang, Kai
2023-06-05 23:08       ` Dave Hansen
2023-06-05 23:24         ` Huang, Kai
2023-06-07 15:22   ` Dave Hansen
2023-06-08  2:10     ` Huang, Kai
2023-06-08 13:43       ` Dave Hansen
2023-06-12 11:21         ` Huang, Kai
2023-06-19 13:16   ` David Hildenbrand
2023-06-19 23:28     ` Huang, Kai
2023-06-04 14:27 ` [PATCH v11 08/20] x86/virt/tdx: Get information about TDX module and TDX-capable memory Kai Huang
2023-06-07 15:25   ` Dave Hansen
2023-06-08  0:27   ` kirill.shutemov
2023-06-08  2:40     ` Huang, Kai
2023-06-08 11:41       ` kirill.shutemov
2023-06-08 13:13         ` Dave Hansen
2023-06-12  2:00           ` Huang, Kai
2023-06-08 23:29         ` Isaku Yamahata
2023-06-08 23:54           ` kirill.shutemov
2023-06-09  1:33             ` Isaku Yamahata
2023-06-09 10:02   ` kirill.shutemov
2023-06-12  2:00     ` Huang, Kai
2023-06-19 13:29   ` David Hildenbrand
2023-06-19 23:51     ` Huang, Kai
2023-06-04 14:27 ` [PATCH v11 09/20] x86/virt/tdx: Use all system memory when initializing TDX module as TDX memory Kai Huang
2023-06-07 15:48   ` Dave Hansen
2023-06-07 23:22     ` Huang, Kai
2023-06-08 22:40   ` kirill.shutemov
2023-06-04 14:27 ` [PATCH v11 10/20] x86/virt/tdx: Add placeholder to construct TDMRs to cover all TDX memory regions Kai Huang
2023-06-07 15:54   ` Dave Hansen
2023-06-07 15:57   ` Dave Hansen
2023-06-08 10:18     ` Huang, Kai
2023-06-08 22:52   ` kirill.shutemov
2023-06-12  2:21     ` Huang, Kai
2023-06-12  3:01       ` Dave Hansen
2023-06-04 14:27 ` [PATCH v11 11/20] x86/virt/tdx: Fill out " Kai Huang
2023-06-07 16:05   ` Dave Hansen
2023-06-08 10:48     ` Huang, Kai
2023-06-08 13:11       ` Dave Hansen
2023-06-12  2:33         ` Huang, Kai
2023-06-12 14:33           ` kirill.shutemov
2023-06-12 22:10             ` Huang, Kai
2023-06-13 10:18               ` kirill.shutemov
2023-06-13 23:19                 ` Huang, Kai
2023-06-08 23:02   ` kirill.shutemov
2023-06-12  2:25     ` Huang, Kai
2023-06-09  4:01   ` Sathyanarayanan Kuppuswamy
2023-06-12  2:28     ` Huang, Kai
2023-06-14 12:31   ` Nikolay Borisov
2023-06-14 22:45     ` Huang, Kai
2023-06-04 14:27 ` [PATCH v11 12/20] x86/virt/tdx: Allocate and set up PAMTs for TDMRs Kai Huang
2023-06-08 23:24   ` kirill.shutemov
2023-06-08 23:43     ` Dave Hansen
2023-06-12  2:52       ` Huang, Kai
2023-06-25 15:38     ` Huang, Kai
2023-06-15  7:48   ` Nikolay Borisov
2023-06-04 14:27 ` [PATCH v11 13/20] x86/virt/tdx: Designate reserved areas for all TDMRs Kai Huang
2023-06-08 23:53   ` kirill.shutemov
2023-06-04 14:27 ` [PATCH v11 14/20] x86/virt/tdx: Configure TDX module with the TDMRs and global KeyID Kai Huang
2023-06-08 23:53   ` kirill.shutemov
2023-06-04 14:27 ` [PATCH v11 15/20] x86/virt/tdx: Configure global KeyID on all packages Kai Huang
2023-06-08 23:53   ` kirill.shutemov
2023-06-15  8:12   ` Nikolay Borisov
2023-06-15 22:24     ` Huang, Kai
2023-06-19 14:56       ` kirill.shutemov
2023-06-19 23:38         ` Huang, Kai
2023-06-04 14:27 ` [PATCH v11 16/20] x86/virt/tdx: Initialize all TDMRs Kai Huang
2023-06-09 10:03   ` kirill.shutemov
2023-06-04 14:27 ` [PATCH v11 17/20] x86/kexec: Flush cache of TDX private memory Kai Huang
2023-06-09 10:14   ` kirill.shutemov
2023-06-04 14:27 ` [PATCH v11 18/20] x86: Handle TDX erratum to reset TDX private memory during kexec() and reboot Kai Huang
2023-06-09 13:23   ` kirill.shutemov
2023-06-12  3:06     ` Huang, Kai
2023-06-12  7:58       ` kirill.shutemov
2023-06-12 10:27         ` Huang, Kai
2023-06-12 11:48           ` kirill.shutemov
2023-06-12 13:18             ` David Laight
2023-06-12 13:47           ` Dave Hansen
2023-06-13  0:51             ` Huang, Kai
2023-06-13 11:05               ` kirill.shutemov
2023-06-14  0:15                 ` Huang, Kai
2023-06-13 14:25               ` Dave Hansen
2023-06-13 23:18                 ` Huang, Kai
2023-06-14  0:24                   ` Dave Hansen
2023-06-14  0:38                     ` Huang, Kai
2023-06-14  0:42                       ` Huang, Kai
2023-06-19 11:43             ` Huang, Kai
2023-06-19 14:31               ` Dave Hansen
2023-06-19 14:46                 ` kirill.shutemov
2023-06-19 23:35                   ` Huang, Kai
2023-06-19 23:41                   ` Dave Hansen
2023-06-20  0:56                     ` Huang, Kai
2023-06-20  1:06                       ` Dave Hansen
2023-06-20  7:58                         ` Peter Zijlstra
2023-06-25 15:30                         ` Huang, Kai
2023-06-25 23:26                           ` Huang, Kai
2023-06-20  7:48                     ` Peter Zijlstra
2023-06-20  8:11       ` Peter Zijlstra
2023-06-20 10:42         ` Huang, Kai
2023-06-20 10:56           ` Peter Zijlstra
2023-06-14  9:33   ` Huang, Kai
2023-06-14 10:02     ` kirill.shutemov
2023-06-14 10:58       ` Huang, Kai
2023-06-14 11:08         ` kirill.shutemov
2023-06-14 11:17           ` Huang, Kai
2023-06-04 14:27 ` [PATCH v11 19/20] x86/mce: Improve error log of kernel space TDX #MC due to erratum Kai Huang
2023-06-05  2:13   ` Xiaoyao Li
2023-06-05 23:05     ` Huang, Kai
2023-06-09 13:17   ` kirill.shutemov
2023-06-12  3:08     ` Huang, Kai [this message]
2023-06-12  7:59       ` kirill.shutemov
2023-06-12 13:51         ` Dave Hansen
2023-06-12 23:31           ` Huang, Kai
2023-06-04 14:27 ` [PATCH v11 20/20] Documentation/x86: Add documentation for TDX host support Kai Huang
2023-06-08 23:56   ` Dave Hansen
2023-06-12  3:41     ` Huang, Kai
2023-06-16  9:02   ` Nikolay Borisov
2023-06-16 16:26     ` Dave Hansen
2023-06-06  0:36 ` [PATCH v11 00/20] TDX host kernel support Isaku Yamahata
2023-06-08 21:03 ` Dan Williams
2023-06-12 10:56   ` Huang, Kai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=90aefcfd663c654197c5878e410f55cc4473eb79.camel@intel.com \
    --to=kai.huang@intel.com \
    --cc=ak@linux.intel.com \
    --cc=bagasdotme@gmail.com \
    --cc=chao.gao@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=isaku.yamahata@intel.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=reinette.chatre@intel.com \
    --cc=sagis@google.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).