archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <>
To: Dave Hansen <>
Cc: Ravi V Shankar <>,
	Peter Zijlstra <>,
	H Peter Anvin <>,
	Jean-Philippe Brucker <>,
	Dave Jiang <>,
	Ashok Raj <>, x86 <>,
	amd-gfx <>,
	Christoph Hellwig <>,
	Ingo Molnar <>, Fenghua Yu <>,
	Borislav Petkov <>, Andy Lutomirski <>,
	Thomas Gleixner <>,
	Tony Luck <>,
	Felix Kuehling <>,
	linux-kernel <>,
	iommu <>,
	Jacob Jun Pan <>,
	David Woodhouse <>
Subject: Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
Date: Mon, 3 Aug 2020 10:16:50 -0700	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Mon, Aug 3, 2020 at 9:37 AM Dave Hansen <> wrote:
> On 8/3/20 8:12 AM, Andy Lutomirski wrote:
> > I could easily be convinced that the PASID fixup is so trivial and so
> > obviously free of misfiring in a way that causes an infinite loop that
> > this code is fine.  But I think we first need to answer the bigger
> > question of why we're doing a lazy fixup in the first place.
> There was an (internal to Intel) implementation of this about a year ago
> that used smp_call_function_many() to force the MSR state into all
> threads of a process.  I took one look at it, decided there was a 0%
> chance of it actually functioning and recommended we find another way.
> While I'm sure it could be done more efficiently, the implementation I
> looked at took ~200 lines of code and comments.  It started to look too
> much like another instance of mm->cpumask for my comfort.

If I were implementing this, I would try making switch_mm_irqs_off()
do, roughly:

void load_mm_pasid(...) {
  if (cpu_feature_enabled(X86_FEATURE_ENQCMD))
    tsk->xstate[offset] = READ_ONCE(next->context.pasid);

This costs one cache miss, although the cache line in question is
about to be read anyway.  It might be faster to, instead, do:

void load_mm_pasid(...) {
  u32 pasid = READ_ONCE(next->context.pasid);

  if (tsk->xstate[offset] != pasid)
    tsk->state[offset] = pasid;

so we don't dirty the cache line in the common case.  The actual
generated code ought to be pretty good -- surely the offset of PASID
in XSTATE is an entry in an array somewhere that can be found with a
single read, right?

The READ_ONCE is because this could race against a write to
context.pasid, so this code needs to be at the end of the function
where it's protected by mm_cpumask.  With all this done, the pasid
update is just on_each_cpu_mask(mm_cpumask(mm), load_mm_pasid, mm,

This looks like maybe 20 lines of code.  As an added bonus, it lets us
free PASIDs early if we ever decide we want to.

May I take this opportunity to ask Intel to please put some real
thought into future pieces of CPU state?  Here's a summary of some
things we have:

- Normal extended state (FPU, XMM, etc): genuinely per thread and only
ever used explicitly.  Actually makes sense with XSAVE(C/S).

- PKRU: affects CPL0-originated memory accesses, so it needs to be
eagerly loaded in the kernel.  Does not make sense with XRSTOR(C/S),
but it's in there anyway.

- CR3: per-mm state.  Makes some sense in CR3, but it's tangled up
with CR4 in nasty ways.

- LDTR: per-mm on Linux and mostly obsolete everyone.  In it's own
register, so it's not a big deal.

- PASID: per-mm state (surely Intel always intended it to be per-mm,
since it's for shared _virtual memory_!).  But for some reason it's in
an MSR (which is slow), and it's cleverly, but not that cleverly,
accessible with XSAVES/XRSTORS.  Doesn't actually make sense.  Also,
PASID is lazy-loadable, but the mechanism for telling the kernel that
a lazy load is needed got flubbed.

- TILE: genuinely per-thread, but it's expensive so it's
lazy-loadable.  But the lazy-load mechanism reuses #NM, and it's not
fully disambiguated from the other use of #NM.  So it sort of works,
but it's gross.

- "KERNEL_GS_BASE", i.e. the shadow GS base.  This is logically
per-user-thread state, but it's only accessible in MSRs.  For some
reason this is *not* in XSAVES/XRSTORS state, nor is there any
efficient way to access it at all.

- Segment registers: can't be properly saved except by hypervisors,
and can almost, but not quite, be properly loaded (assuming the state
was sane to begin with) by normal kernels.  Just don't try to load 1,
2, or 3 into any of them.

Sometimes I think that this is all intended to be as confusing as
possible and that it's all a ploy to keep context switches slow and
complicated.  Maybe Intel doesn't actually want to compete with other
architectures that can context switch quickly?

It would be really nice if we had a clean way to load per-mm state
(see my private emails about this), a clean way to load CPL3 register
state, and a clean way to load per-user-thread *kernel* register state
(e.g. PKRU and probably PKRS).  And there should be an exception that
says "user code accessed a lazy-loaded resource that isn't loaded, and
this is the resource it tried to access".
iommu mailing list

  reply	other threads:[~2020-08-03 17:17 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-13 23:47 [PATCH v6 00/12] x86: tag application address space for devices Fenghua Yu
2020-07-13 23:47 ` [PATCH v6 01/12] iommu: Change type of pasid to u32 Fenghua Yu
2020-07-14  2:45   ` Liu, Yi L
2020-07-14 13:54     ` Fenghua Yu
2020-07-14 13:56       ` Liu, Yi L
2020-07-22 14:03   ` Joerg Roedel
2020-07-22 17:21     ` Fenghua Yu
2020-07-13 23:47 ` [PATCH v6 02/12] iommu/vt-d: Change flags type to unsigned int in binding mm Fenghua Yu
2020-07-13 23:47 ` [PATCH v6 03/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing) Fenghua Yu
2020-07-14  3:25   ` Liu, Yi L
2020-07-15 23:32     ` Fenghua Yu
2020-07-13 23:47 ` [PATCH v6 04/12] x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions Fenghua Yu
2020-07-13 23:48 ` [PATCH v6 05/12] x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature Fenghua Yu
2020-07-13 23:48 ` [PATCH v6 06/12] x86/msr-index: Define IA32_PASID MSR Fenghua Yu
2020-07-13 23:48 ` [PATCH v6 07/12] mm: Define pasid in mm Fenghua Yu
2020-07-13 23:48 ` [PATCH v6 08/12] fork: Clear PASID for new mm Fenghua Yu
2021-02-24 10:19   ` Jean-Philippe Brucker
2021-02-25 22:17     ` Fenghua Yu
2021-03-01 23:00       ` Jacob Pan
2021-03-02 10:43         ` Jean-Philippe Brucker
2020-07-13 23:48 ` [PATCH v6 09/12] x86/process: Clear PASID state for a newly forked/cloned thread Fenghua Yu
2020-08-01  1:44   ` Andy Lutomirski
2020-07-13 23:48 ` [PATCH v6 10/12] x86/mmu: Allocate/free PASID Fenghua Yu
2020-07-13 23:48 ` [PATCH v6 11/12] sched: Define and initialize a flag to identify valid PASID in the task Fenghua Yu
2020-07-13 23:48 ` [PATCH v6 12/12] x86/traps: Fix up invalid PASID Fenghua Yu
2020-07-31 23:34   ` Andy Lutomirski
2020-08-01  0:42     ` Fenghua Yu
2020-08-03 15:03     ` Dave Hansen
2020-08-03 15:12       ` Andy Lutomirski
2020-08-03 15:19         ` Raj, Ashok
2020-08-03 16:36         ` Dave Hansen
2020-08-03 17:16           ` Andy Lutomirski [this message]
2020-08-03 17:34             ` Dave Hansen
2020-08-03 19:24               ` Andy Lutomirski
2020-08-01  1:28   ` Andy Lutomirski
2020-08-03 17:19     ` Fenghua Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='' \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \
    --subject='Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).