LKML Archive on
 help / color / Atom feed
From: Linus Torvalds <>
To: Thomas Gleixner <>
Cc: Benjamin Herrenschmidt <>,
	Ingo Molnar <>, Balbir Singh <>,
	Peter Zijlstra <>,
	Andrew Morton <>,
	Borislav Petkov <>,
	Linux Kernel Mailing List <>,
	Andrew Lutomirski <>
Subject: Re: [GIT PULL] x86/mm changes for v5.8
Date: Tue, 2 Jun 2020 12:14:43 -0700
Message-ID: <> (raw)
In-Reply-To: <>

On Tue, Jun 2, 2020 at 11:29 AM Thomas Gleixner <> wrote:
> It's trivial enough to fix. We have a static key already which is
> telling us whether SMT scheduling is active.

.. but should we do it here, in switch_mm() in the first place?

Should it perhaps just return an error if user land tries to set the
"flush L1$" thing on an SMT system? And no, I don't think we care at
all if people then start playing games and enabling/disabling SMT
dynamically while applications are working. At that point the admin
kets to keep both of the broken pieces.

Also, see my other point about how "switch_mm()" really isn't even a
protection domain switch to begin with. We're still in the exact same
protection domain we used to be in, with the exact same direct access
to L1D$.

Why would we flush the caches on a random and irrelevant event in
kernel space? switch_mm() simply isn't relevant for caches (well,
unless you have fully virtual ones, but that's a completely different

Wouldn't it be more sensible to treat it more like TIF_NEED_FPU_LOAD -
have a per-cpu "I need to flush the cache" variable, and then the only
thing a context switch does is to see if the user changed (or
whatever) and then set the bit, and set TIF_NOTIFY_RESUME in the

Because the L1D$ flush isn't a kernel issue, it's a "don't let user
space try to attack it" issue. The kernel can already read it if it
wants to.

And that's just the "big picture" issues I see. In the big picture,
doing this when SMT is enabled is unbelievably stupid. And in the big
picture, context switch really isn't a security domain change wrt the

The more I look at those patches, the more I go "that's just wrong" on
some of the "small picture" implementation details.

Here's just a few cases that I reacted to

Actual low-level flushing code:

 (1) the SW fallback

     (a) is only defined on Intel, and initializing the silly cache
flush pages on any other vendor will fail.

     (b) seems to assume that 16 pages (order-4) is sufficient and
necessary. Probably "good enough", but it's also an example of "yeah,
that's expensive".

     (c) and if I read the code correctly, trying to flush the L1D$ on
non-intel without the HW support, it causes a WARN_ON_ONCE()! WTF?

 (2) the HW case is done for any vendor, if it reports the "I have the MSR"

 (3) the VMX support certainly has various sanity checks like "oh, CPU
doesn't have X86_BUG_L1TF, then I won't do this even if there was some
kernel command line to say I should". But the new prctrl doesn't have
anything like that. It just enables that L1D$ thing mindlessly,
thinking that user-land software somehow knows what it's doing. BS.

 (4) what does L1D_FLUSH_POPULATE_TLB mean?

   That "option" makes zero sense. It pre-populates the TLB before
doing the accesses to the L1D$ pages in the SW case, but nothing at
all explains why that is needed. It's clearly not needed for the
caller, since the TLB population only happens for the SW fallback, not
for the HW one.

   No documentation, no nothing. It's enabled for the VMX case, not
for the non-vmx case, which makes me suspect it's some crazy "work
around vm monitor page faults, because we know our SW flush fallback
is just random garbage".

In other words, there are those big "high-level design" questions, but
also several oddities in just the implementation details.

I really get the feeling that this feature just isn't ready.

Ingo - would you mind sending me a pull request for the (independent)
TLB cleanups part of that x86/mm tree? Because everything up to and
including commit bd1de2a7aace ("x86/tlb/uv: Add a forward declaration
for struct flush_tlb_info") looks sane.

It's only this L1D$ flushing thing at the end of that branch that I
think isn't fully cooked.


  reply index

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-01 17:01 Ingo Molnar
2020-06-01 21:42 ` Linus Torvalds
2020-06-02  2:35   ` Linus Torvalds
2020-06-02 10:25     ` Singh, Balbir
2020-06-02  7:33   ` Ingo Molnar
2020-06-02  9:37     ` Benjamin Herrenschmidt
2020-06-02 18:28       ` Thomas Gleixner
2020-06-02 19:14         ` Linus Torvalds [this message]
2020-06-02 23:01           ` Singh, Balbir
2020-06-02 23:28             ` Linus Torvalds
2020-06-03  1:31               ` Singh, Balbir
2020-06-04 17:29           ` [GIT PULL v2] " Ingo Molnar
2020-06-05  2:41             ` Linus Torvalds
2020-06-05  8:11               ` [GIT PULL v3] " Ingo Molnar
2020-06-05 20:40                 ` pr-tracker-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='' \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on

Archives are clonable:
	git clone --mirror lkml/git/0.git
	git clone --mirror lkml/git/1.git
	git clone --mirror lkml/git/2.git
	git clone --mirror lkml/git/3.git
	git clone --mirror lkml/git/4.git
	git clone --mirror lkml/git/5.git
	git clone --mirror lkml/git/6.git
	git clone --mirror lkml/git/7.git
	git clone --mirror lkml/git/8.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ \
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone