All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <andi@firstfloor.org>
To: speck@linutronix.de
Cc: Andi Kleen <ak@linux.intel.com>
Subject: [MODERATED] [PATCH v6 10/43] MDSv6
Date: Sun, 24 Feb 2019 07:07:16 -0800	[thread overview]
Message-ID: <fd985a6564dd500cf316665c5de823cb13843a1d.1551019522.git.ak@linux.intel.com> (raw)
In-Reply-To: <cover.1551019522.git.ak@linux.intel.com>
In-Reply-To: <cover.1551019522.git.ak@linux.intel.com>

From: Andi Kleen <ak@linux.intel.com>
Subject:  mds: Add documentation for clear cpu usage

Including the theory, and some guide lines for subsystem/driver
maintainers.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 Documentation/clearcpu.txt | 261 +++++++++++++++++++++++++++++++++++++
 1 file changed, 261 insertions(+)
 create mode 100644 Documentation/clearcpu.txt

diff --git a/Documentation/clearcpu.txt b/Documentation/clearcpu.txt
new file mode 100644
index 000000000000..a45e5d82868a
--- /dev/null
+++ b/Documentation/clearcpu.txt
@@ -0,0 +1,261 @@
+
+Security model for Microarchitectural Data Sampling
+===================================================
+
+Some CPUs can leave read or written data in internal buffers,
+which then later might be sampled through side effects.
+For more details see CVE-2018-12126 CVE-2018-12130 CVE-2018-12127
+
+This can be avoided by explicitly clearing the CPU state.
+
+We attempt to avoid leaking data between different processes,
+and also some sensitive data, like cryptographic data, to
+user space.
+
+We support three modes:
+
+(1) mitigation off (mds=off)
+(2) clear only when needed (default)
+(3) clear on every kernel exit, or guest entry (mds=full)
+
+(1) and (3) are trivial, the rest of the document discusses (2)
+
+In general option (3) is the most conservative choice. It does
+not make ST assumptions about leaking data.
+
+Basic requirements and assumptions
+----------------------------------
+
+Kernel addresses and kernel temporary data are not sensitive.
+
+User data is sensitive, but only for other processes.
+
+User data is anything in the user address space, or data buffers
+directly copied from/to the user (e.g. read/write). It does not
+include metadata, or flag settings. For example packet headers
+or file names are not sensitive in this model.
+
+Block IO data (but not meta data) is sensitive.
+
+Most data structures in the kernel are not sensitive.
+
+Kernel data is sensitive when it involves cryptographic keys.
+
+We consider data from input devices (such as key presses)
+sensitive. We also consider sound data or terminal
+data sensitive.
+
+We assume that only data actually accessed by the kernel by explicit
+instructions can be leaked.  Note that this may not be always
+true, in theory prefetching or speculation may touch more. The assumption
+is that if any such happens it will be very low bandwidth and hard
+to control due to the existing Spectre and other mitigations,
+such as memory randomization.  If a user is concerned about this they
+need to use mds=full.
+
+Guidance for driver/subsystem developers
+----------------------------------------
+
+[These generally need to be enforced in code review for new code now]
+
+When you touch user supplied data of *other* processes in system call
+context add lazy_clear_cpu().
+
+For the cases below we care only about data from other processes.
+Touching non cryptographic data from the current process is always allowed.
+
+Touching only pointers to user data is always allowed.
+
+When your interrupt does touch user data directly mark it with IRQF_USER_DATA.
+
+When your tasklet does touch user data directly, mark it TASKLET_USER_DATA
+using tasklet_init_flags/or DECLARE_TASKLET_USERDATA*.
+
+When your timer does touch user data mark it with TIMER_USER_DATA
+If it is a hrtimer and touches user data, mark it with HRTIMER_MODE_USER_DATA.
+
+When your irq poll handler does touch user data, mark it lazy_clear_cpu().
+
+For networking code, make sure to only touch user data through
+skb_push/put/copy [add more], unless it is data from the current
+process. If that is not ensured add lazy_clear_cpu or
+lazy_clear_cpu_interrupt.
+
+Any cryptographic code touching key data should use memzero_explicit
+or kzfree to free the data.
+
+If your RCU callback touches user data add lazy_clear_cpu().
+
+These steps are currently only needed for code that runs on MDS affected
+CPUs, which is currently only x86. But might be worth being prepared
+if other architectures become affected too.
+
+Implementation details/assumptions
+----------------------------------
+
+Any buffer clearing is done lazily on next kernel exit. lazy_clear*
+is only a few fast instructions with no cache misses setting
+a flag and can be used frequently even in fast paths.
+
+Protecting process data
+-----------------------
+
+If a system call touches data of its own process, CPU state does not
+need to be cleared, because it has already access to it.
+
+On context switching we clear data, unless the context switch is
+inside a process. We also clear after any context switches from kernel
+threads.
+
+Cryptographic keys inside the kernel should be protected.
+We assume they use kzfree() or memzero_explicit() to clear
+state, so these functions trigger a cpu clear.
+
+Hard Interrupts and tasklets
+----------------------------
+
+Most interrupt handlers for modern devices do not touch
+user data, because they rely on DMA and only manipulate
+pointers. They have been audited.
+
+Some handlers copy data, but often use strategic
+functions which can be marked with a lazy clear.
+For example memcpy_from/to_io, swiotlb (see below
+for a full list)
+
+Some handlers touch user data without using these strategic
+functions, these have to be marked with IRQF_USER_DATA.
+All in tree handlers have been audited.
+
+Softirqs
+--------
+
+Softirqs are handled case by case:
+
+        TIMER: see timers below.
+        NET_*: see networking below.
+        BLOCK: do not touch user data, except
+	for a few using kmap_atomic. We have a lazy_clear_cpu_interrupt()
+	in kmap_atomic for this case.
+
+        IRQ_POLL: generally do not touch user data
+        TASKLET: see tasklets below
+        SCHED:   only touches scheduler metadata
+        RCU:	RCU handlers generally only free.
+
+Networking
+----------
+
+This is only about network code running in hard interrupt
+or softirq or timer context. Per process network code
+generally only touches data for the current process,
+so does not need any changes.
+
+In principle packet data should be encrypted anyways for the wire,
+but we try to avoid leaking it anyways.
+
+For networking code, any skb functions that are likely
+touching non header packet data schedule a clear cpu at next
+kernel exit. This includes skb_copy and related, skb_put/push,
+checksum functions.  We assume that any networking code touching
+packet data uses these functions.
+
+NMIs / machine checks
+---------------------
+
+Assume they don't touch other processes user data. Most NMI
+handlers are fairly simple and trivial and only concerned with
+some non user hardware state. The machine check handlers and perf PMI
+handlers are complicated (e.g. perf can touch user stack), but they
+never touch any data not of the current process.
+
+Other interrupts
+----------------
+
+SMP function interrupt call backs have been audited and don't touch
+any user data.
+
+Clear points
+------------
+
+We schedule clears in some centralized functions to minimize impact
+on the  overall code.
+
+Always clear:
+
+kernel preemption		undefined state, need to always clear
+context switch			protect user / kernel thread data
+VM entry			protect host against guest
+
+Always schedule clear for next kernel exit:
+
+kzfree / memzero_explicit	keys and crypto data
+
+Only schedule clear for next exit when called in interrupts:
+
+kmap_atomic			block drivers touching user process data
+memcpy_from/to_io		drivers copying IO data
+insw*, outs*
+input_event			input drivers touching user IO data
+serio_interrupt
+tty_insert_*			tty drivers touching user input IO data
+swiotlb				bounce buffers touching IO data
+sg_copy_*			scsi drivers touching IO data in interrupts
+skb_put, skb_copy_*		networking code touching IO data
+skb_*csum*
+snd_pcm_period_elapsed,		sound drivers touching IO data
+snd_rawmidi_transmit/receive
+snd_timer_interrupt
+
+Sandboxes
+---------
+
+We don't do anything special for seccomp processes
+
+If there is a sandbox inside the process the process should take care
+itself of clearing its own sensitive data before running sandbox
+code. This would include data touched by system calls.
+
+BPF
+---
+
+Assume BPF execution does not touch other user's data, so does
+not need to schedule a clear for itself.
+
+BPF could attack the rest of the kernel if it can successfully
+measure side channel side effects.
+
+When the BPF program was loaded unprivileged, always clear the CPU
+to prevent any exploits written in BPF using side channels to read
+data leaked from other kernel code
+
+We only do this when running in an interrupt, or if an clear cpu is
+already scheduled (which means for example there was a context
+switch, or crypto operation before)
+
+In process context we assume the code only accesses data of the
+current user and check that the BPF running was loaded by the
+same user so even if data leaked it would not cross privilege
+boundaries.
+
+Technically we would only need to do this if the BPF program
+contains conditional branches and loads dominated by them, but
+let's assume that nearly all do.
+
+This could be further optimized by batching clears for
+many similar EBPF executions in a row (e.g. for packet
+processing). This would need ensuring that no sensitive
+data is touched inbetween the EBPF executions, and also
+that all EBPF scripts are set up by the same uid.
+We could add such optimizations later based on
+profile data.
+
+Virtualization
+--------------
+
+When entering a guest in KVM we clear to avoid any leakage to a guest.
+Normally this is done implicitly as part of the L1TF mitigation,
+except on a few CPUs that are not vulnerable to L1TF and need
+explicit clear. It relies on L1TF being enabled. It also uses the
+"fast exit" optimization that only clears if an interrupt or context switch
+happened during a VMexit, unless mds=full is used.
-- 
2.17.2

  parent reply	other threads:[~2019-02-24 15:11 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-24 15:07 [MODERATED] [PATCH v6 00/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 01/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 02/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 03/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 04/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 05/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 06/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 07/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 08/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 09/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` Andi Kleen [this message]
2019-02-25 16:11   ` [MODERATED] Re: [PATCH v6 10/43] MDSv6 Greg KH
2019-02-25 16:42     ` Andi Kleen
2019-02-25 16:30   ` Greg KH
2019-02-25 16:41     ` [MODERATED] Encrypted Message Jon Masters
2019-02-25 16:58     ` [MODERATED] Re: [PATCH v6 10/43] MDSv6 Andi Kleen
2019-02-25 17:18   ` Dave Hansen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 11/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 12/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 13/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 14/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 15/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 16/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 17/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 18/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 19/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 20/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 21/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 22/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 23/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 24/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 25/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 26/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 27/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 28/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 29/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 30/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 31/43] MDSv6 Andi Kleen
2019-02-25 15:19   ` [MODERATED] " Greg KH
2019-02-25 15:34     ` Andi Kleen
2019-02-25 15:49       ` Greg KH
2019-02-25 15:52         ` [MODERATED] Encrypted Message Jon Masters
2019-02-25 16:00           ` [MODERATED] " Greg KH
2019-02-25 16:19             ` [MODERATED] " Jon Masters
2019-02-25 16:19         ` [MODERATED] Re: [PATCH v6 31/43] MDSv6 Andi Kleen
2019-02-25 16:24         ` mark gross
2019-02-25 16:24         ` Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 32/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 33/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 34/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 35/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [FROZEN] [PATCH v6 36/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 37/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 38/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 39/43] MDSv6 Andi Kleen
2019-02-25 15:26   ` [MODERATED] " Greg KH
2019-02-25 16:28     ` Andi Kleen
2019-02-25 16:47       ` Greg KH
2019-02-25 17:05         ` Andi Kleen
2019-02-25 17:49           ` Greg KH
2019-02-25 18:10             ` Andi Kleen
2019-02-25 20:11               ` Greg KH
2019-02-25 21:00                 ` Greg KH
2019-02-25 21:19                 ` Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 40/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 41/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 42/43] MDSv6 Andi Kleen
2019-02-24 15:07 ` [MODERATED] [PATCH v6 43/43] MDSv6 Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fd985a6564dd500cf316665c5de823cb13843a1d.1551019522.git.ak@linux.intel.com \
    --to=andi@firstfloor.org \
    --cc=ak@linux.intel.com \
    --cc=speck@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.