All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <ak@linux.intel.com>
To: speck@linutronix.de
Subject: [MODERATED] [PATCH 2/8] L1TFv3 7
Date: Thu,  3 May 2018 20:23:23 -0700	[thread overview]
Message-ID: <1bef0e393f925379b76cb689bfb3fdbfc052e716.1525403858.git.ak@linux.intel.com> (raw)
In-Reply-To: <cover.1525403858.git.ak@linux.intel.com>
In-Reply-To: <cover.1525403858.git.ak@linux.intel.com>

With L1 terminal fault the CPU speculates into unmapped PTEs, and
resulting side effects allow to read the memory the PTE is pointing
too, if its values are still in the L1 cache.

For swapped out pages Linux uses unmapped PTEs and stores a swap entry
into them.

We need to make sure the swap entry is not pointing to valid memory,
which requires setting higher bits (between bit 36 and bit 45) that
are inside the CPUs physical address space, but outside any real
memory.

To do this we invert the offset to make sure the higher bits are always
set, as long as the swap file is not too big.

Here's a patch that switches the order of "type" and
"offset" in the x86-64 encoding, in addition to doing the binary 'not' on
the offset.

That means that now the offset is bits 9-58 in the page table, and that
the offset is in the bits that hardware generally doesn't care about.

That, in turn, means that if you have a desktop chip with only 40 bits of
physical addressing, now that the offset starts at bit 9, you still have
to have 30 bits of offset actually *in use* until bit 39 ends up being
clear.

So that's 4 terabyte of swap space (because the offset is counted in
pages, so 30 bits of offset is 42 bits of actual coverage). With bigger
physical addressing, that obviously grows further, until you hit the limit
of the offset (at 50 bits of offset - 62 bits of actual swap file
coverage).

Note there is no workaround for 32bit !PAE, or on systems which
have more than MAX_PA/2 memory. The later case is very unlikely
to happen on real systems.

[updated description and minor tweaks by AK]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 arch/x86/include/asm/pgtable_64.h | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 877bc27718ae..593c3cf259dd 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -273,7 +273,7 @@ static inline int pgd_large(pgd_t pgd) { return 0; }
  *
  * |     ...            | 11| 10|  9|8|7|6|5| 4| 3|2| 1|0| <- bit number
  * |     ...            |SW3|SW2|SW1|G|L|D|A|CD|WT|U| W|P| <- bit names
- * | OFFSET (14->63) | TYPE (9-13)  |0|0|X|X| X| X|X|SD|0| <- swp entry
+ * | TYPE (59-63) | ~OFFSET (9-58)  |0|0|X|X| X| X|X|SD|0| <- swp entry
  *
  * G (8) is aliased and used as a PROT_NONE indicator for
  * !present ptes.  We need to start storing swap entries above
@@ -286,20 +286,34 @@ static inline int pgd_large(pgd_t pgd) { return 0; }
  *
  * Bit 7 in swp entry should be 0 because pmd_present checks not only P,
  * but also L and G.
+ *
+ * The offset is inverted by a binary not operation to make the high
+ * physical bits set.
  */
-#define SWP_TYPE_FIRST_BIT (_PAGE_BIT_PROTNONE + 1)
-#define SWP_TYPE_BITS 5
-/* Place the offset above the type: */
-#define SWP_OFFSET_FIRST_BIT (SWP_TYPE_FIRST_BIT + SWP_TYPE_BITS)
+#define SWP_TYPE_BITS		5
+
+#define SWP_OFFSET_FIRST_BIT	(_PAGE_BIT_PROTNONE + 1)
+
+/* We always extract/encode the offset by shifting it all the way up, and then down again */
+#define SWP_OFFSET_SHIFT	(SWP_OFFSET_FIRST_BIT+SWP_TYPE_BITS)
 
 #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS)
 
-#define __swp_type(x)			(((x).val >> (SWP_TYPE_FIRST_BIT)) \
-					 & ((1U << SWP_TYPE_BITS) - 1))
-#define __swp_offset(x)			((x).val >> SWP_OFFSET_FIRST_BIT)
-#define __swp_entry(type, offset)	((swp_entry_t) { \
-					 ((type) << (SWP_TYPE_FIRST_BIT)) \
-					 | ((offset) << SWP_OFFSET_FIRST_BIT) })
+/* Extract the high bits for type */
+#define __swp_type(x) ((x).val >> (64 - SWP_TYPE_BITS))
+
+/* Shift up (to get rid of type), then down to get value */
+#define __swp_offset(x) (~(x).val << SWP_TYPE_BITS >> SWP_OFFSET_SHIFT)
+
+/*
+ * Shift the offset up "too far" by TYPE bits, then down again
+ * The offset is inverted by a binary not operation to make the high
+ * physical bits set.
+ */
+#define __swp_entry(type, offset) ((swp_entry_t) { \
+	(~(unsigned long)(offset) << SWP_OFFSET_SHIFT >> SWP_TYPE_BITS) \
+	| ((unsigned long)(type) << (64-SWP_TYPE_BITS)) })
+
 #define __pte_to_swp_entry(pte)		((swp_entry_t) { pte_val((pte)) })
 #define __pmd_to_swp_entry(pmd)		((swp_entry_t) { pmd_val((pmd)) })
 #define __swp_entry_to_pte(x)		((pte_t) { .pte = (x).val })
-- 
2.14.3

  parent reply	other threads:[~2018-05-04  3:23 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-04  3:23 [MODERATED] [PATCH 0/8] L1TFv3 4 Andi Kleen
2018-05-04  3:23 ` [MODERATED] [PATCH 1/8] L1TFv3 8 Andi Kleen
2018-05-04 13:42   ` [MODERATED] " Michal Hocko
2018-05-04 14:07     ` Andi Kleen
2018-05-04  3:23 ` Andi Kleen [this message]
2018-05-07 11:45   ` [MODERATED] Re: [PATCH 2/8] L1TFv3 7 Vlastimil Babka
2018-05-04  3:23 ` [MODERATED] [PATCH 3/8] L1TFv3 1 Andi Kleen
2018-05-04  3:55   ` [MODERATED] " Linus Torvalds
2018-05-04 13:42   ` Michal Hocko
2018-05-07 12:38   ` Vlastimil Babka
2018-05-07 13:41     ` Andi Kleen
2018-05-07 18:01       ` Thomas Gleixner
2018-05-07 18:21         ` [MODERATED] " Andi Kleen
2018-05-07 20:03           ` Thomas Gleixner
2018-05-04  3:23 ` [MODERATED] [PATCH 4/8] L1TFv3 6 Andi Kleen
2018-05-04  3:23 ` [MODERATED] [PATCH 5/8] L1TFv3 2 Andi Kleen
2018-05-04  3:23 ` [MODERATED] [PATCH 6/8] L1TFv3 0 Andi Kleen
2018-05-04  3:23 ` [MODERATED] [PATCH 7/8] L1TFv3 5 Andi Kleen
2018-05-04 13:43   ` [MODERATED] " Michal Hocko
2018-05-04 14:11     ` Andi Kleen
2018-05-04 14:21       ` Michal Hocko
2018-05-04  3:23 ` [MODERATED] [PATCH 8/8] L1TFv3 3 Andi Kleen
2018-05-04 14:19   ` [MODERATED] " Andi Kleen
2018-05-04 14:34     ` Michal Hocko
2018-05-04 15:53       ` Andi Kleen
2018-05-04 16:26         ` Michal Hocko
2018-05-04 22:15   ` Dave Hansen
2018-05-05  3:55     ` Andi Kleen
2018-05-04  3:54 ` [MODERATED] Re: [PATCH 0/8] L1TFv3 4 Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1bef0e393f925379b76cb689bfb3fdbfc052e716.1525403858.git.ak@linux.intel.com \
    --to=ak@linux.intel.com \
    --cc=speck@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.