From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <vbabka@suse.cz>
Received: from mail.linutronix.de (146.0.238.70:993) by
  crypto-ml.lab.linutronix.de with IMAP4-SSL for <speck@linutronix.de>; 22 Jun
  2018 15:49:09 -0000
Received: from mx2.suse.de ([195.135.220.15])	by Galois.linutronix.de with
 esmtps (TLS1.0:DHE_RSA_CAMELLIA_256_CBC_SHA1:256)	(Exim 4.80)	(envelope-from
 <vbabka@suse.cz>)	id 1fWOJ9-0007yZ-2w	for speck@linutronix.de; Fri, 22 Jun
 2018 17:49:08 +0200
Received: from relay1.suse.de (charybdis-ext-too.suse.de [195.135.220.254])
	by mx2.suse.de (Postfix) with ESMTP id 54E3FAEF3
	for <speck@linutronix.de>; Fri, 22 Jun 2018 15:48:58 +0000 (UTC)
Subject: [MODERATED] Re: [PATCH 8/8] L1TFv8 6
References: <cover.1528929489.git.ak@linux.intel.com>
 <20180614150632.E064C61183@crypto-ml.lab.linutronix.de>
 <4ad5c4d2-7721-729e-3af6-6c8ed84dda9f@suse.cz>
 <260fce1e-c5fe-cace-56a8-a83c2a41f115@suse.cz>
From: Vlastimil Babka <vbabka@suse.cz>
Message-ID: <b825c100-d238-c8c9-b28a-3de423a8e7af@suse.cz>
Date: Fri, 22 Jun 2018 17:46:45 +0200
MIME-Version: 1.0
In-Reply-To: <260fce1e-c5fe-cace-56a8-a83c2a41f115@suse.cz>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
To: speck@linutronix.de
List-ID: <speck.linutronix.de>

On 06/21/2018 01:43 PM, speck for Vlastimil Babka wrote:
> On 06/21/2018 11:02 AM, speck for Vlastimil Babka wrote:
>> On 06/14/2018 12:48 AM, speck for Andi Kleen wrote:
>>> +unsigned long max_swapfile_size(void)
>>> +{
>>> +	unsigned long pages;
>>> +
>>> +	pages = generic_max_swapfile_size();
>>> +
>>> +	if (boot_cpu_has_bug(X86_BUG_L1TF)) {
>>> +		/* Limit the swap file size to MAX_PA/2 for L1TF workaround */
>>> +		pages = min_t(unsigned long, l1tf_pfn_limit() + 1, pages);
>>
>> Is this actually correct? IIUC l1tf_pfn_limit() is in page granularity,
>> which are encoded in bits 12 to $LIMIT., but we have swap offsets in
>> bits 9 to $LIMIT (after patch 2/8), i.e. 3 bits more? Same for the
>> limits described in the changelog?
> 
> Yeah, I was able to verify this with some printk's, constructing a pte
> with max allowed offset and printing it. In VM with 42bit limits, the
> pte is 7ffffc000000000, so the unusable bits start with 38, not 41.
> 
> Also after more digging into this, I also suspect that the PAE case is
> currently not mitigating. The pgtable-3level.h macros don't seem to flip
> the bits. Also swap entries there use only the high pte word, whereas
> most of the safe to use bits are in the low word.

I've been trying to fix the PAE case and here's the current result, note
that it's only compile tested, so just a RFC and testing welcome. I
changed the swap entry format to mimic the 64bit one, as neither 32bit
word has enough "safe" bits to not limit swap size to few GB.

Because the macro machinery doesn't expect the arch-dependent swap entry
format to be 32bit and pte to be 64bit, the results is even more macros,
sorry about that.

-----8<-----
>From 6f8c1176e99fbf56dc8a29a4d279a5770e45fd4f Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Fri, 22 Jun 2018 17:39:33 +0200
Subject: [PATCH] adjust PAE swap encoding for l1tf

---
 arch/x86/include/asm/pgtable-3level.h | 35 +++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
index 76ab26a99e6e..a1d9ab21f8ea 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -241,12 +241,43 @@ static inline pud_t native_pudp_get_and_clear(pud_t *pudp)
 #endif
 
 /* Encode and de-code a swap entry */
+#define SWP_TYPE_BITS		5
+
+#define SWP_OFFSET_FIRST_BIT	(_PAGE_BIT_PROTNONE + 1)
+
+/* We always extract/encode the offset by shifting it all the way up, and then down again */
+#define SWP_OFFSET_SHIFT	(SWP_OFFSET_FIRST_BIT+SWP_TYPE_BITS)
+
 #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > 5)
 #define __swp_type(x)			(((x).val) & 0x1f)
 #define __swp_offset(x)			((x).val >> 5)
 #define __swp_entry(type, offset)	((swp_entry_t){(type) | (offset) << 5})
-#define __pte_to_swp_entry(pte)		((swp_entry_t){ (pte).pte_high })
-#define __swp_entry_to_pte(x)		((pte_t){ { .pte_high = (x).val } })
+
+/*
+ * Normally, __swp_entry() converts from arch-independent swp_entry_t to
+ * arch-dependent swp_entry_t, and __swp_entry_to_pte() just stores the result
+ * to pte. But here we have 32bit swp_entry_t and 64bit pte, and need to use the
+ * whole 64 bits. Thus, we shift the "real" arch-dependent conversion to
+ * __swp_entry_to_pte() through the following helper macro based on 64bit
+ * __swp_entry().
+ */
+#define __swp_pteval_entry(type, offset) ((pteval_t) { \
+	(~(pteval_t)(offset) << SWP_OFFSET_SHIFT >> SWP_TYPE_BITS) \
+	| ((pteval_t)(type) << (64-SWP_TYPE_BITS)) })
+
+#define __swp_entry_to_pte(x)	((pte_t){ .pte = \
+		__swp_pteval_entry(__swp_type(x), __swp_offset(x)) })
+/*
+ * Analogically, __pte_to_swp_entry() doesn't just extract the arch-dependent
+ * swp_entry_t, but also has to convert it from 64bit to the 32bit
+ * intermediate representation, using the following macros based on 64bit
+ * __swp_type() and __swp_offset().
+ */
+#define __pteval_swp_type(x) ((unsigned long)((x).pte >> (64 - SWP_TYPE_BITS)))
+#define __pteval_swp_offset(x) ((unsigned long)(~((x).pte) << SWP_TYPE_BITS >> SWP_OFFSET_SHIFT))
+
+#define __pte_to_swp_entry(pte)	(__swp_entry(__pteval_swp_type(pte), \
+					     __pteval_swp_offset(pte)))
 
 #define gup_get_pte gup_get_pte
 /*
-- 
2.17.1