linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Dave Hansen <dave@sr71.net>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	bp@alien8.de, ak@linux.intel.com, dave.hansen@intel.com,
	dave.hansen@linux.intel.com
Subject: Re: [PATCH 1/4] x86, swap: move swap offset/type up in PTE to work around erratum
Date: Wed, 13 Jul 2016 17:19:05 +0200	[thread overview]
Message-ID: <20160713151905.GB20693@dhcp22.suse.cz> (raw)
In-Reply-To: <20160708001911.9A3FD2B6@viggo.jf.intel.com>

On Thu 07-07-16 17:19:11, Dave Hansen wrote:
> 
> From: Dave Hansen <dave.hansen@linux.intel.com>
> 
> This erratum can result in Accessed/Dirty getting set by the hardware
> when we do not expect them to be (on !Present PTEs).
> 
> Instead of trying to fix them up after this happens, we just
> allow the bits to get set and try to ignore them.  We do this by
> shifting the layout of the bits we use for swap offset/type in
> our 64-bit PTEs.
> 
> It looks like this:
> 
> bitnrs: |     ...            | 11| 10|  9|8|7|6|5| 4| 3|2|1|0|
> names:  |     ...            |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P|
> before: |         OFFSET (9-63)          |0|X|X| TYPE(1-5) |0|
>  after: | OFFSET (14-63)  |  TYPE (9-13) |0|X|X|X| X| X|X|X|0|
> 
> Note that D was already a don't care (X) even before.  We just
> move TYPE up and turn its old spot (which could be hit by the
> A bit) into all don't cares.
> 
> We take 5 bits away from the offset, but that still leaves us
> with 50 bits which lets us index into a 62-bit swapfile (4 EiB).
> I think that's probably fine for the moment.  We could
> theoretically reclaim 5 of the bits (1, 2, 3, 4, 7) but it
> doesn't gain us anything.
> 
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>

Yes, this seems like a safest option. Feel free to add
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
> 
>  b/arch/x86/include/asm/pgtable_64.h |   26 ++++++++++++++++++++------
>  1 file changed, 20 insertions(+), 6 deletions(-)
> 
> diff -puN arch/x86/include/asm/pgtable_64.h~knl-strays-10-move-swp-pte-bits arch/x86/include/asm/pgtable_64.h
> --- a/arch/x86/include/asm/pgtable_64.h~knl-strays-10-move-swp-pte-bits	2016-07-07 17:17:43.556746185 -0700
> +++ b/arch/x86/include/asm/pgtable_64.h	2016-07-07 17:17:43.559746319 -0700
> @@ -140,18 +140,32 @@ static inline int pgd_large(pgd_t pgd) {
>  #define pte_offset_map(dir, address) pte_offset_kernel((dir), (address))
>  #define pte_unmap(pte) ((void)(pte))/* NOP */
>  
> -/* Encode and de-code a swap entry */
> +/*
> + * Encode and de-code a swap entry
> + *
> + * |     ...            | 11| 10|  9|8|7|6|5| 4| 3|2|1|0| <- bit number
> + * |     ...            |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P| <- bit names
> + * | OFFSET (14->63) | TYPE (10-13) |0|X|X|X| X| X|X|X|0| <- swp entry
> + *
> + * G (8) is aliased and used as a PROT_NONE indicator for
> + * !present ptes.  We need to start storing swap entries above
> + * there.  We also need to avoid using A and D because of an
> + * erratum where they can be incorrectly set by hardware on
> + * non-present PTEs.
> + */
> +#define SWP_TYPE_FIRST_BIT (_PAGE_BIT_PROTNONE + 1)
>  #define SWP_TYPE_BITS 5
> -#define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1)
> +/* Place the offset above the type: */
> +#define SWP_OFFSET_FIRST_BIT (SWP_TYPE_FIRST_BIT + SWP_TYPE_BITS + 1)
>  
>  #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS)
>  
> -#define __swp_type(x)			(((x).val >> (_PAGE_BIT_PRESENT + 1)) \
> +#define __swp_type(x)			(((x).val >> (SWP_TYPE_FIRST_BIT)) \
>  					 & ((1U << SWP_TYPE_BITS) - 1))
> -#define __swp_offset(x)			((x).val >> SWP_OFFSET_SHIFT)
> +#define __swp_offset(x)			((x).val >> SWP_OFFSET_FIRST_BIT)
>  #define __swp_entry(type, offset)	((swp_entry_t) { \
> -					 ((type) << (_PAGE_BIT_PRESENT + 1)) \
> -					 | ((offset) << SWP_OFFSET_SHIFT) })
> +					 ((type) << (SWP_TYPE_FIRST_BIT)) \
> +					 | ((offset) << SWP_OFFSET_FIRST_BIT) })
>  #define __pte_to_swp_entry(pte)		((swp_entry_t) { pte_val((pte)) })
>  #define __swp_entry_to_pte(x)		((pte_t) { .pte = (x).val })
>  
> _

-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2016-07-13 15:19 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-08  0:19 [PATCH 0/4] [RFC][v4] Workaround for Xeon Phi PTE A/D bits erratum Dave Hansen
2016-07-08  0:19 ` [PATCH 1/4] x86, swap: move swap offset/type up in PTE to work around erratum Dave Hansen
2016-07-13  8:03   ` [tip:x86/mm] x86/mm: Move " tip-bot for Dave Hansen
2016-07-13 15:19   ` Michal Hocko [this message]
2016-07-08  0:19 ` [PATCH 2/4] x86, pagetable: ignore A/D bits in pte/pmd/pud_none() Dave Hansen
2016-07-13  8:03   ` [tip:x86/mm] x86/mm: Ignore " tip-bot for Dave Hansen
2016-07-13 15:21   ` [PATCH 2/4] x86, pagetable: ignore " Michal Hocko
2016-07-13 15:47     ` Dave Hansen
2016-07-14  6:13       ` Michal Hocko
2016-07-08  0:19 ` [PATCH 3/4] x86: disallow running with 32-bit PTEs to work around erratum Dave Hansen
2016-07-13  8:04   ` [tip:x86/mm] x86/mm: Disallow " tip-bot for Dave Hansen
2016-07-08  0:19 ` [PATCH 4/4] x86: use pte_none() to test for empty PTE Dave Hansen
2016-07-13  8:04   ` [tip:x86/mm] x86/mm: Use " tip-bot for Dave Hansen
2016-07-13 15:18   ` [PATCH 4/4] x86: use " Michal Hocko
2016-07-13 15:23     ` Julia Lawall
2016-07-13 15:49     ` Julia Lawall
2016-07-13 16:28       ` Dave Hansen
2016-07-14 13:47   ` Vlastimil Babka
2016-07-14 14:24     ` Dave Hansen
2016-07-14 14:50       ` David Vrabel
2016-07-13  9:54 ` [PATCH 0/4] [RFC][v4] Workaround for Xeon Phi PTE A/D bits erratum Vlastimil Babka
  -- strict thread matches above, loose matches on Subject: below --
2016-07-01 17:46 Dave Hansen
2016-07-01 17:47 ` [PATCH 1/4] x86, swap: move swap offset/type up in PTE to work around erratum Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160713151905.GB20693@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave@sr71.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).