From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751121AbcGMPTZ (ORCPT ); Wed, 13 Jul 2016 11:19:25 -0400 Received: from mail-wm0-f68.google.com ([74.125.82.68]:33079 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750854AbcGMPTQ (ORCPT ); Wed, 13 Jul 2016 11:19:16 -0400 Date: Wed, 13 Jul 2016 17:19:05 +0200 From: Michal Hocko To: Dave Hansen Cc: linux-kernel@vger.kernel.org, x86@kernel.org, linux-mm@kvack.org, torvalds@linux-foundation.org, akpm@linux-foundation.org, bp@alien8.de, ak@linux.intel.com, dave.hansen@intel.com, dave.hansen@linux.intel.com Subject: Re: [PATCH 1/4] x86, swap: move swap offset/type up in PTE to work around erratum Message-ID: <20160713151905.GB20693@dhcp22.suse.cz> References: <20160708001909.FB2443E2@viggo.jf.intel.com> <20160708001911.9A3FD2B6@viggo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160708001911.9A3FD2B6@viggo.jf.intel.com> User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 07-07-16 17:19:11, Dave Hansen wrote: > > From: Dave Hansen > > This erratum can result in Accessed/Dirty getting set by the hardware > when we do not expect them to be (on !Present PTEs). > > Instead of trying to fix them up after this happens, we just > allow the bits to get set and try to ignore them. We do this by > shifting the layout of the bits we use for swap offset/type in > our 64-bit PTEs. > > It looks like this: > > bitnrs: | ... | 11| 10| 9|8|7|6|5| 4| 3|2|1|0| > names: | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P| > before: | OFFSET (9-63) |0|X|X| TYPE(1-5) |0| > after: | OFFSET (14-63) | TYPE (9-13) |0|X|X|X| X| X|X|X|0| > > Note that D was already a don't care (X) even before. We just > move TYPE up and turn its old spot (which could be hit by the > A bit) into all don't cares. > > We take 5 bits away from the offset, but that still leaves us > with 50 bits which lets us index into a 62-bit swapfile (4 EiB). > I think that's probably fine for the moment. We could > theoretically reclaim 5 of the bits (1, 2, 3, 4, 7) but it > doesn't gain us anything. > > Signed-off-by: Dave Hansen Yes, this seems like a safest option. Feel free to add Acked-by: Michal Hocko > --- > > b/arch/x86/include/asm/pgtable_64.h | 26 ++++++++++++++++++++------ > 1 file changed, 20 insertions(+), 6 deletions(-) > > diff -puN arch/x86/include/asm/pgtable_64.h~knl-strays-10-move-swp-pte-bits arch/x86/include/asm/pgtable_64.h > --- a/arch/x86/include/asm/pgtable_64.h~knl-strays-10-move-swp-pte-bits 2016-07-07 17:17:43.556746185 -0700 > +++ b/arch/x86/include/asm/pgtable_64.h 2016-07-07 17:17:43.559746319 -0700 > @@ -140,18 +140,32 @@ static inline int pgd_large(pgd_t pgd) { > #define pte_offset_map(dir, address) pte_offset_kernel((dir), (address)) > #define pte_unmap(pte) ((void)(pte))/* NOP */ > > -/* Encode and de-code a swap entry */ > +/* > + * Encode and de-code a swap entry > + * > + * | ... | 11| 10| 9|8|7|6|5| 4| 3|2|1|0| <- bit number > + * | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P| <- bit names > + * | OFFSET (14->63) | TYPE (10-13) |0|X|X|X| X| X|X|X|0| <- swp entry > + * > + * G (8) is aliased and used as a PROT_NONE indicator for > + * !present ptes. We need to start storing swap entries above > + * there. We also need to avoid using A and D because of an > + * erratum where they can be incorrectly set by hardware on > + * non-present PTEs. > + */ > +#define SWP_TYPE_FIRST_BIT (_PAGE_BIT_PROTNONE + 1) > #define SWP_TYPE_BITS 5 > -#define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1) > +/* Place the offset above the type: */ > +#define SWP_OFFSET_FIRST_BIT (SWP_TYPE_FIRST_BIT + SWP_TYPE_BITS + 1) > > #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS) > > -#define __swp_type(x) (((x).val >> (_PAGE_BIT_PRESENT + 1)) \ > +#define __swp_type(x) (((x).val >> (SWP_TYPE_FIRST_BIT)) \ > & ((1U << SWP_TYPE_BITS) - 1)) > -#define __swp_offset(x) ((x).val >> SWP_OFFSET_SHIFT) > +#define __swp_offset(x) ((x).val >> SWP_OFFSET_FIRST_BIT) > #define __swp_entry(type, offset) ((swp_entry_t) { \ > - ((type) << (_PAGE_BIT_PRESENT + 1)) \ > - | ((offset) << SWP_OFFSET_SHIFT) }) > + ((type) << (SWP_TYPE_FIRST_BIT)) \ > + | ((offset) << SWP_OFFSET_FIRST_BIT) }) > #define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val((pte)) }) > #define __swp_entry_to_pte(x) ((pte_t) { .pte = (x).val }) > > _ -- Michal Hocko SUSE Labs