From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753671AbcFONGZ (ORCPT ); Wed, 15 Jun 2016 09:06:25 -0400 Received: from mga11.intel.com ([192.55.52.93]:36184 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753107AbcFONGY convert rfc822-to-8bit (ORCPT ); Wed, 15 Jun 2016 09:06:24 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.26,476,1459839600"; d="scan'208";a="976194202" From: "Anaczkowski, Lukasz" To: Dave Hansen , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "tglx@linutronix.de" , "mingo@redhat.com" , "ak@linux.intel.com" , "kirill.shutemov@linux.intel.com" , "mhocko@suse.com" , "akpm@linux-foundation.org" , "hpa@zytor.com" CC: "Srinivasappa, Harish" , "Odzioba, Lukasz" Subject: RE: [PATCH] Linux VM workaround for Knights Landing A/D leak Thread-Topic: [PATCH] Linux VM workaround for Knights Landing A/D leak Thread-Index: AQHRxlWzvl3HK2neZUGqLjgPvidVhJ/pJIQAgADxqrA= Date: Wed, 15 Jun 2016 13:06:17 +0000 Message-ID: References: <1465919919-2093-1-git-send-email-lukasz.anaczkowski@intel.com> <57603CBE.7090702@linux.intel.com> In-Reply-To: <57603CBE.7090702@linux.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYWNiM2ZhNTktOTM3OC00ZWQ4LWFkMjAtZDRkZDEzNDY4NWU2IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6Ik85eU5ycmFYYXBMXC8wcUJ4OGwrQnFjWjZEUk9wWDZtNFdOdHVaYlF6ZHVzPSJ9 x-ctpclassification: CTP_IC x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Dave Hansen [mailto:dave.hansen@linux.intel.com] Sent: Tuesday, June 14, 2016 7:20 PM >> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h ... >> +extern void fix_pte_leak(struct mm_struct *mm, unsigned long addr, >> + pte_t *ptep); > Doesn't hugetlb.h somehow #include pgtable.h? So why double-define > fix_pte_leak()? It's other way round - pgtable.h somehow includes hugetlb.h. I've removed duplicated fix_pte_leak() declaration. >> diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h >> index 2ee7811..6fa4079 100644 >> --- a/arch/x86/include/asm/pgtable_64.h >> +++ b/arch/x86/include/asm/pgtable_64.h >> @@ -178,6 +178,12 @@ extern void cleanup_highmap(void); >> extern void init_extra_mapping_uc(unsigned long phys, unsigned long size); >> extern void init_extra_mapping_wb(unsigned long phys, unsigned long size); >> >> +#define ARCH_HAS_NEEDS_SWAP_PTL 1 >> +static inline bool arch_needs_swap_ptl(void) >> +{ >> + return boot_cpu_has_bug(X86_BUG_PTE_LEAK); >> +} >> + >> #endif /* !__ASSEMBLY__ */ > I think we need a comment on this sucker. I'm not sure we should lean > solely on the commit message to record why we need this until the end of > time. OK. >> + if (c->x86_model == 87) { > Please use the macros in here for the model id: OK. > http://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/tree/arch/x86/include/asm/intel-family.h > We also probably want to prefix the pr_info() with something like > "x86/intel:". OK >> +/* >> + * Workaround for KNL issue: > Please be specific about what this "KNL issue" *is*. OK >> + * A thread that is going to page fault due to P=0, may still >> + * non atomically set A or D bits, which could corrupt swap entries. >> + * Always flush the other CPUs and clear the PTE again to avoid >> + * this leakage. We are excluded using the pagetable lock. >> + */ >> + >> +void fix_pte_leak(struct mm_struct *mm, unsigned long addr, pte_t *ptep) >> +{ >> + if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids) { >> + trace_tlb_flush(TLB_LOCAL_SHOOTDOWN, TLB_FLUSH_ALL); >> + flush_tlb_others(mm_cpumask(mm), mm, addr, >> + addr + PAGE_SIZE); >> + mb(); >> + set_pte(ptep, __pte(0)); >> + } >> +} > > I think the comment here is a bit sparse. Can we add some more details, > like: > > Entering here, the current CPU just cleared the PTE. But, > another thread may have raced and set the A or D bits, or be > _about_ to set the bits. Shooting their TLB entry down will > ensure they see the cleared PTE and will not set A or D. > > and by the set_pte(): > > Clear the PTE one more time, in case the other thread set A/D > before we sent the TLB flush. Thanks, Lukasz