From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DD67C47089 for ; Thu, 27 May 2021 23:59:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2025B60233 for ; Thu, 27 May 2021 23:59:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232943AbhE1AAt (ORCPT ); Thu, 27 May 2021 20:00:49 -0400 Received: from mga09.intel.com ([134.134.136.24]:65453 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233768AbhE0X7f (ORCPT ); Thu, 27 May 2021 19:59:35 -0400 IronPort-SDR: WrVZIhK5VRfWW9PskhQB8mQ5cfmvgjiB4+mp49lknDKWMjPLPT2jsXtH12PQ4FSgwQK4JMScjC jJfGyKim7fAw== X-IronPort-AV: E=McAfee;i="6200,9189,9997"; a="202864150" X-IronPort-AV: E=Sophos;i="5.83,228,1616482800"; d="scan'208";a="202864150" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 May 2021 16:57:41 -0700 IronPort-SDR: WhFUaJMdRoZHktc0K9pH1n1J+PqBSQSOClzMtQtUf/siDr6PqPItak+rH85FtQokhyAvUSdrjv h0lCEipzg22w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,228,1616482800"; d="scan'208";a="477705335" Received: from viggo.jf.intel.com (HELO localhost.localdomain) ([10.54.77.144]) by orsmga001.jf.intel.com with ESMTP; 27 May 2021 16:57:41 -0700 Subject: [PATCH 5/5] selftests/vm/pkeys: exercise x86 XSAVE init state To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Dave Hansen , tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, luto@kernel.org, shuah@kernel.org, babu.moger@amd.com, dave.kleikamp@oracle.com, linuxram@us.ibm.com, bauerman@linux.ibm.com, bigeasy@linutronix.de From: Dave Hansen Date: Thu, 27 May 2021 16:51:19 -0700 References: <20210527235109.B2A9F45F@viggo.jf.intel.com> In-Reply-To: <20210527235109.B2A9F45F@viggo.jf.intel.com> Message-Id: <20210527235119.9D443084@viggo.jf.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Dave Hansen On x86, there is a set of instructions used to save and restore register state collectively known as the XSAVE architecture. There are about a dozen different features managed with XSAVE. The protection keys register, PKRU, is one of those features. The hardware optimizes XSAVE by tracking when the state has not changed from its initial (init) state. In this case, it can avoid the cost of writing state to memory (it would usually just be a bunch of 0's). When the pkey register is 0x0 the hardware optionally choose to track the register as being in the init state (optimize away the writes). AMD CPUs do this more aggressively compared to Intel. On x86, PKRU is rarely in its (very permissive) init state. Instead, the value defaults to something very restrictive. It is not surprising that bugs have popped up in the rare cases when PKRU reaches its init state. Add a protection key selftest which gets the protection keys register into its init state in a way that should work on Intel and AMD. Then, do a bunch of pkey register reads to watch for inadvertent changes. This adds "-mxsave" to CFLAGS for all the x86 vm selftests in order to allow use of the XSAVE instruction __builtin functions. This will make the builtins available on all of the vm selftests, but is expected to be harmless. Signed-off-by: Dave Hansen Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: x86@kernel.org Cc: Andy Lutomirski Cc: Shuah Khan Cc: Babu Moger Cc: Dave Kleikamp Cc: Ram Pai Cc: Thiago Jung Bauermann Cc: Sebastian Andrzej Siewior --- b/tools/testing/selftests/vm/Makefile | 4 - b/tools/testing/selftests/vm/pkey-x86.h | 1 b/tools/testing/selftests/vm/protection_keys.c | 71 +++++++++++++++++++++++++ 3 files changed, 74 insertions(+), 2 deletions(-) diff -puN tools/testing/selftests/vm/Makefile~init-pkru-selftest tools/testing/selftests/vm/Makefile --- a/tools/testing/selftests/vm/Makefile~init-pkru-selftest 2021-05-27 16:40:28.299705459 -0700 +++ b/tools/testing/selftests/vm/Makefile 2021-05-27 16:40:28.315705459 -0700 @@ -99,7 +99,7 @@ $(1) $(1)_64: $(OUTPUT)/$(1)_64 endef ifeq ($(CAN_BUILD_I386),1) -$(BINARIES_32): CFLAGS += -m32 +$(BINARIES_32): CFLAGS += -m32 -mxsave $(BINARIES_32): LDLIBS += -lrt -ldl -lm $(BINARIES_32): $(OUTPUT)/%_32: %.c $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@ @@ -107,7 +107,7 @@ $(foreach t,$(TARGETS),$(eval $(call gen endif ifeq ($(CAN_BUILD_X86_64),1) -$(BINARIES_64): CFLAGS += -m64 +$(BINARIES_64): CFLAGS += -m64 -mxsave $(BINARIES_64): LDLIBS += -lrt -ldl $(BINARIES_64): $(OUTPUT)/%_64: %.c $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@ diff -puN tools/testing/selftests/vm/pkey-x86.h~init-pkru-selftest tools/testing/selftests/vm/pkey-x86.h --- a/tools/testing/selftests/vm/pkey-x86.h~init-pkru-selftest 2021-05-27 16:40:28.301705459 -0700 +++ b/tools/testing/selftests/vm/pkey-x86.h 2021-05-27 16:40:28.315705459 -0700 @@ -126,6 +126,7 @@ static inline u32 pkey_bit_position(int #define XSTATE_PKEY_BIT (9) #define XSTATE_PKEY 0x200 +#define XSTATE_BV_OFFSET 512 int pkey_reg_xstate_offset(void) { diff -puN tools/testing/selftests/vm/protection_keys.c~init-pkru-selftest tools/testing/selftests/vm/protection_keys.c --- a/tools/testing/selftests/vm/protection_keys.c~init-pkru-selftest 2021-05-27 16:40:28.303705459 -0700 +++ b/tools/testing/selftests/vm/protection_keys.c 2021-05-27 16:40:28.314705459 -0700 @@ -1278,6 +1278,76 @@ void test_pkey_alloc_exhaust(int *ptr, u } } +void arch_force_pkey_reg_init(void) +{ +#if defined(__i386__) || defined(__x86_64__) /* arch */ + u64 *buf; + + /* + * All keys should be allocated and set to allow reads and + * writes, so the register should be all 0. If not, just + * skip the test. + */ + if (read_pkey_reg()) + return; + + /* + * Just allocate an absurd about of memory rather than + * doing the XSAVE size enumeration dance. + */ + buf = mmap(NULL, 1*MB, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); + + /* These __builtins require compiling with -mxsave */ + + /* XSAVE to build a valid buffer: */ + __builtin_ia32_xsave(buf, XSTATE_PKEY); + /* Clear XSTATE_BV[PKRU]: */ + buf[XSTATE_BV_OFFSET/sizeof(u64)] &= ~XSTATE_PKEY; + /* XRSTOR will likely get PKRU back to the init state: */ + __builtin_ia32_xrstor(buf, XSTATE_PKEY); + + munmap(buf, 1*MB); +#endif +} + + +/* + * This is mostly useless on ppc for now. But it will not + * hurt anything and should give some better coverage as + * a long-running test that continually checks the pkey + * register. + */ +void test_pkey_init_state(int *ptr, u16 pkey) +{ + int err; + int allocated_pkeys[NR_PKEYS] = {0}; + int nr_allocated_pkeys = 0; + int i; + + for (i = 0; i < NR_PKEYS*3; i++) { + int new_pkey = alloc_pkey(); + + allocated_pkeys[nr_allocated_pkeys++] = new_pkey; + } + + dprintf3("%s()::%d\n", __func__, __LINE__); + + arch_force_pkey_reg_init(); + + /* + * Loop for a bit, hoping to get exercise the kernel + * context switch code. + */ + for (i = 0; i < 1000000; i++) + read_pkey_reg(); + + for (i = 0; i < nr_allocated_pkeys; i++) { + err = sys_pkey_free(allocated_pkeys[i]); + pkey_assert(!err); + read_pkey_reg(); /* for shadow checking */ + } +} + /* * pkey 0 is special. It is allocated by default, so you do not * have to call pkey_alloc() to use it first. Make sure that it @@ -1502,6 +1572,7 @@ void (*pkey_tests[])(int *ptr, u16 pkey) test_implicit_mprotect_exec_only_memory, test_mprotect_with_pkey_0, test_ptrace_of_child, + test_pkey_init_state, test_pkey_syscalls_on_non_allocated_pkey, test_pkey_syscalls_bad_args, test_pkey_alloc_exhaust, _ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 953FDC4708C for ; Thu, 27 May 2021 23:57:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 43B5960233 for ; Thu, 27 May 2021 23:57:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 43B5960233 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6F1876B0073; Thu, 27 May 2021 19:57:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C7588D0001; Thu, 27 May 2021 19:57:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 590C86B0075; Thu, 27 May 2021 19:57:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0234.hostedemail.com [216.40.44.234]) by kanga.kvack.org (Postfix) with ESMTP id 2277C6B0073 for ; Thu, 27 May 2021 19:57:44 -0400 (EDT) Received: from smtpin33.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id B7A7F1813E4E0 for ; Thu, 27 May 2021 23:57:43 +0000 (UTC) X-FDA: 78188676006.33.DEDD2DC Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf19.hostedemail.com (Postfix) with ESMTP id 1810790012EB for ; Thu, 27 May 2021 23:57:32 +0000 (UTC) IronPort-SDR: xw1sT/LLC9LJCCDGv6E3kQTciytAH02pXJUBNiImC630UBY0x4PhhBkJozXuj7f2a4IN9YgnJf IropaVlMz8Ag== X-IronPort-AV: E=McAfee;i="6200,9189,9997"; a="266746138" X-IronPort-AV: E=Sophos;i="5.83,228,1616482800"; d="scan'208";a="266746138" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 May 2021 16:57:41 -0700 IronPort-SDR: WhFUaJMdRoZHktc0K9pH1n1J+PqBSQSOClzMtQtUf/siDr6PqPItak+rH85FtQokhyAvUSdrjv h0lCEipzg22w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,228,1616482800"; d="scan'208";a="477705335" Received: from viggo.jf.intel.com (HELO localhost.localdomain) ([10.54.77.144]) by orsmga001.jf.intel.com with ESMTP; 27 May 2021 16:57:41 -0700 Subject: [PATCH 5/5] selftests/vm/pkeys: exercise x86 XSAVE init state To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org,Dave Hansen ,tglx@linutronix.de,mingo@redhat.com,bp@alien8.de,x86@kernel.org,luto@kernel.org,shuah@kernel.org,babu.moger@amd.com,dave.kleikamp@oracle.com,linuxram@us.ibm.com,bauerman@linux.ibm.com,bigeasy@linutronix.de From: Dave Hansen Date: Thu, 27 May 2021 16:51:19 -0700 References: <20210527235109.B2A9F45F@viggo.jf.intel.com> In-Reply-To: <20210527235109.B2A9F45F@viggo.jf.intel.com> Message-Id: <20210527235119.9D443084@viggo.jf.intel.com> X-Rspamd-Queue-Id: 1810790012EB Authentication-Results: imf19.hostedemail.com; dkim=none; spf=none (imf19.hostedemail.com: domain of dave.hansen@linux.intel.com has no SPF policy when checking 134.134.136.100) smtp.mailfrom=dave.hansen@linux.intel.com; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none) X-Rspamd-Server: rspam03 X-Stat-Signature: rhzgjyty5krkt7imeypo3ytkcmnh4t5u X-HE-Tag: 1622159852-391916 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Dave Hansen On x86, there is a set of instructions used to save and restore register state collectively known as the XSAVE architecture. There are about a dozen different features managed with XSAVE. The protection keys register, PKRU, is one of those features. The hardware optimizes XSAVE by tracking when the state has not changed from its initial (init) state. In this case, it can avoid the cost of writing state to memory (it would usually just be a bunch of 0's). When the pkey register is 0x0 the hardware optionally choose to track the register as being in the init state (optimize away the writes). AMD CPUs do this more aggressively compared to Intel. On x86, PKRU is rarely in its (very permissive) init state. Instead, the value defaults to something very restrictive. It is not surprising that bugs have popped up in the rare cases when PKRU reaches its init state. Add a protection key selftest which gets the protection keys register into its init state in a way that should work on Intel and AMD. Then, do a bunch of pkey register reads to watch for inadvertent changes. This adds "-mxsave" to CFLAGS for all the x86 vm selftests in order to allow use of the XSAVE instruction __builtin functions. This will make the builtins available on all of the vm selftests, but is expected to be harmless. Signed-off-by: Dave Hansen Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: x86@kernel.org Cc: Andy Lutomirski Cc: Shuah Khan Cc: Babu Moger Cc: Dave Kleikamp Cc: Ram Pai Cc: Thiago Jung Bauermann Cc: Sebastian Andrzej Siewior --- b/tools/testing/selftests/vm/Makefile | 4 - b/tools/testing/selftests/vm/pkey-x86.h | 1 b/tools/testing/selftests/vm/protection_keys.c | 71 +++++++++++++++++++++++++ 3 files changed, 74 insertions(+), 2 deletions(-) diff -puN tools/testing/selftests/vm/Makefile~init-pkru-selftest tools/testing/selftests/vm/Makefile --- a/tools/testing/selftests/vm/Makefile~init-pkru-selftest 2021-05-27 16:40:28.299705459 -0700 +++ b/tools/testing/selftests/vm/Makefile 2021-05-27 16:40:28.315705459 -0700 @@ -99,7 +99,7 @@ $(1) $(1)_64: $(OUTPUT)/$(1)_64 endef ifeq ($(CAN_BUILD_I386),1) -$(BINARIES_32): CFLAGS += -m32 +$(BINARIES_32): CFLAGS += -m32 -mxsave $(BINARIES_32): LDLIBS += -lrt -ldl -lm $(BINARIES_32): $(OUTPUT)/%_32: %.c $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@ @@ -107,7 +107,7 @@ $(foreach t,$(TARGETS),$(eval $(call gen endif ifeq ($(CAN_BUILD_X86_64),1) -$(BINARIES_64): CFLAGS += -m64 +$(BINARIES_64): CFLAGS += -m64 -mxsave $(BINARIES_64): LDLIBS += -lrt -ldl $(BINARIES_64): $(OUTPUT)/%_64: %.c $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@ diff -puN tools/testing/selftests/vm/pkey-x86.h~init-pkru-selftest tools/testing/selftests/vm/pkey-x86.h --- a/tools/testing/selftests/vm/pkey-x86.h~init-pkru-selftest 2021-05-27 16:40:28.301705459 -0700 +++ b/tools/testing/selftests/vm/pkey-x86.h 2021-05-27 16:40:28.315705459 -0700 @@ -126,6 +126,7 @@ static inline u32 pkey_bit_position(int #define XSTATE_PKEY_BIT (9) #define XSTATE_PKEY 0x200 +#define XSTATE_BV_OFFSET 512 int pkey_reg_xstate_offset(void) { diff -puN tools/testing/selftests/vm/protection_keys.c~init-pkru-selftest tools/testing/selftests/vm/protection_keys.c --- a/tools/testing/selftests/vm/protection_keys.c~init-pkru-selftest 2021-05-27 16:40:28.303705459 -0700 +++ b/tools/testing/selftests/vm/protection_keys.c 2021-05-27 16:40:28.314705459 -0700 @@ -1278,6 +1278,76 @@ void test_pkey_alloc_exhaust(int *ptr, u } } +void arch_force_pkey_reg_init(void) +{ +#if defined(__i386__) || defined(__x86_64__) /* arch */ + u64 *buf; + + /* + * All keys should be allocated and set to allow reads and + * writes, so the register should be all 0. If not, just + * skip the test. + */ + if (read_pkey_reg()) + return; + + /* + * Just allocate an absurd about of memory rather than + * doing the XSAVE size enumeration dance. + */ + buf = mmap(NULL, 1*MB, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); + + /* These __builtins require compiling with -mxsave */ + + /* XSAVE to build a valid buffer: */ + __builtin_ia32_xsave(buf, XSTATE_PKEY); + /* Clear XSTATE_BV[PKRU]: */ + buf[XSTATE_BV_OFFSET/sizeof(u64)] &= ~XSTATE_PKEY; + /* XRSTOR will likely get PKRU back to the init state: */ + __builtin_ia32_xrstor(buf, XSTATE_PKEY); + + munmap(buf, 1*MB); +#endif +} + + +/* + * This is mostly useless on ppc for now. But it will not + * hurt anything and should give some better coverage as + * a long-running test that continually checks the pkey + * register. + */ +void test_pkey_init_state(int *ptr, u16 pkey) +{ + int err; + int allocated_pkeys[NR_PKEYS] = {0}; + int nr_allocated_pkeys = 0; + int i; + + for (i = 0; i < NR_PKEYS*3; i++) { + int new_pkey = alloc_pkey(); + + allocated_pkeys[nr_allocated_pkeys++] = new_pkey; + } + + dprintf3("%s()::%d\n", __func__, __LINE__); + + arch_force_pkey_reg_init(); + + /* + * Loop for a bit, hoping to get exercise the kernel + * context switch code. + */ + for (i = 0; i < 1000000; i++) + read_pkey_reg(); + + for (i = 0; i < nr_allocated_pkeys; i++) { + err = sys_pkey_free(allocated_pkeys[i]); + pkey_assert(!err); + read_pkey_reg(); /* for shadow checking */ + } +} + /* * pkey 0 is special. It is allocated by default, so you do not * have to call pkey_alloc() to use it first. Make sure that it @@ -1502,6 +1572,7 @@ void (*pkey_tests[])(int *ptr, u16 pkey) test_implicit_mprotect_exec_only_memory, test_mprotect_with_pkey_0, test_ptrace_of_child, + test_pkey_init_state, test_pkey_syscalls_on_non_allocated_pkey, test_pkey_syscalls_bad_args, test_pkey_alloc_exhaust, _