From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eugene Syromiatnikov Subject: Re: [RFC PATCH v4 3/9] x86/cet/ibt: Add IBT legacy code bitmap allocation function Date: Fri, 5 Oct 2018 19:26:22 +0200 Message-ID: <20181005172622.GD19360@asgard.redhat.com> References: <20180921150553.21016-1-yu-cheng.yu@intel.com> <20180921150553.21016-4-yu-cheng.yu@intel.com> <20181003195702.GF32759@asgard.redhat.com> <5BF3AE8F-CC2A-4160-9FF6-FEA171A76371@amacapital.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org To: Andy Lutomirski Cc: Yu-cheng Yu , X86 ML , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , LKML , linux-doc@vger.kernel.org, Linux-MM , linux-arch , Linux API , Arnd Bergmann , Balbir Singh , Cyrill Gorcunov , Dave Hansen , Florian Weimer , "H. J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit Oleg Nesterov List-Id: linux-arch.vger.kernel.org On Fri, Oct 05, 2018 at 10:07:46AM -0700, Andy Lutomirski wrote: > On Fri, Oct 5, 2018 at 10:03 AM Yu-cheng Yu wrote: > > > > On Fri, 2018-10-05 at 09:28 -0700, Andy Lutomirski wrote: > > > > On Oct 5, 2018, at 9:13 AM, Yu-cheng Yu wrote: > > > > > > > > > On Wed, 2018-10-03 at 21:57 +0200, Eugene Syromiatnikov wrote: > > > > > > On Fri, Sep 21, 2018 at 08:05:47AM -0700, Yu-cheng Yu wrote: > > > > > > Indirect branch tracking provides an optional legacy code bitmap > > > > > > that indicates locations of non-IBT compatible code. When set, > > > > > > each bit in the bitmap represents a page in the linear address is > > > > > > legacy code. > > > > > > > > > > > > We allocate the bitmap only when the application requests it. > > > > > > Most applications do not need the bitmap. > > > > > > > > > > > > Signed-off-by: Yu-cheng Yu > > > > > > --- > > > > > > arch/x86/kernel/cet.c | 45 +++++++++++++++++++++++++++++++++++++++++++ > > > > > > 1 file changed, 45 insertions(+) > > > > > > > > > > > > diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c > > > > > > index 6adfe795d692..a65d9745af08 100644 > > > > > > --- a/arch/x86/kernel/cet.c > > > > > > +++ b/arch/x86/kernel/cet.c > > > > > > @@ -314,3 +314,48 @@ void cet_disable_ibt(void) > > > > > > wrmsrl(MSR_IA32_U_CET, r); > > > > > > current->thread.cet.ibt_enabled = 0; > > > > > > } > > > > > > + > > > > > > +int cet_setup_ibt_bitmap(void) > > > > > > +{ > > > > > > + u64 r; > > > > > > + unsigned long bitmap; > > > > > > + unsigned long size; > > > > > > + > > > > > > + if (!cpu_feature_enabled(X86_FEATURE_IBT)) > > > > > > + return -EOPNOTSUPP; > > > > > > + > > > > > > + if (!current->thread.cet.ibt_bitmap_addr) { > > > > > > + /* > > > > > > + * Calculate size and put in thread header. > > > > > > + * may_expand_vm() needs this information. > > > > > > + */ > > > > > > + size = TASK_SIZE / PAGE_SIZE / BITS_PER_BYTE; > > > > > > > > > > TASK_SIZE_MAX is likely needed here, as an application can easily switch > > > > > between long an 32-bit protected mode. And then the case of a CPU that > > > > > doesn't support 5LPT. > > > > > > > > If we had calculated bitmap size from TASK_SIZE_MAX, all 32-bit apps would > > > > have > > > > failed the allocation for bitmap size > TASK_SIZE. Please see values below, > > > > which is printed from the current code. > > > > > > > > Yu-cheng > > > > > > > > > > > > x64: > > > > TASK_SIZE_MAX = 0000 7fff ffff f000 > > > > TASK_SIZE = 0000 7fff ffff f000 > > > > bitmap size = 0000 0000 ffff ffff > > > > > > > > x32: > > > > TASK_SIZE_MAX = 0000 7fff ffff f000 > > > > TASK_SIZE = 0000 0000 ffff e000 > > > > bitmap size = 0000 0000 0001 ffff > > > > > > > > > > I haven’t followed all the details here, but I have a general policy of > > > objecting to any new use of TASK_SIZE. If you really really need to depend on > > > 32-bitness in new code, please figure out what exactly you mean by “32-bit” > > > and use an explicit check. > > > > The explicit check would be: > > > > test_thread_flag(TIF_ADDR32) ? IA32_PAGE_OFFSET : TASK_SIZE_MAX > > > > which is the same as TASK_SIZE. > > But this is only ever done in response to a syscall, right? So > wouldn't in_compat_syscall() be the right check? > > Also, this whole thing makes me extremely nervous. The MSR only > contains the start address, not the size, right? So what prevents > some goof from causing the CPU to read way past the end of the bitmap > if the bitmap is short because the kernel thought it was supposed to > be 32-bit? That's what I've mentioned initially: every syscall made with int 0x80 is interpreted as compat, even if it was made from long mode. > I'm inclined to suggest something awful-ish: always allocate the > bitmap as though it's for a 64-bit process, and just let it be at a > high address. And add a syscall or arch_prctl() to manipulate it for > the benefit of 32-bit programs that can't address it directly. That's likely the only way to go. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:42188 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727941AbeJFAZp (ORCPT ); Fri, 5 Oct 2018 20:25:45 -0400 Date: Fri, 5 Oct 2018 19:26:22 +0200 From: Eugene Syromiatnikov Subject: Re: [RFC PATCH v4 3/9] x86/cet/ibt: Add IBT legacy code bitmap allocation function Message-ID: <20181005172622.GD19360@asgard.redhat.com> References: <20180921150553.21016-1-yu-cheng.yu@intel.com> <20180921150553.21016-4-yu-cheng.yu@intel.com> <20181003195702.GF32759@asgard.redhat.com> <5BF3AE8F-CC2A-4160-9FF6-FEA171A76371@amacapital.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Sender: linux-arch-owner@vger.kernel.org List-ID: To: Andy Lutomirski Cc: Yu-cheng Yu , X86 ML , "H. Peter Anvin" , Thomas Gleixner , Ingo Molnar , LKML , linux-doc@vger.kernel.org, Linux-MM , linux-arch , Linux API , Arnd Bergmann , Balbir Singh , Cyrill Gorcunov , Dave Hansen , Florian Weimer , "H. J. Lu" , Jann Horn , Jonathan Corbet , Kees Cook , Mike Kravetz , Nadav Amit , Oleg Nesterov , Pavel Machek , Peter Zijlstra , Randy Dunlap , "Ravi V. Shankar" , "Shanbhogue, Vedvyas" Message-ID: <20181005172622.xPh8lhQhXg6Y2Wu_EcDTEOEmFZd1ie9HdlGb63E6EJU@z> On Fri, Oct 05, 2018 at 10:07:46AM -0700, Andy Lutomirski wrote: > On Fri, Oct 5, 2018 at 10:03 AM Yu-cheng Yu wrote: > > > > On Fri, 2018-10-05 at 09:28 -0700, Andy Lutomirski wrote: > > > > On Oct 5, 2018, at 9:13 AM, Yu-cheng Yu wrote: > > > > > > > > > On Wed, 2018-10-03 at 21:57 +0200, Eugene Syromiatnikov wrote: > > > > > > On Fri, Sep 21, 2018 at 08:05:47AM -0700, Yu-cheng Yu wrote: > > > > > > Indirect branch tracking provides an optional legacy code bitmap > > > > > > that indicates locations of non-IBT compatible code. When set, > > > > > > each bit in the bitmap represents a page in the linear address is > > > > > > legacy code. > > > > > > > > > > > > We allocate the bitmap only when the application requests it. > > > > > > Most applications do not need the bitmap. > > > > > > > > > > > > Signed-off-by: Yu-cheng Yu > > > > > > --- > > > > > > arch/x86/kernel/cet.c | 45 +++++++++++++++++++++++++++++++++++++++++++ > > > > > > 1 file changed, 45 insertions(+) > > > > > > > > > > > > diff --git a/arch/x86/kernel/cet.c b/arch/x86/kernel/cet.c > > > > > > index 6adfe795d692..a65d9745af08 100644 > > > > > > --- a/arch/x86/kernel/cet.c > > > > > > +++ b/arch/x86/kernel/cet.c > > > > > > @@ -314,3 +314,48 @@ void cet_disable_ibt(void) > > > > > > wrmsrl(MSR_IA32_U_CET, r); > > > > > > current->thread.cet.ibt_enabled = 0; > > > > > > } > > > > > > + > > > > > > +int cet_setup_ibt_bitmap(void) > > > > > > +{ > > > > > > + u64 r; > > > > > > + unsigned long bitmap; > > > > > > + unsigned long size; > > > > > > + > > > > > > + if (!cpu_feature_enabled(X86_FEATURE_IBT)) > > > > > > + return -EOPNOTSUPP; > > > > > > + > > > > > > + if (!current->thread.cet.ibt_bitmap_addr) { > > > > > > + /* > > > > > > + * Calculate size and put in thread header. > > > > > > + * may_expand_vm() needs this information. > > > > > > + */ > > > > > > + size = TASK_SIZE / PAGE_SIZE / BITS_PER_BYTE; > > > > > > > > > > TASK_SIZE_MAX is likely needed here, as an application can easily switch > > > > > between long an 32-bit protected mode. And then the case of a CPU that > > > > > doesn't support 5LPT. > > > > > > > > If we had calculated bitmap size from TASK_SIZE_MAX, all 32-bit apps would > > > > have > > > > failed the allocation for bitmap size > TASK_SIZE. Please see values below, > > > > which is printed from the current code. > > > > > > > > Yu-cheng > > > > > > > > > > > > x64: > > > > TASK_SIZE_MAX = 0000 7fff ffff f000 > > > > TASK_SIZE = 0000 7fff ffff f000 > > > > bitmap size = 0000 0000 ffff ffff > > > > > > > > x32: > > > > TASK_SIZE_MAX = 0000 7fff ffff f000 > > > > TASK_SIZE = 0000 0000 ffff e000 > > > > bitmap size = 0000 0000 0001 ffff > > > > > > > > > > I haven’t followed all the details here, but I have a general policy of > > > objecting to any new use of TASK_SIZE. If you really really need to depend on > > > 32-bitness in new code, please figure out what exactly you mean by “32-bit” > > > and use an explicit check. > > > > The explicit check would be: > > > > test_thread_flag(TIF_ADDR32) ? IA32_PAGE_OFFSET : TASK_SIZE_MAX > > > > which is the same as TASK_SIZE. > > But this is only ever done in response to a syscall, right? So > wouldn't in_compat_syscall() be the right check? > > Also, this whole thing makes me extremely nervous. The MSR only > contains the start address, not the size, right? So what prevents > some goof from causing the CPU to read way past the end of the bitmap > if the bitmap is short because the kernel thought it was supposed to > be 32-bit? That's what I've mentioned initially: every syscall made with int 0x80 is interpreted as compat, even if it was made from long mode. > I'm inclined to suggest something awful-ish: always allocate the > bitmap as though it's for a 64-bit process, and just let it be at a > high address. And add a syscall or arch_prctl() to manipulate it for > the benefit of 32-bit programs that can't address it directly. That's likely the only way to go.