From: Ira Weiny <ira.weiny@intel.com> To: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org, Andrew Morton <akpm@linux-foundation.org>, Fenghua Yu <fenghua.yu@intel.com>, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: Re: [PATCH RFC V2 17/17] x86/entry: Preserve PKRS MSR across exceptions Date: Tue, 21 Jul 2020 22:27:09 -0700 [thread overview] Message-ID: <20200722052709.GB478587@iweiny-DESK2.sc.intel.com> (raw) In-Reply-To: <20200717100610.GH10769@hirez.programming.kicks-ass.net> On Fri, Jul 17, 2020 at 12:06:10PM +0200, Peter Zijlstra wrote: > On Fri, Jul 17, 2020 at 12:20:56AM -0700, ira.weiny@intel.com wrote: > > First I'm not sure if adding this state to idtentry_state and having > > that state copied is the right way to go. It seems like we should start > > passing this by reference instead of value. But for now this works as > > an RFC. Comments? > > As long as you keep sizeof(struct idtentry_state_t) <= sizeof(u64) or > possibly 2*sizeof(unsigned long), code gen shouldn't be too horrid IIRC. > You'll have to look at what the compiler makes of it. > > > Second, I'm not 100% happy with having to save the reference count in > > the exception handler. It seems like a very ugly layering violation but > > I don't see a way around it at the moment. > > So I've been struggling with that API, all the way from > pks_update_protection() to that dev_access_{en,dis}able(). I _really_ > hate it, but I see how you ended up with it. > > I wanted to propose something like: > > u32 current_pkey_save(int pkey, unsigned flags) > { > u32 *lpkr = get_cpu_ptr(&local_pkr); > u32 pkr, saved = *lpkr; > > pkr = update_pkey_reg(saved, pkey, flags); > if (pkr != saved) > wrpkr(pkr); > > put_cpu_ptr(&local_pkr); > return saved; > } > > void current_pkey_restore(u32 pkr) > { > u32 *lpkr = get_cpu_ptr(&local_pkr); > if (*lpkr != pkr) > wrpkr(pkr); > put_cpu_ptr(&local_pkr); > } > > Together with: > > void pkey_switch(struct task_struct *prev, struct task_struct *next) > { > prev->pkr = this_cpu_read(local_pkr); > if (prev->pkr != next->pkr) > wrpkr(next->pkr); > } > > But that's actually hard to frob into the kmap() model :-( The upside is > that you only have 1 word of state, instead of the 2 you have now. I'm still mulling over if what you have really helps us. It seems like we may be able to remove the percpu pkrs_cache variable... But I'm hesitant to do so. Regardless that is not the big issue right now... > > > Third, this patch has gone through a couple of revisions as I've had > > crashes which just don't make sense to me. One particular issue I've > > had is taking a MCE during memcpy_mcsafe causing my WARN_ON() to fire. > > The code path was a pmem copy and the ref count should have been > > elevated due to dev_access_enable() but why was > > idtentry_enter()->idt_save_pkrs() not called I don't know. > > Because MCEs are NMI-like and don't go through the normal interrupt > path. MCEs are an abomination, please wear all the protective devices > you can lay hands on when delving into that. I've been really digging into this today and I'm very concerned that I'm completely missing something WRT idtentry_enter() and idtentry_exit(). I've instrumented idt_{save,restore}_pkrs(), and __dev_access_{en,dis}able() with trace_printk()'s. With this debug code, I have found an instance where it seems like idtentry_enter() is called without a corresponding idtentry_exit(). This has left the thread ref counter at 0 which results in very bad things happening when __dev_access_disable() is called and the ref count goes negative. Effectively this seems to be happening: ... // ref == 0 dev_access_enable() // ref += 1 ==> disable protection // exception (which one I don't know) idtentry_enter() // ref = 0 _handler() // or whatever code... // *_exit() not called [at least there is no trace_printk() output]... // Regardless of trace output, the ref is left at 0 dev_access_disable() // ref -= 1 ==> -1 ==> does not enable protection (Bad stuff is bound to happen now...) ... The ref count ends up completely messed up after this sequence... and the PKRS register gets out of sync as well. This is starting to make some sense of how I was getting what seemed like random crashes before. Unfortunately I don't understand the idt entry/exit code well enough to see clearly what is going on. Is there any reason idtentry_exit() is not called after idtentry_enter()? Perhaps some NMI/MCE or 'not normal' exception code path I'm not seeing? In my searches I see a corresponding exit for every enter. But MCE is pretty hard to follow. Also is there any chance that the process could be getting scheduled and that is causing an issue? Thanks, Ira _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
WARNING: multiple messages have this Message-ID (diff)
From: Ira Weiny <ira.weiny@intel.com> To: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org, Dan Williams <dan.j.williams@intel.com>, Vishal Verma <vishal.l.verma@intel.com>, Andrew Morton <akpm@linux-foundation.org>, Fenghua Yu <fenghua.yu@intel.com>, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: Re: [PATCH RFC V2 17/17] x86/entry: Preserve PKRS MSR across exceptions Date: Tue, 21 Jul 2020 22:27:09 -0700 [thread overview] Message-ID: <20200722052709.GB478587@iweiny-DESK2.sc.intel.com> (raw) In-Reply-To: <20200717100610.GH10769@hirez.programming.kicks-ass.net> On Fri, Jul 17, 2020 at 12:06:10PM +0200, Peter Zijlstra wrote: > On Fri, Jul 17, 2020 at 12:20:56AM -0700, ira.weiny@intel.com wrote: > > First I'm not sure if adding this state to idtentry_state and having > > that state copied is the right way to go. It seems like we should start > > passing this by reference instead of value. But for now this works as > > an RFC. Comments? > > As long as you keep sizeof(struct idtentry_state_t) <= sizeof(u64) or > possibly 2*sizeof(unsigned long), code gen shouldn't be too horrid IIRC. > You'll have to look at what the compiler makes of it. > > > Second, I'm not 100% happy with having to save the reference count in > > the exception handler. It seems like a very ugly layering violation but > > I don't see a way around it at the moment. > > So I've been struggling with that API, all the way from > pks_update_protection() to that dev_access_{en,dis}able(). I _really_ > hate it, but I see how you ended up with it. > > I wanted to propose something like: > > u32 current_pkey_save(int pkey, unsigned flags) > { > u32 *lpkr = get_cpu_ptr(&local_pkr); > u32 pkr, saved = *lpkr; > > pkr = update_pkey_reg(saved, pkey, flags); > if (pkr != saved) > wrpkr(pkr); > > put_cpu_ptr(&local_pkr); > return saved; > } > > void current_pkey_restore(u32 pkr) > { > u32 *lpkr = get_cpu_ptr(&local_pkr); > if (*lpkr != pkr) > wrpkr(pkr); > put_cpu_ptr(&local_pkr); > } > > Together with: > > void pkey_switch(struct task_struct *prev, struct task_struct *next) > { > prev->pkr = this_cpu_read(local_pkr); > if (prev->pkr != next->pkr) > wrpkr(next->pkr); > } > > But that's actually hard to frob into the kmap() model :-( The upside is > that you only have 1 word of state, instead of the 2 you have now. I'm still mulling over if what you have really helps us. It seems like we may be able to remove the percpu pkrs_cache variable... But I'm hesitant to do so. Regardless that is not the big issue right now... > > > Third, this patch has gone through a couple of revisions as I've had > > crashes which just don't make sense to me. One particular issue I've > > had is taking a MCE during memcpy_mcsafe causing my WARN_ON() to fire. > > The code path was a pmem copy and the ref count should have been > > elevated due to dev_access_enable() but why was > > idtentry_enter()->idt_save_pkrs() not called I don't know. > > Because MCEs are NMI-like and don't go through the normal interrupt > path. MCEs are an abomination, please wear all the protective devices > you can lay hands on when delving into that. I've been really digging into this today and I'm very concerned that I'm completely missing something WRT idtentry_enter() and idtentry_exit(). I've instrumented idt_{save,restore}_pkrs(), and __dev_access_{en,dis}able() with trace_printk()'s. With this debug code, I have found an instance where it seems like idtentry_enter() is called without a corresponding idtentry_exit(). This has left the thread ref counter at 0 which results in very bad things happening when __dev_access_disable() is called and the ref count goes negative. Effectively this seems to be happening: ... // ref == 0 dev_access_enable() // ref += 1 ==> disable protection // exception (which one I don't know) idtentry_enter() // ref = 0 _handler() // or whatever code... // *_exit() not called [at least there is no trace_printk() output]... // Regardless of trace output, the ref is left at 0 dev_access_disable() // ref -= 1 ==> -1 ==> does not enable protection (Bad stuff is bound to happen now...) ... The ref count ends up completely messed up after this sequence... and the PKRS register gets out of sync as well. This is starting to make some sense of how I was getting what seemed like random crashes before. Unfortunately I don't understand the idt entry/exit code well enough to see clearly what is going on. Is there any reason idtentry_exit() is not called after idtentry_enter()? Perhaps some NMI/MCE or 'not normal' exception code path I'm not seeing? In my searches I see a corresponding exit for every enter. But MCE is pretty hard to follow. Also is there any chance that the process could be getting scheduled and that is causing an issue? Thanks, Ira
next prev parent reply other threads:[~2020-07-22 5:27 UTC|newest] Thread overview: 157+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-07-17 7:20 [PATCH RFC V2 00/17] PKS: Add Protection Keys Supervisor (PKS) support ira.weiny 2020-07-17 7:20 ` ira.weiny 2020-07-17 7:20 ` [PATCH RFC V2 01/17] x86/pkeys: Create pkeys_internal.h ira.weiny 2020-07-17 7:20 ` ira.weiny 2020-07-17 7:20 ` [PATCH RFC V2 02/17] x86/fpu: Refactor arch_set_user_pkey_access() for PKS support ira.weiny 2020-07-17 7:20 ` ira.weiny 2020-07-17 8:54 ` Peter Zijlstra 2020-07-17 8:54 ` Peter Zijlstra 2020-07-17 20:52 ` Ira Weiny 2020-07-17 20:52 ` Ira Weiny 2020-07-20 9:14 ` Peter Zijlstra 2020-07-20 9:14 ` Peter Zijlstra 2020-07-17 22:36 ` Dave Hansen 2020-07-17 22:36 ` Dave Hansen 2020-07-20 9:13 ` Peter Zijlstra 2020-07-20 9:13 ` Peter Zijlstra 2020-07-17 7:20 ` [PATCH RFC V2 03/17] x86/pks: Enable Protection Keys Supervisor (PKS) ira.weiny 2020-07-17 7:20 ` ira.weiny 2020-07-17 7:20 ` [PATCH RFC V2 04/17] x86/pks: Preserve the PKRS MSR on context switch ira.weiny 2020-07-17 7:20 ` ira.weiny 2020-07-17 8:31 ` Peter Zijlstra 2020-07-17 8:31 ` Peter Zijlstra 2020-07-17 21:39 ` Ira Weiny 2020-07-17 21:39 ` Ira Weiny 2020-07-17 8:59 ` Peter Zijlstra 2020-07-17 8:59 ` Peter Zijlstra 2020-07-17 22:34 ` Ira Weiny 2020-07-17 22:34 ` Ira Weiny 2020-07-20 9:15 ` Peter Zijlstra 2020-07-20 9:15 ` Peter Zijlstra 2020-07-20 18:35 ` Ira Weiny 2020-07-20 18:35 ` Ira Weiny 2020-07-17 7:20 ` [PATCH RFC V2 05/17] x86/pks: Add PKS kernel API ira.weiny 2020-07-17 7:20 ` ira.weiny 2020-07-17 7:20 ` [PATCH RFC V2 06/17] x86/pks: Add a debugfs file for allocated PKS keys ira.weiny 2020-07-17 7:20 ` ira.weiny 2020-07-17 7:20 ` [PATCH RFC V2 07/17] Documentation/pkeys: Update documentation for kernel pkeys ira.weiny 2020-07-17 7:20 ` ira.weiny 2020-07-17 7:20 ` [PATCH RFC V2 08/17] x86/pks: Add PKS Test code ira.weiny 2020-07-17 7:20 ` ira.weiny 2020-07-17 7:20 ` [PATCH RFC V2 09/17] memremap: Convert devmap static branch to {inc,dec} ira.weiny 2020-07-17 7:20 ` ira.weiny 2020-07-17 7:20 ` [PATCH RFC V2 10/17] fs/dax: Remove unused size parameter ira.weiny 2020-07-17 7:20 ` ira.weiny 2020-07-17 7:20 ` [PATCH RFC V2 11/17] drivers/dax: Expand lock scope to cover the use of addresses ira.weiny 2020-07-17 7:20 ` ira.weiny 2020-07-17 7:20 ` [PATCH RFC V2 12/17] memremap: Add zone device access protection ira.weiny 2020-07-17 7:20 ` ira.weiny 2020-07-17 9:10 ` Peter Zijlstra 2020-07-17 9:10 ` Peter Zijlstra 2020-07-18 5:06 ` Ira Weiny 2020-07-18 5:06 ` Ira Weiny 2020-07-20 9:16 ` Peter Zijlstra 2020-07-20 9:16 ` Peter Zijlstra 2020-07-17 9:17 ` Peter Zijlstra 2020-07-17 9:17 ` Peter Zijlstra 2020-07-18 5:51 ` Ira Weiny 2020-07-18 5:51 ` Ira Weiny 2020-07-17 9:20 ` Peter Zijlstra 2020-07-17 9:20 ` Peter Zijlstra 2020-07-17 7:20 ` [PATCH RFC V2 13/17] kmap: Add stray write protection for device pages ira.weiny 2020-07-17 7:20 ` ira.weiny 2020-07-17 9:21 ` Peter Zijlstra 2020-07-17 9:21 ` Peter Zijlstra 2020-07-19 4:13 ` Ira Weiny 2020-07-19 4:13 ` Ira Weiny 2020-07-20 9:17 ` Peter Zijlstra 2020-07-20 9:17 ` Peter Zijlstra 2020-07-21 16:31 ` Ira Weiny 2020-07-21 16:31 ` Ira Weiny 2020-07-17 7:20 ` [PATCH RFC V2 14/17] dax: Stray write protection for dax_direct_access() ira.weiny 2020-07-17 7:20 ` ira.weiny 2020-07-17 9:22 ` Peter Zijlstra 2020-07-17 9:22 ` Peter Zijlstra 2020-07-19 4:41 ` Ira Weiny 2020-07-19 4:41 ` Ira Weiny 2020-07-17 7:20 ` [PATCH RFC V2 15/17] nvdimm/pmem: Stray write protection for pmem->virt_addr ira.weiny 2020-07-17 7:20 ` ira.weiny 2020-07-17 7:20 ` [PATCH RFC V2 16/17] [dax|pmem]: Enable stray write protection ira.weiny 2020-07-17 7:20 ` ira.weiny 2020-07-17 9:25 ` Peter Zijlstra 2020-07-17 9:25 ` Peter Zijlstra 2020-07-17 7:20 ` [PATCH RFC V2 17/17] x86/entry: Preserve PKRS MSR across exceptions ira.weiny 2020-07-17 7:20 ` ira.weiny 2020-07-17 9:30 ` Peter Zijlstra 2020-07-17 9:30 ` Peter Zijlstra 2020-07-21 18:01 ` Ira Weiny 2020-07-21 18:01 ` Ira Weiny 2020-07-21 19:11 ` Peter Zijlstra 2020-07-21 19:11 ` Peter Zijlstra 2020-07-17 9:34 ` Peter Zijlstra 2020-07-17 9:34 ` Peter Zijlstra 2020-07-17 10:06 ` Peter Zijlstra 2020-07-17 10:06 ` Peter Zijlstra 2020-07-22 5:27 ` Ira Weiny [this message] 2020-07-22 5:27 ` Ira Weiny 2020-07-22 9:48 ` Peter Zijlstra 2020-07-22 9:48 ` Peter Zijlstra 2020-07-22 21:24 ` Ira Weiny 2020-07-22 21:24 ` Ira Weiny 2020-07-23 20:08 ` Thomas Gleixner 2020-07-23 20:08 ` Thomas Gleixner 2020-07-23 20:15 ` Thomas Gleixner 2020-07-23 20:15 ` Thomas Gleixner 2020-07-24 17:23 ` Ira Weiny 2020-07-24 17:23 ` Ira Weiny 2020-07-24 17:29 ` Andy Lutomirski 2020-07-24 17:29 ` Andy Lutomirski 2020-07-24 19:43 ` Ira Weiny 2020-07-24 19:43 ` Ira Weiny 2020-07-22 16:21 ` Andy Lutomirski 2020-07-22 16:21 ` Andy Lutomirski 2020-07-22 16:21 ` Andy Lutomirski 2020-07-23 16:18 ` Fenghua Yu 2020-07-23 16:18 ` Fenghua Yu 2020-07-23 16:18 ` Fenghua Yu 2020-07-23 16:23 ` Dave Hansen 2020-07-23 16:23 ` Dave Hansen 2020-07-23 16:23 ` Dave Hansen 2020-07-23 16:52 ` Fenghua Yu 2020-07-23 16:52 ` Fenghua Yu 2020-07-23 16:52 ` Fenghua Yu 2020-07-23 17:08 ` Andy Lutomirski 2020-07-23 17:08 ` Andy Lutomirski 2020-07-23 17:08 ` Andy Lutomirski 2020-07-23 17:30 ` Dave Hansen 2020-07-23 17:30 ` Dave Hansen 2020-07-23 17:30 ` Dave Hansen 2020-07-23 20:23 ` Thomas Gleixner 2020-07-23 20:23 ` Thomas Gleixner 2020-07-23 20:23 ` Thomas Gleixner 2020-07-23 20:22 ` Thomas Gleixner 2020-07-23 20:22 ` Thomas Gleixner 2020-07-23 20:22 ` Thomas Gleixner 2020-07-23 21:30 ` Andy Lutomirski 2020-07-23 21:30 ` Andy Lutomirski 2020-07-23 21:30 ` Andy Lutomirski 2020-07-23 22:14 ` Thomas Gleixner 2020-07-23 22:14 ` Thomas Gleixner 2020-07-23 22:14 ` Thomas Gleixner 2020-07-23 19:53 ` Thomas Gleixner 2020-07-23 19:53 ` Thomas Gleixner 2020-07-23 22:04 ` Ira Weiny 2020-07-23 22:04 ` Ira Weiny 2020-07-23 23:41 ` Thomas Gleixner 2020-07-23 23:41 ` Thomas Gleixner 2020-07-24 21:24 ` Thomas Gleixner 2020-07-24 21:24 ` Thomas Gleixner 2020-07-24 21:31 ` Thomas Gleixner 2020-07-24 21:31 ` Thomas Gleixner 2020-07-25 0:09 ` Andy Lutomirski 2020-07-25 0:09 ` Andy Lutomirski 2020-07-25 0:09 ` Andy Lutomirski 2020-07-27 20:59 ` Ira Weiny 2020-07-27 20:59 ` Ira Weiny 2020-07-24 22:19 ` [PATCH RFC V2 00/17] PKS: Add Protection Keys Supervisor (PKS) support Kees Cook 2020-07-24 22:19 ` Kees Cook
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200722052709.GB478587@iweiny-DESK2.sc.intel.com \ --to=ira.weiny@intel.com \ --cc=akpm@linux-foundation.org \ --cc=bp@alien8.de \ --cc=dave.hansen@linux.intel.com \ --cc=fenghua.yu@intel.com \ --cc=linux-doc@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-kselftest@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=linux-nvdimm@lists.01.org \ --cc=luto@kernel.org \ --cc=mingo@redhat.com \ --cc=peterz@infradead.org \ --cc=tglx@linutronix.de \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.