* [MODERATED] [PATCH 0/2] more sampling fun 0 @ 2020-02-19 22:45 mark gross 2020-02-20 1:53 ` [MODERATED] " mark gross 2020-02-20 8:14 ` Greg KH 0 siblings, 2 replies; 21+ messages in thread From: mark gross @ 2020-02-19 22:45 UTC (permalink / raw) To: speck From: mark gross <mgross@linux.intel.com> Subject: [PATCH 0/2] Special Register Buffer Data Sampling patch set Special Register Buffer Data Sampling is a sampling type of vulnerability that leaks data across cores sharing the HW-RNG for vulnerable processors. This leak is fixed by a microcode update and is enabled by default. This new microcode serializes processor access during execution of RDRAND or RDSEED. It ensures that the shared buffer is overwritten before it is released for reuse. The mitigation impacts the throughput of the RDRAND and RDSEED instructions and latency of RT processing running on the socket while executing RDRAND or RDSEED. The micro bechmark of calling RDRAND many times shows a 10x slowdown. This patch set enables kernel command line control of this mitigation and exports vulnerability and mitigation status. This patch set includes 2 patches: The first patch updates cpu_vuln_whitelist with support for a 16 bit field for enumerating based on stepping as well as vendor, family, model. The second patch enables the command line control of the mitigation as well as the sysfs export of vulnerability status. The documentation patch is pending on the official white paper to be complete such that I can make sure the in tree documentation is consistent with the white paper. The microcode defaults to enabling the mitigation. mark gross (2): Add capability to specify a range of steppings in the vulnerability white list structure. WIP SRBDS mitigation enabling. arch/x86/include/asm/cpu_device_id.h | 12 ++++ arch/x86/include/asm/cpufeatures.h | 3 + arch/x86/include/asm/msr-index.h | 4 ++ arch/x86/kernel/cpu/bugs.c | 84 ++++++++++++++++++++++++++++ arch/x86/kernel/cpu/common.c | 52 ++++++++++++----- arch/x86/kernel/cpu/cpu.h | 10 ++++ arch/x86/kernel/cpu/intel.c | 2 + arch/x86/kernel/cpu/match.c | 26 +++++++++ drivers/base/cpu.c | 8 +++ 9 files changed, 187 insertions(+), 14 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* [MODERATED] Re: [PATCH 0/2] more sampling fun 0 2020-02-19 22:45 [MODERATED] [PATCH 0/2] more sampling fun 0 mark gross @ 2020-02-20 1:53 ` mark gross 2020-02-20 8:14 ` Greg KH 1 sibling, 0 replies; 21+ messages in thread From: mark gross @ 2020-02-20 1:53 UTC (permalink / raw) To: speck Sorry for the partial sequence. I'm not sure where the problem is and will follow up with IT to see if its on my end. --mark On Wed, Feb 19, 2020 at 02:45:22PM -0800, speck for mark gross wrote: > From: mark gross <mgross@linux.intel.com> > Subject: [PATCH 0/2] Special Register Buffer Data Sampling patch set > > Special Register Buffer Data Sampling is a sampling type of vulnerability that > leaks data across cores sharing the HW-RNG for vulnerable processors. > > This leak is fixed by a microcode update and is enabled by default. > > This new microcode serializes processor access during execution of RDRAND > or RDSEED. It ensures that the shared buffer is overwritten before it > is released for reuse. > > The mitigation impacts the throughput of the RDRAND and RDSEED instructions > and latency of RT processing running on the socket while executing RDRAND or > RDSEED. The micro bechmark of calling RDRAND many times shows a 10x slowdown. > > This patch set enables kernel command line control of this mitigation and > exports vulnerability and mitigation status. > > This patch set includes 2 patches: > The first patch updates cpu_vuln_whitelist with support for a 16 bit field for > enumerating based on stepping as well as vendor, family, model. > > The second patch enables the command line control of the mitigation as well as > the sysfs export of vulnerability status. > > The documentation patch is pending on the official white paper to be complete > such that I can make sure the in tree documentation is consistent with the > white paper. > > The microcode defaults to enabling the mitigation. > > mark gross (2): > Add capability to specify a range of steppings in the vulnerability > white list structure. > WIP SRBDS mitigation enabling. > > arch/x86/include/asm/cpu_device_id.h | 12 ++++ > arch/x86/include/asm/cpufeatures.h | 3 + > arch/x86/include/asm/msr-index.h | 4 ++ > arch/x86/kernel/cpu/bugs.c | 84 ++++++++++++++++++++++++++++ > arch/x86/kernel/cpu/common.c | 52 ++++++++++++----- > arch/x86/kernel/cpu/cpu.h | 10 ++++ > arch/x86/kernel/cpu/intel.c | 2 + > arch/x86/kernel/cpu/match.c | 26 +++++++++ > drivers/base/cpu.c | 8 +++ > 9 files changed, 187 insertions(+), 14 deletions(-) > > -- > 2.17.1 ^ permalink raw reply [flat|nested] 21+ messages in thread
* [MODERATED] Re: [PATCH 0/2] more sampling fun 0 2020-02-19 22:45 [MODERATED] [PATCH 0/2] more sampling fun 0 mark gross 2020-02-20 1:53 ` [MODERATED] " mark gross @ 2020-02-20 8:14 ` Greg KH 2020-02-20 14:27 ` Greg KH ` (3 more replies) 1 sibling, 4 replies; 21+ messages in thread From: Greg KH @ 2020-02-20 8:14 UTC (permalink / raw) To: speck On Wed, Feb 19, 2020 at 02:45:22PM -0800, speck for mark gross wrote: > From: mark gross <mgross@linux.intel.com> > Subject: [PATCH 0/2] Special Register Buffer Data Sampling patch set > > Special Register Buffer Data Sampling is a sampling type of vulnerability that > leaks data across cores sharing the HW-RNG for vulnerable processors. > > This leak is fixed by a microcode update and is enabled by default. > > This new microcode serializes processor access during execution of RDRAND > or RDSEED. It ensures that the shared buffer is overwritten before it > is released for reuse. > > The mitigation impacts the throughput of the RDRAND and RDSEED instructions > and latency of RT processing running on the socket while executing RDRAND or > RDSEED. The micro bechmark of calling RDRAND many times shows a 10x slowdown. Then we need to stop using RDRAND internally for our "give me a random number api" which has spread to more and more parts of the kernel. Here's a patch that does so: https://lore.kernel.org/lkml/20200216161836.1976-1-Jason@zx2c4.com/ which I'm going to advise get merged now and backported to the stable branches. thanks, greg k-h ^ permalink raw reply [flat|nested] 21+ messages in thread
* [MODERATED] Re: [PATCH 0/2] more sampling fun 0 2020-02-20 8:14 ` Greg KH @ 2020-02-20 14:27 ` Greg KH 2020-02-20 15:40 ` mark gross 2020-02-20 14:55 ` Andi Kleen ` (2 subsequent siblings) 3 siblings, 1 reply; 21+ messages in thread From: Greg KH @ 2020-02-20 14:27 UTC (permalink / raw) To: speck On Thu, Feb 20, 2020 at 09:14:20AM +0100, speck for Greg KH wrote: > On Wed, Feb 19, 2020 at 02:45:22PM -0800, speck for mark gross wrote: > > From: mark gross <mgross@linux.intel.com> > > Subject: [PATCH 0/2] Special Register Buffer Data Sampling patch set > > > > Special Register Buffer Data Sampling is a sampling type of vulnerability that > > leaks data across cores sharing the HW-RNG for vulnerable processors. > > > > This leak is fixed by a microcode update and is enabled by default. > > > > This new microcode serializes processor access during execution of RDRAND > > or RDSEED. It ensures that the shared buffer is overwritten before it > > is released for reuse. > > > > The mitigation impacts the throughput of the RDRAND and RDSEED instructions > > and latency of RT processing running on the socket while executing RDRAND or > > RDSEED. The micro bechmark of calling RDRAND many times shows a 10x slowdown. > > Then we need to stop using RDRAND internally for our "give me a random > number api" which has spread to more and more parts of the kernel. > > Here's a patch that does so: > https://lore.kernel.org/lkml/20200216161836.1976-1-Jason@zx2c4.com/ > which I'm going to advise get merged now and backported to the stable > branches. Note, the author of that patch has reached out to me to say they found this same issue. He did so independantly so odds are others already know about this. He found it because he was wondering why rdrand was so slow on newer systems, and then traced things backwards like all the other researchers in this area. So, what's the timeline here? Looks like this is already "in the wild" from what I can tell. greg k-h ^ permalink raw reply [flat|nested] 21+ messages in thread
* [MODERATED] Re: [PATCH 0/2] more sampling fun 0 2020-02-20 14:27 ` Greg KH @ 2020-02-20 15:40 ` mark gross 2020-02-20 16:18 ` Greg KH 0 siblings, 1 reply; 21+ messages in thread From: mark gross @ 2020-02-20 15:40 UTC (permalink / raw) To: speck On Thu, Feb 20, 2020 at 03:27:20PM +0100, speck for Greg KH wrote: > On Thu, Feb 20, 2020 at 09:14:20AM +0100, speck for Greg KH wrote: > > On Wed, Feb 19, 2020 at 02:45:22PM -0800, speck for mark gross wrote: > > > From: mark gross <mgross@linux.intel.com> > > > Subject: [PATCH 0/2] Special Register Buffer Data Sampling patch set > > > > > > Special Register Buffer Data Sampling is a sampling type of vulnerability that > > > leaks data across cores sharing the HW-RNG for vulnerable processors. > > > > > > This leak is fixed by a microcode update and is enabled by default. > > > > > > This new microcode serializes processor access during execution of RDRAND > > > or RDSEED. It ensures that the shared buffer is overwritten before it > > > is released for reuse. > > > > > > The mitigation impacts the throughput of the RDRAND and RDSEED instructions > > > and latency of RT processing running on the socket while executing RDRAND or > > > RDSEED. The micro bechmark of calling RDRAND many times shows a 10x slowdown. > > > > Then we need to stop using RDRAND internally for our "give me a random > > number api" which has spread to more and more parts of the kernel. > > > > Here's a patch that does so: > > https://lore.kernel.org/lkml/20200216161836.1976-1-Jason@zx2c4.com/ > > which I'm going to advise get merged now and backported to the stable > > branches. > > Note, the author of that patch has reached out to me to say they found > this same issue. He did so independantly so odds are others already > know about this. He found it because he was wondering why rdrand was so > slow on newer systems, and then traced things backwards like all the > other researchers in this area. Are you saying the author has seen the RNG data leaking across processors or the slowdown? > > So, what's the timeline here? Looks like this is already "in the wild" > from what I can tell. > The uCode mitigation is coming out with the 2020.1 IPU (intel platform update) (fist ucode update of 2020) that I belive is slated for an official May disclosure. --mark > greg k-h ^ permalink raw reply [flat|nested] 21+ messages in thread
* [MODERATED] Re: [PATCH 0/2] more sampling fun 0 2020-02-20 15:40 ` mark gross @ 2020-02-20 16:18 ` Greg KH 0 siblings, 0 replies; 21+ messages in thread From: Greg KH @ 2020-02-20 16:18 UTC (permalink / raw) To: speck On Thu, Feb 20, 2020 at 07:40:07AM -0800, speck for mark gross wrote: > On Thu, Feb 20, 2020 at 03:27:20PM +0100, speck for Greg KH wrote: > > On Thu, Feb 20, 2020 at 09:14:20AM +0100, speck for Greg KH wrote: > > > On Wed, Feb 19, 2020 at 02:45:22PM -0800, speck for mark gross wrote: > > > > From: mark gross <mgross@linux.intel.com> > > > > Subject: [PATCH 0/2] Special Register Buffer Data Sampling patch set > > > > > > > > Special Register Buffer Data Sampling is a sampling type of vulnerability that > > > > leaks data across cores sharing the HW-RNG for vulnerable processors. > > > > > > > > This leak is fixed by a microcode update and is enabled by default. > > > > > > > > This new microcode serializes processor access during execution of RDRAND > > > > or RDSEED. It ensures that the shared buffer is overwritten before it > > > > is released for reuse. > > > > > > > > The mitigation impacts the throughput of the RDRAND and RDSEED instructions > > > > and latency of RT processing running on the socket while executing RDRAND or > > > > RDSEED. The micro bechmark of calling RDRAND many times shows a 10x slowdown. > > > > > > Then we need to stop using RDRAND internally for our "give me a random > > > number api" which has spread to more and more parts of the kernel. > > > > > > Here's a patch that does so: > > > https://lore.kernel.org/lkml/20200216161836.1976-1-Jason@zx2c4.com/ > > > which I'm going to advise get merged now and backported to the stable > > > branches. > > > > Note, the author of that patch has reached out to me to say they found > > this same issue. He did so independantly so odds are others already > > know about this. He found it because he was wondering why rdrand was so > > slow on newer systems, and then traced things backwards like all the > > other researchers in this area. > Are you saying the author has seen the RNG data leaking across processors or > the slowdown? slowdown from what? Relative to older processors, yes. Leaking across, yes, I think so. > > So, what's the timeline here? Looks like this is already "in the wild" > > from what I can tell. > > > The uCode mitigation is coming out with the 2020.1 IPU (intel platform update) > (fist ucode update of 2020) that I belive is slated for an official May > disclosure. {sigh} That's a long time, we should just fix up the kernel internally to not worry about this now so that we are not sitting on this any longer. thanks, greg k-h ^ permalink raw reply [flat|nested] 21+ messages in thread
* [MODERATED] Re: [PATCH 0/2] more sampling fun 0 2020-02-20 8:14 ` Greg KH 2020-02-20 14:27 ` Greg KH @ 2020-02-20 14:55 ` Andi Kleen 2020-02-20 15:05 ` Greg KH 2020-02-20 21:51 ` Josh Poimboeuf 2020-02-20 15:09 ` mark gross 2020-02-28 16:21 ` [MODERATED] Additional sampling fun Borislav Petkov 3 siblings, 2 replies; 21+ messages in thread From: Andi Kleen @ 2020-02-20 14:55 UTC (permalink / raw) To: speck > Then we need to stop using RDRAND internally for our "give me a random > number api" which has spread to more and more parts of the kernel. Only if that API is called frequently enough. AFAIK it is not. Normally it's used for rare rekeying of hash tables etc., which doesn't happen very often. > Here's a patch that does so: > https://lore.kernel.org/lkml/20200216161836.1976-1-Jason@zx2c4.com/ > which I'm going to advise get merged now and backported to the stable > branches. Don't see any reason at this point. Only do it if there's an actual indication of a problem. -Andi ^ permalink raw reply [flat|nested] 21+ messages in thread
* [MODERATED] Re: [PATCH 0/2] more sampling fun 0 2020-02-20 14:55 ` Andi Kleen @ 2020-02-20 15:05 ` Greg KH 2020-02-20 16:55 ` Andi Kleen 2020-02-20 21:51 ` Josh Poimboeuf 1 sibling, 1 reply; 21+ messages in thread From: Greg KH @ 2020-02-20 15:05 UTC (permalink / raw) To: speck On Thu, Feb 20, 2020 at 06:55:10AM -0800, speck for Andi Kleen wrote: > > Then we need to stop using RDRAND internally for our "give me a random > > number api" which has spread to more and more parts of the kernel. > > Only if that API is called frequently enough. AFAIK it is not. It's called by all sorts of places in the kernel today. > Normally it's used for rare rekeying of hash tables etc., which > doesn't happen very often. "normally" :) > > Here's a patch that does so: > > https://lore.kernel.org/lkml/20200216161836.1976-1-Jason@zx2c4.com/ > > which I'm going to advise get merged now and backported to the stable > > branches. > > Don't see any reason at this point. Only do it if there's an actual > indication of a problem. It's slow, why wouldn't we stop using it as we already have a much faster call. thanks, greg k-h ^ permalink raw reply [flat|nested] 21+ messages in thread
* [MODERATED] Re: [PATCH 0/2] more sampling fun 0 2020-02-20 15:05 ` Greg KH @ 2020-02-20 16:55 ` Andi Kleen 0 siblings, 0 replies; 21+ messages in thread From: Andi Kleen @ 2020-02-20 16:55 UTC (permalink / raw) To: speck On Thu, Feb 20, 2020 at 04:05:49PM +0100, speck for Greg KH wrote: > On Thu, Feb 20, 2020 at 06:55:10AM -0800, speck for Andi Kleen wrote: > > > Then we need to stop using RDRAND internally for our "give me a random > > > number api" which has spread to more and more parts of the kernel. > > > > Only if that API is called frequently enough. AFAIK it is not. > > It's called by all sorts of places in the kernel today. > > > Normally it's used for rare rekeying of hash tables etc., which > > doesn't happen very often. > > "normally" :) I looked through the users. The ones that weren't clearly slow path were SCTP connection setup, exec, and some strange abuse in bcache that should probably be fixed anyways. I suppose maybe for exec, but still not sure it's worth it. But the exec use is for seeding user space RNG, so that's the one that might actually be worth it. -Andi ^ permalink raw reply [flat|nested] 21+ messages in thread
* [MODERATED] Re: [PATCH 0/2] more sampling fun 0 2020-02-20 14:55 ` Andi Kleen 2020-02-20 15:05 ` Greg KH @ 2020-02-20 21:51 ` Josh Poimboeuf 2020-02-20 22:15 ` Andi Kleen 1 sibling, 1 reply; 21+ messages in thread From: Josh Poimboeuf @ 2020-02-20 21:51 UTC (permalink / raw) To: speck On Thu, Feb 20, 2020 at 06:55:10AM -0800, speck for Andi Kleen wrote: > > Then we need to stop using RDRAND internally for our "give me a random > > number api" which has spread to more and more parts of the kernel. > > Only if that API is called frequently enough. AFAIK it is not. > > Normally it's used for rare rekeying of hash tables etc., which > doesn't happen very often. > > > Here's a patch that does so: > > https://lore.kernel.org/lkml/20200216161836.1976-1-Jason@zx2c4.com/ > > which I'm going to advise get merged now and backported to the stable > > branches. > > Don't see any reason at this point. Only do it if there's an actual > indication of a problem. Internal testing of the SRBDS beta microcode on Kaby Lake is showing significant slowdowns in several syscall microbenchmarks. One pthread_create() microbenchmark had a ~48% slowdown. We confirmed it was due to RDRAND in get_random_u64(). In this case I think the path was: clone() _do_fork() copy_process() dup_task_struct() get_random_canary() (due to CONFIG_STACKPROTECTOR) get_random_long() get_random_u64() arch_get_random_long() RDRAND -- Josh ^ permalink raw reply [flat|nested] 21+ messages in thread
* [MODERATED] Re: [PATCH 0/2] more sampling fun 0 2020-02-20 21:51 ` Josh Poimboeuf @ 2020-02-20 22:15 ` Andi Kleen 2020-02-20 22:59 ` Kees Cook 0 siblings, 1 reply; 21+ messages in thread From: Andi Kleen @ 2020-02-20 22:15 UTC (permalink / raw) To: speck On Thu, Feb 20, 2020 at 03:51:32PM -0600, speck for Josh Poimboeuf wrote: > On Thu, Feb 20, 2020 at 06:55:10AM -0800, speck for Andi Kleen wrote: > > > Then we need to stop using RDRAND internally for our "give me a random > > > number api" which has spread to more and more parts of the kernel. > > > > Only if that API is called frequently enough. AFAIK it is not. > > > > Normally it's used for rare rekeying of hash tables etc., which > > doesn't happen very often. > > > > > Here's a patch that does so: > > > https://lore.kernel.org/lkml/20200216161836.1976-1-Jason@zx2c4.com/ > > > which I'm going to advise get merged now and backported to the stable > > > branches. > > > > Don't see any reason at this point. Only do it if there's an actual > > indication of a problem. > > Internal testing of the SRBDS beta microcode on Kaby Lake is showing > significant slowdowns in several syscall microbenchmarks. > > One pthread_create() microbenchmark had a ~48% slowdown. We confirmed > it was due to RDRAND in get_random_u64(). > > In this case I think the path was: > > clone() > _do_fork() > copy_process() > dup_task_struct() > get_random_canary() (due to CONFIG_STACKPROTECTOR) > get_random_long() > get_random_u64() > arch_get_random_long() > RDRAND Okay thanks, then we need the patch Greg pointed out. -Andi ^ permalink raw reply [flat|nested] 21+ messages in thread
* [MODERATED] Re: [PATCH 0/2] more sampling fun 0 2020-02-20 22:15 ` Andi Kleen @ 2020-02-20 22:59 ` Kees Cook 0 siblings, 0 replies; 21+ messages in thread From: Kees Cook @ 2020-02-20 22:59 UTC (permalink / raw) To: speck On Thu, Feb 20, 2020 at 02:15:22PM -0800, speck for Andi Kleen wrote: > On Thu, Feb 20, 2020 at 03:51:32PM -0600, speck for Josh Poimboeuf wrote: > > On Thu, Feb 20, 2020 at 06:55:10AM -0800, speck for Andi Kleen wrote: > > > > Then we need to stop using RDRAND internally for our "give me a random > > > > number api" which has spread to more and more parts of the kernel. > > > > > > Only if that API is called frequently enough. AFAIK it is not. > > > > > > Normally it's used for rare rekeying of hash tables etc., which > > > doesn't happen very often. > > > > > > > Here's a patch that does so: > > > > https://lore.kernel.org/lkml/20200216161836.1976-1-Jason@zx2c4.com/ > > > > which I'm going to advise get merged now and backported to the stable > > > > branches. > > > > > > Don't see any reason at this point. Only do it if there's an actual > > > indication of a problem. > > > > Internal testing of the SRBDS beta microcode on Kaby Lake is showing > > significant slowdowns in several syscall microbenchmarks. > > > > One pthread_create() microbenchmark had a ~48% slowdown. We confirmed > > it was due to RDRAND in get_random_u64(). > > > > In this case I think the path was: > > > > clone() > > _do_fork() > > copy_process() > > dup_task_struct() > > get_random_canary() (due to CONFIG_STACKPROTECTOR) > > get_random_long() > > get_random_u64() > > arch_get_random_long() > > RDRAND > > Okay thanks, then we need the patch Greg pointed out. Agreed; RDRAND should be used to _seed_ the RNG, not be a replacement for it. -- Kees Cook ^ permalink raw reply [flat|nested] 21+ messages in thread
* [MODERATED] Re: [PATCH 0/2] more sampling fun 0 2020-02-20 8:14 ` Greg KH 2020-02-20 14:27 ` Greg KH 2020-02-20 14:55 ` Andi Kleen @ 2020-02-20 15:09 ` mark gross 2020-02-28 16:21 ` [MODERATED] Additional sampling fun Borislav Petkov 3 siblings, 0 replies; 21+ messages in thread From: mark gross @ 2020-02-20 15:09 UTC (permalink / raw) To: speck On Thu, Feb 20, 2020 at 09:14:20AM +0100, speck for Greg KH wrote: > On Wed, Feb 19, 2020 at 02:45:22PM -0800, speck for mark gross wrote: > > From: mark gross <mgross@linux.intel.com> > > Subject: [PATCH 0/2] Special Register Buffer Data Sampling patch set > > > > Special Register Buffer Data Sampling is a sampling type of vulnerability that > > leaks data across cores sharing the HW-RNG for vulnerable processors. > > > > This leak is fixed by a microcode update and is enabled by default. > > > > This new microcode serializes processor access during execution of RDRAND > > or RDSEED. It ensures that the shared buffer is overwritten before it > > is released for reuse. > > > > The mitigation impacts the throughput of the RDRAND and RDSEED instructions > > and latency of RT processing running on the socket while executing RDRAND or > > RDSEED. The micro bechmark of calling RDRAND many times shows a 10x slowdown. > > Then we need to stop using RDRAND internally for our "give me a random > number api" which has spread to more and more parts of the kernel. FWIW even if the opcodes are slowed by 10x there is no human noticable impact on boot times. IMO its more a risk to RT applications than anything else. --mark > Here's a patch that does so: > https://lore.kernel.org/lkml/20200216161836.1976-1-Jason@zx2c4.com/ > which I'm going to advise get merged now and backported to the stable > branches. > > thanks, > > greg k-h ^ permalink raw reply [flat|nested] 21+ messages in thread
* [MODERATED] Additional sampling fun 2020-02-20 8:14 ` Greg KH ` (2 preceding siblings ...) 2020-02-20 15:09 ` mark gross @ 2020-02-28 16:21 ` Borislav Petkov 2020-02-28 16:34 ` [MODERATED] " Greg KH 3 siblings, 1 reply; 21+ messages in thread From: Borislav Petkov @ 2020-02-28 16:21 UTC (permalink / raw) To: speck On Thu, Feb 20, 2020 at 09:14:20AM +0100, speck for Greg KH wrote: > Then we need to stop using RDRAND internally for our "give me a random > number api" which has spread to more and more parts of the kernel. > > Here's a patch that does so: > https://lore.kernel.org/lkml/20200216161836.1976-1-Jason@zx2c4.com/ > which I'm going to advise get merged now and backported to the stable > branches. So one of our guys - Nicolai Stange - was looking at this wrt backporting it to trees and there's another problem in add_interrupt_randomness() which could potentially turn out to be nasty. We asked him to write it up for speck@ (he's not subscribed) so that we can discuss it here first. Here is the deal in his own words: "In the context of the get_random_long() patch posted at [1], I noticed that there's also a RDSEED insn issued from the interrupt path, which perhaps might have undesired effects performance-wise. More specifically, add_interrupt_randomness() would issue one RDSEED either once a second or every 64 interrupts, whichever comes first: void add_interrupt_randomness(int irq, int irq_flags) { fast_mix(fast_pool); /* increments fast_pool->count */ ... if ((fast_pool->count < 64) && !time_after(now, fast_pool->last + HZ)) return; ... fast_pool->last = now; if (arch_get_random_seed_long(&seed)) {} ... } So while this certainly won't matter much on average, I'm still wondering whether or not this RDSEED could potentially cause IRQ latency spikes relevant e.g. to -RT and/or under high IRQ load? FWIW, the commit introducing the arch_get_random_seed_long() invocation to add_interrupt_randomness() was commit 83664a6928a4 ("random: Use arch_get_random_seed*() at init time and once a second"). I can only guess, but I think the motivation for mixing arch_get_random_seed_long() from the interrupt path was probably to sync that with the IRQ rate. That is, to make sure that the entropy mixed from RDSEED doesn't dominate the interrupt entropy source. [1] https://lkml.kernel.org/r/20200216161836.1976-1-Jason@zx2c4.com " Thx. -- Regards/Gruss, Boris. SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg -- ^ permalink raw reply [flat|nested] 21+ messages in thread
* [MODERATED] Re: Additional sampling fun 2020-02-28 16:21 ` [MODERATED] Additional sampling fun Borislav Petkov @ 2020-02-28 16:34 ` Greg KH 2020-02-28 17:38 ` mark gross 0 siblings, 1 reply; 21+ messages in thread From: Greg KH @ 2020-02-28 16:34 UTC (permalink / raw) To: speck On Fri, Feb 28, 2020 at 05:21:40PM +0100, speck for Borislav Petkov wrote: > On Thu, Feb 20, 2020 at 09:14:20AM +0100, speck for Greg KH wrote: > > Then we need to stop using RDRAND internally for our "give me a random > > number api" which has spread to more and more parts of the kernel. > > > > Here's a patch that does so: > > https://lore.kernel.org/lkml/20200216161836.1976-1-Jason@zx2c4.com/ > > which I'm going to advise get merged now and backported to the stable > > branches. Note, the above patch (well the v2 version) is now merged and should show up in the next -rc1 release. > So one of our guys - Nicolai Stange - was looking at this > wrt backporting it to trees and there's another problem in > add_interrupt_randomness() which could potentially turn out > to be nasty. > > We asked him to write it up for speck@ (he's not subscribed) so that we > can discuss it here first. Here is the deal in his own words: > > "In the context of the get_random_long() patch posted at [1], I noticed > that there's also a RDSEED insn issued from the interrupt path, which > perhaps might have undesired effects performance-wise. > > More specifically, add_interrupt_randomness() would issue one RDSEED > either once a second or every 64 interrupts, whichever comes first: > > void add_interrupt_randomness(int irq, int irq_flags) > { > fast_mix(fast_pool); /* increments fast_pool->count */ > ... > if ((fast_pool->count < 64) && > !time_after(now, fast_pool->last + HZ)) > return; > ... > fast_pool->last = now; > if (arch_get_random_seed_long(&seed)) {} > ... > } > > So while this certainly won't matter much on average, I'm still > wondering whether or not this RDSEED could potentially cause IRQ > latency spikes relevant e.g. to -RT and/or under high IRQ load? > > FWIW, the commit introducing the arch_get_random_seed_long() invocation > to add_interrupt_randomness() was commit 83664a6928a4 ("random: Use > arch_get_random_seed*() at init time and once a second"). > > I can only guess, but I think the motivation for mixing > arch_get_random_seed_long() from the interrupt path was probably to > sync that with the IRQ rate. That is, to make sure that the entropy > mixed from RDSEED doesn't dominate the interrupt entropy source. > > [1] https://lkml.kernel.org/r/20200216161836.1976-1-Jason@zx2c4.com > " Ugh. I think we need to drag Jason into this as well, but really, talking about that can be done on the mailing list as there's nothing wrong with trying to get that slow code out of the irq path today, right? thanks, greg k-h ^ permalink raw reply [flat|nested] 21+ messages in thread
* [MODERATED] Re: Additional sampling fun 2020-02-28 16:34 ` [MODERATED] " Greg KH @ 2020-02-28 17:38 ` mark gross 2020-02-28 17:44 ` Thomas Gleixner 0 siblings, 1 reply; 21+ messages in thread From: mark gross @ 2020-02-28 17:38 UTC (permalink / raw) To: speck On Fri, Feb 28, 2020 at 05:34:47PM +0100, speck for Greg KH wrote: > On Fri, Feb 28, 2020 at 05:21:40PM +0100, speck for Borislav Petkov wrote: > > On Thu, Feb 20, 2020 at 09:14:20AM +0100, speck for Greg KH wrote: > > > Then we need to stop using RDRAND internally for our "give me a random > > > number api" which has spread to more and more parts of the kernel. > > > > > > Here's a patch that does so: > > > https://lore.kernel.org/lkml/20200216161836.1976-1-Jason@zx2c4.com/ > > > which I'm going to advise get merged now and backported to the stable > > > branches. > > Note, the above patch (well the v2 version) is now merged and should > show up in the next -rc1 release. > > > So one of our guys - Nicolai Stange - was looking at this > > wrt backporting it to trees and there's another problem in > > add_interrupt_randomness() which could potentially turn out > > to be nasty. > > > > We asked him to write it up for speck@ (he's not subscribed) so that we > > can discuss it here first. Here is the deal in his own words: > > > > "In the context of the get_random_long() patch posted at [1], I noticed > > that there's also a RDSEED insn issued from the interrupt path, which > > perhaps might have undesired effects performance-wise. > > > > More specifically, add_interrupt_randomness() would issue one RDSEED > > either once a second or every 64 interrupts, whichever comes first: > > > > void add_interrupt_randomness(int irq, int irq_flags) > > { > > fast_mix(fast_pool); /* increments fast_pool->count */ > > ... > > if ((fast_pool->count < 64) && > > !time_after(now, fast_pool->last + HZ)) > > return; > > ... > > fast_pool->last = now; > > if (arch_get_random_seed_long(&seed)) {} > > ... > > } > > > > So while this certainly won't matter much on average, I'm still > > wondering whether or not this RDSEED could potentially cause IRQ > > latency spikes relevant e.g. to -RT and/or under high IRQ load? > > > > FWIW, the commit introducing the arch_get_random_seed_long() invocation > > to add_interrupt_randomness() was commit 83664a6928a4 ("random: Use > > arch_get_random_seed*() at init time and once a second"). > > > > I can only guess, but I think the motivation for mixing > > arch_get_random_seed_long() from the interrupt path was probably to > > sync that with the IRQ rate. That is, to make sure that the entropy > > mixed from RDSEED doesn't dominate the interrupt entropy source. > > > > [1] https://lkml.kernel.org/r/20200216161836.1976-1-Jason@zx2c4.com > > " > > Ugh. I think we need to drag Jason into this as well, but really, > talking about that can be done on the mailing list as there's nothing > wrong with trying to get that slow code out of the irq path today, > right? > > thanks, > > greg k-h FWIW unless someone is abusing rdrand/rdseed I don't think the impact of the mitigation will be measurable. Running multiple instances of spanking rdrand in a loop will show nonlinear impacts due to bus lock contention but, I don't think there is any contention issues with once/64IRS's or once a second. you are looking at approximately O(100cycles) vrs O(1000cycles) every second or every 64th interrupt. I don't think you'll be able to measure the impact of that. (unless you force lock contention on the HW bus lock) --mark ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Additional sampling fun 2020-02-28 17:38 ` mark gross @ 2020-02-28 17:44 ` Thomas Gleixner 2020-02-28 18:09 ` [MODERATED] " Luck, Tony 0 siblings, 1 reply; 21+ messages in thread From: Thomas Gleixner @ 2020-02-28 17:44 UTC (permalink / raw) To: speck speck for mark gross <speck@linutronix.de> writes: > On Fri, Feb 28, 2020 at 05:34:47PM +0100, speck for Greg KH wrote: >> On Fri, Feb 28, 2020 at 05:21:40PM +0100, speck for Borislav Petkov wrote: >> Ugh. I think we need to drag Jason into this as well, but really, >> talking about that can be done on the mailing list as there's nothing >> wrong with trying to get that slow code out of the irq path today, >> right? >> > FWIW unless someone is abusing rdrand/rdseed I don't think the impact of the > mitigation will be measurable. Running multiple instances of spanking rdrand > in a loop will show nonlinear impacts due to bus lock contention but, I don't > think there is any contention issues with once/64IRS's or once a second. you > are looking at approximately O(100cycles) vrs O(1000cycles) every second or > every 64th interrupt. I don't think you'll be able to measure the impact of > that. (unless you force lock contention on the HW bus lock) Have several cores with a 10k+ interrupts per second and if you're unlucky they start to contend, then the every 64th interrupt will be measurable quite prominent. But I agree with Greg, that we can tackle this on LKML without mentioning that particular issue. Thanks, tglx ^ permalink raw reply [flat|nested] 21+ messages in thread
* [MODERATED] Re: Additional sampling fun 2020-02-28 17:44 ` Thomas Gleixner @ 2020-02-28 18:09 ` Luck, Tony 2020-02-28 18:40 ` Borislav Petkov 2020-02-28 21:53 ` Thomas Gleixner 0 siblings, 2 replies; 21+ messages in thread From: Luck, Tony @ 2020-02-28 18:09 UTC (permalink / raw) To: speck On Fri, Feb 28, 2020 at 06:44:48PM +0100, speck for Thomas Gleixner wrote: > speck for mark gross <speck@linutronix.de> writes: > > On Fri, Feb 28, 2020 at 05:34:47PM +0100, speck for Greg KH wrote: > >> On Fri, Feb 28, 2020 at 05:21:40PM +0100, speck for Borislav Petkov wrote: > >> Ugh. I think we need to drag Jason into this as well, but really, > >> talking about that can be done on the mailing list as there's nothing > >> wrong with trying to get that slow code out of the irq path today, > >> right? > >> > > FWIW unless someone is abusing rdrand/rdseed I don't think the impact of the > > mitigation will be measurable. Running multiple instances of spanking rdrand > > in a loop will show nonlinear impacts due to bus lock contention but, I don't > > think there is any contention issues with once/64IRS's or once a second. you > > are looking at approximately O(100cycles) vrs O(1000cycles) every second or > > every 64th interrupt. I don't think you'll be able to measure the impact of > > that. (unless you force lock contention on the HW bus lock) > > Have several cores with a 10k+ interrupts per second and if you're > unlucky they start to contend, then the every 64th interrupt will be > measurable quite prominent. > > But I agree with Greg, that we can tackle this on LKML without > mentioning that particular issue. That code really shouldn't ever have been using RDSEED (which is documented as NOT scaling across invocations on multiple cores). -Tony ^ permalink raw reply [flat|nested] 21+ messages in thread
* [MODERATED] Re: Additional sampling fun 2020-02-28 18:09 ` [MODERATED] " Luck, Tony @ 2020-02-28 18:40 ` Borislav Petkov 2020-02-28 21:53 ` Thomas Gleixner 1 sibling, 0 replies; 21+ messages in thread From: Borislav Petkov @ 2020-02-28 18:40 UTC (permalink / raw) To: speck On Fri, Feb 28, 2020 at 10:09:38AM -0800, speck for Luck, Tony wrote: > That code really shouldn't ever have been using RDSEED (which is documented > as NOT scaling across invocations on multiple cores). Ripping it out would be very easy and problem would be solved. :-) -- Regards/Gruss, Boris. SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg -- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Additional sampling fun 2020-02-28 18:09 ` [MODERATED] " Luck, Tony 2020-02-28 18:40 ` Borislav Petkov @ 2020-02-28 21:53 ` Thomas Gleixner 2020-03-03 1:03 ` [MODERATED] " Luck, Tony 1 sibling, 1 reply; 21+ messages in thread From: Thomas Gleixner @ 2020-02-28 21:53 UTC (permalink / raw) To: speck speck for "Luck, Tony" <speck@linutronix.de> writes: > On Fri, Feb 28, 2020 at 06:44:48PM +0100, speck for Thomas Gleixner wrote: >> Have several cores with a 10k+ interrupts per second and if you're >> unlucky they start to contend, then the every 64th interrupt will be >> measurable quite prominent. >> >> But I agree with Greg, that we can tackle this on LKML without >> mentioning that particular issue. > > That code really shouldn't ever have been using RDSEED (which is documented > as NOT scaling across invocations on multiple cores). The only thing what the SDM says is: Under heavy load, with multiple cores executing RDSEED in parallel, it is possible for the demand of random numbers by software processes/threads to exceed the rate at which the random number generator hardware can supply them. This will lead to the RDSEED instruction returning no data transitorily. The RDSEED instruction indicates the occurrence of this situation by clearing the CF flag. I does not tell that it's slow to return CF=0. And if it does the current code just ignores it and carries on. So the question is whether the original RDSEED is slow already in the contended case or if the ucode mitigation will make it so. Thanks, tglx ^ permalink raw reply [flat|nested] 21+ messages in thread
* [MODERATED] Re: Additional sampling fun 2020-02-28 21:53 ` Thomas Gleixner @ 2020-03-03 1:03 ` Luck, Tony 0 siblings, 0 replies; 21+ messages in thread From: Luck, Tony @ 2020-03-03 1:03 UTC (permalink / raw) To: speck On Fri, Feb 28, 2020 at 10:53:25PM +0100, speck for Thomas Gleixner wrote: > speck for "Luck, Tony" <speck@linutronix.de> writes: > > On Fri, Feb 28, 2020 at 06:44:48PM +0100, speck for Thomas Gleixner wrote: > >> Have several cores with a 10k+ interrupts per second and if you're > >> unlucky they start to contend, then the every 64th interrupt will be > >> measurable quite prominent. > >> > >> But I agree with Greg, that we can tackle this on LKML without > >> mentioning that particular issue. > > > > That code really shouldn't ever have been using RDSEED (which is documented > > as NOT scaling across invocations on multiple cores). > > The only thing what the SDM says is: > > Under heavy load, with multiple cores executing RDSEED in parallel, it > is possible for the demand of random numbers by software > processes/threads to exceed the rate at which the random number > generator hardware can supply them. This will lead to the RDSEED > instruction returning no data transitorily. The RDSEED instruction > indicates the occurrence of this situation by clearing the CF flag. > > I does not tell that it's slow to return CF=0. And if it does the > current code just ignores it and carries on. The "Intel® Digital Random Number Generator (DRNG) Software Implementation Guide"[1] provides graphs in section 3.4.1 showing that RDRAND throughput scales linearly with the number of threads up to some saturation level (at about 10 threads in the graph ... but the footnote says that these results are "estimated" and "for informational purposes only"). I had thought that there was an RDSEED graph, but that must have been in some internal document. > So the question is whether the original RDSEED is slow already in the > contended case or if the ucode mitigation will make it so. If you plot throughput for RDSEED you'll get a flat line. Whatever byte rate out got from one thread is all there is. -Tony [1] https://software.intel.com/en-us/articles/intel-digital-random-number-generator-drng-software-implementation-guide ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2020-03-03 1:03 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-02-19 22:45 [MODERATED] [PATCH 0/2] more sampling fun 0 mark gross 2020-02-20 1:53 ` [MODERATED] " mark gross 2020-02-20 8:14 ` Greg KH 2020-02-20 14:27 ` Greg KH 2020-02-20 15:40 ` mark gross 2020-02-20 16:18 ` Greg KH 2020-02-20 14:55 ` Andi Kleen 2020-02-20 15:05 ` Greg KH 2020-02-20 16:55 ` Andi Kleen 2020-02-20 21:51 ` Josh Poimboeuf 2020-02-20 22:15 ` Andi Kleen 2020-02-20 22:59 ` Kees Cook 2020-02-20 15:09 ` mark gross 2020-02-28 16:21 ` [MODERATED] Additional sampling fun Borislav Petkov 2020-02-28 16:34 ` [MODERATED] " Greg KH 2020-02-28 17:38 ` mark gross 2020-02-28 17:44 ` Thomas Gleixner 2020-02-28 18:09 ` [MODERATED] " Luck, Tony 2020-02-28 18:40 ` Borislav Petkov 2020-02-28 21:53 ` Thomas Gleixner 2020-03-03 1:03 ` [MODERATED] " Luck, Tony
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).