linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* x86/random: Speculation to the rescue
@ 2019-09-28 22:24 Thomas Gleixner
  2019-09-28 23:53 ` Linus Torvalds
  0 siblings, 1 reply; 37+ messages in thread
From: Thomas Gleixner @ 2019-09-28 22:24 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Theodore Ts'o, Nicholas Mc Guire, x86,
	Andy Lutomirski, Kees Cook

Nicholas presented the idea to (ab)use speculative execution for random
number generation years ago at the Real-Time Linux Workshop:

   https://static.lwn.net/images/conf/rtlws11/random-hardware.pdf

The idea is to use the non-constant execution time of loops on speculative
CPUs. The trick is to "time" the execution with rdtsc() and having enough
other instructions within the loop which make the uops scheduling
non-deterministic.

A trivial loop:

	for (i = 0; i < nbits; i++) {
		t = rdtsc();
		if (t & 0x01ULL)
			d |= (1ULL << i);
	}

does not provide enough entropy, but adding a pointless construct into the
loop makes it work:

	for (i = 0; i < nbits; i++) {
		t = __sw_hweight64(rdtsc() + rdtsc();
		if (t & 0x01ULL)
			d |= (1ULL << i);
	}

The interesting part is that rdtsc() can be speculated and the software
variant of hweight64() is adding enough 'useless' instructions to make the
uops scheduling and therefore the execution time random enough so that bit
zero of the result yields useful entropy.

To verify that this is true, there is a debugfs interface which provides
two files:

    /sys/kernel/debug/x86/rnd_fill
    /sys/kernel/debug/x86/rnd_data

Writing anything into 'rnd_fill' triggers the entropy collection via the
above algorithm and stores 64bit in one go in a data array after running
the 64bit value through a lame folding algorithm (Fibonacci LFSR).

When the write returns (which takes less than 10ms), the buffer is filled
with 4096 64bit entries, i.e. 262144 bits. The resulting data was read out
from rnd_data and tested against dieharder and the test01 Rabbit suite.

Most runs pass all tests, but both test suites find occasionally a few
tests which they considers weak, but the weak points are not the same on
the failing runs. The number of weak findings depends on the
microarchitecture, older CPUs expose it more than newer ones, which is not
suprising as the speculation gets more crazy on newer generations.

As the folding algorithm is lazy and primitive, it's not necessarily a
proof that the approach is wrong.  Feeding the entropy into the proper
kernel random generator should make that go away. Aside of that as the
'weak' points are randomly moving around there is no real way to attack
them in my opinion (but I might be wrong as usual).

What's interesting is that even with the trivial 64bit folding the result
is surprisingly good. That means that the speculative execution provides
usable entropy.

A visual comparison of 262144 bits retrieved from /dev/random and from the
rng_data file is here:

  https://tglx.de/~tglx/random/index.html

The tests were successfully run on various generations of Intel and AMD
hardware, but of course that needs way more investigation than a few runs
on a few machines.

As this depends on the microarchitecure and the pipeline depth this needs
at least some basic runtime verification mechanism before utilizing this.

But contrary to the last two years experience, speculation seems to have
it's useful sides as well :)

Suggested-by: Nicholas Mc Guire <hofrat@opentech.at>
Not-Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/processor.h |    2 
 arch/x86/kernel/cpu/common.c     |    4 +
 arch/x86/kernel/tsc.c            |   95 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 101 insertions(+)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -988,4 +988,6 @@ enum mds_mitigations {
 	MDS_MITIGATION_VMWERV,
 };
 
+extern bool cpu_has_speculation;
+
 #endif /* _ASM_X86_PROCESSOR_H */
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -77,6 +77,9 @@ EXPORT_SYMBOL(smp_num_siblings);
 /* Last level cache ID of each logical CPU */
 DEFINE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id) = BAD_APICID;
 
+/* Indicator that the CPU is speculating */
+bool cpu_has_speculation __ro_after_init;
+
 /* correctly size the local cpu masks */
 void __init setup_cpu_local_masks(void)
 {
@@ -1099,6 +1102,7 @@ static void __init cpu_set_bug_bits(stru
 	if (cpu_matches(NO_SPECULATION))
 		return;
 
+	cpu_has_speculation = true;
 	setup_force_cpu_bug(X86_BUG_SPECTRE_V1);
 	setup_force_cpu_bug(X86_BUG_SPECTRE_V2);
 
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -14,6 +14,7 @@
 #include <linux/percpu.h>
 #include <linux/timex.h>
 #include <linux/static_key.h>
+#include <linux/debugfs.h>
 
 #include <asm/hpet.h>
 #include <asm/timer.h>
@@ -1531,3 +1532,97 @@ unsigned long calibrate_delay_is_known(v
 	return 0;
 }
 #endif
+
+unsigned int arch_get_speculative_entropy(u64 *buf, unsigned int nentries)
+{
+	unsigned int n, i;
+
+	if (!boot_cpu_has(X86_FEATURE_TSC) || !cpu_has_speculation)
+		return 0;
+
+	for (n = 0; n < nentries; n++) {
+		u64 d = 0;
+
+		/*
+		 * The loop below does not execute in constant time on
+		 * speculative CPUs. The trick is to do useless operations
+		 * between the TSC reads, i.e. calling __sw_hweight64().
+		 * This makes uops scheduling sufficiently random resulting
+		 * in useable entropy.
+		 */
+		for (i = 0; i < 64; i++) {
+			u64 t = rdtsc() + __sw_hweight64(rdtsc());
+
+			if (t & 0x01ULL)
+				d |= (1ULL << i);
+		}
+		buf[n] = d;
+	}
+	return nentries;
+}
+
+#define BUFSIZE 4096
+static u64 rng_array[4096];
+
+static struct debugfs_blob_wrapper rng_blob = {
+	.data = rng_array,
+	.size = sizeof(rng_array),
+};
+
+static u64 fold(u64 d)
+{
+	unsigned int i;
+	u64 res = d;
+
+	/*
+	 * Lazy and trivial folding just for testing purposes.
+	 *
+	 * Fibonacci LFSR with the primitive polynomial:
+	 * x^64 + x^61 + x^56 + x^31 + x^28 + x^23 + 1
+	 */
+	for (i = 1; i < 64; i++) {
+		u64 tmp = d << 63;
+
+		tmp = tmp >> 63;
+
+		tmp ^= ((res >> 63) & 1);
+		tmp ^= ((res >> 60) & 1);
+		tmp ^= ((res >> 55) & 1);
+		tmp ^= ((res >> 30) & 1);
+		tmp ^= ((res >> 27) & 1);
+		tmp ^= ((res >> 22) & 1);
+		res <<= 1;
+		res ^= tmp;
+	}
+	return res;
+}
+
+static ssize_t rngfill_write(struct file *file, const char __user *user_buf,
+			     size_t count, loff_t *ppos)
+{
+	unsigned int n;
+	u64 d;
+
+	for (n = 0; n < BUFSIZE; n++) {
+		if (!arch_get_speculative_entropy(&d, 1))
+			return -ENODEV;
+		rng_array[n] = fold(d);
+	}
+	return count;
+}
+
+static const struct file_operations fops_rngfill = {
+	.write = rngfill_write,
+	.llseek = default_llseek,
+};
+
+static int __init rng_debug_init(void)
+{
+	debugfs_create_file("rng_fill", S_IWUSR,
+			    arch_debugfs_dir, NULL, &fops_rngfill);
+
+	debugfs_create_blob("rng_data", 0444, arch_debugfs_dir, &rng_blob);
+
+	return 0;
+}
+late_initcall(rng_debug_init);

^ permalink raw reply	[flat|nested] 37+ messages in thread
* Re: x86/random: Speculation to the rescue
@ 2019-10-01  2:14 hgntkwis
  0 siblings, 0 replies; 37+ messages in thread
From: hgntkwis @ 2019-10-01  2:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: LKML

Why not get entropy from the white noise that can be obtained from any  
attached ADC? Audio cards, some SBCs, and microcontrollers all have  
ADCs.
Not that I'm familiar with when the kernel first needs entropy or an  
expert in the field.

Thanks



-------------------------------------------------
This free account was provided by VFEmail.net - report spam to abuse@vfemail.net
 
ONLY AT VFEmail! - Use our Metadata Mitigator to keep your email out of the NSA's hands!
$24.95 ONETIME Lifetime accounts with Privacy Features!  
15GB disk! No bandwidth quotas!
Commercial and Bulk Mail Options!  

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2019-10-15 21:50 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-28 22:24 x86/random: Speculation to the rescue Thomas Gleixner
2019-09-28 23:53 ` Linus Torvalds
2019-09-29  7:40   ` Thomas Gleixner
2019-09-29  8:05   ` Alexander E. Patrakov
2019-09-30  1:16   ` Linus Torvalds
2019-09-30  2:59     ` Linus Torvalds
2019-09-30  6:10       ` Borislav Petkov
2019-09-30 16:06         ` Linus Torvalds
2019-10-01 13:51           ` Borislav Petkov
2019-10-01 17:14             ` Linus Torvalds
2019-10-01 17:50               ` [PATCH] char/random: Add a newline at the end of the file Borislav Petkov
2019-09-30 18:05         ` x86/random: Speculation to the rescue Kees Cook
2019-09-30  3:37     ` Theodore Y. Ts'o
2019-09-30 13:16       ` Theodore Y. Ts'o
2019-09-30 16:15         ` Linus Torvalds
2019-09-30 16:32           ` Peter Zijlstra
2019-09-30 17:03             ` Linus Torvalds
2019-10-01 10:28           ` David Laight
2019-10-15 21:50             ` Thomas Gleixner
2019-10-01 16:15   ` Ahmed S. Darwish
2019-10-01 16:37     ` Kees Cook
2019-10-01 17:18       ` Ahmed S. Darwish
2019-10-01 17:25     ` Linus Torvalds
2019-10-06 12:07       ` Pavel Machek
2019-10-02 12:01     ` Theodore Y. Ts'o
2019-10-06 11:41   ` Pavel Machek
2019-10-06 17:26     ` Linus Torvalds
2019-10-06 17:35       ` Pavel Machek
2019-10-06 18:06         ` Linus Torvalds
2019-10-06 18:21           ` Pavel Machek
2019-10-06 18:26             ` Linus Torvalds
2019-10-07 11:47             ` Theodore Y. Ts'o
2019-10-07 22:18               ` Pavel Machek
2019-10-08 11:33                 ` David Laight
2019-10-09  8:02                   ` Pavel Machek
2019-10-09  9:37                     ` David Laight
2019-10-01  2:14 hgntkwis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).