linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param
@ 2022-03-28 11:17 Sasha Levin
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 02/43] regulator: rpi-panel: Handle I2C errors/timing to the Atmel Sasha Levin
                   ` (41 more replies)
  0 siblings, 42 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:17 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Casey Schaufler, syzbot+d1e3b1d92d25abf97943, James Morris,
	Paul Moore, Sasha Levin, jmorris, serge, stephen.smalley.work,
	eparis, linux-security-module, selinux

From: Casey Schaufler <casey@schaufler-ca.com>

[ Upstream commit ecff30575b5ad0eda149aadad247b7f75411fd47 ]

The usual LSM hook "bail on fail" scheme doesn't work for cases where
a security module may return an error code indicating that it does not
recognize an input.  In this particular case Smack sees a mount option
that it recognizes, and returns 0. A call to a BPF hook follows, which
returns -ENOPARAM, which confuses the caller because Smack has processed
its data.

The SELinux hook incorrectly returns 1 on success. There was a time
when this was correct, however the current expectation is that it
return 0 on success. This is repaired.

Reported-by: syzbot+d1e3b1d92d25abf97943@syzkaller.appspotmail.com
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 security/security.c      | 17 +++++++++++++++--
 security/selinux/hooks.c |  5 ++---
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/security/security.c b/security/security.c
index 22261d79f333..f101a53a63ed 100644
--- a/security/security.c
+++ b/security/security.c
@@ -884,9 +884,22 @@ int security_fs_context_dup(struct fs_context *fc, struct fs_context *src_fc)
 	return call_int_hook(fs_context_dup, 0, fc, src_fc);
 }
 
-int security_fs_context_parse_param(struct fs_context *fc, struct fs_parameter *param)
+int security_fs_context_parse_param(struct fs_context *fc,
+				    struct fs_parameter *param)
 {
-	return call_int_hook(fs_context_parse_param, -ENOPARAM, fc, param);
+	struct security_hook_list *hp;
+	int trc;
+	int rc = -ENOPARAM;
+
+	hlist_for_each_entry(hp, &security_hook_heads.fs_context_parse_param,
+			     list) {
+		trc = hp->hook.fs_context_parse_param(fc, param);
+		if (trc == 0)
+			rc = 0;
+		else if (trc != -ENOPARAM)
+			return trc;
+	}
+	return rc;
 }
 
 int security_sb_alloc(struct super_block *sb)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 5b6895e4fc29..371f67a37f9a 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -2860,10 +2860,9 @@ static int selinux_fs_context_parse_param(struct fs_context *fc,
 		return opt;
 
 	rc = selinux_add_opt(opt, param->string, &fc->security);
-	if (!rc) {
+	if (!rc)
 		param->string = NULL;
-		rc = 1;
-	}
+
 	return rc;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 02/43] regulator: rpi-panel: Handle I2C errors/timing to the Atmel
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
@ 2022-03-28 11:17 ` Sasha Levin
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 03/43] crypto: hisilicon/qm - cleanup warning in qm_vf_read_qos Sasha Levin
                   ` (40 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:17 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dave Stevenson, Detlev Casanova, Mark Brown, Sasha Levin, lgirdwood

From: Dave Stevenson <dave.stevenson@raspberrypi.com>

[ Upstream commit 5665eee7a3800430e7dc3ef6f25722476b603186 ]

The Atmel is doing some things in the I2C ISR, during which
period it will not respond to further commands. This is
particularly true of the POWERON command.

Increase delays appropriately, and retry should I2C errors be
reported.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.com>
Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com>
Link: https://lore.kernel.org/r/20220124220129.158891-3-detlev.casanova@collabora.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 .../regulator/rpi-panel-attiny-regulator.c    | 56 +++++++++++++++----
 1 file changed, 46 insertions(+), 10 deletions(-)

diff --git a/drivers/regulator/rpi-panel-attiny-regulator.c b/drivers/regulator/rpi-panel-attiny-regulator.c
index ee46bfbf5eee..991b4730d768 100644
--- a/drivers/regulator/rpi-panel-attiny-regulator.c
+++ b/drivers/regulator/rpi-panel-attiny-regulator.c
@@ -37,11 +37,24 @@ static const struct regmap_config attiny_regmap_config = {
 static int attiny_lcd_power_enable(struct regulator_dev *rdev)
 {
 	unsigned int data;
+	int ret, i;
 
 	regmap_write(rdev->regmap, REG_POWERON, 1);
+	msleep(80);
+
 	/* Wait for nPWRDWN to go low to indicate poweron is done. */
-	regmap_read_poll_timeout(rdev->regmap, REG_PORTB, data,
-					data & BIT(0), 10, 1000000);
+	for (i = 0; i < 20; i++) {
+		ret = regmap_read(rdev->regmap, REG_PORTB, &data);
+		if (!ret) {
+			if (data & BIT(0))
+				break;
+		}
+		usleep_range(10000, 12000);
+	}
+	usleep_range(10000, 12000);
+
+	if (ret)
+		pr_err("%s: regmap_read_poll_timeout failed %d\n", __func__, ret);
 
 	/* Default to the same orientation as the closed source
 	 * firmware used for the panel.  Runtime rotation
@@ -57,23 +70,34 @@ static int attiny_lcd_power_disable(struct regulator_dev *rdev)
 {
 	regmap_write(rdev->regmap, REG_PWM, 0);
 	regmap_write(rdev->regmap, REG_POWERON, 0);
-	udelay(1);
+	msleep(30);
 	return 0;
 }
 
 static int attiny_lcd_power_is_enabled(struct regulator_dev *rdev)
 {
 	unsigned int data;
-	int ret;
+	int ret, i;
 
-	ret = regmap_read(rdev->regmap, REG_POWERON, &data);
+	for (i = 0; i < 10; i++) {
+		ret = regmap_read(rdev->regmap, REG_POWERON, &data);
+		if (!ret)
+			break;
+		usleep_range(10000, 12000);
+	}
 	if (ret < 0)
 		return ret;
 
 	if (!(data & BIT(0)))
 		return 0;
 
-	ret = regmap_read(rdev->regmap, REG_PORTB, &data);
+	for (i = 0; i < 10; i++) {
+		ret = regmap_read(rdev->regmap, REG_PORTB, &data);
+		if (!ret)
+			break;
+		usleep_range(10000, 12000);
+	}
+
 	if (ret < 0)
 		return ret;
 
@@ -103,20 +127,32 @@ static int attiny_update_status(struct backlight_device *bl)
 {
 	struct regmap *regmap = bl_get_data(bl);
 	int brightness = bl->props.brightness;
+	int ret, i;
 
 	if (bl->props.power != FB_BLANK_UNBLANK ||
 	    bl->props.fb_blank != FB_BLANK_UNBLANK)
 		brightness = 0;
 
-	return regmap_write(regmap, REG_PWM, brightness);
+	for (i = 0; i < 10; i++) {
+		ret = regmap_write(regmap, REG_PWM, brightness);
+		if (!ret)
+			break;
+	}
+
+	return ret;
 }
 
 static int attiny_get_brightness(struct backlight_device *bl)
 {
 	struct regmap *regmap = bl_get_data(bl);
-	int ret, brightness;
+	int ret, brightness, i;
+
+	for (i = 0; i < 10; i++) {
+		ret = regmap_read(regmap, REG_PWM, &brightness);
+		if (!ret)
+			break;
+	}
 
-	ret = regmap_read(regmap, REG_PWM, &brightness);
 	if (ret)
 		return ret;
 
@@ -166,7 +202,7 @@ static int attiny_i2c_probe(struct i2c_client *i2c,
 	}
 
 	regmap_write(regmap, REG_POWERON, 0);
-	mdelay(1);
+	msleep(30);
 
 	config.dev = &i2c->dev;
 	config.regmap = regmap;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 03/43] crypto: hisilicon/qm - cleanup warning in qm_vf_read_qos
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 02/43] regulator: rpi-panel: Handle I2C errors/timing to the Atmel Sasha Levin
@ 2022-03-28 11:17 ` Sasha Levin
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 04/43] crypto: octeontx2 - CN10K CPT to RNM workaround Sasha Levin
                   ` (39 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:17 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Kai Ye, Herbert Xu, Sasha Levin, wangzhou1, davem, linux-crypto

From: Kai Ye <yekai13@huawei.com>

[ Upstream commit 05b3bade290d6c940701f97f3233c07cfe27205d ]

The kernel test rebot report this warning: Uninitialized variable: ret.
The code flow may return value of ret directly. This value is an
uninitialized variable, here is fix it.

Signed-off-by: Kai Ye <yekai13@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/crypto/hisilicon/qm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
index c5b84a5ea350..3b29c8993b8c 100644
--- a/drivers/crypto/hisilicon/qm.c
+++ b/drivers/crypto/hisilicon/qm.c
@@ -4295,7 +4295,7 @@ static void qm_vf_get_qos(struct hisi_qm *qm, u32 fun_num)
 static int qm_vf_read_qos(struct hisi_qm *qm)
 {
 	int cnt = 0;
-	int ret;
+	int ret = -EINVAL;
 
 	/* reset mailbox qos val */
 	qm->mb_qos = 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 04/43] crypto: octeontx2 - CN10K CPT to RNM workaround
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 02/43] regulator: rpi-panel: Handle I2C errors/timing to the Atmel Sasha Levin
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 03/43] crypto: hisilicon/qm - cleanup warning in qm_vf_read_qos Sasha Levin
@ 2022-03-28 11:17 ` Sasha Levin
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 05/43] gcc-plugins/stackleak: Exactly match strings instead of prefixes Sasha Levin
                   ` (38 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:17 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Srujana Challa, Shijith Thotton, Herbert Xu, Sasha Levin,
	bbrezillon, arno, davem, dan.carpenter, ardb, keescook,
	jiapeng.chong, linux-crypto

From: Srujana Challa <schalla@marvell.com>

[ Upstream commit bd9305b0cb69bfe98885a63a9e6231ae92e822e2 ]

When software sets CPT_AF_CTL[RNM_REQ_EN]=1 and RNM in not producing
entropy(i.e., RNM_ENTROPY_STATUS[NORMAL_CNT] < 0x40), the first cycle of
the response may be lost due to a conditional clocking issue. Due to
this, the subsequent random number stream will be corrupted. So, this
patch adds support to ensure RNM_ENTROPY_STATUS[NORMAL_CNT] = 0x40
before writing CPT_AF_CTL[RNM_REQ_EN] = 1, as a workaround.

Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Shijith Thotton <sthotton@marvell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 .../marvell/octeontx2/otx2_cptpf_ucode.c      | 43 ++++++++++++++++++-
 1 file changed, 42 insertions(+), 1 deletion(-)

diff --git a/drivers/crypto/marvell/octeontx2/otx2_cptpf_ucode.c b/drivers/crypto/marvell/octeontx2/otx2_cptpf_ucode.c
index 1b4d425bbf0e..7fd4503d9cfc 100644
--- a/drivers/crypto/marvell/octeontx2/otx2_cptpf_ucode.c
+++ b/drivers/crypto/marvell/octeontx2/otx2_cptpf_ucode.c
@@ -1076,6 +1076,39 @@ static void delete_engine_grps(struct pci_dev *pdev,
 		delete_engine_group(&pdev->dev, &eng_grps->grp[i]);
 }
 
+#define PCI_DEVID_CN10K_RNM 0xA098
+#define RNM_ENTROPY_STATUS  0x8
+
+static void rnm_to_cpt_errata_fixup(struct device *dev)
+{
+	struct pci_dev *pdev;
+	void __iomem *base;
+	int timeout = 5000;
+
+	pdev = pci_get_device(PCI_VENDOR_ID_CAVIUM, PCI_DEVID_CN10K_RNM, NULL);
+	if (!pdev)
+		return;
+
+	base = pci_ioremap_bar(pdev, 0);
+	if (!base)
+		goto put_pdev;
+
+	while ((readq(base + RNM_ENTROPY_STATUS) & 0x7F) != 0x40) {
+		cpu_relax();
+		udelay(1);
+		timeout--;
+		if (!timeout) {
+			dev_warn(dev, "RNM is not producing entropy\n");
+			break;
+		}
+	}
+
+	iounmap(base);
+
+put_pdev:
+	pci_dev_put(pdev);
+}
+
 int otx2_cpt_get_eng_grp(struct otx2_cpt_eng_grps *eng_grps, int eng_type)
 {
 
@@ -1189,9 +1222,17 @@ int otx2_cpt_create_eng_grps(struct otx2_cptpf_dev *cptpf,
 
 	if (is_dev_otx2(pdev))
 		goto unlock;
+
+	/*
+	 * Ensure RNM_ENTROPY_STATUS[NORMAL_CNT] = 0x40 before writing
+	 * CPT_AF_CTL[RNM_REQ_EN] = 1 as a workaround for HW errata.
+	 */
+	rnm_to_cpt_errata_fixup(&pdev->dev);
+
 	/*
 	 * Configure engine group mask to allow context prefetching
-	 * for the groups.
+	 * for the groups and enable random number request, to enable
+	 * CPT to request random numbers from RNM.
 	 */
 	otx2_cpt_write_af_reg(&cptpf->afpf_mbox, pdev, CPT_AF_CTL,
 			      OTX2_CPT_ALL_ENG_GRPS_MASK << 3 | BIT_ULL(16),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 05/43] gcc-plugins/stackleak: Exactly match strings instead of prefixes
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (2 preceding siblings ...)
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 04/43] crypto: octeontx2 - CN10K CPT to RNM workaround Sasha Levin
@ 2022-03-28 11:17 ` Sasha Levin
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 06/43] rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion Sasha Levin
                   ` (37 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:17 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Kees Cook, Alexander Popov, Sasha Levin, linux-hardening

From: Kees Cook <keescook@chromium.org>

[ Upstream commit 27e9faf415dbf94af19b9c827842435edbc1fbbc ]

Since STRING_CST may not be NUL terminated, strncmp() was used for check
for equality. However, this may lead to mismatches for longer section
names where the start matches the tested-for string. Test for exact
equality by checking for the presences of NUL termination.

Cc: Alexander Popov <alex.popov@linux.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 scripts/gcc-plugins/stackleak_plugin.c | 25 +++++++++++++++++++++----
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/scripts/gcc-plugins/stackleak_plugin.c b/scripts/gcc-plugins/stackleak_plugin.c
index e9db7dcb3e5f..b04aa8e91a41 100644
--- a/scripts/gcc-plugins/stackleak_plugin.c
+++ b/scripts/gcc-plugins/stackleak_plugin.c
@@ -429,6 +429,23 @@ static unsigned int stackleak_cleanup_execute(void)
 	return 0;
 }
 
+/*
+ * STRING_CST may or may not be NUL terminated:
+ * https://gcc.gnu.org/onlinedocs/gccint/Constant-expressions.html
+ */
+static inline bool string_equal(tree node, const char *string, int length)
+{
+	if (TREE_STRING_LENGTH(node) < length)
+		return false;
+	if (TREE_STRING_LENGTH(node) > length + 1)
+		return false;
+	if (TREE_STRING_LENGTH(node) == length + 1 &&
+	    TREE_STRING_POINTER(node)[length] != '\0')
+		return false;
+	return !memcmp(TREE_STRING_POINTER(node), string, length);
+}
+#define STRING_EQUAL(node, str)	string_equal(node, str, strlen(str))
+
 static bool stackleak_gate(void)
 {
 	tree section;
@@ -438,13 +455,13 @@ static bool stackleak_gate(void)
 	if (section && TREE_VALUE(section)) {
 		section = TREE_VALUE(TREE_VALUE(section));
 
-		if (!strncmp(TREE_STRING_POINTER(section), ".init.text", 10))
+		if (STRING_EQUAL(section, ".init.text"))
 			return false;
-		if (!strncmp(TREE_STRING_POINTER(section), ".devinit.text", 13))
+		if (STRING_EQUAL(section, ".devinit.text"))
 			return false;
-		if (!strncmp(TREE_STRING_POINTER(section), ".cpuinit.text", 13))
+		if (STRING_EQUAL(section, ".cpuinit.text"))
 			return false;
-		if (!strncmp(TREE_STRING_POINTER(section), ".meminit.text", 13))
+		if (STRING_EQUAL(section, ".meminit.text"))
 			return false;
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 06/43] rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (3 preceding siblings ...)
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 05/43] gcc-plugins/stackleak: Exactly match strings instead of prefixes Sasha Levin
@ 2022-03-28 11:17 ` Sasha Levin
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 07/43] pinctrl: npcm: Fix broken references to chip->parent_device Sasha Levin
                   ` (36 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:17 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: David Woodhouse, Paul E . McKenney, Sasha Levin, frederic,
	quic_neeraju, josh, rcu

From: David Woodhouse <dwmw@amazon.co.uk>

[ Upstream commit 82980b1622d97017053c6792382469d7dc26a486 ]

If we allow architectures to bring APs online in parallel, then we end
up requiring rcu_cpu_starting() to be reentrant. But currently, the
manipulation of rnp->ofl_seq is not thread-safe.

However, rnp->ofl_seq is also fairly much pointless anyway since both
rcu_cpu_starting() and rcu_report_dead() hold rcu_state.ofl_lock for
fairly much the whole time that rnp->ofl_seq is set to an odd number
to indicate that an operation is in progress.

So drop rnp->ofl_seq completely, and use only rcu_state.ofl_lock.

This has a couple of minor complexities: lockdep will complain when we
take rcu_state.ofl_lock, and currently accepts the 'excuse' of having
an odd value in rnp->ofl_seq. So switch it to an arch_spinlock_t to
avoid that false positive complaint. Since we're killing rnp->ofl_seq
of course that 'excuse' has to be changed too, so make it check for
arch_spin_is_locked(rcu_state.ofl_lock).

There's no arch_spin_lock_irqsave() so we have to manually save and
restore local interrupts around the locking.

At Paul's request based on Neeraj's analysis, make rcu_gp_init not just
wait but *exclude* any CPU online/offline activity, which was fairly
much true already by virtue of it holding rcu_state.ofl_lock.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/rcu/tree.c | 71 ++++++++++++++++++++++++-----------------------
 kernel/rcu/tree.h |  4 +--
 2 files changed, 37 insertions(+), 38 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index a4c25a6283b0..73a4c9d07b86 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -91,7 +91,7 @@ static struct rcu_state rcu_state = {
 	.abbr = RCU_ABBR,
 	.exp_mutex = __MUTEX_INITIALIZER(rcu_state.exp_mutex),
 	.exp_wake_mutex = __MUTEX_INITIALIZER(rcu_state.exp_wake_mutex),
-	.ofl_lock = __RAW_SPIN_LOCK_UNLOCKED(rcu_state.ofl_lock),
+	.ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED,
 };
 
 /* Dump rcu_node combining tree at boot to verify correct setup. */
@@ -1175,7 +1175,15 @@ bool rcu_lockdep_current_cpu_online(void)
 	preempt_disable_notrace();
 	rdp = this_cpu_ptr(&rcu_data);
 	rnp = rdp->mynode;
-	if (rdp->grpmask & rcu_rnp_online_cpus(rnp) || READ_ONCE(rnp->ofl_seq) & 0x1)
+	/*
+	 * Strictly, we care here about the case where the current CPU is
+	 * in rcu_cpu_starting() and thus has an excuse for rdp->grpmask
+	 * not being up to date. So arch_spin_is_locked() might have a
+	 * false positive if it's held by some *other* CPU, but that's
+	 * OK because that just means a false *negative* on the warning.
+	 */
+	if (rdp->grpmask & rcu_rnp_online_cpus(rnp) ||
+	    arch_spin_is_locked(&rcu_state.ofl_lock))
 		ret = true;
 	preempt_enable_notrace();
 	return ret;
@@ -1739,7 +1747,6 @@ static void rcu_strict_gp_boundary(void *unused)
  */
 static noinline_for_stack bool rcu_gp_init(void)
 {
-	unsigned long firstseq;
 	unsigned long flags;
 	unsigned long oldmask;
 	unsigned long mask;
@@ -1782,22 +1789,17 @@ static noinline_for_stack bool rcu_gp_init(void)
 	 * of RCU's Requirements documentation.
 	 */
 	WRITE_ONCE(rcu_state.gp_state, RCU_GP_ONOFF);
+	/* Exclude CPU hotplug operations. */
 	rcu_for_each_leaf_node(rnp) {
-		// Wait for CPU-hotplug operations that might have
-		// started before this grace period did.
-		smp_mb(); // Pair with barriers used when updating ->ofl_seq to odd values.
-		firstseq = READ_ONCE(rnp->ofl_seq);
-		if (firstseq & 0x1)
-			while (firstseq == READ_ONCE(rnp->ofl_seq))
-				schedule_timeout_idle(1);  // Can't wake unless RCU is watching.
-		smp_mb(); // Pair with barriers used when updating ->ofl_seq to even values.
-		raw_spin_lock(&rcu_state.ofl_lock);
-		raw_spin_lock_irq_rcu_node(rnp);
+		local_irq_save(flags);
+		arch_spin_lock(&rcu_state.ofl_lock);
+		raw_spin_lock_rcu_node(rnp);
 		if (rnp->qsmaskinit == rnp->qsmaskinitnext &&
 		    !rnp->wait_blkd_tasks) {
 			/* Nothing to do on this leaf rcu_node structure. */
-			raw_spin_unlock_irq_rcu_node(rnp);
-			raw_spin_unlock(&rcu_state.ofl_lock);
+			raw_spin_unlock_rcu_node(rnp);
+			arch_spin_unlock(&rcu_state.ofl_lock);
+			local_irq_restore(flags);
 			continue;
 		}
 
@@ -1832,8 +1834,9 @@ static noinline_for_stack bool rcu_gp_init(void)
 				rcu_cleanup_dead_rnp(rnp);
 		}
 
-		raw_spin_unlock_irq_rcu_node(rnp);
-		raw_spin_unlock(&rcu_state.ofl_lock);
+		raw_spin_unlock_rcu_node(rnp);
+		arch_spin_unlock(&rcu_state.ofl_lock);
+		local_irq_restore(flags);
 	}
 	rcu_gp_slow(gp_preinit_delay); /* Races with CPU hotplug. */
 
@@ -4287,11 +4290,10 @@ void rcu_cpu_starting(unsigned int cpu)
 
 	rnp = rdp->mynode;
 	mask = rdp->grpmask;
-	WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
-	WARN_ON_ONCE(!(rnp->ofl_seq & 0x1));
+	local_irq_save(flags);
+	arch_spin_lock(&rcu_state.ofl_lock);
 	rcu_dynticks_eqs_online();
-	smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
-	raw_spin_lock_irqsave_rcu_node(rnp, flags);
+	raw_spin_lock_rcu_node(rnp);
 	WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext | mask);
 	newcpu = !(rnp->expmaskinitnext & mask);
 	rnp->expmaskinitnext |= mask;
@@ -4304,15 +4306,18 @@ void rcu_cpu_starting(unsigned int cpu)
 
 	/* An incoming CPU should never be blocking a grace period. */
 	if (WARN_ON_ONCE(rnp->qsmask & mask)) { /* RCU waiting on incoming CPU? */
+		/* rcu_report_qs_rnp() *really* wants some flags to restore */
+		unsigned long flags2;
+
+		local_irq_save(flags2);
 		rcu_disable_urgency_upon_qs(rdp);
 		/* Report QS -after- changing ->qsmaskinitnext! */
-		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
+		rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags2);
 	} else {
-		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
+		raw_spin_unlock_rcu_node(rnp);
 	}
-	smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
-	WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
-	WARN_ON_ONCE(rnp->ofl_seq & 0x1);
+	arch_spin_unlock(&rcu_state.ofl_lock);
+	local_irq_restore(flags);
 	smp_mb(); /* Ensure RCU read-side usage follows above initialization. */
 }
 
@@ -4326,7 +4331,7 @@ void rcu_cpu_starting(unsigned int cpu)
  */
 void rcu_report_dead(unsigned int cpu)
 {
-	unsigned long flags;
+	unsigned long flags, seq_flags;
 	unsigned long mask;
 	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
 	struct rcu_node *rnp = rdp->mynode;  /* Outgoing CPU's rdp & rnp. */
@@ -4340,10 +4345,8 @@ void rcu_report_dead(unsigned int cpu)
 
 	/* Remove outgoing CPU from mask in the leaf rcu_node structure. */
 	mask = rdp->grpmask;
-	WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
-	WARN_ON_ONCE(!(rnp->ofl_seq & 0x1));
-	smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
-	raw_spin_lock(&rcu_state.ofl_lock);
+	local_irq_save(seq_flags);
+	arch_spin_lock(&rcu_state.ofl_lock);
 	raw_spin_lock_irqsave_rcu_node(rnp, flags); /* Enforce GP memory-order guarantee. */
 	rdp->rcu_ofl_gp_seq = READ_ONCE(rcu_state.gp_seq);
 	rdp->rcu_ofl_gp_flags = READ_ONCE(rcu_state.gp_flags);
@@ -4354,10 +4357,8 @@ void rcu_report_dead(unsigned int cpu)
 	}
 	WRITE_ONCE(rnp->qsmaskinitnext, rnp->qsmaskinitnext & ~mask);
 	raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
-	raw_spin_unlock(&rcu_state.ofl_lock);
-	smp_mb(); // Pair with rcu_gp_cleanup()'s ->ofl_seq barrier().
-	WRITE_ONCE(rnp->ofl_seq, rnp->ofl_seq + 1);
-	WARN_ON_ONCE(rnp->ofl_seq & 0x1);
+	arch_spin_unlock(&rcu_state.ofl_lock);
+	local_irq_restore(seq_flags);
 
 	rdp->cpu_started = false;
 }
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 486fc901bd08..4b4bcef8a974 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -56,8 +56,6 @@ struct rcu_node {
 				/*  Initialized from ->qsmaskinitnext at the */
 				/*  beginning of each grace period. */
 	unsigned long qsmaskinitnext;
-	unsigned long ofl_seq;	/* CPU-hotplug operation sequence count. */
-				/* Online CPUs for next grace period. */
 	unsigned long expmask;	/* CPUs or groups that need to check in */
 				/*  to allow the current expedited GP */
 				/*  to complete. */
@@ -355,7 +353,7 @@ struct rcu_state {
 	const char *name;			/* Name of structure. */
 	char abbr;				/* Abbreviated name. */
 
-	raw_spinlock_t ofl_lock ____cacheline_internodealigned_in_smp;
+	arch_spinlock_t ofl_lock ____cacheline_internodealigned_in_smp;
 						/* Synchronize offline with */
 						/*  GP pre-initialization. */
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 07/43] pinctrl: npcm: Fix broken references to chip->parent_device
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (4 preceding siblings ...)
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 06/43] rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion Sasha Levin
@ 2022-03-28 11:17 ` Sasha Levin
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 08/43] rcu: Mark writes to the rcu_segcblist structure's ->flags field Sasha Levin
                   ` (35 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:17 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Marc Zyngier, Bartosz Golaszewski, Sasha Levin, avifishman70,
	tmaimon77, tali.perry1, linus.walleij, openbmc, linux-gpio

From: Marc Zyngier <maz@kernel.org>

[ Upstream commit f7e53e2255808ca3abcc8f38d18ad0823425e771 ]

The npcm driver has a bunch of references to the irq_chip parent_device
field, but never sets it.

Fix it by fishing that reference from somewhere else, but it is
obvious that these debug statements were never used. Also remove
an unused field in a local data structure.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Bartosz Golaszewski <brgl@bgdev.pl>
Link: https://lore.kernel.org/r/20220201120310.878267-11-maz@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/pinctrl/nuvoton/pinctrl-npcm7xx.c | 25 +++++++++++------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/pinctrl/nuvoton/pinctrl-npcm7xx.c b/drivers/pinctrl/nuvoton/pinctrl-npcm7xx.c
index 4d81908d6725..ba536fd4d674 100644
--- a/drivers/pinctrl/nuvoton/pinctrl-npcm7xx.c
+++ b/drivers/pinctrl/nuvoton/pinctrl-npcm7xx.c
@@ -78,7 +78,6 @@ struct npcm7xx_gpio {
 	struct gpio_chip	gc;
 	int			irqbase;
 	int			irq;
-	void			*priv;
 	struct irq_chip		irq_chip;
 	u32			pinctrl_id;
 	int (*direction_input)(struct gpio_chip *chip, unsigned offset);
@@ -226,7 +225,7 @@ static void npcmgpio_irq_handler(struct irq_desc *desc)
 	chained_irq_enter(chip, desc);
 	sts = ioread32(bank->base + NPCM7XX_GP_N_EVST);
 	en  = ioread32(bank->base + NPCM7XX_GP_N_EVEN);
-	dev_dbg(chip->parent_device, "==> got irq sts %.8x %.8x\n", sts,
+	dev_dbg(bank->gc.parent, "==> got irq sts %.8x %.8x\n", sts,
 		en);
 
 	sts &= en;
@@ -241,33 +240,33 @@ static int npcmgpio_set_irq_type(struct irq_data *d, unsigned int type)
 		gpiochip_get_data(irq_data_get_irq_chip_data(d));
 	unsigned int gpio = BIT(d->hwirq);
 
-	dev_dbg(d->chip->parent_device, "setirqtype: %u.%u = %u\n", gpio,
+	dev_dbg(bank->gc.parent, "setirqtype: %u.%u = %u\n", gpio,
 		d->irq, type);
 	switch (type) {
 	case IRQ_TYPE_EDGE_RISING:
-		dev_dbg(d->chip->parent_device, "edge.rising\n");
+		dev_dbg(bank->gc.parent, "edge.rising\n");
 		npcm_gpio_clr(&bank->gc, bank->base + NPCM7XX_GP_N_EVBE, gpio);
 		npcm_gpio_clr(&bank->gc, bank->base + NPCM7XX_GP_N_POL, gpio);
 		break;
 	case IRQ_TYPE_EDGE_FALLING:
-		dev_dbg(d->chip->parent_device, "edge.falling\n");
+		dev_dbg(bank->gc.parent, "edge.falling\n");
 		npcm_gpio_clr(&bank->gc, bank->base + NPCM7XX_GP_N_EVBE, gpio);
 		npcm_gpio_set(&bank->gc, bank->base + NPCM7XX_GP_N_POL, gpio);
 		break;
 	case IRQ_TYPE_EDGE_BOTH:
-		dev_dbg(d->chip->parent_device, "edge.both\n");
+		dev_dbg(bank->gc.parent, "edge.both\n");
 		npcm_gpio_set(&bank->gc, bank->base + NPCM7XX_GP_N_EVBE, gpio);
 		break;
 	case IRQ_TYPE_LEVEL_LOW:
-		dev_dbg(d->chip->parent_device, "level.low\n");
+		dev_dbg(bank->gc.parent, "level.low\n");
 		npcm_gpio_set(&bank->gc, bank->base + NPCM7XX_GP_N_POL, gpio);
 		break;
 	case IRQ_TYPE_LEVEL_HIGH:
-		dev_dbg(d->chip->parent_device, "level.high\n");
+		dev_dbg(bank->gc.parent, "level.high\n");
 		npcm_gpio_clr(&bank->gc, bank->base + NPCM7XX_GP_N_POL, gpio);
 		break;
 	default:
-		dev_dbg(d->chip->parent_device, "invalid irq type\n");
+		dev_dbg(bank->gc.parent, "invalid irq type\n");
 		return -EINVAL;
 	}
 
@@ -289,7 +288,7 @@ static void npcmgpio_irq_ack(struct irq_data *d)
 		gpiochip_get_data(irq_data_get_irq_chip_data(d));
 	unsigned int gpio = d->hwirq;
 
-	dev_dbg(d->chip->parent_device, "irq_ack: %u.%u\n", gpio, d->irq);
+	dev_dbg(bank->gc.parent, "irq_ack: %u.%u\n", gpio, d->irq);
 	iowrite32(BIT(gpio), bank->base + NPCM7XX_GP_N_EVST);
 }
 
@@ -301,7 +300,7 @@ static void npcmgpio_irq_mask(struct irq_data *d)
 	unsigned int gpio = d->hwirq;
 
 	/* Clear events */
-	dev_dbg(d->chip->parent_device, "irq_mask: %u.%u\n", gpio, d->irq);
+	dev_dbg(bank->gc.parent, "irq_mask: %u.%u\n", gpio, d->irq);
 	iowrite32(BIT(gpio), bank->base + NPCM7XX_GP_N_EVENC);
 }
 
@@ -313,7 +312,7 @@ static void npcmgpio_irq_unmask(struct irq_data *d)
 	unsigned int gpio = d->hwirq;
 
 	/* Enable events */
-	dev_dbg(d->chip->parent_device, "irq_unmask: %u.%u\n", gpio, d->irq);
+	dev_dbg(bank->gc.parent, "irq_unmask: %u.%u\n", gpio, d->irq);
 	iowrite32(BIT(gpio), bank->base + NPCM7XX_GP_N_EVENS);
 }
 
@@ -323,7 +322,7 @@ static unsigned int npcmgpio_irq_startup(struct irq_data *d)
 	unsigned int gpio = d->hwirq;
 
 	/* active-high, input, clear interrupt, enable interrupt */
-	dev_dbg(d->chip->parent_device, "startup: %u.%u\n", gpio, d->irq);
+	dev_dbg(gc->parent, "startup: %u.%u\n", gpio, d->irq);
 	npcmgpio_direction_input(gc, gpio);
 	npcmgpio_irq_ack(d);
 	npcmgpio_irq_unmask(d);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 08/43] rcu: Mark writes to the rcu_segcblist structure's ->flags field
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (5 preceding siblings ...)
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 07/43] pinctrl: npcm: Fix broken references to chip->parent_device Sasha Levin
@ 2022-03-28 11:17 ` Sasha Levin
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 09/43] block: throttle split bio in case of iops limit Sasha Levin
                   ` (34 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:17 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Paul E. McKenney, Zhouyi Zhou, Frederic Weisbecker, Sasha Levin,
	quic_neeraju, josh, rcu

From: "Paul E. McKenney" <paulmck@kernel.org>

[ Upstream commit c09929031018913b5783872a8b8cdddef4a543c7 ]

KCSAN reports data races between the rcu_segcblist_clear_flags() and
rcu_segcblist_set_flags() functions, though misreporting the latter
as a call to rcu_segcblist_is_enabled() from call_rcu().  This commit
converts the updates of this field to WRITE_ONCE(), relying on the
resulting unmarked reads to continue to detect buggy concurrent writes
to this field.

Reported-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/rcu/rcu_segcblist.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h
index e373fbe44da5..431cee212467 100644
--- a/kernel/rcu/rcu_segcblist.h
+++ b/kernel/rcu/rcu_segcblist.h
@@ -56,13 +56,13 @@ static inline long rcu_segcblist_n_cbs(struct rcu_segcblist *rsclp)
 static inline void rcu_segcblist_set_flags(struct rcu_segcblist *rsclp,
 					   int flags)
 {
-	rsclp->flags |= flags;
+	WRITE_ONCE(rsclp->flags, rsclp->flags | flags);
 }
 
 static inline void rcu_segcblist_clear_flags(struct rcu_segcblist *rsclp,
 					     int flags)
 {
-	rsclp->flags &= ~flags;
+	WRITE_ONCE(rsclp->flags, rsclp->flags & ~flags);
 }
 
 static inline bool rcu_segcblist_test_flags(struct rcu_segcblist *rsclp,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 09/43] block: throttle split bio in case of iops limit
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (6 preceding siblings ...)
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 08/43] rcu: Mark writes to the rcu_segcblist structure's ->flags field Sasha Levin
@ 2022-03-28 11:17 ` Sasha Levin
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 10/43] memstick/mspro_block: fix handling of read-only devices Sasha Levin
                   ` (33 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:17 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Ming Lei, Ning Li, Tejun Heo, Chunguang Xu, Jens Axboe,
	Sasha Levin, linux-block, cgroups

From: Ming Lei <ming.lei@redhat.com>

[ Upstream commit 9f5ede3c01f9951b0ae7d68b28762ad51d9bacc8 ]

Commit 111be8839817 ("block-throttle: avoid double charge") marks bio as
BIO_THROTTLED unconditionally if __blk_throtl_bio() is called on this bio,
then this bio won't be called into __blk_throtl_bio() any more. This way
is to avoid double charge in case of bio splitting. It is reasonable for
read/write throughput limit, but not reasonable for IOPS limit because
block layer provides io accounting against split bio.

Chunguang Xu has already observed this issue and fixed it in commit
4f1e9630afe6 ("blk-throtl: optimize IOPS throttle for large IO scenarios").
However, that patch only covers bio splitting in __blk_queue_split(), and
we have other kind of bio splitting, such as bio_split() &
submit_bio_noacct() and other ways.

This patch tries to fix the issue in one generic way by always charging
the bio for iops limit in blk_throtl_bio(). This way is reasonable:
re-submission & fast-cloned bio is charged if it is submitted to same
disk/queue, and BIO_THROTTLED will be cleared if bio->bi_bdev is changed.

This new approach can get much more smooth/stable iops limit compared with
commit 4f1e9630afe6 ("blk-throtl: optimize IOPS throttle for large IO
scenarios") since that commit can't throttle current split bios actually.

Also this way won't cause new double bio iops charge in
blk_throtl_dispatch_work_fn() in which blk_throtl_bio() won't be called
any more.

Reported-by: Ning Li <lining2020x@163.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Chunguang Xu <brookxu@tencent.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220216044514.2903784-7-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 block/blk-merge.c    |  2 --
 block/blk-throttle.c | 10 +++++++---
 block/blk-throttle.h |  2 --
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 4de34a332c9f..f5255991b773 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -368,8 +368,6 @@ void __blk_queue_split(struct request_queue *q, struct bio **bio,
 		trace_block_split(split, (*bio)->bi_iter.bi_sector);
 		submit_bio_noacct(*bio);
 		*bio = split;
-
-		blk_throtl_charge_bio_split(*bio);
 	}
 }
 
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 7c462c006b26..87769b337fc5 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -808,7 +808,8 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio,
 	unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd;
 	unsigned int bio_size = throtl_bio_data_size(bio);
 
-	if (bps_limit == U64_MAX) {
+	/* no need to throttle if this bio's bytes have been accounted */
+	if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) {
 		if (wait)
 			*wait = 0;
 		return true;
@@ -920,9 +921,12 @@ static void throtl_charge_bio(struct throtl_grp *tg, struct bio *bio)
 	unsigned int bio_size = throtl_bio_data_size(bio);
 
 	/* Charge the bio to the group */
-	tg->bytes_disp[rw] += bio_size;
+	if (!bio_flagged(bio, BIO_THROTTLED)) {
+		tg->bytes_disp[rw] += bio_size;
+		tg->last_bytes_disp[rw] += bio_size;
+	}
+
 	tg->io_disp[rw]++;
-	tg->last_bytes_disp[rw] += bio_size;
 	tg->last_io_disp[rw]++;
 
 	/*
diff --git a/block/blk-throttle.h b/block/blk-throttle.h
index 175f03abd9e4..cb43f4417d6e 100644
--- a/block/blk-throttle.h
+++ b/block/blk-throttle.h
@@ -170,8 +170,6 @@ static inline bool blk_throtl_bio(struct bio *bio)
 {
 	struct throtl_grp *tg = blkg_to_tg(bio->bi_blkg);
 
-	if (bio_flagged(bio, BIO_THROTTLED))
-		return false;
 	if (!tg->has_rules[bio_data_dir(bio)])
 		return false;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 10/43] memstick/mspro_block: fix handling of read-only devices
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (7 preceding siblings ...)
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 09/43] block: throttle split bio in case of iops limit Sasha Levin
@ 2022-03-28 11:17 ` Sasha Levin
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 11/43] block/bfq_wf2q: correct weight to ioprio Sasha Levin
                   ` (32 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:17 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Christoph Hellwig, Jens Axboe, Sasha Levin, maximlevitsky, oakad,
	ulf.hansson, baijiaju1990, chaitanya.kulkarni, mcgrof, linux-mmc

From: Christoph Hellwig <hch@lst.de>

[ Upstream commit 6dab421bfe06a59bf8f212a72e34673e8acf2018 ]

Use set_disk_ro to propagate the read-only state to the block layer
instead of checking for it in ->open and leaking a reference in case
of a read-only device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220215094514.3828912-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/memstick/core/mspro_block.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/memstick/core/mspro_block.c b/drivers/memstick/core/mspro_block.c
index c0450397b673..7ea312f0840e 100644
--- a/drivers/memstick/core/mspro_block.c
+++ b/drivers/memstick/core/mspro_block.c
@@ -186,13 +186,8 @@ static int mspro_block_bd_open(struct block_device *bdev, fmode_t mode)
 
 	mutex_lock(&mspro_block_disk_lock);
 
-	if (msb && msb->card) {
+	if (msb && msb->card)
 		msb->usage_count++;
-		if ((mode & FMODE_WRITE) && msb->read_only)
-			rc = -EROFS;
-		else
-			rc = 0;
-	}
 
 	mutex_unlock(&mspro_block_disk_lock);
 
@@ -1239,6 +1234,9 @@ static int mspro_block_init_disk(struct memstick_dev *card)
 	set_capacity(msb->disk, capacity);
 	dev_dbg(&card->dev, "capacity set %ld\n", capacity);
 
+	if (msb->read_only)
+		set_disk_ro(msb->disk, true);
+
 	rc = device_add_disk(&card->dev, msb->disk, NULL);
 	if (rc)
 		goto out_cleanup_disk;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 11/43] block/bfq_wf2q: correct weight to ioprio
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (8 preceding siblings ...)
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 10/43] memstick/mspro_block: fix handling of read-only devices Sasha Levin
@ 2022-03-28 11:17 ` Sasha Levin
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 12/43] crypto: xts - Add softdep on ecb Sasha Levin
                   ` (31 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:17 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Yahu Gao, Paolo Valente, Jens Axboe, Sasha Levin, linux-block

From: Yahu Gao <gaoyahu19@gmail.com>

[ Upstream commit bcd2be763252f3a4d5fc4d6008d4d96c601ee74b ]

The return value is ioprio * BFQ_WEIGHT_CONVERSION_COEFF or 0.
What we want is ioprio or 0.
Correct this by changing the calculation.

Signed-off-by: Yahu Gao <gaoyahu19@gmail.com>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Link: https://lore.kernel.org/r/20220107065859.25689-1-gaoyahu19@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 block/bfq-wf2q.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/bfq-wf2q.c b/block/bfq-wf2q.c
index b74cc0da118e..709b901de3ca 100644
--- a/block/bfq-wf2q.c
+++ b/block/bfq-wf2q.c
@@ -519,7 +519,7 @@ unsigned short bfq_ioprio_to_weight(int ioprio)
 static unsigned short bfq_weight_to_ioprio(int weight)
 {
 	return max_t(int, 0,
-		     IOPRIO_NR_LEVELS * BFQ_WEIGHT_CONVERSION_COEFF - weight);
+		     IOPRIO_NR_LEVELS - weight / BFQ_WEIGHT_CONVERSION_COEFF);
 }
 
 static void bfq_get_entity(struct bfq_entity *entity)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 12/43] crypto: xts - Add softdep on ecb
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (9 preceding siblings ...)
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 11/43] block/bfq_wf2q: correct weight to ioprio Sasha Levin
@ 2022-03-28 11:17 ` Sasha Levin
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 13/43] crypto: hisilicon/sec - not need to enable sm4 extra mode at HW V3 Sasha Levin
                   ` (30 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:17 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Herbert Xu, rftc, Sasha Levin, davem, linux-crypto

From: Herbert Xu <herbert@gondor.apana.org.au>

[ Upstream commit dfe085d8dcd0bb1fe20cc2327e81c8064cead441 ]

The xts module needs ecb to be present as it's meant to work
on top of ecb.  This patch adds a softdep so ecb can be included
automatically into the initramfs.

Reported-by: rftc <rftc@gmx.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 crypto/xts.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/crypto/xts.c b/crypto/xts.c
index 6c12f30dbdd6..63c85b9e64e0 100644
--- a/crypto/xts.c
+++ b/crypto/xts.c
@@ -466,3 +466,4 @@ MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("XTS block cipher mode");
 MODULE_ALIAS_CRYPTO("xts");
 MODULE_IMPORT_NS(CRYPTO_INTERNAL);
+MODULE_SOFTDEP("pre: ecb");
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 13/43] crypto: hisilicon/sec - not need to enable sm4 extra mode at HW V3
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (10 preceding siblings ...)
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 12/43] crypto: xts - Add softdep on ecb Sasha Levin
@ 2022-03-28 11:17 ` Sasha Levin
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 14/43] block, bfq: don't move oom_bfqq Sasha Levin
                   ` (29 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:17 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Kai Ye, Herbert Xu, Sasha Levin, liulongfang, davem, linux-crypto

From: Kai Ye <yekai13@huawei.com>

[ Upstream commit f8a2652826444d13181061840b96a5d975d5b6c6 ]

It is not need to enable sm4 extra mode in at HW V3. Here is fix it.

Signed-off-by: Kai Ye <yekai13@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/crypto/hisilicon/sec2/sec_main.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/hisilicon/sec2/sec_main.c b/drivers/crypto/hisilicon/sec2/sec_main.c
index 26d3ab1d308b..89d4cc767d36 100644
--- a/drivers/crypto/hisilicon/sec2/sec_main.c
+++ b/drivers/crypto/hisilicon/sec2/sec_main.c
@@ -443,9 +443,11 @@ static int sec_engine_init(struct hisi_qm *qm)
 
 	writel(SEC_SAA_ENABLE, qm->io_base + SEC_SAA_EN_REG);
 
-	/* Enable sm4 extra mode, as ctr/ecb */
-	writel_relaxed(SEC_BD_ERR_CHK_EN0,
-		       qm->io_base + SEC_BD_ERR_CHK_EN_REG0);
+	/* HW V2 enable sm4 extra mode, as ctr/ecb */
+	if (qm->ver < QM_HW_V3)
+		writel_relaxed(SEC_BD_ERR_CHK_EN0,
+			       qm->io_base + SEC_BD_ERR_CHK_EN_REG0);
+
 	/* Enable sm4 xts mode multiple iv */
 	writel_relaxed(SEC_BD_ERR_CHK_EN1,
 		       qm->io_base + SEC_BD_ERR_CHK_EN_REG1);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 14/43] block, bfq: don't move oom_bfqq
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (11 preceding siblings ...)
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 13/43] crypto: hisilicon/sec - not need to enable sm4 extra mode at HW V3 Sasha Levin
@ 2022-03-28 11:17 ` Sasha Levin
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 15/43] selinux: use correct type for context length Sasha Levin
                   ` (28 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:17 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Yu Kuai, Jan Kara, Paolo Valente, Jens Axboe, Sasha Levin, tj,
	cgroups, linux-block

From: Yu Kuai <yukuai3@huawei.com>

[ Upstream commit 8410f70977734f21b8ed45c37e925d311dfda2e7 ]

Our test report a UAF:

[ 2073.019181] ==================================================================
[ 2073.019188] BUG: KASAN: use-after-free in __bfq_put_async_bfqq+0xa0/0x168
[ 2073.019191] Write of size 8 at addr ffff8000ccf64128 by task rmmod/72584
[ 2073.019192]
[ 2073.019196] CPU: 0 PID: 72584 Comm: rmmod Kdump: loaded Not tainted 4.19.90-yk #5
[ 2073.019198] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[ 2073.019200] Call trace:
[ 2073.019203]  dump_backtrace+0x0/0x310
[ 2073.019206]  show_stack+0x28/0x38
[ 2073.019210]  dump_stack+0xec/0x15c
[ 2073.019216]  print_address_description+0x68/0x2d0
[ 2073.019220]  kasan_report+0x238/0x2f0
[ 2073.019224]  __asan_store8+0x88/0xb0
[ 2073.019229]  __bfq_put_async_bfqq+0xa0/0x168
[ 2073.019233]  bfq_put_async_queues+0xbc/0x208
[ 2073.019236]  bfq_pd_offline+0x178/0x238
[ 2073.019240]  blkcg_deactivate_policy+0x1f0/0x420
[ 2073.019244]  bfq_exit_queue+0x128/0x178
[ 2073.019249]  blk_mq_exit_sched+0x12c/0x160
[ 2073.019252]  elevator_exit+0xc8/0xd0
[ 2073.019256]  blk_exit_queue+0x50/0x88
[ 2073.019259]  blk_cleanup_queue+0x228/0x3d8
[ 2073.019267]  null_del_dev+0xfc/0x1e0 [null_blk]
[ 2073.019274]  null_exit+0x90/0x114 [null_blk]
[ 2073.019278]  __arm64_sys_delete_module+0x358/0x5a0
[ 2073.019282]  el0_svc_common+0xc8/0x320
[ 2073.019287]  el0_svc_handler+0xf8/0x160
[ 2073.019290]  el0_svc+0x10/0x218
[ 2073.019291]
[ 2073.019294] Allocated by task 14163:
[ 2073.019301]  kasan_kmalloc+0xe0/0x190
[ 2073.019305]  kmem_cache_alloc_node_trace+0x1cc/0x418
[ 2073.019308]  bfq_pd_alloc+0x54/0x118
[ 2073.019313]  blkcg_activate_policy+0x250/0x460
[ 2073.019317]  bfq_create_group_hierarchy+0x38/0x110
[ 2073.019321]  bfq_init_queue+0x6d0/0x948
[ 2073.019325]  blk_mq_init_sched+0x1d8/0x390
[ 2073.019330]  elevator_switch_mq+0x88/0x170
[ 2073.019334]  elevator_switch+0x140/0x270
[ 2073.019338]  elv_iosched_store+0x1a4/0x2a0
[ 2073.019342]  queue_attr_store+0x90/0xe0
[ 2073.019348]  sysfs_kf_write+0xa8/0xe8
[ 2073.019351]  kernfs_fop_write+0x1f8/0x378
[ 2073.019359]  __vfs_write+0xe0/0x360
[ 2073.019363]  vfs_write+0xf0/0x270
[ 2073.019367]  ksys_write+0xdc/0x1b8
[ 2073.019371]  __arm64_sys_write+0x50/0x60
[ 2073.019375]  el0_svc_common+0xc8/0x320
[ 2073.019380]  el0_svc_handler+0xf8/0x160
[ 2073.019383]  el0_svc+0x10/0x218
[ 2073.019385]
[ 2073.019387] Freed by task 72584:
[ 2073.019391]  __kasan_slab_free+0x120/0x228
[ 2073.019394]  kasan_slab_free+0x10/0x18
[ 2073.019397]  kfree+0x94/0x368
[ 2073.019400]  bfqg_put+0x64/0xb0
[ 2073.019404]  bfqg_and_blkg_put+0x90/0xb0
[ 2073.019408]  bfq_put_queue+0x220/0x228
[ 2073.019413]  __bfq_put_async_bfqq+0x98/0x168
[ 2073.019416]  bfq_put_async_queues+0xbc/0x208
[ 2073.019420]  bfq_pd_offline+0x178/0x238
[ 2073.019424]  blkcg_deactivate_policy+0x1f0/0x420
[ 2073.019429]  bfq_exit_queue+0x128/0x178
[ 2073.019433]  blk_mq_exit_sched+0x12c/0x160
[ 2073.019437]  elevator_exit+0xc8/0xd0
[ 2073.019440]  blk_exit_queue+0x50/0x88
[ 2073.019443]  blk_cleanup_queue+0x228/0x3d8
[ 2073.019451]  null_del_dev+0xfc/0x1e0 [null_blk]
[ 2073.019459]  null_exit+0x90/0x114 [null_blk]
[ 2073.019462]  __arm64_sys_delete_module+0x358/0x5a0
[ 2073.019467]  el0_svc_common+0xc8/0x320
[ 2073.019471]  el0_svc_handler+0xf8/0x160
[ 2073.019474]  el0_svc+0x10/0x218
[ 2073.019475]
[ 2073.019479] The buggy address belongs to the object at ffff8000ccf63f00
 which belongs to the cache kmalloc-1024 of size 1024
[ 2073.019484] The buggy address is located 552 bytes inside of
 1024-byte region [ffff8000ccf63f00, ffff8000ccf64300)
[ 2073.019486] The buggy address belongs to the page:
[ 2073.019492] page:ffff7e000333d800 count:1 mapcount:0 mapping:ffff8000c0003a00 index:0x0 compound_mapcount: 0
[ 2073.020123] flags: 0x7ffff0000008100(slab|head)
[ 2073.020403] raw: 07ffff0000008100 ffff7e0003334c08 ffff7e00001f5a08 ffff8000c0003a00
[ 2073.020409] raw: 0000000000000000 00000000001c001c 00000001ffffffff 0000000000000000
[ 2073.020411] page dumped because: kasan: bad access detected
[ 2073.020412]
[ 2073.020414] Memory state around the buggy address:
[ 2073.020420]  ffff8000ccf64000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2073.020424]  ffff8000ccf64080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2073.020428] >ffff8000ccf64100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2073.020430]                                   ^
[ 2073.020434]  ffff8000ccf64180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2073.020438]  ffff8000ccf64200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 2073.020439] ==================================================================

The same problem exist in mainline as well.

This is because oom_bfqq is moved to a non-root group, thus root_group
is freed earlier.

Thus fix the problem by don't move oom_bfqq.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Link: https://lore.kernel.org/r/20220129015924.3958918-4-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 block/bfq-cgroup.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 24a5c5329bcd..809bc612d96b 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -646,6 +646,12 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 {
 	struct bfq_entity *entity = &bfqq->entity;
 
+	/*
+	 * oom_bfqq is not allowed to move, oom_bfqq will hold ref to root_group
+	 * until elevator exit.
+	 */
+	if (bfqq == &bfqd->oom_bfqq)
+		return;
 	/*
 	 * Get extra reference to prevent bfqq from being freed in
 	 * next possible expire or deactivate.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 15/43] selinux: use correct type for context length
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (12 preceding siblings ...)
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 14/43] block, bfq: don't move oom_bfqq Sasha Levin
@ 2022-03-28 11:17 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction Sasha Levin
                   ` (27 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:17 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Christian Göttsche, Paul Moore, Sasha Levin,
	stephen.smalley.work, eparis, selinux

From: Christian Göttsche <cgzones@googlemail.com>

[ Upstream commit b97df7c098c531010e445da88d02b7bf7bf59ef6 ]

security_sid_to_context() expects a pointer to an u32 as the address
where to store the length of the computed context.

Reported by sparse:

    security/selinux/xfrm.c:359:39: warning: incorrect type in arg 4
                                    (different signedness)
    security/selinux/xfrm.c:359:39:    expected unsigned int
                                       [usertype] *scontext_len
    security/selinux/xfrm.c:359:39:    got int *

Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
[PM: wrapped commit description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 security/selinux/xfrm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/selinux/xfrm.c b/security/selinux/xfrm.c
index 90697317895f..c576832febc6 100644
--- a/security/selinux/xfrm.c
+++ b/security/selinux/xfrm.c
@@ -347,7 +347,7 @@ int selinux_xfrm_state_alloc_acquire(struct xfrm_state *x,
 	int rc;
 	struct xfrm_sec_ctx *ctx;
 	char *ctx_str = NULL;
-	int str_len;
+	u32 str_len;
 
 	if (!polsec)
 		return 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (13 preceding siblings ...)
  2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 15/43] selinux: use correct type for context length Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 18:08   ` Eric Biggers
                     ` (2 more replies)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 17/43] random: remove batched entropy locking Sasha Levin
                   ` (26 subsequent siblings)
  41 siblings, 3 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jason A. Donenfeld, Theodore Ts'o, Dominik Brodowski,
	Eric Biggers, Greg Kroah-Hartman, Jean-Philippe Aumasson,
	Sasha Levin

From: "Jason A. Donenfeld" <Jason@zx2c4.com>

[ Upstream commit 6e8ec2552c7d13991148e551e3325a624d73fac6 ]

The current 4096-bit LFSR used for entropy collection had a few
desirable attributes for the context in which it was created. For
example, the state was huge, which meant that /dev/random would be able
to output quite a bit of accumulated entropy before blocking. It was
also, in its time, quite fast at accumulating entropy byte-by-byte,
which matters given the varying contexts in which mix_pool_bytes() is
called. And its diffusion was relatively high, which meant that changes
would ripple across several words of state rather quickly.

However, it also suffers from a few security vulnerabilities. In
particular, inputs learned by an attacker can be undone, but moreover,
if the state of the pool leaks, its contents can be controlled and
entirely zeroed out. I've demonstrated this attack with this SMT2
script, <https://xn--4db.cc/5o9xO8pb>, which Boolector/CaDiCal solves in
a matter of seconds on a single core of my laptop, resulting in little
proof of concept C demonstrators such as <https://xn--4db.cc/jCkvvIaH/c>.

For basically all recent formal models of RNGs, these attacks represent
a significant cryptographic flaw. But how does this manifest
practically? If an attacker has access to the system to such a degree
that he can learn the internal state of the RNG, arguably there are
other lower hanging vulnerabilities -- side-channel, infoleak, or
otherwise -- that might have higher priority. On the other hand, seed
files are frequently used on systems that have a hard time generating
much entropy on their own, and these seed files, being files, often leak
or are duplicated and distributed accidentally, or are even seeded over
the Internet intentionally, where their contents might be recorded or
tampered with. Seen this way, an otherwise quasi-implausible
vulnerability is a bit more practical than initially thought.

Another aspect of the current mix_pool_bytes() function is that, while
its performance was arguably competitive for the time in which it was
created, it's no longer considered so. This patch improves performance
significantly: on a high-end CPU, an i7-11850H, it improves performance
of mix_pool_bytes() by 225%, and on a low-end CPU, a Cortex-A7, it
improves performance by 103%.

This commit replaces the LFSR of mix_pool_bytes() with a straight-
forward cryptographic hash function, BLAKE2s, which is already in use
for pool extraction. Universal hashing with a secret seed was considered
too, something along the lines of <https://eprint.iacr.org/2013/338>,
but the requirement for a secret seed makes for a chicken & egg problem.
Instead we go with a formally proven scheme using a computational hash
function, described in sections 5.1, 6.4, and B.1.8 of
<https://eprint.iacr.org/2019/198>.

BLAKE2s outputs 256 bits, which should give us an appropriate amount of
min-entropy accumulation, and a wide enough margin of collision
resistance against active attacks. mix_pool_bytes() becomes a simple
call to blake2s_update(), for accumulation, while the extraction step
becomes a blake2s_final() to generate a seed, with which we can then do
a HKDF-like or BLAKE2X-like expansion, the first part of which we fold
back as an init key for subsequent blake2s_update()s, and the rest we
produce to the caller. This then is provided to our CRNG like usual. In
that expansion step, we make opportunistic use of 32 bytes of RDRAND
output, just as before. We also always reseed the crng with 32 bytes,
unconditionally, or not at all, rather than sometimes with 16 as before,
as we don't win anything by limiting beyond the 16 byte threshold.

Going for a hash function as an entropy collector is a conservative,
proven approach. The result of all this is a much simpler and much less
bespoke construction than what's there now, which not only plugs a
vulnerability but also improves performance considerably.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/char/random.c | 304 ++++++++----------------------------------
 1 file changed, 55 insertions(+), 249 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 3404a91edf29..882f78829a24 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -42,61 +42,6 @@
  */
 
 /*
- * (now, with legal B.S. out of the way.....)
- *
- * This routine gathers environmental noise from device drivers, etc.,
- * and returns good random numbers, suitable for cryptographic use.
- * Besides the obvious cryptographic uses, these numbers are also good
- * for seeding TCP sequence numbers, and other places where it is
- * desirable to have numbers which are not only random, but hard to
- * predict by an attacker.
- *
- * Theory of operation
- * ===================
- *
- * Computers are very predictable devices.  Hence it is extremely hard
- * to produce truly random numbers on a computer --- as opposed to
- * pseudo-random numbers, which can easily generated by using a
- * algorithm.  Unfortunately, it is very easy for attackers to guess
- * the sequence of pseudo-random number generators, and for some
- * applications this is not acceptable.  So instead, we must try to
- * gather "environmental noise" from the computer's environment, which
- * must be hard for outside attackers to observe, and use that to
- * generate random numbers.  In a Unix environment, this is best done
- * from inside the kernel.
- *
- * Sources of randomness from the environment include inter-keyboard
- * timings, inter-interrupt timings from some interrupts, and other
- * events which are both (a) non-deterministic and (b) hard for an
- * outside observer to measure.  Randomness from these sources are
- * added to an "entropy pool", which is mixed using a CRC-like function.
- * This is not cryptographically strong, but it is adequate assuming
- * the randomness is not chosen maliciously, and it is fast enough that
- * the overhead of doing it on every interrupt is very reasonable.
- * As random bytes are mixed into the entropy pool, the routines keep
- * an *estimate* of how many bits of randomness have been stored into
- * the random number generator's internal state.
- *
- * When random bytes are desired, they are obtained by taking the BLAKE2s
- * hash of the contents of the "entropy pool".  The BLAKE2s hash avoids
- * exposing the internal state of the entropy pool.  It is believed to
- * be computationally infeasible to derive any useful information
- * about the input of BLAKE2s from its output.  Even if it is possible to
- * analyze BLAKE2s in some clever way, as long as the amount of data
- * returned from the generator is less than the inherent entropy in
- * the pool, the output data is totally unpredictable.  For this
- * reason, the routine decreases its internal estimate of how many
- * bits of "true randomness" are contained in the entropy pool as it
- * outputs random numbers.
- *
- * If this estimate goes to zero, the routine can still generate
- * random numbers; however, an attacker may (at least in theory) be
- * able to infer the future output of the generator from prior
- * outputs.  This requires successful cryptanalysis of BLAKE2s, which is
- * not believed to be feasible, but there is a remote possibility.
- * Nonetheless, these numbers should be useful for the vast majority
- * of purposes.
- *
  * Exported interfaces ---- output
  * ===============================
  *
@@ -298,23 +243,6 @@
  *
  *	mknod /dev/random c 1 8
  *	mknod /dev/urandom c 1 9
- *
- * Acknowledgements:
- * =================
- *
- * Ideas for constructing this random number generator were derived
- * from Pretty Good Privacy's random number generator, and from private
- * discussions with Phil Karn.  Colin Plumb provided a faster random
- * number generator, which speed up the mixing function of the entropy
- * pool, taken from PGPfone.  Dale Worley has also contributed many
- * useful ideas and suggestions to improve this driver.
- *
- * Any flaws in the design are solely my responsibility, and should
- * not be attributed to the Phil, Colin, or any of authors of PGP.
- *
- * Further background information on this topic may be obtained from
- * RFC 1750, "Randomness Recommendations for Security", by Donald
- * Eastlake, Steve Crocker, and Jeff Schiller.
  */
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
@@ -358,79 +286,15 @@
 
 /* #define ADD_INTERRUPT_BENCH */
 
-/*
- * If the entropy count falls under this number of bits, then we
- * should wake up processes which are selecting or polling on write
- * access to /dev/random.
- */
-static int random_write_wakeup_bits = 28 * (1 << 5);
-
-/*
- * Originally, we used a primitive polynomial of degree .poolwords
- * over GF(2).  The taps for various sizes are defined below.  They
- * were chosen to be evenly spaced except for the last tap, which is 1
- * to get the twisting happening as fast as possible.
- *
- * For the purposes of better mixing, we use the CRC-32 polynomial as
- * well to make a (modified) twisted Generalized Feedback Shift
- * Register.  (See M. Matsumoto & Y. Kurita, 1992.  Twisted GFSR
- * generators.  ACM Transactions on Modeling and Computer Simulation
- * 2(3):179-194.  Also see M. Matsumoto & Y. Kurita, 1994.  Twisted
- * GFSR generators II.  ACM Transactions on Modeling and Computer
- * Simulation 4:254-266)
- *
- * Thanks to Colin Plumb for suggesting this.
- *
- * The mixing operation is much less sensitive than the output hash,
- * where we use BLAKE2s.  All that we want of mixing operation is that
- * it be a good non-cryptographic hash; i.e. it not produce collisions
- * when fed "random" data of the sort we expect to see.  As long as
- * the pool state differs for different inputs, we have preserved the
- * input entropy and done a good job.  The fact that an intelligent
- * attacker can construct inputs that will produce controlled
- * alterations to the pool's state is not important because we don't
- * consider such inputs to contribute any randomness.  The only
- * property we need with respect to them is that the attacker can't
- * increase his/her knowledge of the pool's state.  Since all
- * additions are reversible (knowing the final state and the input,
- * you can reconstruct the initial state), if an attacker has any
- * uncertainty about the initial state, he/she can only shuffle that
- * uncertainty about, but never cause any collisions (which would
- * decrease the uncertainty).
- *
- * Our mixing functions were analyzed by Lacharme, Roeck, Strubel, and
- * Videau in their paper, "The Linux Pseudorandom Number Generator
- * Revisited" (see: http://eprint.iacr.org/2012/251.pdf).  In their
- * paper, they point out that we are not using a true Twisted GFSR,
- * since Matsumoto & Kurita used a trinomial feedback polynomial (that
- * is, with only three taps, instead of the six that we are using).
- * As a result, the resulting polynomial is neither primitive nor
- * irreducible, and hence does not have a maximal period over
- * GF(2**32).  They suggest a slight change to the generator
- * polynomial which improves the resulting TGFSR polynomial to be
- * irreducible, which we have made here.
- */
 enum poolinfo {
-	POOL_WORDS = 128,
-	POOL_WORDMASK = POOL_WORDS - 1,
-	POOL_BYTES = POOL_WORDS * sizeof(u32),
-	POOL_BITS = POOL_BYTES * 8,
+	POOL_BITS = BLAKE2S_HASH_SIZE * 8,
 	POOL_BITSHIFT = ilog2(POOL_BITS),
 
 	/* To allow fractional bits to be tracked, the entropy_count field is
 	 * denominated in units of 1/8th bits. */
 	POOL_ENTROPY_SHIFT = 3,
 #define POOL_ENTROPY_BITS() (input_pool.entropy_count >> POOL_ENTROPY_SHIFT)
-	POOL_FRACBITS = POOL_BITS << POOL_ENTROPY_SHIFT,
-
-	/* x^128 + x^104 + x^76 + x^51 +x^25 + x + 1 */
-	POOL_TAP1 = 104,
-	POOL_TAP2 = 76,
-	POOL_TAP3 = 51,
-	POOL_TAP4 = 25,
-	POOL_TAP5 = 1,
-
-	EXTRACT_SIZE = BLAKE2S_HASH_SIZE / 2
+	POOL_FRACBITS = POOL_BITS << POOL_ENTROPY_SHIFT
 };
 
 /*
@@ -438,6 +302,12 @@ enum poolinfo {
  */
 static DECLARE_WAIT_QUEUE_HEAD(random_write_wait);
 static struct fasync_struct *fasync;
+/*
+ * If the entropy count falls under this number of bits, then we
+ * should wake up processes which are selecting or polling on write
+ * access to /dev/random.
+ */
+static int random_write_wakeup_bits = POOL_BITS * 3 / 4;
 
 static DEFINE_SPINLOCK(random_ready_list_lock);
 static LIST_HEAD(random_ready_list);
@@ -493,73 +363,31 @@ MODULE_PARM_DESC(ratelimit_disable, "Disable random ratelimit suppression");
  *
  **********************************************************************/
 
-static u32 input_pool_data[POOL_WORDS] __latent_entropy;
-
 static struct {
+	struct blake2s_state hash;
 	spinlock_t lock;
-	u16 add_ptr;
-	u16 input_rotate;
 	int entropy_count;
 } input_pool = {
+	.hash.h = { BLAKE2S_IV0 ^ (0x01010000 | BLAKE2S_HASH_SIZE),
+		    BLAKE2S_IV1, BLAKE2S_IV2, BLAKE2S_IV3, BLAKE2S_IV4,
+		    BLAKE2S_IV5, BLAKE2S_IV6, BLAKE2S_IV7 },
+	.hash.outlen = BLAKE2S_HASH_SIZE,
 	.lock = __SPIN_LOCK_UNLOCKED(input_pool.lock),
 };
 
-static ssize_t extract_entropy(void *buf, size_t nbytes, int min);
-static ssize_t _extract_entropy(void *buf, size_t nbytes);
+static bool extract_entropy(void *buf, size_t nbytes, int min);
+static void _extract_entropy(void *buf, size_t nbytes);
 
 static void crng_reseed(struct crng_state *crng, bool use_input_pool);
 
-static const u32 twist_table[8] = {
-	0x00000000, 0x3b6e20c8, 0x76dc4190, 0x4db26158,
-	0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 };
-
 /*
  * This function adds bytes into the entropy "pool".  It does not
  * update the entropy estimate.  The caller should call
  * credit_entropy_bits if this is appropriate.
- *
- * The pool is stirred with a primitive polynomial of the appropriate
- * degree, and then twisted.  We twist by three bits at a time because
- * it's cheap to do so and helps slightly in the expected case where
- * the entropy is concentrated in the low-order bits.
  */
 static void _mix_pool_bytes(const void *in, int nbytes)
 {
-	unsigned long i;
-	int input_rotate;
-	const u8 *bytes = in;
-	u32 w;
-
-	input_rotate = input_pool.input_rotate;
-	i = input_pool.add_ptr;
-
-	/* mix one byte at a time to simplify size handling and churn faster */
-	while (nbytes--) {
-		w = rol32(*bytes++, input_rotate);
-		i = (i - 1) & POOL_WORDMASK;
-
-		/* XOR in the various taps */
-		w ^= input_pool_data[i];
-		w ^= input_pool_data[(i + POOL_TAP1) & POOL_WORDMASK];
-		w ^= input_pool_data[(i + POOL_TAP2) & POOL_WORDMASK];
-		w ^= input_pool_data[(i + POOL_TAP3) & POOL_WORDMASK];
-		w ^= input_pool_data[(i + POOL_TAP4) & POOL_WORDMASK];
-		w ^= input_pool_data[(i + POOL_TAP5) & POOL_WORDMASK];
-
-		/* Mix the result back in with a twist */
-		input_pool_data[i] = (w >> 3) ^ twist_table[w & 7];
-
-		/*
-		 * Normally, we add 7 bits of rotation to the pool.
-		 * At the beginning of the pool, add an extra 7 bits
-		 * rotation, so that successive passes spread the
-		 * input bits across the pool evenly.
-		 */
-		input_rotate = (input_rotate + (i ? 7 : 14)) & 31;
-	}
-
-	input_pool.input_rotate = input_rotate;
-	input_pool.add_ptr = i;
+	blake2s_update(&input_pool.hash, in, nbytes);
 }
 
 static void __mix_pool_bytes(const void *in, int nbytes)
@@ -953,15 +781,14 @@ static int crng_slow_load(const u8 *cp, size_t len)
 static void crng_reseed(struct crng_state *crng, bool use_input_pool)
 {
 	unsigned long flags;
-	int i, num;
+	int i;
 	union {
 		u8 block[CHACHA_BLOCK_SIZE];
 		u32 key[8];
 	} buf;
 
 	if (use_input_pool) {
-		num = extract_entropy(&buf, 32, 16);
-		if (num == 0)
+		if (!extract_entropy(&buf, 32, 16))
 			return;
 	} else {
 		_extract_crng(&primary_crng, buf.block);
@@ -1329,74 +1156,48 @@ static size_t account(size_t nbytes, int min)
 }
 
 /*
- * This function does the actual extraction for extract_entropy.
- *
- * Note: we assume that .poolwords is a multiple of 16 words.
+ * This is an HKDF-like construction for using the hashed collected entropy
+ * as a PRF key, that's then expanded block-by-block.
  */
-static void extract_buf(u8 *out)
+static void _extract_entropy(void *buf, size_t nbytes)
 {
-	struct blake2s_state state __aligned(__alignof__(unsigned long));
-	u8 hash[BLAKE2S_HASH_SIZE];
-	unsigned long *salt;
 	unsigned long flags;
-
-	blake2s_init(&state, sizeof(hash));
-
-	/*
-	 * If we have an architectural hardware random number
-	 * generator, use it for BLAKE2's salt & personal fields.
-	 */
-	for (salt = (unsigned long *)&state.h[4];
-	     salt < (unsigned long *)&state.h[8]; ++salt) {
-		unsigned long v;
-		if (!arch_get_random_long(&v))
-			break;
-		*salt ^= v;
+	u8 seed[BLAKE2S_HASH_SIZE], next_key[BLAKE2S_HASH_SIZE];
+	struct {
+		unsigned long rdrand[32 / sizeof(long)];
+		size_t counter;
+	} block;
+	size_t i;
+
+	for (i = 0; i < ARRAY_SIZE(block.rdrand); ++i) {
+		if (!arch_get_random_long(&block.rdrand[i]))
+			block.rdrand[i] = random_get_entropy();
 	}
 
-	/* Generate a hash across the pool */
 	spin_lock_irqsave(&input_pool.lock, flags);
-	blake2s_update(&state, (const u8 *)input_pool_data, POOL_BYTES);
-	blake2s_final(&state, hash); /* final zeros out state */
 
-	/*
-	 * We mix the hash back into the pool to prevent backtracking
-	 * attacks (where the attacker knows the state of the pool
-	 * plus the current outputs, and attempts to find previous
-	 * outputs), unless the hash function can be inverted. By
-	 * mixing at least a hash worth of hash data back, we make
-	 * brute-forcing the feedback as hard as brute-forcing the
-	 * hash.
-	 */
-	__mix_pool_bytes(hash, sizeof(hash));
-	spin_unlock_irqrestore(&input_pool.lock, flags);
+	/* seed = HASHPRF(last_key, entropy_input) */
+	blake2s_final(&input_pool.hash, seed);
 
-	/* Note that EXTRACT_SIZE is half of hash size here, because above
-	 * we've dumped the full length back into mixer. By reducing the
-	 * amount that we emit, we retain a level of forward secrecy.
-	 */
-	memcpy(out, hash, EXTRACT_SIZE);
-	memzero_explicit(hash, sizeof(hash));
-}
+	/* next_key = HASHPRF(seed, RDRAND || 0) */
+	block.counter = 0;
+	blake2s(next_key, (u8 *)&block, seed, sizeof(next_key), sizeof(block), sizeof(seed));
+	blake2s_init_key(&input_pool.hash, BLAKE2S_HASH_SIZE, next_key, sizeof(next_key));
 
-static ssize_t _extract_entropy(void *buf, size_t nbytes)
-{
-	ssize_t ret = 0, i;
-	u8 tmp[EXTRACT_SIZE];
+	spin_unlock_irqrestore(&input_pool.lock, flags);
+	memzero_explicit(next_key, sizeof(next_key));
 
 	while (nbytes) {
-		extract_buf(tmp);
-		i = min_t(int, nbytes, EXTRACT_SIZE);
-		memcpy(buf, tmp, i);
+		i = min_t(size_t, nbytes, BLAKE2S_HASH_SIZE);
+		/* output = HASHPRF(seed, RDRAND || ++counter) */
+		++block.counter;
+		blake2s(buf, (u8 *)&block, seed, i, sizeof(block), sizeof(seed));
 		nbytes -= i;
 		buf += i;
-		ret += i;
 	}
 
-	/* Wipe data just returned from memory */
-	memzero_explicit(tmp, sizeof(tmp));
-
-	return ret;
+	memzero_explicit(seed, sizeof(seed));
+	memzero_explicit(&block, sizeof(block));
 }
 
 /*
@@ -1404,13 +1205,18 @@ static ssize_t _extract_entropy(void *buf, size_t nbytes)
  * returns it in a buffer.
  *
  * The min parameter specifies the minimum amount we can pull before
- * failing to avoid races that defeat catastrophic reseeding.
+ * failing to avoid races that defeat catastrophic reseeding. If we
+ * have less than min entropy available, we return false and buf is
+ * not filled.
  */
-static ssize_t extract_entropy(void *buf, size_t nbytes, int min)
+static bool extract_entropy(void *buf, size_t nbytes, int min)
 {
 	trace_extract_entropy(nbytes, POOL_ENTROPY_BITS(), _RET_IP_);
-	nbytes = account(nbytes, min);
-	return _extract_entropy(buf, nbytes);
+	if (account(nbytes, min)) {
+		_extract_entropy(buf, nbytes);
+		return true;
+	}
+	return false;
 }
 
 #define warn_unseeded_randomness(previous) \
@@ -1674,7 +1480,7 @@ static void __init init_std_data(void)
 	unsigned long rv;
 
 	mix_pool_bytes(&now, sizeof(now));
-	for (i = POOL_BYTES; i > 0; i -= sizeof(rv)) {
+	for (i = BLAKE2S_BLOCK_SIZE; i > 0; i -= sizeof(rv)) {
 		if (!arch_get_random_seed_long(&rv) &&
 		    !arch_get_random_long(&rv))
 			rv = random_get_entropy();
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 17/43] random: remove batched entropy locking
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (14 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 18/43] random: absorb fast pool into input pool after fast load Sasha Levin
                   ` (25 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jason A. Donenfeld, Sebastian Andrzej Siewior, Dominik Brodowski,
	Eric Biggers, Andy Lutomirski, Jonathan Neuschäfer,
	Sasha Levin, tytso

From: "Jason A. Donenfeld" <Jason@zx2c4.com>

[ Upstream commit 77760fd7f7ae3dfd03668204e708d1568d75447d ]

Rather than use spinlocks to protect batched entropy, we can instead
disable interrupts locally, since we're dealing with per-cpu data, and
manage resets with a basic generation counter. At the same time, we
can't quite do this on PREEMPT_RT, where we still want spinlocks-as-
mutexes semantics. So we use a local_lock_t, which provides the right
behavior for each. Because this is a per-cpu lock, that generation
counter is still doing the necessary CPU-to-CPU communication.

This should improve performance a bit. It will also fix the linked splat
that Jonathan received with a PROVE_RAW_LOCK_NESTING=y.

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Reported-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Tested-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Link: https://lore.kernel.org/lkml/YfMa0QgsjCVdRAvJ@latitude/
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/char/random.c | 55 ++++++++++++++++++++++---------------------
 1 file changed, 28 insertions(+), 27 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 882f78829a24..34ee34b30993 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1876,13 +1876,16 @@ static int __init random_sysctls_init(void)
 device_initcall(random_sysctls_init);
 #endif	/* CONFIG_SYSCTL */
 
+static atomic_t batch_generation = ATOMIC_INIT(0);
+
 struct batched_entropy {
 	union {
 		u64 entropy_u64[CHACHA_BLOCK_SIZE / sizeof(u64)];
 		u32 entropy_u32[CHACHA_BLOCK_SIZE / sizeof(u32)];
 	};
+	local_lock_t lock;
 	unsigned int position;
-	spinlock_t batch_lock;
+	int generation;
 };
 
 /*
@@ -1894,7 +1897,7 @@ struct batched_entropy {
  * point prior.
  */
 static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64) = {
-	.batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u64.lock),
+	.lock = INIT_LOCAL_LOCK(batched_entropy_u64.lock)
 };
 
 u64 get_random_u64(void)
@@ -1903,67 +1906,65 @@ u64 get_random_u64(void)
 	unsigned long flags;
 	struct batched_entropy *batch;
 	static void *previous;
+	int next_gen;
 
 	warn_unseeded_randomness(&previous);
 
+	local_lock_irqsave(&batched_entropy_u64.lock, flags);
 	batch = raw_cpu_ptr(&batched_entropy_u64);
-	spin_lock_irqsave(&batch->batch_lock, flags);
-	if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) {
+
+	next_gen = atomic_read(&batch_generation);
+	if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0 ||
+	    next_gen != batch->generation) {
 		extract_crng((u8 *)batch->entropy_u64);
 		batch->position = 0;
+		batch->generation = next_gen;
 	}
+
 	ret = batch->entropy_u64[batch->position++];
-	spin_unlock_irqrestore(&batch->batch_lock, flags);
+	local_unlock_irqrestore(&batched_entropy_u64.lock, flags);
 	return ret;
 }
 EXPORT_SYMBOL(get_random_u64);
 
 static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u32) = {
-	.batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u32.lock),
+	.lock = INIT_LOCAL_LOCK(batched_entropy_u32.lock)
 };
+
 u32 get_random_u32(void)
 {
 	u32 ret;
 	unsigned long flags;
 	struct batched_entropy *batch;
 	static void *previous;
+	int next_gen;
 
 	warn_unseeded_randomness(&previous);
 
+	local_lock_irqsave(&batched_entropy_u32.lock, flags);
 	batch = raw_cpu_ptr(&batched_entropy_u32);
-	spin_lock_irqsave(&batch->batch_lock, flags);
-	if (batch->position % ARRAY_SIZE(batch->entropy_u32) == 0) {
+
+	next_gen = atomic_read(&batch_generation);
+	if (batch->position % ARRAY_SIZE(batch->entropy_u32) == 0 ||
+	    next_gen != batch->generation) {
 		extract_crng((u8 *)batch->entropy_u32);
 		batch->position = 0;
+		batch->generation = next_gen;
 	}
+
 	ret = batch->entropy_u32[batch->position++];
-	spin_unlock_irqrestore(&batch->batch_lock, flags);
+	local_unlock_irqrestore(&batched_entropy_u32.lock, flags);
 	return ret;
 }
 EXPORT_SYMBOL(get_random_u32);
 
 /* It's important to invalidate all potential batched entropy that might
  * be stored before the crng is initialized, which we can do lazily by
- * simply resetting the counter to zero so that it's re-extracted on the
- * next usage. */
+ * bumping the generation counter.
+ */
 static void invalidate_batched_entropy(void)
 {
-	int cpu;
-	unsigned long flags;
-
-	for_each_possible_cpu(cpu) {
-		struct batched_entropy *batched_entropy;
-
-		batched_entropy = per_cpu_ptr(&batched_entropy_u32, cpu);
-		spin_lock_irqsave(&batched_entropy->batch_lock, flags);
-		batched_entropy->position = 0;
-		spin_unlock(&batched_entropy->batch_lock);
-
-		batched_entropy = per_cpu_ptr(&batched_entropy_u64, cpu);
-		spin_lock(&batched_entropy->batch_lock);
-		batched_entropy->position = 0;
-		spin_unlock_irqrestore(&batched_entropy->batch_lock, flags);
-	}
+	atomic_inc(&batch_generation);
 }
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 18/43] random: absorb fast pool into input pool after fast load
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (15 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 17/43] random: remove batched entropy locking Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 19/43] powercap/dtpm_cpu: Reset per_cpu variable in the release function Sasha Levin
                   ` (24 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jason A. Donenfeld, Theodore Ts'o, Dominik Brodowski,
	Eric Biggers, Sasha Levin

From: "Jason A. Donenfeld" <Jason@zx2c4.com>

[ Upstream commit c30c575db4858f0bbe5e315ff2e529c782f33a1f ]

During crng_init == 0, we never credit entropy in add_interrupt_
randomness(), but instead dump it directly into the primary_crng. That's
fine, except for the fact that we then wind up throwing away that
entropy later when we switch to extracting from the input pool and
xoring into (and later in this series overwriting) the primary_crng key.
The two other early init sites -- add_hwgenerator_randomness()'s use
crng_fast_load() and add_device_ randomness()'s use of crng_slow_load()
-- always additionally give their inputs to the input pool. But not
add_interrupt_randomness().

This commit fixes that shortcoming by calling mix_pool_bytes() after
crng_fast_load() in add_interrupt_randomness(). That's partially
verboten on PREEMPT_RT, where it implies taking spinlock_t from an IRQ
handler. But this also only happens during early boot and then never
again after that. Plus it's a trylock so it has the same considerations
as calling crng_fast_load(), which we're already using.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Suggested-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/char/random.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 34ee34b30993..2f21c5473d86 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1075,6 +1075,10 @@ void add_interrupt_randomness(int irq)
 		    crng_fast_load((u8 *)fast_pool->pool, sizeof(fast_pool->pool)) > 0) {
 			fast_pool->count = 0;
 			fast_pool->last = now;
+			if (spin_trylock(&input_pool.lock)) {
+				_mix_pool_bytes(&fast_pool->pool, sizeof(fast_pool->pool));
+				spin_unlock(&input_pool.lock);
+			}
 		}
 		return;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 19/43] powercap/dtpm_cpu: Reset per_cpu variable in the release function
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (16 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 18/43] random: absorb fast pool into input pool after fast load Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 20/43] random: round-robin registers as ulong, not u32 Sasha Levin
                   ` (23 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Daniel Lezcano, Ulf Hansson, Sasha Levin, daniel.lezcano, rafael,
	linux-pm

From: Daniel Lezcano <daniel.lezcano@linaro.org>

[ Upstream commit 0aea2e4ec2a2bfa2d7e8820e37ba5b5ce04f20a5 ]

The release function does not reset the per cpu variable when it is
called. That will prevent creation again as the variable will be
already from the previous creation.

Fix it by resetting them.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20220130210210.549877-2-daniel.lezcano@linaro.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/powercap/dtpm_cpu.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c
index b740866b228d..1e8cac699646 100644
--- a/drivers/powercap/dtpm_cpu.c
+++ b/drivers/powercap/dtpm_cpu.c
@@ -150,10 +150,17 @@ static int update_pd_power_uw(struct dtpm *dtpm)
 static void pd_release(struct dtpm *dtpm)
 {
 	struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm);
+	struct cpufreq_policy *policy;
 
 	if (freq_qos_request_active(&dtpm_cpu->qos_req))
 		freq_qos_remove_request(&dtpm_cpu->qos_req);
 
+	policy = cpufreq_cpu_get(dtpm_cpu->cpu);
+	if (policy) {
+		for_each_cpu(dtpm_cpu->cpu, policy->related_cpus)
+			per_cpu(dtpm_per_cpu, dtpm_cpu->cpu) = NULL;
+	}
+	
 	kfree(dtpm_cpu);
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 20/43] random: round-robin registers as ulong, not u32
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (17 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 19/43] powercap/dtpm_cpu: Reset per_cpu variable in the release function Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 21/43] arm64: module: remove (NOLOAD) from linker script Sasha Levin
                   ` (22 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Jason A. Donenfeld, Theodore Ts'o, Dominik Brodowski, Sasha Levin

From: "Jason A. Donenfeld" <Jason@zx2c4.com>

[ Upstream commit da3951ebdcd1cb1d5c750e08cd05aee7b0c04d9a ]

When the interrupt handler does not have a valid cycle counter, it calls
get_reg() to read a register from the irq stack, in round-robin.
Currently it does this assuming that registers are 32-bit. This is
_probably_ the case, and probably all platforms without cycle counters
are in fact 32-bit platforms. But maybe not, and either way, it's not
quite correct. This commit fixes that to deal with `unsigned long`
rather than `u32`.

Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/char/random.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 2f21c5473d86..d2ce6b1a229d 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1032,15 +1032,15 @@ static void add_interrupt_bench(cycles_t start)
 #define add_interrupt_bench(x)
 #endif
 
-static u32 get_reg(struct fast_pool *f, struct pt_regs *regs)
+static unsigned long get_reg(struct fast_pool *f, struct pt_regs *regs)
 {
-	u32 *ptr = (u32 *)regs;
+	unsigned long *ptr = (unsigned long *)regs;
 	unsigned int idx;
 
 	if (regs == NULL)
 		return 0;
 	idx = READ_ONCE(f->reg_idx);
-	if (idx >= sizeof(struct pt_regs) / sizeof(u32))
+	if (idx >= sizeof(struct pt_regs) / sizeof(unsigned long))
 		idx = 0;
 	ptr += idx++;
 	WRITE_ONCE(f->reg_idx, idx);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 21/43] arm64: module: remove (NOLOAD) from linker script
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (18 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 20/43] random: round-robin registers as ulong, not u32 Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 22/43] selinux: allow FIOCLEX and FIONCLEX with policy capability Sasha Levin
                   ` (21 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Fangrui Song, Nathan Chancellor, Ard Biesheuvel, Will Deacon,
	Sasha Levin, catalin.marinas, ndesaulniers, mark.rutland, pcc,
	linux-arm-kernel, llvm

From: Fangrui Song <maskray@google.com>

[ Upstream commit 4013e26670c590944abdab56c4fa797527b74325 ]

On ELF, (NOLOAD) sets the section type to SHT_NOBITS[1]. It is conceptually
inappropriate for .plt and .text.* sections which are always
SHT_PROGBITS.

In GNU ld, if PLT entries are needed, .plt will be SHT_PROGBITS anyway
and (NOLOAD) will be essentially ignored. In ld.lld, since
https://reviews.llvm.org/D118840 ("[ELF] Support (TYPE=<value>) to
customize the output section type"), ld.lld will report a `section type
mismatch` error. Just remove (NOLOAD) to fix the error.

[1] https://lld.llvm.org/ELF/linker_script.html As of today, "The
section should be marked as not loadable" on
https://sourceware.org/binutils/docs/ld/Output-Section-Type.html is
outdated for ELF.

Tested-by: Nathan Chancellor <nathan@kernel.org>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Fangrui Song <maskray@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20220218081209.354383-1-maskray@google.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/arm64/include/asm/module.lds.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/module.lds.h b/arch/arm64/include/asm/module.lds.h
index a11ccadd47d2..094701ec5500 100644
--- a/arch/arm64/include/asm/module.lds.h
+++ b/arch/arm64/include/asm/module.lds.h
@@ -1,8 +1,8 @@
 SECTIONS {
 #ifdef CONFIG_ARM64_MODULE_PLTS
-	.plt 0 (NOLOAD) : { BYTE(0) }
-	.init.plt 0 (NOLOAD) : { BYTE(0) }
-	.text.ftrace_trampoline 0 (NOLOAD) : { BYTE(0) }
+	.plt 0 : { BYTE(0) }
+	.init.plt 0 : { BYTE(0) }
+	.text.ftrace_trampoline 0 : { BYTE(0) }
 #endif
 
 #ifdef CONFIG_KASAN_SW_TAGS
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 22/43] selinux: allow FIOCLEX and FIONCLEX with policy capability
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (19 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 21/43] arm64: module: remove (NOLOAD) from linker script Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 23/43] loop: use sysfs_emit() in the sysfs xxx show() Sasha Levin
                   ` (20 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Richard Haines, Demi Marie Obenour, Paul Moore, Sasha Levin,
	stephen.smalley.work, eparis, cgzones, ndesaulniers, selinux

From: Richard Haines <richard_c_haines@btinternet.com>

[ Upstream commit 65881e1db4e948614d9eb195b8e1197339822949 ]

These ioctls are equivalent to fcntl(fd, F_SETFD, flags), which SELinux
always allows too.  Furthermore, a failed FIOCLEX could result in a file
descriptor being leaked to a process that should not have access to it.

As this patch removes access controls, a policy capability needs to be
enabled in policy to always allow these ioctls.

Based-on-patch-by: Demi Marie Obenour <demiobenour@gmail.com>
Signed-off-by: Richard Haines <richard_c_haines@btinternet.com>
[PM: subject line tweak]
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 security/selinux/hooks.c                   | 6 ++++++
 security/selinux/include/policycap.h       | 1 +
 security/selinux/include/policycap_names.h | 3 ++-
 security/selinux/include/security.h        | 7 +++++++
 4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 371f67a37f9a..141220308307 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -3744,6 +3744,12 @@ static int selinux_file_ioctl(struct file *file, unsigned int cmd,
 					    CAP_OPT_NONE, true);
 		break;
 
+	case FIOCLEX:
+	case FIONCLEX:
+		if (!selinux_policycap_ioctl_skip_cloexec())
+			error = ioctl_has_perm(cred, file, FILE__IOCTL, (u16) cmd);
+		break;
+
 	/* default case assumes that the command will go
 	 * to the file's ioctl() function.
 	 */
diff --git a/security/selinux/include/policycap.h b/security/selinux/include/policycap.h
index 2ec038efbb03..a9e572ca4fd9 100644
--- a/security/selinux/include/policycap.h
+++ b/security/selinux/include/policycap.h
@@ -11,6 +11,7 @@ enum {
 	POLICYDB_CAPABILITY_CGROUPSECLABEL,
 	POLICYDB_CAPABILITY_NNP_NOSUID_TRANSITION,
 	POLICYDB_CAPABILITY_GENFS_SECLABEL_SYMLINKS,
+	POLICYDB_CAPABILITY_IOCTL_SKIP_CLOEXEC,
 	__POLICYDB_CAPABILITY_MAX
 };
 #define POLICYDB_CAPABILITY_MAX (__POLICYDB_CAPABILITY_MAX - 1)
diff --git a/security/selinux/include/policycap_names.h b/security/selinux/include/policycap_names.h
index b89289f092c9..ebd64afe1def 100644
--- a/security/selinux/include/policycap_names.h
+++ b/security/selinux/include/policycap_names.h
@@ -12,7 +12,8 @@ const char *selinux_policycap_names[__POLICYDB_CAPABILITY_MAX] = {
 	"always_check_network",
 	"cgroup_seclabel",
 	"nnp_nosuid_transition",
-	"genfs_seclabel_symlinks"
+	"genfs_seclabel_symlinks",
+	"ioctl_skip_cloexec"
 };
 
 #endif /* _SELINUX_POLICYCAP_NAMES_H_ */
diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
index ac0ece01305a..c0d966020ebd 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -219,6 +219,13 @@ static inline bool selinux_policycap_genfs_seclabel_symlinks(void)
 	return READ_ONCE(state->policycap[POLICYDB_CAPABILITY_GENFS_SECLABEL_SYMLINKS]);
 }
 
+static inline bool selinux_policycap_ioctl_skip_cloexec(void)
+{
+	struct selinux_state *state = &selinux_state;
+
+	return READ_ONCE(state->policycap[POLICYDB_CAPABILITY_IOCTL_SKIP_CLOEXEC]);
+}
+
 struct selinux_policy_convert_data;
 
 struct selinux_load_state {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 23/43] loop: use sysfs_emit() in the sysfs xxx show()
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (20 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 22/43] selinux: allow FIOCLEX and FIONCLEX with policy capability Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 24/43] Fix incorrect type in assignment of ipv6 port for audit Sasha Levin
                   ` (19 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Chaitanya Kulkarni, Himanshu Madhani, Jens Axboe, Sasha Levin,
	linux-block

From: Chaitanya Kulkarni <kch@nvidia.com>

[ Upstream commit b27824d31f09ea7b4a6ba2c1b18bd328df3e8bed ]

sprintf does not know the PAGE_SIZE maximum of the temporary buffer
used for outputting sysfs content and it's possible to overrun the
PAGE_SIZE buffer length.

Use a generic sysfs_emit function that knows the size of the
temporary buffer and ensures that no overrun is done for offset
attribute in
loop_attr_[offset|sizelimit|autoclear|partscan|dio]_show() callbacks.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20220215213310.7264-2-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/block/loop.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 19fe19eaa50e..e65d1e24cab3 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -681,33 +681,33 @@ static ssize_t loop_attr_backing_file_show(struct loop_device *lo, char *buf)
 
 static ssize_t loop_attr_offset_show(struct loop_device *lo, char *buf)
 {
-	return sprintf(buf, "%llu\n", (unsigned long long)lo->lo_offset);
+	return sysfs_emit(buf, "%llu\n", (unsigned long long)lo->lo_offset);
 }
 
 static ssize_t loop_attr_sizelimit_show(struct loop_device *lo, char *buf)
 {
-	return sprintf(buf, "%llu\n", (unsigned long long)lo->lo_sizelimit);
+	return sysfs_emit(buf, "%llu\n", (unsigned long long)lo->lo_sizelimit);
 }
 
 static ssize_t loop_attr_autoclear_show(struct loop_device *lo, char *buf)
 {
 	int autoclear = (lo->lo_flags & LO_FLAGS_AUTOCLEAR);
 
-	return sprintf(buf, "%s\n", autoclear ? "1" : "0");
+	return sysfs_emit(buf, "%s\n", autoclear ? "1" : "0");
 }
 
 static ssize_t loop_attr_partscan_show(struct loop_device *lo, char *buf)
 {
 	int partscan = (lo->lo_flags & LO_FLAGS_PARTSCAN);
 
-	return sprintf(buf, "%s\n", partscan ? "1" : "0");
+	return sysfs_emit(buf, "%s\n", partscan ? "1" : "0");
 }
 
 static ssize_t loop_attr_dio_show(struct loop_device *lo, char *buf)
 {
 	int dio = (lo->lo_flags & LO_FLAGS_DIRECT_IO);
 
-	return sprintf(buf, "%s\n", dio ? "1" : "0");
+	return sysfs_emit(buf, "%s\n", dio ? "1" : "0");
 }
 
 LOOP_ATTR_RO(backing_file);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 24/43] Fix incorrect type in assignment of ipv6 port for audit
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (21 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 23/43] loop: use sysfs_emit() in the sysfs xxx show() Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 25/43] irqchip/qcom-pdc: Fix broken locking Sasha Levin
                   ` (18 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Casey Schaufler, kernel test robot, Sasha Levin, jmorris, serge,
	linux-security-module

From: Casey Schaufler <casey@schaufler-ca.com>

[ Upstream commit a5cd1ab7ab679d252a6d2f483eee7d45ebf2040c ]

Remove inappropriate use of ntohs() and assign the
port value directly.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 security/smack/smack_lsm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 14b279cc75c9..6207762dbdb1 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -2510,7 +2510,7 @@ static int smk_ipv6_check(struct smack_known *subject,
 #ifdef CONFIG_AUDIT
 	smk_ad_init_net(&ad, __func__, LSM_AUDIT_DATA_NET, &net);
 	ad.a.u.net->family = PF_INET6;
-	ad.a.u.net->dport = ntohs(address->sin6_port);
+	ad.a.u.net->dport = address->sin6_port;
 	if (act == SMK_RECEIVING)
 		ad.a.u.net->v6info.saddr = address->sin6_addr;
 	else
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 25/43] irqchip/qcom-pdc: Fix broken locking
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (22 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 24/43] Fix incorrect type in assignment of ipv6 port for audit Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 26/43] irqchip/nvic: Release nvic_base upon failure Sasha Levin
                   ` (17 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Marc Zyngier, Maulik Shah, Sasha Levin, agross, bjorn.andersson,
	tglx, linux-arm-msm

From: Marc Zyngier <maz@kernel.org>

[ Upstream commit a6aca2f460e203781dc41391913cc5b54f4bc0ce ]

pdc_enable_intr() serves as a primitive to qcom_pdc_gic_{en,dis}able,
and has a raw spinlock for mutual exclusion, which is uses with
interruptible primitives.

This means that this critical section can itself be interrupted.
Should the interrupt also be a PDC interrupt, and the endpoint driver
perform an irq_disable() on that interrupt, we end-up in a deadlock.

Fix this by using the irqsave/irqrestore variants of the locking
primitives.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Maulik Shah <quic_mkshah@quicinc.com>
Link: https://lore.kernel.org/r/20220224101226.88373-5-maz@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/irqchip/qcom-pdc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/irqchip/qcom-pdc.c b/drivers/irqchip/qcom-pdc.c
index 173e6520e06e..c0b457f26ec4 100644
--- a/drivers/irqchip/qcom-pdc.c
+++ b/drivers/irqchip/qcom-pdc.c
@@ -56,17 +56,18 @@ static u32 pdc_reg_read(int reg, u32 i)
 static void pdc_enable_intr(struct irq_data *d, bool on)
 {
 	int pin_out = d->hwirq;
+	unsigned long flags;
 	u32 index, mask;
 	u32 enable;
 
 	index = pin_out / 32;
 	mask = pin_out % 32;
 
-	raw_spin_lock(&pdc_lock);
+	raw_spin_lock_irqsave(&pdc_lock, flags);
 	enable = pdc_reg_read(IRQ_ENABLE_BANK, index);
 	enable = on ? ENABLE_INTR(enable, mask) : CLEAR_INTR(enable, mask);
 	pdc_reg_write(IRQ_ENABLE_BANK, index, enable);
-	raw_spin_unlock(&pdc_lock);
+	raw_spin_unlock_irqrestore(&pdc_lock, flags);
 }
 
 static void qcom_pdc_gic_disable(struct irq_data *d)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 26/43] irqchip/nvic: Release nvic_base upon failure
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (23 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 25/43] irqchip/qcom-pdc: Fix broken locking Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 27/43] fs/binfmt_elf: Fix AT_PHDR for unusual ELF files Sasha Levin
                   ` (16 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Souptick Joarder (HPE),
	kernel test robot, Dan Carpenter, Marc Zyngier, Sasha Levin,
	tglx

From: "Souptick Joarder (HPE)" <jrdr.linux@gmail.com>

[ Upstream commit e414c25e3399b2b3d7337dc47abccab5c71b7c8f ]

smatch warning was reported as below ->

smatch warnings:
drivers/irqchip/irq-nvic.c:131 nvic_of_init()
warn: 'nvic_base' not released on lines: 97.

Release nvic_base upon failure.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Souptick Joarder (HPE) <jrdr.linux@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220218163303.33344-1-jrdr.linux@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/irqchip/irq-nvic.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/irqchip/irq-nvic.c b/drivers/irqchip/irq-nvic.c
index ba4759b3e269..94230306e0ee 100644
--- a/drivers/irqchip/irq-nvic.c
+++ b/drivers/irqchip/irq-nvic.c
@@ -107,6 +107,7 @@ static int __init nvic_of_init(struct device_node *node,
 
 	if (!nvic_irq_domain) {
 		pr_warn("Failed to allocate irq domain\n");
+		iounmap(nvic_base);
 		return -ENOMEM;
 	}
 
@@ -116,6 +117,7 @@ static int __init nvic_of_init(struct device_node *node,
 	if (ret) {
 		pr_warn("Failed to allocate irq chips\n");
 		irq_domain_remove(nvic_irq_domain);
+		iounmap(nvic_base);
 		return ret;
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 27/43] fs/binfmt_elf: Fix AT_PHDR for unusual ELF files
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (24 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 26/43] irqchip/nvic: Release nvic_base upon failure Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 28/43] hwrng: cavium - fix NULL but dereferenced coccicheck error Sasha Levin
                   ` (15 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Akira Kawata, kernel test robot, Kees Cook, Sasha Levin, viro,
	linux-fsdevel, linux-mm

From: Akira Kawata <akirakawata1@gmail.com>

[ Upstream commit 0da1d5002745cdc721bc018b582a8a9704d56c42 ]

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=197921

As pointed out in the discussion of buglink, we cannot calculate AT_PHDR
as the sum of load_addr and exec->e_phoff.

: The AT_PHDR of ELF auxiliary vectors should point to the memory address
: of program header. But binfmt_elf.c calculates this address as follows:
:
: NEW_AUX_ENT(AT_PHDR, load_addr + exec->e_phoff);
:
: which is wrong since e_phoff is the file offset of program header and
: load_addr is the memory base address from PT_LOAD entry.
:
: The ld.so uses AT_PHDR as the memory address of program header. In normal
: case, since the e_phoff is usually 64 and in the first PT_LOAD region, it
: is the correct program header address.
:
: But if the address of program header isn't equal to the first PT_LOAD
: address + e_phoff (e.g.  Put the program header in other non-consecutive
: PT_LOAD region), ld.so will try to read program header from wrong address
: then crash or use incorrect program header.

This is because exec->e_phoff
is the offset of PHDRs in the file and the address of PHDRs in the
memory may differ from it. This patch fixes the bug by calculating the
address of program headers from PT_LOADs directly.

Signed-off-by: Akira Kawata <akirakawata1@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220127124014.338760-2-akirakawata1@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/binfmt_elf.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index d61543fbd652..af0965c10619 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -170,8 +170,8 @@ static int padzero(unsigned long elf_bss)
 
 static int
 create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
-		unsigned long load_addr, unsigned long interp_load_addr,
-		unsigned long e_entry)
+		unsigned long interp_load_addr,
+		unsigned long e_entry, unsigned long phdr_addr)
 {
 	struct mm_struct *mm = current->mm;
 	unsigned long p = bprm->p;
@@ -257,7 +257,7 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
 	NEW_AUX_ENT(AT_HWCAP, ELF_HWCAP);
 	NEW_AUX_ENT(AT_PAGESZ, ELF_EXEC_PAGESIZE);
 	NEW_AUX_ENT(AT_CLKTCK, CLOCKS_PER_SEC);
-	NEW_AUX_ENT(AT_PHDR, load_addr + exec->e_phoff);
+	NEW_AUX_ENT(AT_PHDR, phdr_addr);
 	NEW_AUX_ENT(AT_PHENT, sizeof(struct elf_phdr));
 	NEW_AUX_ENT(AT_PHNUM, exec->e_phnum);
 	NEW_AUX_ENT(AT_BASE, interp_load_addr);
@@ -823,7 +823,7 @@ static int parse_elf_properties(struct file *f, const struct elf_phdr *phdr,
 static int load_elf_binary(struct linux_binprm *bprm)
 {
 	struct file *interpreter = NULL; /* to shut gcc up */
- 	unsigned long load_addr = 0, load_bias = 0;
+	unsigned long load_addr, load_bias = 0, phdr_addr = 0;
 	int load_addr_set = 0;
 	unsigned long error;
 	struct elf_phdr *elf_ppnt, *elf_phdata, *interp_elf_phdata = NULL;
@@ -1180,6 +1180,17 @@ static int load_elf_binary(struct linux_binprm *bprm)
 				reloc_func_desc = load_bias;
 			}
 		}
+
+		/*
+		 * Figure out which segment in the file contains the Program
+		 * Header table, and map to the associated memory address.
+		 */
+		if (elf_ppnt->p_offset <= elf_ex->e_phoff &&
+		    elf_ex->e_phoff < elf_ppnt->p_offset + elf_ppnt->p_filesz) {
+			phdr_addr = elf_ex->e_phoff - elf_ppnt->p_offset +
+				    elf_ppnt->p_vaddr;
+		}
+
 		k = elf_ppnt->p_vaddr;
 		if ((elf_ppnt->p_flags & PF_X) && k < start_code)
 			start_code = k;
@@ -1215,6 +1226,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 	}
 
 	e_entry = elf_ex->e_entry + load_bias;
+	phdr_addr += load_bias;
 	elf_bss += load_bias;
 	elf_brk += load_bias;
 	start_code += load_bias;
@@ -1278,8 +1290,8 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		goto out;
 #endif /* ARCH_HAS_SETUP_ADDITIONAL_PAGES */
 
-	retval = create_elf_tables(bprm, elf_ex,
-			  load_addr, interp_load_addr, e_entry);
+	retval = create_elf_tables(bprm, elf_ex, interp_load_addr,
+				   e_entry, phdr_addr);
 	if (retval < 0)
 		goto out;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 28/43] hwrng: cavium - fix NULL but dereferenced coccicheck error
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (25 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 27/43] fs/binfmt_elf: Fix AT_PHDR for unusual ELF files Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 29/43] signal, x86: Delay calling signals in atomic on RT enabled kernels Sasha Levin
                   ` (14 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Wan Jiabing, Sunil Goutham, Herbert Xu, Sasha Levin, mpm, linux-crypto

From: Wan Jiabing <wanjiabing@vivo.com>

[ Upstream commit e6205ad58a7ac194abfb33897585b38687d797fa ]

Fix following coccicheck warning:
./drivers/char/hw_random/cavium-rng-vf.c:182:17-20: ERROR:
pdev is NULL but dereferenced.

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/char/hw_random/cavium-rng-vf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/hw_random/cavium-rng-vf.c b/drivers/char/hw_random/cavium-rng-vf.c
index 6f66919652bf..7c55f4cf4a8b 100644
--- a/drivers/char/hw_random/cavium-rng-vf.c
+++ b/drivers/char/hw_random/cavium-rng-vf.c
@@ -179,7 +179,7 @@ static int cavium_map_pf_regs(struct cavium_rng *rng)
 	pdev = pci_get_device(PCI_VENDOR_ID_CAVIUM,
 			      PCI_DEVID_CAVIUM_RNG_PF, NULL);
 	if (!pdev) {
-		dev_err(&pdev->dev, "Cannot find RNG PF device\n");
+		pr_err("Cannot find RNG PF device\n");
 		return -EIO;
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 29/43] signal, x86: Delay calling signals in atomic on RT enabled kernels
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (26 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 28/43] hwrng: cavium - fix NULL but dereferenced coccicheck error Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 14:31   ` Eric W. Biederman
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 30/43] bfq: fix use-after-free in bfq_dispatch_request Sasha Levin
                   ` (13 subsequent siblings)
  41 siblings, 1 reply; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Oleg Nesterov, Steven Rostedt, Thomas Gleixner,
	Sebastian Andrzej Siewior, Sasha Levin, mingo, bp, dave.hansen,
	x86, peterz, juri.lelli, vincent.guittot, luto, frederic,
	mark.rutland, valentin.schneider, ebiederm, keescook, elver,
	legion

From: Oleg Nesterov <oleg@redhat.com>

[ Upstream commit bf9ad37dc8a30cce22ae95d6c2ca6abf8731d305 ]

On x86_64 we must disable preemption before we enable interrupts
for stack faults, int3 and debugging, because the current task is using
a per CPU debug stack defined by the IST. If we schedule out, another task
can come in and use the same stack and cause the stack to be corrupted
and crash the kernel on return.

When CONFIG_PREEMPT_RT is enabled, spinlock_t locks become sleeping, and
one of these is the spin lock used in signal handling.

Some of the debug code (int3) causes do_trap() to send a signal.
This function calls a spinlock_t lock that has been converted to a
sleeping lock. If this happens, the above issues with the corrupted
stack is possible.

Instead of calling the signal right away, for PREEMPT_RT and x86,
the signal information is stored on the stacks task_struct and
TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
code will send the signal when preemption is enabled.

[ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
  ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
[bigeasy: Add on 32bit as per Yang Shi, minor rewording. ]
[ tglx: Use a config option ]

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/Ygq5aBB/qMQw6aP5@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/x86/Kconfig       |  1 +
 include/linux/sched.h  |  3 +++
 kernel/Kconfig.preempt | 12 +++++++++++-
 kernel/entry/common.c  | 14 ++++++++++++++
 kernel/signal.c        | 40 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 9f5bd41bf660..d557ac29b6cd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -120,6 +120,7 @@ config X86
 	select ARCH_WANTS_NO_INSTR
 	select ARCH_WANT_HUGE_PMD_SHARE
 	select ARCH_WANT_LD_ORPHAN_WARN
+	select ARCH_WANTS_RT_DELAYED_SIGNALS
 	select ARCH_WANTS_THP_SWAP		if X86_64
 	select ARCH_HAS_PARANOID_L1D_FLUSH
 	select BUILDTIME_TABLE_SORT
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 75ba8aa60248..098e37fd770a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1087,6 +1087,9 @@ struct task_struct {
 	/* Restored if set_restore_sigmask() was used: */
 	sigset_t			saved_sigmask;
 	struct sigpending		pending;
+#ifdef CONFIG_RT_DELAYED_SIGNALS
+	struct kernel_siginfo		forced_info;
+#endif
 	unsigned long			sas_ss_sp;
 	size_t				sas_ss_size;
 	unsigned int			sas_ss_flags;
diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
index ce77f0265660..5644abd5f8a8 100644
--- a/kernel/Kconfig.preempt
+++ b/kernel/Kconfig.preempt
@@ -132,4 +132,14 @@ config SCHED_CORE
 	  which is the likely usage by Linux distributions, there should
 	  be no measurable impact on performance.
 
-
+config ARCH_WANTS_RT_DELAYED_SIGNALS
+	bool
+	help
+	  This option is selected by architectures where raising signals
+	  can happen in atomic contexts on PREEMPT_RT enabled kernels. This
+	  option delays raising the signal until the return to user space
+	  loop where it is also delivered. X86 requires this to deliver
+	  signals from trap handlers which run on IST stacks.
+
+config RT_DELAYED_SIGNALS
+	def_bool PREEMPT_RT && ARCH_WANTS_RT_DELAYED_SIGNALS
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index bad713684c2e..0543a2c92f20 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -148,6 +148,18 @@ static void handle_signal_work(struct pt_regs *regs, unsigned long ti_work)
 	arch_do_signal_or_restart(regs, ti_work & _TIF_SIGPENDING);
 }
 
+#ifdef CONFIG_RT_DELAYED_SIGNALS
+static inline void raise_delayed_signal(void)
+{
+	if (unlikely(current->forced_info.si_signo)) {
+		force_sig_info(&current->forced_info);
+		current->forced_info.si_signo = 0;
+	}
+}
+#else
+static inline void raise_delayed_signal(void) { }
+#endif
+
 static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
 					    unsigned long ti_work)
 {
@@ -162,6 +174,8 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
 		if (ti_work & _TIF_NEED_RESCHED)
 			schedule();
 
+		raise_delayed_signal();
+
 		if (ti_work & _TIF_UPROBE)
 			uprobe_notify_resume(regs);
 
diff --git a/kernel/signal.c b/kernel/signal.c
index 9b04631acde8..e93de6daa188 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1307,6 +1307,43 @@ enum sig_handler {
 	HANDLER_EXIT,	 /* Only visible as the process exit code */
 };
 
+/*
+ * On some archictectures, PREEMPT_RT has to delay sending a signal from a
+ * trap since it cannot enable preemption, and the signal code's
+ * spin_locks turn into mutexes. Instead, it must set TIF_NOTIFY_RESUME
+ * which will send the signal on exit of the trap.
+ */
+#ifdef CONFIG_RT_DELAYED_SIGNALS
+static inline bool force_sig_delayed(struct kernel_siginfo *info,
+				     struct task_struct *t)
+{
+	if (!in_atomic())
+		return false;
+
+	if (WARN_ON_ONCE(t->forced_info.si_signo))
+		return true;
+
+	if (is_si_special(info)) {
+		WARN_ON_ONCE(info != SEND_SIG_PRIV);
+		t->forced_info.si_signo = info->si_signo;
+		t->forced_info.si_errno = 0;
+		t->forced_info.si_code = SI_KERNEL;
+		t->forced_info.si_pid = 0;
+		t->forced_info.si_uid = 0;
+	} else {
+		t->forced_info = *info;
+	}
+	set_tsk_thread_flag(t, TIF_NOTIFY_RESUME);
+	return true;
+}
+#else
+static inline bool force_sig_delayed(struct kernel_siginfo *info,
+				     struct task_struct *t)
+{
+	return false;
+}
+#endif
+
 /*
  * Force a signal that the process can't ignore: if necessary
  * we unblock the signal and change any SIG_IGN to SIG_DFL.
@@ -1327,6 +1364,9 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
 	struct k_sigaction *action;
 	int sig = info->si_signo;
 
+	if (force_sig_delayed(info, t))
+		return 0;
+
 	spin_lock_irqsave(&t->sighand->siglock, flags);
 	action = &t->sighand->action[sig-1];
 	ignored = action->sa.sa_handler == SIG_IGN;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 30/43] bfq: fix use-after-free in bfq_dispatch_request
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (27 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 29/43] signal, x86: Delay calling signals in atomic on RT enabled kernels Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 31/43] ACPICA: Avoid walking the ACPI Namespace if it is not there Sasha Levin
                   ` (12 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Zhang Wensheng, Hulk Robot, Jens Axboe, Sasha Levin,
	paolo.valente, linux-block

From: Zhang Wensheng <zhangwensheng5@huawei.com>

[ Upstream commit ab552fcb17cc9e4afe0e4ac4df95fc7b30e8490a ]

KASAN reports a use-after-free report when doing normal scsi-mq test

[69832.239032] ==================================================================
[69832.241810] BUG: KASAN: use-after-free in bfq_dispatch_request+0x1045/0x44b0
[69832.243267] Read of size 8 at addr ffff88802622ba88 by task kworker/3:1H/155
[69832.244656]
[69832.245007] CPU: 3 PID: 155 Comm: kworker/3:1H Not tainted 5.10.0-10295-g576c6382529e #8
[69832.246626] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[69832.249069] Workqueue: kblockd blk_mq_run_work_fn
[69832.250022] Call Trace:
[69832.250541]  dump_stack+0x9b/0xce
[69832.251232]  ? bfq_dispatch_request+0x1045/0x44b0
[69832.252243]  print_address_description.constprop.6+0x3e/0x60
[69832.253381]  ? __cpuidle_text_end+0x5/0x5
[69832.254211]  ? vprintk_func+0x6b/0x120
[69832.254994]  ? bfq_dispatch_request+0x1045/0x44b0
[69832.255952]  ? bfq_dispatch_request+0x1045/0x44b0
[69832.256914]  kasan_report.cold.9+0x22/0x3a
[69832.257753]  ? bfq_dispatch_request+0x1045/0x44b0
[69832.258755]  check_memory_region+0x1c1/0x1e0
[69832.260248]  bfq_dispatch_request+0x1045/0x44b0
[69832.261181]  ? bfq_bfqq_expire+0x2440/0x2440
[69832.262032]  ? blk_mq_delay_run_hw_queues+0xf9/0x170
[69832.263022]  __blk_mq_do_dispatch_sched+0x52f/0x830
[69832.264011]  ? blk_mq_sched_request_inserted+0x100/0x100
[69832.265101]  __blk_mq_sched_dispatch_requests+0x398/0x4f0
[69832.266206]  ? blk_mq_do_dispatch_ctx+0x570/0x570
[69832.267147]  ? __switch_to+0x5f4/0xee0
[69832.267898]  blk_mq_sched_dispatch_requests+0xdf/0x140
[69832.268946]  __blk_mq_run_hw_queue+0xc0/0x270
[69832.269840]  blk_mq_run_work_fn+0x51/0x60
[69832.278170]  process_one_work+0x6d4/0xfe0
[69832.278984]  worker_thread+0x91/0xc80
[69832.279726]  ? __kthread_parkme+0xb0/0x110
[69832.280554]  ? process_one_work+0xfe0/0xfe0
[69832.281414]  kthread+0x32d/0x3f0
[69832.282082]  ? kthread_park+0x170/0x170
[69832.282849]  ret_from_fork+0x1f/0x30
[69832.283573]
[69832.283886] Allocated by task 7725:
[69832.284599]  kasan_save_stack+0x19/0x40
[69832.285385]  __kasan_kmalloc.constprop.2+0xc1/0xd0
[69832.286350]  kmem_cache_alloc_node+0x13f/0x460
[69832.287237]  bfq_get_queue+0x3d4/0x1140
[69832.287993]  bfq_get_bfqq_handle_split+0x103/0x510
[69832.289015]  bfq_init_rq+0x337/0x2d50
[69832.289749]  bfq_insert_requests+0x304/0x4e10
[69832.290634]  blk_mq_sched_insert_requests+0x13e/0x390
[69832.291629]  blk_mq_flush_plug_list+0x4b4/0x760
[69832.292538]  blk_flush_plug_list+0x2c5/0x480
[69832.293392]  io_schedule_prepare+0xb2/0xd0
[69832.294209]  io_schedule_timeout+0x13/0x80
[69832.295014]  wait_for_common_io.constprop.1+0x13c/0x270
[69832.296137]  submit_bio_wait+0x103/0x1a0
[69832.296932]  blkdev_issue_discard+0xe6/0x160
[69832.297794]  blk_ioctl_discard+0x219/0x290
[69832.298614]  blkdev_common_ioctl+0x50a/0x1750
[69832.304715]  blkdev_ioctl+0x470/0x600
[69832.305474]  block_ioctl+0xde/0x120
[69832.306232]  vfs_ioctl+0x6c/0xc0
[69832.306877]  __se_sys_ioctl+0x90/0xa0
[69832.307629]  do_syscall_64+0x2d/0x40
[69832.308362]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[69832.309382]
[69832.309701] Freed by task 155:
[69832.310328]  kasan_save_stack+0x19/0x40
[69832.311121]  kasan_set_track+0x1c/0x30
[69832.311868]  kasan_set_free_info+0x1b/0x30
[69832.312699]  __kasan_slab_free+0x111/0x160
[69832.313524]  kmem_cache_free+0x94/0x460
[69832.314367]  bfq_put_queue+0x582/0x940
[69832.315112]  __bfq_bfqd_reset_in_service+0x166/0x1d0
[69832.317275]  bfq_bfqq_expire+0xb27/0x2440
[69832.318084]  bfq_dispatch_request+0x697/0x44b0
[69832.318991]  __blk_mq_do_dispatch_sched+0x52f/0x830
[69832.319984]  __blk_mq_sched_dispatch_requests+0x398/0x4f0
[69832.321087]  blk_mq_sched_dispatch_requests+0xdf/0x140
[69832.322225]  __blk_mq_run_hw_queue+0xc0/0x270
[69832.323114]  blk_mq_run_work_fn+0x51/0x60
[69832.323942]  process_one_work+0x6d4/0xfe0
[69832.324772]  worker_thread+0x91/0xc80
[69832.325518]  kthread+0x32d/0x3f0
[69832.326205]  ret_from_fork+0x1f/0x30
[69832.326932]
[69832.338297] The buggy address belongs to the object at ffff88802622b968
[69832.338297]  which belongs to the cache bfq_queue of size 512
[69832.340766] The buggy address is located 288 bytes inside of
[69832.340766]  512-byte region [ffff88802622b968, ffff88802622bb68)
[69832.343091] The buggy address belongs to the page:
[69832.344097] page:ffffea0000988a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88802622a528 pfn:0x26228
[69832.346214] head:ffffea0000988a00 order:2 compound_mapcount:0 compound_pincount:0
[69832.347719] flags: 0x1fffff80010200(slab|head)
[69832.348625] raw: 001fffff80010200 ffffea0000dbac08 ffff888017a57650 ffff8880179fe840
[69832.354972] raw: ffff88802622a528 0000000000120008 00000001ffffffff 0000000000000000
[69832.356547] page dumped because: kasan: bad access detected
[69832.357652]
[69832.357970] Memory state around the buggy address:
[69832.358926]  ffff88802622b980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[69832.360358]  ffff88802622ba00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[69832.361810] >ffff88802622ba80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[69832.363273]                       ^
[69832.363975]  ffff88802622bb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
[69832.375960]  ffff88802622bb80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[69832.377405] ==================================================================

In bfq_dispatch_requestfunction, it may have function call:

bfq_dispatch_request
	__bfq_dispatch_request
		bfq_select_queue
			bfq_bfqq_expire
				__bfq_bfqd_reset_in_service
					bfq_put_queue
						kmem_cache_free
In this function call, in_serv_queue has beed expired and meet the
conditions to free. In the function bfq_dispatch_request, the address
of in_serv_queue pointing to has been released. For getting the value
of idle_timer_disabled, it will get flags value from the address which
in_serv_queue pointing to, then the problem of use-after-free happens;

Fix the problem by check in_serv_queue == bfqd->in_service_queue, to
get the value of idle_timer_disabled if in_serve_queue is equel to
bfqd->in_service_queue. If the space of in_serv_queue pointing has
been released, this judge will aviod use-after-free problem.
And if in_serv_queue may be expired or finished, the idle_timer_disabled
will be false which would not give effects to bfq_update_dispatch_stats.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Wensheng <zhangwensheng5@huawei.com>
Link: https://lore.kernel.org/r/20220303070334.3020168-1-zhangwensheng5@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 block/bfq-iosched.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 36a66e97e3c2..8735f075230f 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5181,7 +5181,7 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;
 	struct request *rq;
 	struct bfq_queue *in_serv_queue;
-	bool waiting_rq, idle_timer_disabled;
+	bool waiting_rq, idle_timer_disabled = false;
 
 	spin_lock_irq(&bfqd->lock);
 
@@ -5189,14 +5189,15 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
 	waiting_rq = in_serv_queue && bfq_bfqq_wait_request(in_serv_queue);
 
 	rq = __bfq_dispatch_request(hctx);
-
-	idle_timer_disabled =
-		waiting_rq && !bfq_bfqq_wait_request(in_serv_queue);
+	if (in_serv_queue == bfqd->in_service_queue) {
+		idle_timer_disabled =
+			waiting_rq && !bfq_bfqq_wait_request(in_serv_queue);
+	}
 
 	spin_unlock_irq(&bfqd->lock);
-
-	bfq_update_dispatch_stats(hctx->queue, rq, in_serv_queue,
-				  idle_timer_disabled);
+	bfq_update_dispatch_stats(hctx->queue, rq,
+			idle_timer_disabled ? in_serv_queue : NULL,
+				idle_timer_disabled);
 
 	return rq;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 31/43] ACPICA: Avoid walking the ACPI Namespace if it is not there
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (28 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 30/43] bfq: fix use-after-free in bfq_dispatch_request Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 32/43] ACPI / x86: Add skip i2c clients quirk for Nextbook Ares 8 Sasha Levin
                   ` (11 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Rafael J. Wysocki, Hans de Goede, Sasha Levin, robert.moore,
	linux-acpi, devel

From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>

[ Upstream commit 0c9992315e738e7d6e927ef36839a466b080dba6 ]

ACPICA commit b1c3656ef4950098e530be68d4b589584f06cddc

Prevent acpi_ns_walk_namespace() from crashing when called with
start_node equal to ACPI_ROOT_OBJECT if the Namespace has not been
instantiated yet and acpi_gbl_root_node is NULL.

For instance, this can happen if the kernel is run with "acpi=off"
in the command line.

Link: https://github.com/acpica/acpica/commit/b1c3656ef4950098e530be68d4b589584f06cddc
Link: https://lore.kernel.org/linux-acpi/CAJZ5v0hJWW_vZ3wwajE7xT38aWjY7cZyvqMJpXHzUL98-SiCVQ@mail.gmail.com/
Reported-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/acpi/acpica/nswalk.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/acpi/acpica/nswalk.c b/drivers/acpi/acpica/nswalk.c
index 915c2433463d..e7c30ce06e18 100644
--- a/drivers/acpi/acpica/nswalk.c
+++ b/drivers/acpi/acpica/nswalk.c
@@ -169,6 +169,9 @@ acpi_ns_walk_namespace(acpi_object_type type,
 
 	if (start_node == ACPI_ROOT_OBJECT) {
 		start_node = acpi_gbl_root_node;
+		if (!start_node) {
+			return_ACPI_STATUS(AE_NO_NAMESPACE);
+		}
 	}
 
 	/* Null child means "get first node" */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 32/43] ACPI / x86: Add skip i2c clients quirk for Nextbook Ares 8
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (29 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 31/43] ACPICA: Avoid walking the ACPI Namespace if it is not there Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 33/43] ACPI / x86: Add skip i2c clients quirk for Lenovo Yoga Tablet 1050F/L Sasha Levin
                   ` (10 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Hans de Goede, Rafael J . Wysocki, Sasha Levin, rafael,
	mario.limonciello, linux-acpi

From: Hans de Goede <hdegoede@redhat.com>

[ Upstream commit f38312c9b569322edf4baae467568206fe46d57b ]

The Nextbook Ares 8 is a x86 ACPI tablet which ships with Android x86
as factory OS. Its DSDT contains a bunch of I2C devices which are not
actually there, causing various resource conflicts (the Android x86
kernel fork ignores I2C devices described in the DSDT).

Add a ACPI_QUIRK_SKIP_I2C_CLIENTS for the Nextbook Ares 8 to the
acpi_quirk_skip_dmi_ids table to woraround this.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/acpi/x86/utils.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/acpi/x86/utils.c b/drivers/acpi/x86/utils.c
index ffdeed5334d6..9b991294f1e5 100644
--- a/drivers/acpi/x86/utils.c
+++ b/drivers/acpi/x86/utils.c
@@ -284,6 +284,15 @@ static const struct dmi_system_id acpi_quirk_skip_dmi_ids[] = {
 		.driver_data = (void *)(ACPI_QUIRK_SKIP_I2C_CLIENTS |
 					ACPI_QUIRK_SKIP_ACPI_AC_AND_BATTERY),
 	},
+	{
+		/* Nextbook Ares 8 */
+		.matches = {
+			DMI_MATCH(DMI_SYS_VENDOR, "Insyde"),
+			DMI_MATCH(DMI_PRODUCT_NAME, "M890BAP"),
+		},
+		.driver_data = (void *)(ACPI_QUIRK_SKIP_I2C_CLIENTS |
+					ACPI_QUIRK_SKIP_ACPI_AC_AND_BATTERY),
+	},
 	{
 		/* Whitelabel (sold as various brands) TM800A550L */
 		.matches = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 33/43] ACPI / x86: Add skip i2c clients quirk for Lenovo Yoga Tablet 1050F/L
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (30 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 32/43] ACPI / x86: Add skip i2c clients quirk for Nextbook Ares 8 Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 34/43] lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3 Sasha Levin
                   ` (9 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Hans de Goede, Rafael J . Wysocki, Sasha Levin, rafael,
	mario.limonciello, linux-acpi

From: Hans de Goede <hdegoede@redhat.com>

[ Upstream commit 4fecb1e93e4914fc0bc1fb467ca79741f9f94abb ]

The Yoga Tablet 1050F/L is a x86 ACPI tablet which ships with Android x86
as factory OS. Its DSDT contains a bunch of I2C devices which are not
actually there, causing various resource conflicts (the Android x86
kernel fork ignores I2C devices described in the DSDT).

Add a ACPI_QUIRK_SKIP_I2C_CLIENTS for the Nextbook Ares 8 to the
acpi_quirk_skip_dmi_ids table to woraround this.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/acpi/x86/utils.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/acpi/x86/utils.c b/drivers/acpi/x86/utils.c
index 9b991294f1e5..664070fc8349 100644
--- a/drivers/acpi/x86/utils.c
+++ b/drivers/acpi/x86/utils.c
@@ -284,6 +284,18 @@ static const struct dmi_system_id acpi_quirk_skip_dmi_ids[] = {
 		.driver_data = (void *)(ACPI_QUIRK_SKIP_I2C_CLIENTS |
 					ACPI_QUIRK_SKIP_ACPI_AC_AND_BATTERY),
 	},
+	{
+		/* Lenovo Yoga Tablet 1050F/L */
+		.matches = {
+			DMI_MATCH(DMI_SYS_VENDOR, "Intel Corp."),
+			DMI_MATCH(DMI_PRODUCT_NAME, "VALLEYVIEW C0 PLATFORM"),
+			DMI_MATCH(DMI_BOARD_NAME, "BYT-T FFD8"),
+			/* Partial match on beginning of BIOS version */
+			DMI_MATCH(DMI_BIOS_VERSION, "BLADE_21"),
+		},
+		.driver_data = (void *)(ACPI_QUIRK_SKIP_I2C_CLIENTS |
+					ACPI_QUIRK_SKIP_ACPI_AC_AND_BATTERY),
+	},
 	{
 		/* Nextbook Ares 8 */
 		.matches = {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 34/43] lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (31 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 33/43] ACPI / x86: Add skip i2c clients quirk for Lenovo Yoga Tablet 1050F/L Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 35/43] Revert "Revert "block, bfq: honor already-setup queue merges"" Sasha Levin
                   ` (8 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Paul Menzel, Matt Brown, Song Liu, Sasha Levin, ast, daniel,
	andrii, netdev, bpf

From: Paul Menzel <pmenzel@molgen.mpg.de>

[ Upstream commit 633174a7046ec3b4572bec24ef98e6ee89bce14b ]

Buidling raid6test on Ubuntu 21.10 (ppc64le) with GNU Make 4.3 shows the
errors below:

    $ cd lib/raid6/test/
    $ make
    <stdin>:1:1: error: stray ‘\’ in program
    <stdin>:1:2: error: stray ‘#’ in program
    <stdin>:1:11: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ \
        before ‘<’ token

    [...]

The errors come from the HAS_ALTIVEC test, which fails, and the POWER
optimized versions are not built. That’s also reason nobody noticed on the
other architectures.

GNU Make 4.3 does not remove the backslash anymore. From the 4.3 release
announcment:

> * WARNING: Backward-incompatibility!
>   Number signs (#) appearing inside a macro reference or function invocation
>   no longer introduce comments and should not be escaped with backslashes:
>   thus a call such as:
>     foo := $(shell echo '#')
>   is legal.  Previously the number sign needed to be escaped, for example:
>     foo := $(shell echo '\#')
>   Now this latter will resolve to "\#".  If you want to write makefiles
>   portable to both versions, assign the number sign to a variable:
>     H := \#
>     foo := $(shell echo '$H')
>   This was claimed to be fixed in 3.81, but wasn't, for some reason.
>   To detect this change search for 'nocomment' in the .FEATURES variable.

So, do the same as commit 9564a8cf422d ("Kbuild: fix # escaping in .cmd
files for future Make") and commit 929bef467771 ("bpf: Use $(pound) instead
of \# in Makefiles") and define and use a $(pound) variable.

Reference for the change in make:
https://git.savannah.gnu.org/cgit/make.git/commit/?id=c6966b323811c37acedff05b57

Cc: Matt Brown <matthew.brown.dev@gmail.com>
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 lib/raid6/test/Makefile | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/lib/raid6/test/Makefile b/lib/raid6/test/Makefile
index a4c7cd74cff5..4fb7700a741b 100644
--- a/lib/raid6/test/Makefile
+++ b/lib/raid6/test/Makefile
@@ -4,6 +4,8 @@
 # from userspace.
 #
 
+pound := \#
+
 CC	 = gcc
 OPTFLAGS = -O2			# Adjust as desired
 CFLAGS	 = -I.. -I ../../../include -g $(OPTFLAGS)
@@ -42,7 +44,7 @@ else ifeq ($(HAS_NEON),yes)
         OBJS   += neon.o neon1.o neon2.o neon4.o neon8.o recov_neon.o recov_neon_inner.o
         CFLAGS += -DCONFIG_KERNEL_MODE_NEON=1
 else
-        HAS_ALTIVEC := $(shell printf '\#include <altivec.h>\nvector int a;\n' |\
+        HAS_ALTIVEC := $(shell printf '$(pound)include <altivec.h>\nvector int a;\n' |\
                          gcc -c -x c - >/dev/null && rm ./-.o && echo yes)
         ifeq ($(HAS_ALTIVEC),yes)
                 CFLAGS += -I../../../arch/powerpc/include
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 35/43] Revert "Revert "block, bfq: honor already-setup queue merges""
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (32 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 34/43] lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3 Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 36/43] ACPI/APEI: Limit printable size of BERT table data Sasha Levin
                   ` (7 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Paolo Valente, Holger Hoffstätte, Jens Axboe, Sasha Levin,
	linux-block

From: Paolo Valente <paolo.valente@linaro.org>

[ Upstream commit 15729ff8143f8135b03988a100a19e66d7cb7ecd ]

A crash [1] happened to be triggered in conjunction with commit
2d52c58b9c9b ("block, bfq: honor already-setup queue merges"). The
latter was then reverted by commit ebc69e897e17 ("Revert "block, bfq:
honor already-setup queue merges""). Yet, the reverted commit was not
the one introducing the bug. In fact, it actually triggered a UAF
introduced by a different commit, and now fixed by commit d29bd41428cf
("block, bfq: reset last_bfqq_created on group change").

So, there is no point in keeping commit 2d52c58b9c9b ("block, bfq:
honor already-setup queue merges") out. This commit restores it.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=214503

Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Link: https://lore.kernel.org/r/20211125181510.15004-1-paolo.valente@linaro.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 block/bfq-iosched.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 8735f075230f..1dff82d34b44 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -2782,6 +2782,15 @@ bfq_setup_merge(struct bfq_queue *bfqq, struct bfq_queue *new_bfqq)
 	 * are likely to increase the throughput.
 	 */
 	bfqq->new_bfqq = new_bfqq;
+	/*
+	 * The above assignment schedules the following redirections:
+	 * each time some I/O for bfqq arrives, the process that
+	 * generated that I/O is disassociated from bfqq and
+	 * associated with new_bfqq. Here we increases new_bfqq->ref
+	 * in advance, adding the number of processes that are
+	 * expected to be associated with new_bfqq as they happen to
+	 * issue I/O.
+	 */
 	new_bfqq->ref += process_refs;
 	return new_bfqq;
 }
@@ -2844,6 +2853,10 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 {
 	struct bfq_queue *in_service_bfqq, *new_bfqq;
 
+	/* if a merge has already been setup, then proceed with that first */
+	if (bfqq->new_bfqq)
+		return bfqq->new_bfqq;
+
 	/*
 	 * Check delayed stable merge for rotational or non-queueing
 	 * devs. For this branch to be executed, bfqq must not be
@@ -2945,9 +2958,6 @@ bfq_setup_cooperator(struct bfq_data *bfqd, struct bfq_queue *bfqq,
 	if (bfq_too_late_for_merging(bfqq))
 		return NULL;
 
-	if (bfqq->new_bfqq)
-		return bfqq->new_bfqq;
-
 	if (!io_struct || unlikely(bfqq == &bfqd->oom_bfqq))
 		return NULL;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 36/43] ACPI/APEI: Limit printable size of BERT table data
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (33 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 35/43] Revert "Revert "block, bfq: honor already-setup queue merges"" Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 37/43] PM: core: keep irq flags in device_pm_check_callbacks() Sasha Levin
                   ` (6 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Darren Hart, Rafael J . Wysocki, Sasha Levin, rafael, rdunlap,
	ying.huang, linux-acpi

From: Darren Hart <darren@os.amperecomputing.com>

[ Upstream commit 3f8dec116210ca649163574ed5f8df1e3b837d07 ]

Platforms with large BERT table data can trigger soft lockup errors
while attempting to print the entire BERT table data to the console at
boot:

  watchdog: BUG: soft lockup - CPU#160 stuck for 23s! [swapper/0:1]

Observed on Ampere Altra systems with a single BERT record of ~250KB.

The original bert driver appears to have assumed relatively small table
data. Since it is impractical to reassemble large table data from
interwoven console messages, and the table data is available in

  /sys/firmware/acpi/tables/data/BERT

limit the size for tables printed to the console to 1024 (for no reason
other than it seemed like a good place to kick off the discussion, would
appreciate feedback from existing users in terms of what size would
maintain their current usage model).

Alternatively, we could make printing a CONFIG option, use the
bert_disable boot arg (or something similar), or use a debug log level.
However, all those solutions require extra steps or change the existing
behavior for small table data. Limiting the size preserves existing
behavior on existing platforms with small table data, and eliminates the
soft lockups for platforms with large table data, while still making it
available.

Signed-off-by: Darren Hart <darren@os.amperecomputing.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/acpi/apei/bert.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/apei/bert.c b/drivers/acpi/apei/bert.c
index 19e50fcbf4d6..ad8ab3f12cf3 100644
--- a/drivers/acpi/apei/bert.c
+++ b/drivers/acpi/apei/bert.c
@@ -29,6 +29,7 @@
 
 #undef pr_fmt
 #define pr_fmt(fmt) "BERT: " fmt
+#define ACPI_BERT_PRINT_MAX_LEN 1024
 
 static int bert_disable;
 
@@ -58,8 +59,11 @@ static void __init bert_print_all(struct acpi_bert_region *region,
 		}
 
 		pr_info_once("Error records from previous boot:\n");
-
-		cper_estatus_print(KERN_INFO HW_ERR, estatus);
+		if (region_len < ACPI_BERT_PRINT_MAX_LEN)
+			cper_estatus_print(KERN_INFO HW_ERR, estatus);
+		else
+			pr_info_once("Max print length exceeded, table data is available at:\n"
+				     "/sys/firmware/acpi/tables/data/BERT");
 
 		/*
 		 * Because the boot error source is "one-time polled" type,
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 37/43] PM: core: keep irq flags in device_pm_check_callbacks()
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (34 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 36/43] ACPI/APEI: Limit printable size of BERT table data Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 38/43] parisc: Fix non-access data TLB cache flush faults Sasha Levin
                   ` (5 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dmitry Baryshkov, Rafael J . Wysocki, Sasha Levin, rafael, pavel,
	len.brown, gregkh, linux-pm

From: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

[ Upstream commit 524bb1da785a7ae43dd413cd392b5071c6c367f8 ]

The function device_pm_check_callbacks() can be called under the spin
lock (in the reported case it happens from genpd_add_device() ->
dev_pm_domain_set(), when the genpd uses spinlocks rather than mutexes.

However this function uncoditionally uses spin_lock_irq() /
spin_unlock_irq(), thus not preserving the CPU flags. Use the
irqsave/irqrestore instead.

The backtrace for the reference:
[    2.752010] ------------[ cut here ]------------
[    2.756769] raw_local_irq_restore() called with IRQs enabled
[    2.762596] WARNING: CPU: 4 PID: 1 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x34/0x50
[    2.772338] Modules linked in:
[    2.775487] CPU: 4 PID: 1 Comm: swapper/0 Tainted: G S                5.17.0-rc6-00384-ge330d0d82eff-dirty #684
[    2.781384] Freeing initrd memory: 46024K
[    2.785839] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    2.785841] pc : warn_bogus_irq_restore+0x34/0x50
[    2.785844] lr : warn_bogus_irq_restore+0x34/0x50
[    2.785846] sp : ffff80000805b7d0
[    2.785847] x29: ffff80000805b7d0 x28: 0000000000000000 x27: 0000000000000002
[    2.785850] x26: ffffd40e80930b18 x25: ffff7ee2329192b8 x24: ffff7edfc9f60800
[    2.785853] x23: ffffd40e80930b18 x22: ffffd40e80930d30 x21: ffff7edfc0dffa00
[    2.785856] x20: ffff7edfc09e3768 x19: 0000000000000000 x18: ffffffffffffffff
[    2.845775] x17: 6572206f74206465 x16: 6c696166203a3030 x15: ffff80008805b4f7
[    2.853108] x14: 0000000000000000 x13: ffffd40e809550b0 x12: 00000000000003d8
[    2.860441] x11: 0000000000000148 x10: ffffd40e809550b0 x9 : ffffd40e809550b0
[    2.867774] x8 : 00000000ffffefff x7 : ffffd40e809ad0b0 x6 : ffffd40e809ad0b0
[    2.875107] x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000
[    2.882440] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff7edfc03a8000
[    2.889774] Call trace:
[    2.892290]  warn_bogus_irq_restore+0x34/0x50
[    2.896770]  _raw_spin_unlock_irqrestore+0x94/0xa0
[    2.901690]  genpd_unlock_spin+0x20/0x30
[    2.905724]  genpd_add_device+0x100/0x2d0
[    2.909850]  __genpd_dev_pm_attach+0xa8/0x23c
[    2.914329]  genpd_dev_pm_attach_by_id+0xc4/0x190
[    2.919167]  genpd_dev_pm_attach_by_name+0x3c/0xd0
[    2.924086]  dev_pm_domain_attach_by_name+0x24/0x30
[    2.929102]  psci_dt_attach_cpu+0x24/0x90
[    2.933230]  psci_cpuidle_probe+0x2d4/0x46c
[    2.937534]  platform_probe+0x68/0xe0
[    2.941304]  really_probe.part.0+0x9c/0x2fc
[    2.945605]  __driver_probe_device+0x98/0x144
[    2.950085]  driver_probe_device+0x44/0x15c
[    2.954385]  __device_attach_driver+0xb8/0x120
[    2.958950]  bus_for_each_drv+0x78/0xd0
[    2.962896]  __device_attach+0xd8/0x180
[    2.966843]  device_initial_probe+0x14/0x20
[    2.971144]  bus_probe_device+0x9c/0xa4
[    2.975092]  device_add+0x380/0x88c
[    2.978679]  platform_device_add+0x114/0x234
[    2.983067]  platform_device_register_full+0x100/0x190
[    2.988344]  psci_idle_init+0x6c/0xb0
[    2.992113]  do_one_initcall+0x74/0x3a0
[    2.996060]  kernel_init_freeable+0x2fc/0x384
[    3.000543]  kernel_init+0x28/0x130
[    3.004132]  ret_from_fork+0x10/0x20
[    3.007817] irq event stamp: 319826
[    3.011404] hardirqs last  enabled at (319825): [<ffffd40e7eda0268>] __up_console_sem+0x78/0x84
[    3.020332] hardirqs last disabled at (319826): [<ffffd40e7fd6d9d8>] el1_dbg+0x24/0x8c
[    3.028458] softirqs last  enabled at (318312): [<ffffd40e7ec90410>] _stext+0x410/0x588
[    3.036678] softirqs last disabled at (318299): [<ffffd40e7ed1bf68>] __irq_exit_rcu+0x158/0x174
[    3.045607] ---[ end trace 0000000000000000 ]---

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/base/power/main.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 04ea92cbd9cf..08c8a69d7b81 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -2018,7 +2018,9 @@ static bool pm_ops_is_empty(const struct dev_pm_ops *ops)
 
 void device_pm_check_callbacks(struct device *dev)
 {
-	spin_lock_irq(&dev->power.lock);
+	unsigned long flags;
+
+	spin_lock_irqsave(&dev->power.lock, flags);
 	dev->power.no_pm_callbacks =
 		(!dev->bus || (pm_ops_is_empty(dev->bus->pm) &&
 		 !dev->bus->suspend && !dev->bus->resume)) &&
@@ -2027,7 +2029,7 @@ void device_pm_check_callbacks(struct device *dev)
 		(!dev->pm_domain || pm_ops_is_empty(&dev->pm_domain->ops)) &&
 		(!dev->driver || (pm_ops_is_empty(dev->driver->pm) &&
 		 !dev->driver->suspend && !dev->driver->resume));
-	spin_unlock_irq(&dev->power.lock);
+	spin_unlock_irqrestore(&dev->power.lock, flags);
 }
 
 bool dev_pm_skip_suspend(struct device *dev)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 38/43] parisc: Fix non-access data TLB cache flush faults
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (35 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 37/43] PM: core: keep irq flags in device_pm_check_callbacks() Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 39/43] parisc: Fix handling off probe non-access faults Sasha Levin
                   ` (4 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: John David Anglin, Helge Deller, Sasha Levin, James.Bottomley,
	svens, ira.weiny, akpm, linux-parisc

From: John David Anglin <dave.anglin@bell.net>

[ Upstream commit f839e5f1cef36ce268950c387129b1bfefdaebc9 ]

When a page is not present, we get non-access data TLB faults from
the fdc and fic instructions in flush_user_dcache_range_asm and
flush_user_icache_range_asm. When these occur, the cache line is
not invalidated and potentially we get memory corruption. The
problem was hidden by the nullification of the flush instructions.

These faults also affect performance. With pa8800/pa8900 processors,
there will be 32 faults per 4 KB page since the cache line is 128
bytes.  There will be more faults with earlier processors.

The problem is fixed by using flush_cache_pages(). It does the flush
using a tmp alias mapping.

The flush_cache_pages() call in flush_cache_range() flushed too
large a range.

V2: Remove unnecessary preempt_disable() and preempt_enable() calls.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/parisc/kernel/cache.c | 28 +---------------------------
 1 file changed, 1 insertion(+), 27 deletions(-)

diff --git a/arch/parisc/kernel/cache.c b/arch/parisc/kernel/cache.c
index 94150b91c96f..bce71cefe572 100644
--- a/arch/parisc/kernel/cache.c
+++ b/arch/parisc/kernel/cache.c
@@ -558,15 +558,6 @@ static void flush_cache_pages(struct vm_area_struct *vma, struct mm_struct *mm,
 	}
 }
 
-static void flush_user_cache_tlb(struct vm_area_struct *vma,
-				 unsigned long start, unsigned long end)
-{
-	flush_user_dcache_range_asm(start, end);
-	if (vma->vm_flags & VM_EXEC)
-		flush_user_icache_range_asm(start, end);
-	flush_tlb_range(vma, start, end);
-}
-
 void flush_cache_mm(struct mm_struct *mm)
 {
 	struct vm_area_struct *vma;
@@ -581,17 +572,8 @@ void flush_cache_mm(struct mm_struct *mm)
 		return;
 	}
 
-	preempt_disable();
-	if (mm->context == mfsp(3)) {
-		for (vma = mm->mmap; vma; vma = vma->vm_next)
-			flush_user_cache_tlb(vma, vma->vm_start, vma->vm_end);
-		preempt_enable();
-		return;
-	}
-
 	for (vma = mm->mmap; vma; vma = vma->vm_next)
 		flush_cache_pages(vma, mm, vma->vm_start, vma->vm_end);
-	preempt_enable();
 }
 
 void flush_cache_range(struct vm_area_struct *vma,
@@ -605,15 +587,7 @@ void flush_cache_range(struct vm_area_struct *vma,
 		return;
 	}
 
-	preempt_disable();
-	if (vma->vm_mm->context == mfsp(3)) {
-		flush_user_cache_tlb(vma, start, end);
-		preempt_enable();
-		return;
-	}
-
-	flush_cache_pages(vma, vma->vm_mm, vma->vm_start, vma->vm_end);
-	preempt_enable();
+	flush_cache_pages(vma, vma->vm_mm, start, end);
 }
 
 void
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 39/43] parisc: Fix handling off probe non-access faults
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (36 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 38/43] parisc: Fix non-access data TLB cache flush faults Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 40/43] nvme-tcp: lockdep: annotate in-kernel sockets Sasha Levin
                   ` (3 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: John David Anglin, Helge Deller, Sasha Levin, James.Bottomley,
	svens, rmk+kernel, akpm, ebiederm, wangkefeng.wang, zhengqi.arch,
	linux-parisc

From: John David Anglin <dave.anglin@bell.net>

[ Upstream commit e00b0a2ab8ec019c344e53bfc76e31c18bb587b7 ]

Currently, the parisc kernel does not fully support non-access TLB
fault handling for probe instructions. In the fast path, we set the
target register to zero if it is not a shadowed register. The slow
path is not implemented, so we call do_page_fault. The architecture
indicates that non-access faults should not cause a page fault from
disk.

This change adds to code to provide non-access fault support for
probe instructions. It also modifies the handling of faults on
userspace so that if the address lies in a valid VMA and the access
type matches that for the VMA, the probe target register is set to
one. Otherwise, the target register is set to zero.

This was done to make probe instructions more useful for userspace.
Probe instructions are not very useful if they set the target register
to zero whenever a page is not present in memory. Nominally, the
purpose of the probe instruction is determine whether read or write
access to a given address is allowed.

This fixes a problem in function pointer comparison noticed in the
glibc testsuite (stdio-common/tst-vfprintf-user-type). The same
problem is likely in glibc (_dl_lookup_address).

V2 adds flush and lpa instruction support to handle_nadtlb_fault.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/parisc/include/asm/traps.h |  1 +
 arch/parisc/kernel/traps.c      |  2 +
 arch/parisc/mm/fault.c          | 89 +++++++++++++++++++++++++++++++++
 3 files changed, 92 insertions(+)

diff --git a/arch/parisc/include/asm/traps.h b/arch/parisc/include/asm/traps.h
index 34619f010c63..0ccdb738a9a3 100644
--- a/arch/parisc/include/asm/traps.h
+++ b/arch/parisc/include/asm/traps.h
@@ -18,6 +18,7 @@ unsigned long parisc_acctyp(unsigned long code, unsigned int inst);
 const char *trap_name(unsigned long code);
 void do_page_fault(struct pt_regs *regs, unsigned long code,
 		unsigned long address);
+int handle_nadtlb_fault(struct pt_regs *regs);
 #endif
 
 #endif
diff --git a/arch/parisc/kernel/traps.c b/arch/parisc/kernel/traps.c
index b6fdebddc8e9..39576a9245c7 100644
--- a/arch/parisc/kernel/traps.c
+++ b/arch/parisc/kernel/traps.c
@@ -662,6 +662,8 @@ void notrace handle_interruption(int code, struct pt_regs *regs)
 			 by hand. Technically we need to emulate:
 			 fdc,fdce,pdc,"fic,4f",prober,probeir,probew, probeiw
 		*/
+		if (code == 17 && handle_nadtlb_fault(regs))
+			return;
 		fault_address = regs->ior;
 		fault_space = regs->isr;
 		break;
diff --git a/arch/parisc/mm/fault.c b/arch/parisc/mm/fault.c
index e9eabf8f14d7..f114e102aaf2 100644
--- a/arch/parisc/mm/fault.c
+++ b/arch/parisc/mm/fault.c
@@ -425,3 +425,92 @@ void do_page_fault(struct pt_regs *regs, unsigned long code,
 	}
 	pagefault_out_of_memory();
 }
+
+/* Handle non-access data TLB miss faults.
+ *
+ * For probe instructions, accesses to userspace are considered allowed
+ * if they lie in a valid VMA and the access type matches. We are not
+ * allowed to handle MM faults here so there may be situations where an
+ * actual access would fail even though a probe was successful.
+ */
+int
+handle_nadtlb_fault(struct pt_regs *regs)
+{
+	unsigned long insn = regs->iir;
+	int breg, treg, xreg, val = 0;
+	struct vm_area_struct *vma, *prev_vma;
+	struct task_struct *tsk;
+	struct mm_struct *mm;
+	unsigned long address;
+	unsigned long acc_type;
+
+	switch (insn & 0x380) {
+	case 0x280:
+		/* FDC instruction */
+		fallthrough;
+	case 0x380:
+		/* PDC and FIC instructions */
+		if (printk_ratelimit()) {
+			pr_warn("BUG: nullifying cache flush/purge instruction\n");
+			show_regs(regs);
+		}
+		if (insn & 0x20) {
+			/* Base modification */
+			breg = (insn >> 21) & 0x1f;
+			xreg = (insn >> 16) & 0x1f;
+			if (breg && xreg)
+				regs->gr[breg] += regs->gr[xreg];
+		}
+		regs->gr[0] |= PSW_N;
+		return 1;
+
+	case 0x180:
+		/* PROBE instruction */
+		treg = insn & 0x1f;
+		if (regs->isr) {
+			tsk = current;
+			mm = tsk->mm;
+			if (mm) {
+				/* Search for VMA */
+				address = regs->ior;
+				mmap_read_lock(mm);
+				vma = find_vma_prev(mm, address, &prev_vma);
+				mmap_read_unlock(mm);
+
+				/*
+				 * Check if access to the VMA is okay.
+				 * We don't allow for stack expansion.
+				 */
+				acc_type = (insn & 0x40) ? VM_WRITE : VM_READ;
+				if (vma
+				    && address >= vma->vm_start
+				    && (vma->vm_flags & acc_type) == acc_type)
+					val = 1;
+			}
+		}
+		if (treg)
+			regs->gr[treg] = val;
+		regs->gr[0] |= PSW_N;
+		return 1;
+
+	case 0x300:
+		/* LPA instruction */
+		if (insn & 0x20) {
+			/* Base modification */
+			breg = (insn >> 21) & 0x1f;
+			xreg = (insn >> 16) & 0x1f;
+			if (breg && xreg)
+				regs->gr[breg] += regs->gr[xreg];
+		}
+		treg = insn & 0x1f;
+		if (treg)
+			regs->gr[treg] = 0;
+		regs->gr[0] |= PSW_N;
+		return 1;
+
+	default:
+		break;
+	}
+
+	return 0;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 40/43] nvme-tcp: lockdep: annotate in-kernel sockets
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (37 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 39/43] parisc: Fix handling off probe non-access faults Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 41/43] spi: tegra20: Use of_device_get_match_data() Sasha Levin
                   ` (2 subsequent siblings)
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Chris Leech, Christoph Hellwig, Sasha Levin, kbusch, axboe, sagi,
	linux-nvme

From: Chris Leech <cleech@redhat.com>

[ Upstream commit 841aee4d75f18fdfb53935080b03de0c65e9b92c ]

Put NVMe/TCP sockets in their own class to avoid some lockdep warnings.
Sockets created by nvme-tcp are not exposed to user-space, and will not
trigger certain code paths that the general socket API exposes.

Lockdep complains about a circular dependency between the socket and
filesystem locks, because setsockopt can trigger a page fault with a
socket lock held, but nvme-tcp sends requests on the socket while file
system locks are held.

  ======================================================
  WARNING: possible circular locking dependency detected
  5.15.0-rc3 #1 Not tainted
  ------------------------------------------------------
  fio/1496 is trying to acquire lock:
  (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendpage+0x23/0x80

  but task is already holding lock:
  (&xfs_dir_ilock_class/5){+.+.}-{3:3}, at: xfs_ilock+0xcf/0x290 [xfs]

  which lock already depends on the new lock.

  other info that might help us debug this:

  chain exists of:
   sk_lock-AF_INET --> sb_internal --> &xfs_dir_ilock_class/5

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&xfs_dir_ilock_class/5);
                                lock(sb_internal);
                                lock(&xfs_dir_ilock_class/5);
   lock(sk_lock-AF_INET);

  *** DEADLOCK ***

  6 locks held by fio/1496:
   #0: (sb_writers#13){.+.+}-{0:0}, at: path_openat+0x9fc/0xa20
   #1: (&inode->i_sb->s_type->i_mutex_dir_key){++++}-{3:3}, at: path_openat+0x296/0xa20
   #2: (sb_internal){.+.+}-{0:0}, at: xfs_trans_alloc_icreate+0x41/0xd0 [xfs]
   #3: (&xfs_dir_ilock_class/5){+.+.}-{3:3}, at: xfs_ilock+0xcf/0x290 [xfs]
   #4: (hctx->srcu){....}-{0:0}, at: hctx_lock+0x51/0xd0
   #5: (&queue->send_mutex){+.+.}-{3:3}, at: nvme_tcp_queue_rq+0x33e/0x380 [nvme_tcp]

This annotation lets lockdep analyze nvme-tcp controlled sockets
independently of what the user-space sockets API does.

Link: https://lore.kernel.org/linux-nvme/CAHj4cs9MDYLJ+q+2_GXUK9HxFizv2pxUryUR0toX974M040z7g@mail.gmail.com/

Signed-off-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/nvme/host/tcp.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index 65e00c64a588..d66e2de044e0 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -30,6 +30,44 @@ static int so_priority;
 module_param(so_priority, int, 0644);
 MODULE_PARM_DESC(so_priority, "nvme tcp socket optimize priority");
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+/* lockdep can detect a circular dependency of the form
+ *   sk_lock -> mmap_lock (page fault) -> fs locks -> sk_lock
+ * because dependencies are tracked for both nvme-tcp and user contexts. Using
+ * a separate class prevents lockdep from conflating nvme-tcp socket use with
+ * user-space socket API use.
+ */
+static struct lock_class_key nvme_tcp_sk_key[2];
+static struct lock_class_key nvme_tcp_slock_key[2];
+
+static void nvme_tcp_reclassify_socket(struct socket *sock)
+{
+	struct sock *sk = sock->sk;
+
+	if (WARN_ON_ONCE(!sock_allow_reclassification(sk)))
+		return;
+
+	switch (sk->sk_family) {
+	case AF_INET:
+		sock_lock_init_class_and_name(sk, "slock-AF_INET-NVME",
+					      &nvme_tcp_slock_key[0],
+					      "sk_lock-AF_INET-NVME",
+					      &nvme_tcp_sk_key[0]);
+		break;
+	case AF_INET6:
+		sock_lock_init_class_and_name(sk, "slock-AF_INET6-NVME",
+					      &nvme_tcp_slock_key[1],
+					      "sk_lock-AF_INET6-NVME",
+					      &nvme_tcp_sk_key[1]);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+	}
+}
+#else
+static void nvme_tcp_reclassify_socket(struct socket *sock) { }
+#endif
+
 enum nvme_tcp_send_state {
 	NVME_TCP_SEND_CMD_PDU = 0,
 	NVME_TCP_SEND_H2C_PDU,
@@ -1469,6 +1507,8 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nctrl,
 		goto err_destroy_mutex;
 	}
 
+	nvme_tcp_reclassify_socket(queue->sock);
+
 	/* Single syn retry */
 	tcp_sock_set_syncnt(queue->sock->sk, 1);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 41/43] spi: tegra20: Use of_device_get_match_data()
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (38 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 40/43] nvme-tcp: lockdep: annotate in-kernel sockets Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 42/43] Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag" Sasha Levin
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 43/43] spi: fsi: Implement a timeout for polling status Sasha Levin
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Minghao Chi, Zeal Robot, Mark Brown, Sasha Levin, ldewangan,
	thierry.reding, jonathanh, linux-spi, linux-tegra

From: Minghao Chi <chi.minghao@zte.com.cn>

[ Upstream commit c9839acfcbe20ce43d363c2a9d0772472d9921c0 ]

Use of_device_get_match_data() to simplify the code.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Link: https://lore.kernel.org/r/20220315023138.2118293-1-chi.minghao@zte.com.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/spi/spi-tegra20-slink.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/spi/spi-tegra20-slink.c b/drivers/spi/spi-tegra20-slink.c
index 2a03739a0c60..80c3787deea9 100644
--- a/drivers/spi/spi-tegra20-slink.c
+++ b/drivers/spi/spi-tegra20-slink.c
@@ -1006,14 +1006,8 @@ static int tegra_slink_probe(struct platform_device *pdev)
 	struct resource		*r;
 	int ret, spi_irq;
 	const struct tegra_slink_chip_data *cdata = NULL;
-	const struct of_device_id *match;
 
-	match = of_match_device(tegra_slink_of_match, &pdev->dev);
-	if (!match) {
-		dev_err(&pdev->dev, "Error: No device match found\n");
-		return -ENODEV;
-	}
-	cdata = match->data;
+	cdata = of_device_get_match_data(&pdev->dev);
 
 	master = spi_alloc_master(&pdev->dev, sizeof(*tspi));
 	if (!master) {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 42/43] Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag"
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (39 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 41/43] spi: tegra20: Use of_device_get_match_data() Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  2022-07-07 21:30   ` Tom Crossland
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 43/43] spi: fsi: Implement a timeout for polling status Sasha Levin
  41 siblings, 1 reply; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Rafael J. Wysocki, Mario Limonciello, Mario Limonciello,
	Huang Rui, Mika Westerberg, Sasha Levin, rafael, linux-acpi

From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>

[ Upstream commit 2ca8e6285250c07a2e5a22ecbfd59b5a4ef73484 ]

Revert commit 159d8c274fd9 ("ACPI: Pass the same capabilities to the
_OSC regardless of the query flag") which caused legitimate usage
scenarios (when the platform firmware does not want the OS to control
certain platform features controlled by the system bus scope _OSC) to
break and was misguided by some misleading language in the _OSC
definition in the ACPI specification (in particular, Section 6.2.11.1.3
"Sequence of _OSC Calls" that contradicts other perts of the _OSC
definition).

Link: https://lore.kernel.org/linux-acpi/CAJZ5v0iStA0JmO0H3z+VgQsVuQONVjKPpw0F5HKfiq=Gb6B5yw@mail.gmail.com
Reported-by: Mario Limonciello <Mario.Limonciello@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/acpi/bus.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 07f604832fd6..079b952ab59f 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -332,21 +332,32 @@ static void acpi_bus_osc_negotiate_platform_control(void)
 	if (ACPI_FAILURE(acpi_run_osc(handle, &context)))
 		return;
 
-	kfree(context.ret.pointer);
+	capbuf_ret = context.ret.pointer;
+	if (context.ret.length <= OSC_SUPPORT_DWORD) {
+		kfree(context.ret.pointer);
+		return;
+	}
 
-	/* Now run _OSC again with query flag clear */
+	/*
+	 * Now run _OSC again with query flag clear and with the caps
+	 * supported by both the OS and the platform.
+	 */
 	capbuf[OSC_QUERY_DWORD] = 0;
+	capbuf[OSC_SUPPORT_DWORD] = capbuf_ret[OSC_SUPPORT_DWORD];
+	kfree(context.ret.pointer);
 
 	if (ACPI_FAILURE(acpi_run_osc(handle, &context)))
 		return;
 
 	capbuf_ret = context.ret.pointer;
-	osc_sb_apei_support_acked =
-		capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_APEI_SUPPORT;
-	osc_pc_lpi_support_confirmed =
-		capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_PCLPI_SUPPORT;
-	osc_sb_native_usb4_support_confirmed =
-		capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_NATIVE_USB4_SUPPORT;
+	if (context.ret.length > OSC_SUPPORT_DWORD) {
+		osc_sb_apei_support_acked =
+			capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_APEI_SUPPORT;
+		osc_pc_lpi_support_confirmed =
+			capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_PCLPI_SUPPORT;
+		osc_sb_native_usb4_support_confirmed =
+			capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_NATIVE_USB4_SUPPORT;
+	}
 
 	kfree(context.ret.pointer);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* [PATCH AUTOSEL 5.17 43/43] spi: fsi: Implement a timeout for polling status
  2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
                   ` (40 preceding siblings ...)
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 42/43] Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag" Sasha Levin
@ 2022-03-28 11:18 ` Sasha Levin
  41 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-28 11:18 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Eddie James, Mark Brown, Sasha Levin, linux-spi

From: Eddie James <eajames@linux.ibm.com>

[ Upstream commit 89b35e3f28514087d3f1e28e8f5634fbfd07c554 ]

The data transfer routines must poll the status register to
determine when more data can be shifted in or out. If the hardware
gets into a bad state, these polling loops may never exit. Prevent
this by returning an error if a timeout is exceeded.

Signed-off-by: Eddie James <eajames@linux.ibm.com>
Link: https://lore.kernel.org/r/20220317211426.38940-1-eajames@linux.ibm.com
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/spi/spi-fsi.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/spi/spi-fsi.c b/drivers/spi/spi-fsi.c
index b6c7467f0b59..d403a7a3021d 100644
--- a/drivers/spi/spi-fsi.c
+++ b/drivers/spi/spi-fsi.c
@@ -25,6 +25,7 @@
 
 #define SPI_FSI_BASE			0x70000
 #define SPI_FSI_INIT_TIMEOUT_MS		1000
+#define SPI_FSI_STATUS_TIMEOUT_MS	100
 #define SPI_FSI_MAX_RX_SIZE		8
 #define SPI_FSI_MAX_TX_SIZE		40
 
@@ -299,6 +300,7 @@ static int fsi_spi_transfer_data(struct fsi_spi *ctx,
 				 struct spi_transfer *transfer)
 {
 	int rc = 0;
+	unsigned long end;
 	u64 status = 0ULL;
 
 	if (transfer->tx_buf) {
@@ -315,10 +317,14 @@ static int fsi_spi_transfer_data(struct fsi_spi *ctx,
 			if (rc)
 				return rc;
 
+			end = jiffies + msecs_to_jiffies(SPI_FSI_STATUS_TIMEOUT_MS);
 			do {
 				rc = fsi_spi_status(ctx, &status, "TX");
 				if (rc)
 					return rc;
+
+				if (time_after(jiffies, end))
+					return -ETIMEDOUT;
 			} while (status & SPI_FSI_STATUS_TDR_FULL);
 
 			sent += nb;
@@ -329,10 +335,14 @@ static int fsi_spi_transfer_data(struct fsi_spi *ctx,
 		u8 *rx = transfer->rx_buf;
 
 		while (transfer->len > recv) {
+			end = jiffies + msecs_to_jiffies(SPI_FSI_STATUS_TIMEOUT_MS);
 			do {
 				rc = fsi_spi_status(ctx, &status, "RX");
 				if (rc)
 					return rc;
+
+				if (time_after(jiffies, end))
+					return -ETIMEDOUT;
 			} while (!(status & SPI_FSI_STATUS_RDR_FULL));
 
 			rc = fsi_spi_read_reg(ctx, SPI_FSI_DATA_RX, &in);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH AUTOSEL 5.17 29/43] signal, x86: Delay calling signals in atomic on RT enabled kernels
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 29/43] signal, x86: Delay calling signals in atomic on RT enabled kernels Sasha Levin
@ 2022-03-28 14:31   ` Eric W. Biederman
  2022-03-28 16:35     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 63+ messages in thread
From: Eric W. Biederman @ 2022-03-28 14:31 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel, stable, Oleg Nesterov, Steven Rostedt,
	Thomas Gleixner, Sebastian Andrzej Siewior, mingo, bp,
	dave.hansen, x86, peterz, juri.lelli, vincent.guittot, luto,
	frederic, mark.rutland, valentin.schneider, keescook, elver,
	legion


Thank you for cc'ing me.  You probably want to hold off on back-porting
this patch.  The appropriate fix requires some more conversation.

At a mininum this patch should not be using TIF_NOTIFY_RESUME.

Eric



Sasha Levin <sashal@kernel.org> writes:

> From: Oleg Nesterov <oleg@redhat.com>
>
> [ Upstream commit bf9ad37dc8a30cce22ae95d6c2ca6abf8731d305 ]
>
> On x86_64 we must disable preemption before we enable interrupts
> for stack faults, int3 and debugging, because the current task is using
> a per CPU debug stack defined by the IST. If we schedule out, another task
> can come in and use the same stack and cause the stack to be corrupted
> and crash the kernel on return.
>
> When CONFIG_PREEMPT_RT is enabled, spinlock_t locks become sleeping, and
> one of these is the spin lock used in signal handling.
>
> Some of the debug code (int3) causes do_trap() to send a signal.
> This function calls a spinlock_t lock that has been converted to a
> sleeping lock. If this happens, the above issues with the corrupted
> stack is possible.
>
> Instead of calling the signal right away, for PREEMPT_RT and x86,
> the signal information is stored on the stacks task_struct and
> TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
> code will send the signal when preemption is enabled.
>
> [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
>   ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
> [bigeasy: Add on 32bit as per Yang Shi, minor rewording. ]
> [ tglx: Use a config option ]
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Link: https://lore.kernel.org/r/Ygq5aBB/qMQw6aP5@linutronix.de
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>  arch/x86/Kconfig       |  1 +
>  include/linux/sched.h  |  3 +++
>  kernel/Kconfig.preempt | 12 +++++++++++-
>  kernel/entry/common.c  | 14 ++++++++++++++
>  kernel/signal.c        | 40 ++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 69 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 9f5bd41bf660..d557ac29b6cd 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -120,6 +120,7 @@ config X86
>  	select ARCH_WANTS_NO_INSTR
>  	select ARCH_WANT_HUGE_PMD_SHARE
>  	select ARCH_WANT_LD_ORPHAN_WARN
> +	select ARCH_WANTS_RT_DELAYED_SIGNALS
>  	select ARCH_WANTS_THP_SWAP		if X86_64
>  	select ARCH_HAS_PARANOID_L1D_FLUSH
>  	select BUILDTIME_TABLE_SORT
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 75ba8aa60248..098e37fd770a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1087,6 +1087,9 @@ struct task_struct {
>  	/* Restored if set_restore_sigmask() was used: */
>  	sigset_t			saved_sigmask;
>  	struct sigpending		pending;
> +#ifdef CONFIG_RT_DELAYED_SIGNALS
> +	struct kernel_siginfo		forced_info;
> +#endif
>  	unsigned long			sas_ss_sp;
>  	size_t				sas_ss_size;
>  	unsigned int			sas_ss_flags;
> diff --git a/kernel/Kconfig.preempt b/kernel/Kconfig.preempt
> index ce77f0265660..5644abd5f8a8 100644
> --- a/kernel/Kconfig.preempt
> +++ b/kernel/Kconfig.preempt
> @@ -132,4 +132,14 @@ config SCHED_CORE
>  	  which is the likely usage by Linux distributions, there should
>  	  be no measurable impact on performance.
>  
> -
> +config ARCH_WANTS_RT_DELAYED_SIGNALS
> +	bool
> +	help
> +	  This option is selected by architectures where raising signals
> +	  can happen in atomic contexts on PREEMPT_RT enabled kernels. This
> +	  option delays raising the signal until the return to user space
> +	  loop where it is also delivered. X86 requires this to deliver
> +	  signals from trap handlers which run on IST stacks.
> +
> +config RT_DELAYED_SIGNALS
> +	def_bool PREEMPT_RT && ARCH_WANTS_RT_DELAYED_SIGNALS
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index bad713684c2e..0543a2c92f20 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -148,6 +148,18 @@ static void handle_signal_work(struct pt_regs *regs, unsigned long ti_work)
>  	arch_do_signal_or_restart(regs, ti_work & _TIF_SIGPENDING);
>  }
>  
> +#ifdef CONFIG_RT_DELAYED_SIGNALS
> +static inline void raise_delayed_signal(void)
> +{
> +	if (unlikely(current->forced_info.si_signo)) {
> +		force_sig_info(&current->forced_info);
> +		current->forced_info.si_signo = 0;
> +	}
> +}
> +#else
> +static inline void raise_delayed_signal(void) { }
> +#endif
> +
>  static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
>  					    unsigned long ti_work)
>  {
> @@ -162,6 +174,8 @@ static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
>  		if (ti_work & _TIF_NEED_RESCHED)
>  			schedule();
>  
> +		raise_delayed_signal();
> +
>  		if (ti_work & _TIF_UPROBE)
>  			uprobe_notify_resume(regs);
>  
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 9b04631acde8..e93de6daa188 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -1307,6 +1307,43 @@ enum sig_handler {
>  	HANDLER_EXIT,	 /* Only visible as the process exit code */
>  };
>  
> +/*
> + * On some archictectures, PREEMPT_RT has to delay sending a signal from a
> + * trap since it cannot enable preemption, and the signal code's
> + * spin_locks turn into mutexes. Instead, it must set TIF_NOTIFY_RESUME
> + * which will send the signal on exit of the trap.
> + */
> +#ifdef CONFIG_RT_DELAYED_SIGNALS
> +static inline bool force_sig_delayed(struct kernel_siginfo *info,
> +				     struct task_struct *t)
> +{
> +	if (!in_atomic())
> +		return false;
> +
> +	if (WARN_ON_ONCE(t->forced_info.si_signo))
> +		return true;
> +
> +	if (is_si_special(info)) {
> +		WARN_ON_ONCE(info != SEND_SIG_PRIV);
> +		t->forced_info.si_signo = info->si_signo;
> +		t->forced_info.si_errno = 0;
> +		t->forced_info.si_code = SI_KERNEL;
> +		t->forced_info.si_pid = 0;
> +		t->forced_info.si_uid = 0;
> +	} else {
> +		t->forced_info = *info;
> +	}
> +	set_tsk_thread_flag(t, TIF_NOTIFY_RESUME);
> +	return true;
> +}
> +#else
> +static inline bool force_sig_delayed(struct kernel_siginfo *info,
> +				     struct task_struct *t)
> +{
> +	return false;
> +}
> +#endif
> +
>  /*
>   * Force a signal that the process can't ignore: if necessary
>   * we unblock the signal and change any SIG_IGN to SIG_DFL.
> @@ -1327,6 +1364,9 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
>  	struct k_sigaction *action;
>  	int sig = info->si_signo;
>  
> +	if (force_sig_delayed(info, t))
> +		return 0;
> +
>  	spin_lock_irqsave(&t->sighand->siglock, flags);
>  	action = &t->sighand->action[sig-1];
>  	ignored = action->sa.sa_handler == SIG_IGN;

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH AUTOSEL 5.17 29/43] signal, x86: Delay calling signals in atomic on RT enabled kernels
  2022-03-28 14:31   ` Eric W. Biederman
@ 2022-03-28 16:35     ` Sebastian Andrzej Siewior
  2022-03-31 16:59       ` Sasha Levin
  0 siblings, 1 reply; 63+ messages in thread
From: Sebastian Andrzej Siewior @ 2022-03-28 16:35 UTC (permalink / raw)
  To: Eric W. Biederman, Sasha Levin
  Cc: linux-kernel, stable, Oleg Nesterov, Steven Rostedt,
	Thomas Gleixner, mingo, bp, dave.hansen, x86, peterz, juri.lelli,
	vincent.guittot, luto, frederic, mark.rutland,
	valentin.schneider, keescook, elver, legion

On 2022-03-28 09:31:51 [-0500], Eric W. Biederman wrote:
> 
> Thank you for cc'ing me.  You probably want to hold off on back-porting
> this patch.  The appropriate fix requires some more conversation.
> 
> At a mininum this patch should not be using TIF_NOTIFY_RESUME.

Sasha,

could you please drop this patch from the stable backports (5.15, 5.16, 5.17).

> Eric

Sebastian

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction Sasha Levin
@ 2022-03-28 18:08   ` Eric Biggers
  2022-03-28 18:34     ` Michael Brooks
                       ` (2 more replies)
       [not found]   ` <CAOnCY6RUN+CSwjsD6Vg-MDi7ERAj2kKLorMLGp1jE8dTZ+3cpQ@mail.gmail.com>
  2022-03-30 16:08   ` Michael Brooks
  2 siblings, 3 replies; 63+ messages in thread
From: Eric Biggers @ 2022-03-28 18:08 UTC (permalink / raw)
  To: Sasha Levin
  Cc: linux-kernel, stable, Jason A. Donenfeld, Theodore Ts'o,
	Dominik Brodowski, Greg Kroah-Hartman, Jean-Philippe Aumasson

On Mon, Mar 28, 2022 at 07:18:00AM -0400, Sasha Levin wrote:
> From: "Jason A. Donenfeld" <Jason@zx2c4.com>
> 
> [ Upstream commit 6e8ec2552c7d13991148e551e3325a624d73fac6 ]
> 

I don't think it's a good idea to start backporting random commits to random.c
that weren't marked for stable.  There were a lot of changes in v5.18, and
sometimes they relate to each other in subtle ways, so the individual commits
aren't necessarily safe to pick.

IMO, you shouldn't backport any non-stable-Cc'ed commits to random.c unless
Jason explicitly reviews the exact sequence of commits that you're backporting.

- Eric

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction
  2022-03-28 18:08   ` Eric Biggers
@ 2022-03-28 18:34     ` Michael Brooks
  2022-03-29  5:31     ` Jason A. Donenfeld
  2022-03-29 15:38     ` Theodore Ts'o
  2 siblings, 0 replies; 63+ messages in thread
From: Michael Brooks @ 2022-03-28 18:34 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Sasha Levin, linux-kernel, stable, Jason A. Donenfeld,
	Theodore Ts'o, Dominik Brodowski, Greg Kroah-Hartman,
	Jean-Philippe Aumasson

[-- Attachment #1: Type: text/plain, Size: 798 bytes --]

This patch is not ready, but I think it solves the issues Jason has brought up.

-Mike

On Mon, Mar 28, 2022 at 11:08 AM Eric Biggers <ebiggers@google.com> wrote:
>
> On Mon, Mar 28, 2022 at 07:18:00AM -0400, Sasha Levin wrote:
> > From: "Jason A. Donenfeld" <Jason@zx2c4.com>
> >
> > [ Upstream commit 6e8ec2552c7d13991148e551e3325a624d73fac6 ]
> >
>
> I don't think it's a good idea to start backporting random commits to random.c
> that weren't marked for stable.  There were a lot of changes in v5.18, and
> sometimes they relate to each other in subtle ways, so the individual commits
> aren't necessarily safe to pick.
>
> IMO, you shouldn't backport any non-stable-Cc'ed commits to random.c unless
> Jason explicitly reviews the exact sequence of commits that you're backporting.
>
> - Eric

[-- Attachment #2: keypoolrandom.patch --]
[-- Type: application/octet-stream, Size: 59858 bytes --]

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 40107f8b9e9e..95f334a824ba 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -7,7 +7,7 @@
  * This driver produces cryptographically secure pseudorandom data. It is divided
  * into roughly six sections, each with a section header:
  *
- *   - Initialization and readiness waiting.
+ *   - Initialization and readiness waiting.make
  *   - Fast key erasure RNG, the "crng".
  *   - Entropy accumulation and extraction routines.
  *   - Entropy collection routines.
@@ -60,6 +60,7 @@
 #include <asm/irq_regs.h>
 #include <asm/io.h>
 
+ 
 /*********************************************************************
  *
  * Initialization and readiness waiting.
@@ -95,6 +96,517 @@ static int ratelimit_disable __read_mostly;
 module_param_named(ratelimit_disable, ratelimit_disable, int, 0644);
 MODULE_PARM_DESC(ratelimit_disable, "Disable random ratelimit suppression");
 
+////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
+//  CONSTANTS
+////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
+#define KEYPOOL_SIZE        1024
+// BLOCK_SIZE is 256, 
+// but maybe 128 is better for what we are doing.
+#define POOL_SIZE           BLOCK_SIZE * 4
+#define POOL_SIZE_BITS      BLOCK_SIZE * 8
+#define OUTPUT_POOL_SHIFT	10
+#define INPUT_POOL_WORDS	(1 << (INPUT_POOL_SHIFT-5))
+#define OUTPUT_POOL_WORDS	(1 << (OUTPUT_POOL_SHIFT-5))
+
+//Global runtime entropy
+uint8_t runtime_entropy[POOL_SIZE] __latent_entropy;
+
+// All primes less than 128, used in jumptable multiplcation.  
+// I don't think we need primes up to 1024 becuase our pool size isn't that large
+// But, this is debateable 
+const int keypool_primes[] = {3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127};//, 131, 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193, 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 257, 263, 269, 271, 277, 281, 283, 293, 307, 311, 313, 317, 331, 337, 347, 349, 353, 359, 367, 373, 379, 383, 389, 397, 401, 409, 419, 421, 431, 433, 439, 443, 449, 457, 461, 463, 467, 479, 487, 491, 499, 503, 509, 521, 523, 541, 547, 557, 563, 569, 571, 577, 587, 593, 599, 601, 607, 613, 617, 619, 631, 641, 643, 647, 653, 659, 661, 673, 677, 683, 691, 701, 709, 719, 727, 733, 739, 743, 751, 757, 761, 769, 773, 787, 797, 809, 811, 821, 823, 827, 829, 839, 853, 857, 859, 863, 877, 881, 883, 887, 907, 911, 919, 929, 937, 941, 947, 953, 967, 971, 977, 983, 991, 997, 1009, 1013, 1019, 1021};
+
+static ssize_t extract_crng_user(uint8_t *__user_buf, size_t nbytes);
+static void crng_reseed(uint8_t *crng_pool, size_t nbytes);
+void _unique_iid(u64 uu_key[], u64 gatekey, size_t nbytes, int rotate);
+u64 _alternate_rand(void);
+
+// By the pidgenhole priciple, no two callers can have the same gatekey.
+// If you have the same caller address and the same jiffies - then are you the same source. 
+// any construction for a gatekey is valid so long as principle above holds because it substitutes the needs for a mutex.
+#ifndef __make_gatekey
+  #define __make_gatekey(new_key)((u64)jiffies ^ (u64)_RET_IP_ ^ (u64)new_key ^ (u64)_THIS_IP_) // ^ _alternate_rand()
+  //todo: 32bit?
+  //#else
+  //#define _gatekey(new_key)((u32)_RET_IP_ << 32 | ((u32)&new_key ^ (u32)_THIS_IP_))|((u32)_THIS_IP_ << 32 | (u32)&new_key);
+  //#endif
+#endif
+
+
+// Rotate bits
+// Bits are not lost so there isn't loss of uniquness.
+uint64_t rotl64 ( uint64_t x, int8_t r )
+{
+  r = r % 64;
+  return (x << r) | (x >> (64 - r));
+}
+
+
+/* Add two buffers to compound uncertainty
+ *
+ * _add_unique() will spray bytes across the pool evenly to create a filed of possibilities
+ * With each jump more uncertainty is introduced to this field.
+ * With this shuffling strategy an attacker is forced to work harder, 
+ * and it is O(n) to copy bytes using a jump table or with a linear copy.
+ *
+ * This shuffling strategy was built to support the volume of writes created by handle_irq_event_percpu()
+ * There is no lock over the keypool, and all writes are done via atomic XOR operations.
+ * Even if a write was lost do to a race condition, it would be difficult to determine what was kept and was wasn't.
+ * Any effect of a race condition would make it even harder to reconstruct the keypool state.
+ * 
+ */
+void _add_unique(uint8_t keypool[], int keypool_size, uint64_t gatekey, uint8_t unique[], int unique_size, int nbytes)
+{
+  int step;
+  // Write in the first byte that is read by _get_unique() which is in 64 bits.
+  int next_jump = (gatekey * 8) % (keypool_size / 8);
+  // Copy bytes with a jump table - O(n)
+  for(step = 0; step < nbytes; step++)
+  {
+    // Check if there is somthing to add.
+    if(unique[step] != 0){
+      // Every byte within keypool_size can be written to at the same time without loosing a write.
+      keypool[next_jump] ^= unique[step];
+      // Save off the jump address before we change it. 
+      next_jump ^= keypool[next_jump];
+      // Circular buffer
+      next_jump = keypool_size % keypool_size;
+    }
+  }
+  // Leave no trace
+  gatekey = 0;
+  next_jump = 0;
+}
+
+/*
+ * The goal of any _get_unique_csprng* is to produce an unpredictable I.I.D. stream
+ * _get_unique() is meant to be as difficult to predict as possible but,
+ * it is not fully I.I.D. - and it doesn't need to be.
+ * 
+*/
+// Todo: this works but needs to be ported to AES-NI
+// AES-NI massivly outperforms chacha20 in a `openssl speedtest`
+// Also AES-NI is unlikley to have the kind of backdoor that would undermine this consturciotn.
+// paranoid users can configure the kernel to use pure-software to avoid any idological battles.
+// handset makers will obviously prefer AES-NI because it will provide better battery life.
+/*
+void _get_unique_csprng_aes_ni(u8 uu_key[], u64 gatekey, size_t nbytes, int rotate)
+{
+  struct AES_ctx ctx;
+  uint8_t aes_key_material[BLOCK_SIZE * 3] __latent_entropy;
+  uint8_t *aes_key = aes_key_material;
+  uint8_t *aes_iv = aes_key_material + BLOCK_SIZE;
+  uint8_t *aes_block = aes_key_material + BLOCK_SIZE * 2;
+  uint64_t *aes_block_rotate = (uint64_t *)aes_block;
+  uint64_t *jump_rotate = (uint64_t *) &runtime_entropy;
+  size_t jump_rotate_size = KEYPOOL_SIZE / 8;
+  size_t amount_left = nbytes;
+  size_t chunk = 0;
+  // Get a new key, iv and preimage from the entropy pool:
+  _get_unique(runtime_entropy, KEYPOOL_SIZE, gatekey, aes_key_material, sizeof(aes_key_material));
+  // Cover our tracks
+  // Make sure this gatekey + entry location can never be reused:
+  // No two accessors can generate the same gatekey so this is threadsafe.
+  _add_unique(runtime_entropy, POOL_SIZE, gatekey, aes_block_rotate, sizeof(gatekey), sizeof(gatekey));
+  // Pull 64bits at a time out of the ring function
+  while( amount_left > 0 )
+  {
+    // account for sizes that are not evenly divisable by BLOCK_SIZE.
+    chunk = __min(amount_left, BLOCK_SIZE);
+    // Populate our cipher struct
+    AES_init_ctx_iv(&ctx, aes_key, aes_iv);
+    // Encrypt one block with AES-CBC-128:
+    AES_CBC_encrypt_buffer(&ctx, aes_block, BLOCK_SIZE);
+    // Copy the first 64bits to the user:
+    memcpy(uu_key, aes_block, chunk);
+    amount_left -= chunk;
+    if(amount_left > 0)
+    {
+      // move our copy destination
+      uu_key += chunk;
+      if(rotate)
+      {
+        // Rotate the key material with the output so that similar keys are never reused:
+        _add_unique(aes_key_material, BLOCK_SIZE*3, gatekey, aes_block, BLOCK_SIZE, BLOCK_SIZE);
+      }
+      // The ciphertext from the previous call to aes() is the plaintext for the next invocation.
+    }
+  }
+  // Cleanup the secrets used
+  memzero_explicit(&aes_key_material, BLOCK_SIZE*3);
+  gatekey ^= gatekey;
+}
+*/
+
+/*
+ * Obtain a uniqueness from the keypool
+ *
+ * A lock isn't needed because no two threads will be able to follow the same path.
+ * We assume this holds true due the pidgen hole principle behind the gatekey generation.
+ * 
+ * This method is linear O(n), and we want to force our attacker into an exponet.
+ * KEYPOOL_SIZE * bites is possible entry points (1024*8)
+ * We have four combinations of these; (1024*8)^4
+ *  - making a total of 2^52 possible combinations for any given keypool.
+ *
+ * The gatekey and state of the keypool is used to derive 4 jump distinct points.
+ * It is like taking two MRI scans of a sand castle, then putting them in a XOR killidiscope.
+ *
+ * Constrants:
+ *   Each of the four layers must be unique, to prevent a^a=0
+ *   Make sure our jump path to choose layers is distinct from other parallell invocations
+ *   To prevent future repeats of a jump path we overwite our jump index
+ * 
+ */
+void _get_unique(uint8_t *keypool, int keypool_size, u64 gatekey, uint8_t *unique, size_t nbytes)
+{
+  size_t step;
+  uint64_t *keyspace = (uint64_t *) keypool;
+  uint64_t *product = (uint64_t *) unique;
+  // We extract out 64bits at a time for performance.
+  int64_t keypool_size_64 = keypool_size / 8;
+  uint8_t gate_position = (uint8_t) gatekey % keypool_size_64;
+  uint8_t  jump_offset;
+  // We need to seed the process with our first jump location
+  product[0] ^= gatekey;
+  // A prime is used to maximize the number of reads without repeat
+  jump_offset = keypool_primes[product[1] % sizeof(keypool_primes)];
+  // Pull 64bits at a time out of the ring function
+  for(step = 0; step < nbytes/8; step++)
+  {
+    // Pull the next 64bits from the entropy source:
+    product[step] ^= keyspace[gate_position];
+    // A shift rotate will make our reads less predictable without loosing uniqness
+    // Here we rotate by an uncertain degree, making our local state more unique
+    product[step] = rotl64(product[step], unique[step]%64);    
+    // Pick another 64bit chunk that is somewhere else in the pool and doesn't overlap
+    gate_position = (gate_position + jump_offset) % keypool_size_64;
+    product[step] ^= keyspace[gate_position];
+    // Assume that 'keyspace' is shared, so we add a local rotation
+    product[step] = rotl64(product[step], unique[step+1]%64);
+    // Find another point to read from that is distinct.
+    gate_position = (gate_position + jump_offset) % keypool_size_64;
+  }
+}
+
+
+/*
+ * This generates a ChaCha block using the provided key, and then
+ * immediately overwites that key with half the block. It returns
+ * the resultant ChaCha state to the user, along with the second
+ * half of the block containing 32 bytes of random data that may
+ * be used; random_data_len may not be greater than 32.
+ */
+static void crng_fast_key_erasure(u8 key[CHACHA_KEY_SIZE],
+				  u32 chacha_state[CHACHA_STATE_WORDS],
+				  u8 *random_data, size_t random_data_len)
+{
+	u8 first_block[CHACHA_BLOCK_SIZE];
+
+	BUG_ON(random_data_len > 32);
+
+	chacha_init_consts(chacha_state);
+	memcpy(&chacha_state[4], key, CHACHA_KEY_SIZE);
+	memset(&chacha_state[12], 0, sizeof(u32) * 4);
+	chacha20_block(chacha_state, first_block);
+
+	memcpy(key, first_block, CHACHA_KEY_SIZE);
+	memcpy(random_data, first_block + CHACHA_KEY_SIZE, random_data_len);
+	memzero_explicit(first_block, sizeof(first_block));
+}
+
+/*
+ * The goal is to produce a very secure source of I.I.D.
+ * (Independent and identically distributed)
+ * This is a wrapper to dispatch to whatever primitive is best
+ */
+void _unique_iid(u64 uu_key[], u64 gatekey, size_t nbytes, int rotate)
+{
+  // The state just needs to be distinct here - the key is the important bit.
+  u32 chacha_state[CHACHA_STATE_WORDS] __latent_entropy;
+  // We could make secret_key into __latent_entropy, but I would prefer speed - the keypool should be enough.
+  u8 secret_key[CHACHA_KEY_SIZE];
+  // lets get ourselves a key from the keypool:
+  _get_unique(runtime_entropy, KEYPOOL_SIZE, __make_gatekey(uu_key), secret_key, sizeof(secret_key));
+  // pass that key to a primitive which can provide an I.I.D. stream
+  crng_fast_key_erasure(secret_key, chacha_state, (u8 *)uu_key, nbytes);
+}
+
+/*
+ * The goal here is to be fast
+ * the user needs less 1 block, they only need two words.
+ * Lets fill the request as quickly as we can.
+ * we add __latent_entropy, because we are called early in execution
+ * it is good to have all the sources we can get.
+ */
+u64 get_random_u64(void)
+{
+  u64 anvil;
+  _unique_iid((u64 *)&anvil, __make_gatekey(&anvil), sizeof(anvil), 0);
+  return anvil;
+}
+EXPORT_SYMBOL(get_random_u64);
+
+/* 
+ * we want to return just one byte as quickly as possible. 
+ * not use in using a 128 or 256-bit cypher for 32 bits
+ * __make_gatekey is plenty unique for this purpose
+ * get_random_u32 is for kernal-land consuemrs
+ */
+u32 get_random_u32(void)
+{
+  u64 anvil;
+  _unique_iid(&anvil, __make_gatekey(&anvil), sizeof(anvil), 0);
+  return (u32)anvil;
+}
+EXPORT_SYMBOL(get_random_u32);
+
+/*
+ * There are many times when we need another opinion. 
+ * Ideally that would come from another source, such as arch_get_random_seed_long()
+ * When we don't have a arch_get_random_seed_long, then we'll use ourselves as a source.
+ * 
+ * Failure is not an option - and this output is untrusted.
+ * The output should be XOR'ed with a random value from a different source.
+ */
+u64 _alternate_rand()
+{
+  //Need a source that isn't GCC's latententropy or time.
+  u64 anvil = 0;
+  //Try every source we know of, taken from random.c:
+  if(!arch_get_random_seed_long((unsigned long *)&anvil))
+  {
+      if(!arch_get_random_long((unsigned long *)&anvil))
+      {
+         anvil = random_get_entropy();
+      }
+  }
+  // anvil might still be zero -  
+  // We can't tell the difference between a zero-roll and a hardware error code.
+  // Worst case, we are missing everything above
+  if(anvil == 0)
+  {
+    // We cannot fail, in this case we pull from the pool
+    // This output is used to make a gatekey, so time is used
+    // No two calls can use the exact same jiffies + &anvil due to the pigeonhole principle
+    // todo: 32bit?
+    u64 alternate_gatekey __latent_entropy;
+    alternate_gatekey ^= (u64)jiffies ^ (u64)&anvil;
+    _unique_iid(&anvil, alternate_gatekey, sizeof(anvil), 0);
+    // 'anvil' is a small jump table entropy pool that we can further enrich
+    _add_unique((char *)&anvil, sizeof(anvil), alternate_gatekey, (uint8_t *)&alternate_gatekey, sizeof(alternate_gatekey), sizeof(alternate_gatekey));
+    // cleanup
+    alternate_gatekey = 0;
+  }
+  return anvil;
+}
+
+/*
+ * Public functon to provide CRNG
+ *
+ *  - Generate some very hard to guess key material
+ *  - Use the fastest cryptographic primitive available
+ *  - Return CRNG back to the user as quickly as we can
+ *  - Cleanup so we can do this all over again
+ * 
+ * This is where users get their entropy from the random.c 
+ * device driver (i.e. reading /dev/random)
+ */
+static ssize_t extract_crng_user(uint8_t *__user_buf, size_t nbytes){  
+    //If we only need a few bytes these two are the best source.
+    if(nbytes <= 0){
+      return nbytes;
+    } else {
+      // Fill the request - no rotate
+      _unique_iid((u64 *)__user_buf, __make_gatekey(__user_buf), nbytes, 0);  
+    }     
+    //at this point it should not be possible to re-create any part of the PRNG stream used.
+    return nbytes;
+}
+
+// This is the /dev/urandom variant.
+// it is simlar to the algorithm above, but more time is spent procuring stronger key material.
+// the user is willing to wait, so we'll do our very best.
+// when this method completes, the keypool as a whole is better off, as it will be re-scheduled.
+ /*
+ * Be an _unlimited_ random source
+ * Speed is not an issue
+ * Provide the very best source possible
+ * 
+ * Rolling accumulator keys
+ * Key, IV, and Image accumulate entropy with each operation
+ * They are never overwritten, only XOR'ed with the previous value
+ */
+
+static ssize_t extract_crng_user_unlimited(uint8_t *__user_buf, size_t nbytes)
+{
+    //If we only need a few bytes these two are the best source.
+    if(nbytes <= 0){
+      return nbytes;
+    } else {
+      // Fill the request - rotate key material:
+      _unique_iid((u64 *)__user_buf, __make_gatekey(__user_buf), nbytes, 1);  
+    }     
+    //at this point it should not be possible to re-create any part of the PRNG stream used.
+    return nbytes;
+}
+
+static unsigned long get_reg(u16 reg_idx, struct pt_regs *regs)
+{
+	unsigned long *ptr = (unsigned long *)regs;
+	unsigned int idx;
+
+	if (regs == NULL)
+		return 0;
+	idx = READ_ONCE(reg_idx);
+	if (idx >= sizeof(struct pt_regs) / sizeof(unsigned long))
+		idx = 0;
+	ptr += idx++;
+	WRITE_ONCE(reg_idx, idx);
+	return *ptr;
+}
+
+/* This function is in fact called more times than I have ever used a phone.
+ * lets keep this function as light as possible, and move more weight to extract_crng_user()
+ * if we need to add more computation, then the user requesting the PRNG should pay the price
+ * any logic added here, means the entire system pays a price. 
+ * Choose your operations wisely.
+ *
+ * fast_mix() is fast in name only - mixing can also be handled with encryption.
+ *
+ */
+//If there is one function to make lockless, this is the one
+void add_interrupt_randomness(int irq)
+{
+  //Globally unique gatekey
+  uint64_t gatekey __latent_entropy;
+  u64  temp_pool[5] __latent_entropy;
+  struct pt_regs    *regs = get_irq_regs();
+  // todo: woops flags was removed from this branch... 
+  // Personally I liked adding flags because we should collect as much uninquness as we can.
+
+  //irq_flags contains a few bits, and every bit counts.
+  //cycles_t    cycles = irq_flags;
+  __u32     c_high, j_high;
+  __u64     ip = _RET_IP_;
+
+  //todo: woops we don't use 'cycles' anymore in the branch, is there a better source of uniqueness?
+  //This code is adapted from the old random.c - all O(1) operations
+  //The interrupt + time gives us 4 bytes.
+  //if (cycles == 0)
+  //  cycles = get_reg(fast_pool, regs);
+  //c_high = (sizeof(cycles) > 4) ? cycles >> 32 : 0;
+  j_high = (sizeof(jiffies) > 4) ? jiffies >> 32 : 0;
+  //fast_pool[0] ^= cycles ^ j_high ^ irq;
+  temp_pool[1] ^= jiffies ^ c_high;
+  temp_pool[2] ^= ip;
+  temp_pool[3] ^= (sizeof(ip) > 4) ? ip >> 32 :
+    get_reg((int *)temp_pool, regs);
+
+  // A gatekey will have some hardware randomness when available
+  // It will be XOR'ed with __latent_entropy to prevent outsider control
+  gatekey ^= __make_gatekey(&irq);
+  // Add this unique value to the pool
+  temp_pool[4] ^= gatekey;
+  //A single O(1) XOR operation is the best we can get to drip the entropy back into the pool
+  _add_unique(runtime_entropy, POOL_SIZE, gatekey, (uint8_t *)temp_pool, sizeof(temp_pool), sizeof(temp_pool));
+
+  //Cleanup
+  gatekey = 0;
+}
+EXPORT_SYMBOL_GPL(add_interrupt_randomness);
+
+/*
+ * Getting entropy on a fresh system is a hard thing to do. 
+ * So, we will start with latent_entropy, although it isn't required it doesn't hurt.
+ * Then lets take addresses we know about - add them to the mix
+ * todo: Fire up the debugger, and look for regions of memory with good data. 
+ * The zero page has hardware identifiers that can be hard to guess. 
+ * Then derive a key the best we can given the degraded state of the pool.
+ * 
+ * find_more_entropy_in_memory() is called when extract_crng_user can't be used.
+ * get_random_u32() and get_random_u64() can't be used.
+ *
+ */
+static void find_more_entropy_in_memory(uint8_t *crng_pool, int nbytes_needed)
+{
+  uint8_t    *anvil;
+  // This is early in boot, __latent_entropy is helpful
+  u64        gatekey __latent_entropy;
+  gatekey  ^= __make_gatekey(&anvil);
+  int index;
+  // Pull unique hardware information populated by the bios.
+  int ZERO_PAGE = 0;
+
+  // a place to forge some entropy - the has a unique offest.
+  anvil = (uint8_t *)kmalloc(nbytes_needed, __GFP_HIGH);
+
+  // Lets add as many easily accessable unknowns as we can:
+  // Even without ASLR some addresses can be more difficult to guess than others.
+  // With ASLR, this would be partially feedback noise, with offsets.
+  // Add any addresses that are unknown under POOL_SIZE
+  // 16 addresses for 64-bit is ideal, 32 should use 32 addresses to make 1024 bits.
+  // Todo: use a debugger to find the 32 hardest to guess addresses.
+  void *points_of_interest[] = {
+      ZERO_PAGE,
+      _RET_IP_,
+      _THIS_IP_,
+      anvil,
+      gatekey
+	  // todo... add to this list.
+	  // maybe pull in from the bss, dss and text memory segments? 
+  };
+
+  //Gather Runtime Entropy
+  //  - Data from the zero page
+  //  - Memory addresses from the stack and heap and 'anvil' points to the heap.
+  //  - Unset memory on the heap that may contain noise
+  //  - Unallocated memory that maybe have used or in use
+  //Copy from the zero page, contains HW IDs from the bios
+  for(index = 0; index < sizeof(points_of_interest); index++){
+    void *readPoint = points_of_interest[index];
+    // Grab the uniqueness found at this address:
+    _add_unique(crng_pool, POOL_SIZE, gatekey, (void *)&readPoint, nbytes_needed, nbytes_needed);
+    // Pull in uniqueness from this page in memory:
+    // Todo - read the values at this address - we want the contents of the zero page:
+    //_add_unique(crng_pool, POOL_SIZE, gatekey, readPoint, nbytes_needed, nbytes_needed);
+  }
+
+  //twigs when wrapped together can become loadbearing
+  //a samurai sword has many layers of steel.
+  // I.I.D. means Independent and Identically Distributed
+  //_unique_iid() might not be not safe at this point  
+  // - but it is unique enough as a seed.
+  _unique_iid((u64 *)anvil, gatekey, nbytes_needed, 1);
+  _add_unique(crng_pool, POOL_SIZE, gatekey, anvil, nbytes_needed, nbytes_needed);
+  
+  //Clean up our tracks so another process cannot see our source material
+  memzero_explicit(anvil, nbytes_needed);
+  gatekey = 0;
+  kfree(anvil);
+}
+
+static ssize_t
+_random_read(int nonblock, char __user *buf, size_t nbytes)
+{
+  return extract_crng_user(buf, nbytes);
+}
+
+// no blocking
+static ssize_t
+random_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
+{
+  return extract_crng_user(buf+*ppos, nbytes-*ppos);
+}
+
+static ssize_t
+urandom_read(struct file *file, char __user *buf, size_t nbytes, loff_t *ppos)
+{
+  //This is a non-blocking device so we are not going to wait for the pool to fill. 
+  //We will respect the users wishes, and spend time to produce the best output.
+  return extract_crng_user_unlimited(buf+*ppos, nbytes-*ppos);
+}
+
 /*
  * Returns whether or not the input pool has been seeded and thus guaranteed
  * to supply cryptographically secure random numbers. This applies to: the
@@ -110,30 +622,19 @@ bool rng_is_initialized(void)
 }
 EXPORT_SYMBOL(rng_is_initialized);
 
-/* Used by wait_for_random_bytes(), and considered an entropy collector, below. */
-static void try_to_generate_entropy(void);
-
 /*
- * Wait for the input pool to be seeded and thus guaranteed to supply
- * cryptographically secure random numbers. This applies to: the /dev/urandom
- * device, the get_random_bytes function, and the get_random_{u32,u64,int,long}
- * family of functions. Using any of these functions without first calling
- * this function forfeits the guarantee of security.
- *
- * Returns: 0 if the input pool has been seeded.
- *          -ERESTARTSYS if the function was interrupted by a signal.
+ * This funciton is invoked when we are told that we need to find more entropy
+ * this must be fast, we can't wait around because that will slowdown the entire show.
  */
 int wait_for_random_bytes(void)
 {
-	while (!crng_ready()) {
-		int ret;
-
-		try_to_generate_entropy();
-		ret = wait_event_interruptible_timeout(crng_init_wait, crng_ready(), HZ);
-		if (ret)
-			return ret > 0 ? 0 : ret;
-	}
-	return 0;
+	// People are wating on us, there is no "try" - you need to go find what we need.
+	// Because we simply cannot afford to wait, we are using __GFP_HIGH to access uniqness in the heap
+	// as well as other sources.
+	// Woops todo - this is a good idea but it's unstable and I need help here:
+	//find_more_entropy_in_memory(runtime_entropy, POOL_SIZE);
+	// todo: if find_more_entropy_in_memory() fails then we have bigger problem on our hands.
+	return 1;
 }
 EXPORT_SYMBOL(wait_for_random_bytes);
 
@@ -173,38 +674,6 @@ int unregister_random_ready_notifier(struct notifier_block *nb)
 	return ret;
 }
 
-static void process_random_ready_list(void)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&random_ready_chain_lock, flags);
-	raw_notifier_call_chain(&random_ready_chain, 0, NULL);
-	spin_unlock_irqrestore(&random_ready_chain_lock, flags);
-}
-
-#define warn_unseeded_randomness(previous) \
-	_warn_unseeded_randomness(__func__, (void *)_RET_IP_, (previous))
-
-static void _warn_unseeded_randomness(const char *func_name, void *caller, void **previous)
-{
-#ifdef CONFIG_WARN_ALL_UNSEEDED_RANDOM
-	const bool print_once = false;
-#else
-	static bool print_once __read_mostly;
-#endif
-
-	if (print_once || crng_ready() ||
-	    (previous && (caller == READ_ONCE(*previous))))
-		return;
-	WRITE_ONCE(*previous, caller);
-#ifndef CONFIG_WARN_ALL_UNSEEDED_RANDOM
-	print_once = true;
-#endif
-	if (__ratelimit(&unseeded_warning))
-		printk_deferred(KERN_NOTICE "random: %s called from %pS with crng_init=%d\n",
-				func_name, caller, crng_init);
-}
-
 
 /*********************************************************************
  *
@@ -256,260 +725,6 @@ static DEFINE_PER_CPU(struct crng, crngs) = {
 	.lock = INIT_LOCAL_LOCK(crngs.lock),
 };
 
-/* Used by crng_reseed() to extract a new seed from the input pool. */
-static bool drain_entropy(void *buf, size_t nbytes, bool force);
-
-/*
- * This extracts a new crng key from the input pool, but only if there is a
- * sufficient amount of entropy available or force is true, in order to
- * mitigate bruteforcing of newly added bits.
- */
-static void crng_reseed(bool force)
-{
-	unsigned long flags;
-	unsigned long next_gen;
-	u8 key[CHACHA_KEY_SIZE];
-	bool finalize_init = false;
-
-	/* Only reseed if we can, to prevent brute forcing a small amount of new bits. */
-	if (!drain_entropy(key, sizeof(key), force))
-		return;
-
-	/*
-	 * We copy the new key into the base_crng, overwriting the old one,
-	 * and update the generation counter. We avoid hitting ULONG_MAX,
-	 * because the per-cpu crngs are initialized to ULONG_MAX, so this
-	 * forces new CPUs that come online to always initialize.
-	 */
-	spin_lock_irqsave(&base_crng.lock, flags);
-	memcpy(base_crng.key, key, sizeof(base_crng.key));
-	next_gen = base_crng.generation + 1;
-	if (next_gen == ULONG_MAX)
-		++next_gen;
-	WRITE_ONCE(base_crng.generation, next_gen);
-	WRITE_ONCE(base_crng.birth, jiffies);
-	if (!crng_ready()) {
-		crng_init = 2;
-		finalize_init = true;
-	}
-	spin_unlock_irqrestore(&base_crng.lock, flags);
-	memzero_explicit(key, sizeof(key));
-	if (finalize_init) {
-		process_random_ready_list();
-		wake_up_interruptible(&crng_init_wait);
-		kill_fasync(&fasync, SIGIO, POLL_IN);
-		pr_notice("crng init done\n");
-		if (unseeded_warning.missed) {
-			pr_notice("%d get_random_xx warning(s) missed due to ratelimiting\n",
-				  unseeded_warning.missed);
-			unseeded_warning.missed = 0;
-		}
-		if (urandom_warning.missed) {
-			pr_notice("%d urandom warning(s) missed due to ratelimiting\n",
-				  urandom_warning.missed);
-			urandom_warning.missed = 0;
-		}
-	}
-}
-
-/*
- * This generates a ChaCha block using the provided key, and then
- * immediately overwites that key with half the block. It returns
- * the resultant ChaCha state to the user, along with the second
- * half of the block containing 32 bytes of random data that may
- * be used; random_data_len may not be greater than 32.
- */
-static void crng_fast_key_erasure(u8 key[CHACHA_KEY_SIZE],
-				  u32 chacha_state[CHACHA_STATE_WORDS],
-				  u8 *random_data, size_t random_data_len)
-{
-	u8 first_block[CHACHA_BLOCK_SIZE];
-
-	BUG_ON(random_data_len > 32);
-
-	chacha_init_consts(chacha_state);
-	memcpy(&chacha_state[4], key, CHACHA_KEY_SIZE);
-	memset(&chacha_state[12], 0, sizeof(u32) * 4);
-	chacha20_block(chacha_state, first_block);
-
-	memcpy(key, first_block, CHACHA_KEY_SIZE);
-	memcpy(random_data, first_block + CHACHA_KEY_SIZE, random_data_len);
-	memzero_explicit(first_block, sizeof(first_block));
-}
-
-/*
- * Return whether the crng seed is considered to be sufficiently
- * old that a reseeding might be attempted. This happens if the last
- * reseeding was CRNG_RESEED_INTERVAL ago, or during early boot, at
- * an interval proportional to the uptime.
- */
-static bool crng_has_old_seed(void)
-{
-	static bool early_boot = true;
-	unsigned long interval = CRNG_RESEED_INTERVAL;
-
-	if (unlikely(READ_ONCE(early_boot))) {
-		time64_t uptime = ktime_get_seconds();
-		if (uptime >= CRNG_RESEED_INTERVAL / HZ * 2)
-			WRITE_ONCE(early_boot, false);
-		else
-			interval = max_t(unsigned int, 5 * HZ,
-					 (unsigned int)uptime / 2 * HZ);
-	}
-	return time_after(jiffies, READ_ONCE(base_crng.birth) + interval);
-}
-
-/*
- * This function returns a ChaCha state that you may use for generating
- * random data. It also returns up to 32 bytes on its own of random data
- * that may be used; random_data_len may not be greater than 32.
- */
-static void crng_make_state(u32 chacha_state[CHACHA_STATE_WORDS],
-			    u8 *random_data, size_t random_data_len)
-{
-	unsigned long flags;
-	struct crng *crng;
-
-	BUG_ON(random_data_len > 32);
-
-	/*
-	 * For the fast path, we check whether we're ready, unlocked first, and
-	 * then re-check once locked later. In the case where we're really not
-	 * ready, we do fast key erasure with the base_crng directly, because
-	 * this is what crng_pre_init_inject() mutates during early init.
-	 */
-	if (!crng_ready()) {
-		bool ready;
-
-		spin_lock_irqsave(&base_crng.lock, flags);
-		ready = crng_ready();
-		if (!ready)
-			crng_fast_key_erasure(base_crng.key, chacha_state,
-					      random_data, random_data_len);
-		spin_unlock_irqrestore(&base_crng.lock, flags);
-		if (!ready)
-			return;
-	}
-
-	/*
-	 * If the base_crng is old enough, we try to reseed, which in turn
-	 * bumps the generation counter that we check below.
-	 */
-	if (unlikely(crng_has_old_seed()))
-		crng_reseed(false);
-
-	local_lock_irqsave(&crngs.lock, flags);
-	crng = raw_cpu_ptr(&crngs);
-
-	/*
-	 * If our per-cpu crng is older than the base_crng, then it means
-	 * somebody reseeded the base_crng. In that case, we do fast key
-	 * erasure on the base_crng, and use its output as the new key
-	 * for our per-cpu crng. This brings us up to date with base_crng.
-	 */
-	if (unlikely(crng->generation != READ_ONCE(base_crng.generation))) {
-		spin_lock(&base_crng.lock);
-		crng_fast_key_erasure(base_crng.key, chacha_state,
-				      crng->key, sizeof(crng->key));
-		crng->generation = base_crng.generation;
-		spin_unlock(&base_crng.lock);
-	}
-
-	/*
-	 * Finally, when we've made it this far, our per-cpu crng has an up
-	 * to date key, and we can do fast key erasure with it to produce
-	 * some random data and a ChaCha state for the caller. All other
-	 * branches of this function are "unlikely", so most of the time we
-	 * should wind up here immediately.
-	 */
-	crng_fast_key_erasure(crng->key, chacha_state, random_data, random_data_len);
-	local_unlock_irqrestore(&crngs.lock, flags);
-}
-
-/*
- * This function is for crng_init == 0 only. It loads entropy directly
- * into the crng's key, without going through the input pool. It is,
- * generally speaking, not very safe, but we use this only at early
- * boot time when it's better to have something there rather than
- * nothing.
- *
- * If account is set, then the crng_init_cnt counter is incremented.
- * This shouldn't be set by functions like add_device_randomness(),
- * where we can't trust the buffer passed to it is guaranteed to be
- * unpredictable (so it might not have any entropy at all).
- *
- * Returns the number of bytes processed from input, which is bounded
- * by CRNG_INIT_CNT_THRESH if account is true.
- */
-static size_t crng_pre_init_inject(const void *input, size_t len, bool account)
-{
-	static int crng_init_cnt = 0;
-	struct blake2s_state hash;
-	unsigned long flags;
-
-	blake2s_init(&hash, sizeof(base_crng.key));
-
-	spin_lock_irqsave(&base_crng.lock, flags);
-	if (crng_init != 0) {
-		spin_unlock_irqrestore(&base_crng.lock, flags);
-		return 0;
-	}
-
-	if (account)
-		len = min_t(size_t, len, CRNG_INIT_CNT_THRESH - crng_init_cnt);
-
-	blake2s_update(&hash, base_crng.key, sizeof(base_crng.key));
-	blake2s_update(&hash, input, len);
-	blake2s_final(&hash, base_crng.key);
-
-	if (account) {
-		crng_init_cnt += len;
-		if (crng_init_cnt >= CRNG_INIT_CNT_THRESH) {
-			++base_crng.generation;
-			crng_init = 1;
-		}
-	}
-
-	spin_unlock_irqrestore(&base_crng.lock, flags);
-
-	if (crng_init == 1)
-		pr_notice("fast init done\n");
-
-	return len;
-}
-
-static void _get_random_bytes(void *buf, size_t nbytes)
-{
-	u32 chacha_state[CHACHA_STATE_WORDS];
-	u8 tmp[CHACHA_BLOCK_SIZE];
-	size_t len;
-
-	if (!nbytes)
-		return;
-
-	len = min_t(size_t, 32, nbytes);
-	crng_make_state(chacha_state, buf, len);
-	nbytes -= len;
-	buf += len;
-
-	while (nbytes) {
-		if (nbytes < CHACHA_BLOCK_SIZE) {
-			chacha20_block(chacha_state, tmp);
-			memcpy(buf, tmp, nbytes);
-			memzero_explicit(tmp, sizeof(tmp));
-			break;
-		}
-
-		chacha20_block(chacha_state, buf);
-		if (unlikely(chacha_state[12] == 0))
-			++chacha_state[13];
-		nbytes -= CHACHA_BLOCK_SIZE;
-		buf += CHACHA_BLOCK_SIZE;
-	}
-
-	memzero_explicit(chacha_state, sizeof(chacha_state));
-}
-
 /*
  * This function is the exported kernel interface.  It returns some
  * number of good random numbers, suitable for key generation, seeding
@@ -522,59 +737,10 @@ static void _get_random_bytes(void *buf, size_t nbytes)
  */
 void get_random_bytes(void *buf, size_t nbytes)
 {
-	static void *previous;
-
-	warn_unseeded_randomness(&previous);
-	_get_random_bytes(buf, nbytes);
+	extract_crng_user(buf, nbytes);
 }
 EXPORT_SYMBOL(get_random_bytes);
 
-static ssize_t get_random_bytes_user(void __user *buf, size_t nbytes)
-{
-	bool large_request = nbytes > 256;
-	ssize_t ret = 0;
-	size_t len;
-	u32 chacha_state[CHACHA_STATE_WORDS];
-	u8 output[CHACHA_BLOCK_SIZE];
-
-	if (!nbytes)
-		return 0;
-
-	len = min_t(size_t, 32, nbytes);
-	crng_make_state(chacha_state, output, len);
-
-	if (copy_to_user(buf, output, len))
-		return -EFAULT;
-	nbytes -= len;
-	buf += len;
-	ret += len;
-
-	while (nbytes) {
-		if (large_request && need_resched()) {
-			if (signal_pending(current))
-				break;
-			schedule();
-		}
-
-		chacha20_block(chacha_state, output);
-		if (unlikely(chacha_state[12] == 0))
-			++chacha_state[13];
-
-		len = min_t(size_t, nbytes, CHACHA_BLOCK_SIZE);
-		if (copy_to_user(buf, output, len)) {
-			ret = -EFAULT;
-			break;
-		}
-
-		nbytes -= len;
-		buf += len;
-		ret += len;
-	}
-
-	memzero_explicit(chacha_state, sizeof(chacha_state));
-	memzero_explicit(output, sizeof(output));
-	return ret;
-}
 
 /*
  * Batched entropy returns random integers. The quality of the random
@@ -605,68 +771,12 @@ static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64) = {
 	.position = UINT_MAX
 };
 
-u64 get_random_u64(void)
-{
-	u64 ret;
-	unsigned long flags;
-	struct batched_entropy *batch;
-	static void *previous;
-	unsigned long next_gen;
-
-	warn_unseeded_randomness(&previous);
-
-	local_lock_irqsave(&batched_entropy_u64.lock, flags);
-	batch = raw_cpu_ptr(&batched_entropy_u64);
-
-	next_gen = READ_ONCE(base_crng.generation);
-	if (batch->position >= ARRAY_SIZE(batch->entropy_u64) ||
-	    next_gen != batch->generation) {
-		_get_random_bytes(batch->entropy_u64, sizeof(batch->entropy_u64));
-		batch->position = 0;
-		batch->generation = next_gen;
-	}
-
-	ret = batch->entropy_u64[batch->position];
-	batch->entropy_u64[batch->position] = 0;
-	++batch->position;
-	local_unlock_irqrestore(&batched_entropy_u64.lock, flags);
-	return ret;
-}
-EXPORT_SYMBOL(get_random_u64);
 
 static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u32) = {
 	.lock = INIT_LOCAL_LOCK(batched_entropy_u32.lock),
 	.position = UINT_MAX
 };
 
-u32 get_random_u32(void)
-{
-	u32 ret;
-	unsigned long flags;
-	struct batched_entropy *batch;
-	static void *previous;
-	unsigned long next_gen;
-
-	warn_unseeded_randomness(&previous);
-
-	local_lock_irqsave(&batched_entropy_u32.lock, flags);
-	batch = raw_cpu_ptr(&batched_entropy_u32);
-
-	next_gen = READ_ONCE(base_crng.generation);
-	if (batch->position >= ARRAY_SIZE(batch->entropy_u32) ||
-	    next_gen != batch->generation) {
-		_get_random_bytes(batch->entropy_u32, sizeof(batch->entropy_u32));
-		batch->position = 0;
-		batch->generation = next_gen;
-	}
-
-	ret = batch->entropy_u32[batch->position];
-	batch->entropy_u32[batch->position] = 0;
-	++batch->position;
-	local_unlock_irqrestore(&batched_entropy_u32.lock, flags);
-	return ret;
-}
-EXPORT_SYMBOL(get_random_u32);
 
 #ifdef CONFIG_SMP
 /*
@@ -749,22 +859,8 @@ EXPORT_SYMBOL(get_random_bytes_arch);
 
 /**********************************************************************
  *
- * Entropy accumulation and extraction routines.
- *
- * Callers may add entropy via:
- *
- *     static void mix_pool_bytes(const void *in, size_t nbytes)
- *
- * After which, if added entropy should be credited:
- *
- *     static void credit_entropy_bits(size_t nbits)
- *
- * Finally, extract entropy via these two, with the latter one
- * setting the entropy count to zero and extracting only if there
- * is POOL_MIN_BITS entropy credited prior or force is true:
- *
- *     static void extract_entropy(void *buf, size_t nbytes)
- *     static bool drain_entropy(void *buf, size_t nbytes, bool force)
+ * Entropy cannot be extracted from this new key-scheduling primitive, 
+ * this construction is purly additive. Enjoy. 
  *
  **********************************************************************/
 
@@ -788,108 +884,6 @@ static struct {
 	.lock = __SPIN_LOCK_UNLOCKED(input_pool.lock),
 };
 
-static void _mix_pool_bytes(const void *in, size_t nbytes)
-{
-	blake2s_update(&input_pool.hash, in, nbytes);
-}
-
-/*
- * This function adds bytes into the entropy "pool".  It does not
- * update the entropy estimate.  The caller should call
- * credit_entropy_bits if this is appropriate.
- */
-static void mix_pool_bytes(const void *in, size_t nbytes)
-{
-	unsigned long flags;
-
-	spin_lock_irqsave(&input_pool.lock, flags);
-	_mix_pool_bytes(in, nbytes);
-	spin_unlock_irqrestore(&input_pool.lock, flags);
-}
-
-static void credit_entropy_bits(size_t nbits)
-{
-	unsigned int entropy_count, orig, add;
-
-	if (!nbits)
-		return;
-
-	add = min_t(size_t, nbits, POOL_BITS);
-
-	do {
-		orig = READ_ONCE(input_pool.entropy_count);
-		entropy_count = min_t(unsigned int, POOL_BITS, orig + add);
-	} while (cmpxchg(&input_pool.entropy_count, orig, entropy_count) != orig);
-
-	if (!crng_ready() && entropy_count >= POOL_MIN_BITS)
-		crng_reseed(false);
-}
-
-/*
- * This is an HKDF-like construction for using the hashed collected entropy
- * as a PRF key, that's then expanded block-by-block.
- */
-static void extract_entropy(void *buf, size_t nbytes)
-{
-	unsigned long flags;
-	u8 seed[BLAKE2S_HASH_SIZE], next_key[BLAKE2S_HASH_SIZE];
-	struct {
-		unsigned long rdseed[32 / sizeof(long)];
-		size_t counter;
-	} block;
-	size_t i;
-
-	for (i = 0; i < ARRAY_SIZE(block.rdseed); ++i) {
-		if (!arch_get_random_seed_long(&block.rdseed[i]) &&
-		    !arch_get_random_long(&block.rdseed[i]))
-			block.rdseed[i] = random_get_entropy();
-	}
-
-	spin_lock_irqsave(&input_pool.lock, flags);
-
-	/* seed = HASHPRF(last_key, entropy_input) */
-	blake2s_final(&input_pool.hash, seed);
-
-	/* next_key = HASHPRF(seed, RDSEED || 0) */
-	block.counter = 0;
-	blake2s(next_key, (u8 *)&block, seed, sizeof(next_key), sizeof(block), sizeof(seed));
-	blake2s_init_key(&input_pool.hash, BLAKE2S_HASH_SIZE, next_key, sizeof(next_key));
-
-	spin_unlock_irqrestore(&input_pool.lock, flags);
-	memzero_explicit(next_key, sizeof(next_key));
-
-	while (nbytes) {
-		i = min_t(size_t, nbytes, BLAKE2S_HASH_SIZE);
-		/* output = HASHPRF(seed, RDSEED || ++counter) */
-		++block.counter;
-		blake2s(buf, (u8 *)&block, seed, i, sizeof(block), sizeof(seed));
-		nbytes -= i;
-		buf += i;
-	}
-
-	memzero_explicit(seed, sizeof(seed));
-	memzero_explicit(&block, sizeof(block));
-}
-
-/*
- * First we make sure we have POOL_MIN_BITS of entropy in the pool unless force
- * is true, and then we set the entropy count to zero (but don't actually touch
- * any data). Only then can we extract a new key with extract_entropy().
- */
-static bool drain_entropy(void *buf, size_t nbytes, bool force)
-{
-	unsigned int entropy_count;
-	do {
-		entropy_count = READ_ONCE(input_pool.entropy_count);
-		if (!force && entropy_count < POOL_MIN_BITS)
-			return false;
-	} while (cmpxchg(&input_pool.entropy_count, entropy_count, 0) != entropy_count);
-	extract_entropy(buf, nbytes);
-	wake_up_interruptible(&random_write_wait);
-	kill_fasync(&fasync, SIGIO, POLL_OUT);
-	return true;
-}
-
 
 /**********************************************************************
  *
@@ -970,23 +964,11 @@ early_param("random.trust_bootloader", parse_trust_bootloader);
  */
 int __init rand_initialize(void)
 {
-	size_t i;
-	ktime_t now = ktime_get_real();
 	bool arch_init = true;
-	unsigned long rv;
 
-	for (i = 0; i < BLAKE2S_BLOCK_SIZE; i += sizeof(rv)) {
-		if (!arch_get_random_seed_long_early(&rv) &&
-		    !arch_get_random_long_early(&rv)) {
-			rv = random_get_entropy();
-			arch_init = false;
-		}
-		_mix_pool_bytes(&rv, sizeof(rv));
-	}
-	_mix_pool_bytes(&now, sizeof(now));
-	_mix_pool_bytes(utsname(), sizeof(*(utsname())));
+	crng_reseed(runtime_entropy, sizeof(runtime_entropy));
 
-	extract_entropy(base_crng.key, sizeof(base_crng.key));
+	extract_crng_user(base_crng.key, sizeof(base_crng.key));
 	++base_crng.generation;
 
 	if (arch_init && trust_cpu && !crng_ready()) {
@@ -1011,17 +993,10 @@ int __init rand_initialize(void)
  */
 void add_device_randomness(const void *buf, size_t size)
 {
-	cycles_t cycles = random_get_entropy();
-	unsigned long flags, now = jiffies;
-
-	if (crng_init == 0 && size)
-		crng_pre_init_inject(buf, size, false);
-
-	spin_lock_irqsave(&input_pool.lock, flags);
-	_mix_pool_bytes(&cycles, sizeof(cycles));
-	_mix_pool_bytes(&now, sizeof(now));
-	_mix_pool_bytes(buf, size);
-	spin_unlock_irqrestore(&input_pool.lock, flags);
+	// unique gatekey made with the caller's uniqne address. 
+	uint64_t gatekey = __make_gatekey(buf);
+	// Locks are not needed becuase our gatekey cannot be shared with any other caller.
+	_add_unique(runtime_entropy, POOL_SIZE, gatekey, buf, size, size);
 }
 EXPORT_SYMBOL(add_device_randomness);
 
@@ -1031,86 +1006,37 @@ struct timer_rand_state {
 	long last_delta, last_delta2;
 };
 
-/*
- * This function adds entropy to the entropy "pool" by using timing
- * delays.  It uses the timer_rand_state structure to make an estimate
- * of how many bits of entropy this call has added to the pool.
- *
- * The number "num" is also added to the pool - it should somehow describe
- * the type of event which just happened.  This is currently 0-255 for
- * keyboard scan codes, and 256 upwards for interrupts.
- */
-static void add_timer_randomness(struct timer_rand_state *state, unsigned int num)
-{
-	cycles_t cycles = random_get_entropy();
-	unsigned long flags, now = jiffies;
-	long delta, delta2, delta3;
-
-	spin_lock_irqsave(&input_pool.lock, flags);
-	_mix_pool_bytes(&cycles, sizeof(cycles));
-	_mix_pool_bytes(&now, sizeof(now));
-	_mix_pool_bytes(&num, sizeof(num));
-	spin_unlock_irqrestore(&input_pool.lock, flags);
-
-	/*
-	 * Calculate number of bits of randomness we probably added.
-	 * We take into account the first, second and third-order deltas
-	 * in order to make our estimate.
-	 */
-	delta = now - READ_ONCE(state->last_time);
-	WRITE_ONCE(state->last_time, now);
-
-	delta2 = delta - READ_ONCE(state->last_delta);
-	WRITE_ONCE(state->last_delta, delta);
-
-	delta3 = delta2 - READ_ONCE(state->last_delta2);
-	WRITE_ONCE(state->last_delta2, delta2);
-
-	if (delta < 0)
-		delta = -delta;
-	if (delta2 < 0)
-		delta2 = -delta2;
-	if (delta3 < 0)
-		delta3 = -delta3;
-	if (delta > delta2)
-		delta = delta2;
-	if (delta > delta3)
-		delta = delta3;
-
-	/*
-	 * delta is now minimum absolute delta.
-	 * Round down by 1 bit on general principles,
-	 * and limit entropy estimate to 12 bits.
-	 */
-	credit_entropy_bits(min_t(unsigned int, fls(delta >> 1), 11));
-}
-
+// todo - this could be better, this function is called a lot we need to improve performece here
+// ideally we would call _add_unique() once per invocation and call it a day.
 void add_input_randomness(unsigned int type, unsigned int code,
 			  unsigned int value)
 {
-	static unsigned char last_value;
-	static struct timer_rand_state input_timer_state = { INITIAL_JIFFIES };
-
-	/* Ignore autorepeat and the like. */
-	if (value == last_value)
-		return;
-
-	last_value = value;
-	add_timer_randomness(&input_timer_state,
-			     (type << 4) ^ code ^ (code >> 4) ^ value);
+	// unique gatekey made with the caller's uniquness.
+	// ideally we would use the caller's address, in this case we are pass-by-value
+	// so this is not  an ideal gatekey, but it will do the trick.
+	uint64_t gatekey = __make_gatekey(&type);
+	// add the uniqness of the code that we were passed.
+	_add_unique(runtime_entropy, POOL_SIZE, gatekey, code, sizeof(code), sizeof(code));
+	// then add the uniqness of the value, mix up the gatekey to write in a different spot.
+	// and overlap here isn't a big deal because an attacker cannot guess what we are up to.
+	_add_unique(runtime_entropy, POOL_SIZE, gatekey+1, value, sizeof(value), sizeof(value));
+	// sure why not the type is also unique. Is this too much?
+	_add_unique(runtime_entropy, POOL_SIZE, gatekey+2, type, sizeof(type), sizeof(type));
 }
 EXPORT_SYMBOL_GPL(add_input_randomness);
 
 #ifdef CONFIG_BLOCK
 void add_disk_randomness(struct gendisk *disk)
 {
+	// check if there is work to be done.
 	if (!disk || !disk->random)
 		return;
-	/* First major is 1, so we get >= 0x200 here. */
-	add_timer_randomness(disk->random, 0x100 + disk_devt(disk));
+	// Locks are not needed becuase our gatekey cannot be shared with any other caller.
+	_add_unique(runtime_entropy, POOL_SIZE, __make_gatekey(disk), (uint8_t *)disk->random, sizeof(disk->random), sizeof(disk->random));
 }
 EXPORT_SYMBOL_GPL(add_disk_randomness);
 
+// todo: not sure how to intigrate this one.
 void rand_initialize_disk(struct gendisk *disk)
 {
 	struct timer_rand_state *state;
@@ -1129,33 +1055,15 @@ void rand_initialize_disk(struct gendisk *disk)
 
 /*
  * Interface for in-kernel drivers of true hardware RNGs.
- * Those devices may produce endless random bits and will be throttled
- * when our pool is full.
+ * no throttle - no locks, add it asap - people are waiting.
  */
 void add_hwgenerator_randomness(const void *buffer, size_t count,
 				size_t entropy)
 {
-	if (unlikely(crng_init == 0 && entropy < POOL_MIN_BITS)) {
-		size_t ret = crng_pre_init_inject(buffer, count, true);
-		mix_pool_bytes(buffer, ret);
-		count -= ret;
-		buffer += ret;
-		if (!count || crng_init == 0)
-			return;
-	}
-
-	/*
-	 * Throttle writing if we're above the trickle threshold.
-	 * We'll be woken up again once below POOL_MIN_BITS, when
-	 * the calling thread is about to terminate, or once
-	 * CRNG_RESEED_INTERVAL has elapsed.
-	 */
-	wait_event_interruptible_timeout(random_write_wait,
-			!system_wq || kthread_should_stop() ||
-			input_pool.entropy_count < POOL_MIN_BITS,
-			CRNG_RESEED_INTERVAL);
-	mix_pool_bytes(buffer, count);
-	credit_entropy_bits(entropy);
+  // make sure this caller has a unique entry point into the jump table.
+  uint64_t gatekey = __make_gatekey(buffer);
+  // add this buffer to the jump table.
+  _add_unique(runtime_entropy, POOL_SIZE, gatekey, buffer, count, entropy);
 }
 EXPORT_SYMBOL_GPL(add_hwgenerator_randomness);
 
@@ -1184,12 +1092,10 @@ static BLOCKING_NOTIFIER_HEAD(vmfork_chain);
  */
 void add_vmfork_randomness(const void *unique_vm_id, size_t size)
 {
-	add_device_randomness(unique_vm_id, size);
-	if (crng_ready()) {
-		crng_reseed(true);
-		pr_notice("crng reseeded due to virtual machine fork\n");
-	}
-	blocking_notifier_call_chain(&vmfork_chain, 0, NULL);
+  // make sure this caller has a unique entry point into the jump table.
+  uint64_t gatekey = __make_gatekey(buffer);
+  // add this buffer to the jump table.
+  _add_unique(runtime_entropy, POOL_SIZE, gatekey, unique_vm_id, size, size);
 }
 #if IS_MODULE(CONFIG_VMGENID)
 EXPORT_SYMBOL_GPL(add_vmfork_randomness);
@@ -1208,51 +1114,6 @@ int unregister_random_vmfork_notifier(struct notifier_block *nb)
 EXPORT_SYMBOL_GPL(unregister_random_vmfork_notifier);
 #endif
 
-struct fast_pool {
-	struct work_struct mix;
-	unsigned long pool[4];
-	unsigned long last;
-	unsigned int count;
-	u16 reg_idx;
-};
-
-static DEFINE_PER_CPU(struct fast_pool, irq_randomness) = {
-#ifdef CONFIG_64BIT
-	/* SipHash constants */
-	.pool = { 0x736f6d6570736575UL, 0x646f72616e646f6dUL,
-		  0x6c7967656e657261UL, 0x7465646279746573UL }
-#else
-	/* HalfSipHash constants */
-	.pool = { 0, 0, 0x6c796765U, 0x74656462U }
-#endif
-};
-
-/*
- * This is [Half]SipHash-1-x, starting from an empty key. Because
- * the key is fixed, it assumes that its inputs are non-malicious,
- * and therefore this has no security on its own. s represents the
- * 128 or 256-bit SipHash state, while v represents a 128-bit input.
- */
-static void fast_mix(unsigned long s[4], const unsigned long *v)
-{
-	size_t i;
-
-	for (i = 0; i < 16 / sizeof(long); ++i) {
-		s[3] ^= v[i];
-#ifdef CONFIG_64BIT
-		s[0] += s[1]; s[1] = rol64(s[1], 13); s[1] ^= s[0]; s[0] = rol64(s[0], 32);
-		s[2] += s[3]; s[3] = rol64(s[3], 16); s[3] ^= s[2];
-		s[0] += s[3]; s[3] = rol64(s[3], 21); s[3] ^= s[0];
-		s[2] += s[1]; s[1] = rol64(s[1], 17); s[1] ^= s[2]; s[2] = rol64(s[2], 32);
-#else
-		s[0] += s[1]; s[1] = rol32(s[1],  5); s[1] ^= s[0]; s[0] = rol32(s[0], 16);
-		s[2] += s[3]; s[3] = rol32(s[3],  8); s[3] ^= s[2];
-		s[0] += s[3]; s[3] = rol32(s[3],  7); s[3] ^= s[0];
-		s[2] += s[1]; s[1] = rol32(s[1], 13); s[1] ^= s[2]; s[2] = rol32(s[2], 16);
-#endif
-		s[0] ^= v[i];
-	}
-}
 
 #ifdef CONFIG_SMP
 /*
@@ -1261,175 +1122,12 @@ static void fast_mix(unsigned long s[4], const unsigned long *v)
  */
 int random_online_cpu(unsigned int cpu)
 {
-	/*
-	 * During CPU shutdown and before CPU onlining, add_interrupt_
-	 * randomness() may schedule mix_interrupt_randomness(), and
-	 * set the MIX_INFLIGHT flag. However, because the worker can
-	 * be scheduled on a different CPU during this period, that
-	 * flag will never be cleared. For that reason, we zero out
-	 * the flag here, which runs just after workqueues are onlined
-	 * for the CPU again. This also has the effect of setting the
-	 * irq randomness count to zero so that new accumulated irqs
-	 * are fresh.
-	 */
-	per_cpu_ptr(&irq_randomness, cpu)->count = 0;
+	// todo I need help here... 
+	// per_cpu_ptr(&irq_randomness, cpu)->count = 0;
 	return 0;
 }
 #endif
 
-static unsigned long get_reg(struct fast_pool *f, struct pt_regs *regs)
-{
-	unsigned long *ptr = (unsigned long *)regs;
-	unsigned int idx;
-
-	if (regs == NULL)
-		return 0;
-	idx = READ_ONCE(f->reg_idx);
-	if (idx >= sizeof(struct pt_regs) / sizeof(unsigned long))
-		idx = 0;
-	ptr += idx++;
-	WRITE_ONCE(f->reg_idx, idx);
-	return *ptr;
-}
-
-static void mix_interrupt_randomness(struct work_struct *work)
-{
-	struct fast_pool *fast_pool = container_of(work, struct fast_pool, mix);
-	/*
-	 * The size of the copied stack pool is explicitly 16 bytes so that we
-	 * tax mix_pool_byte()'s compression function the same amount on all
-	 * platforms. This means on 64-bit we copy half the pool into this,
-	 * while on 32-bit we copy all of it. The entropy is supposed to be
-	 * sufficiently dispersed between bits that in the sponge-like
-	 * half case, on average we don't wind up "losing" some.
-	 */
-	u8 pool[16];
-
-	/* Check to see if we're running on the wrong CPU due to hotplug. */
-	local_irq_disable();
-	if (fast_pool != this_cpu_ptr(&irq_randomness)) {
-		local_irq_enable();
-		return;
-	}
-
-	/*
-	 * Copy the pool to the stack so that the mixer always has a
-	 * consistent view, before we reenable irqs again.
-	 */
-	memcpy(pool, fast_pool->pool, sizeof(pool));
-	fast_pool->count = 0;
-	fast_pool->last = jiffies;
-	local_irq_enable();
-
-	if (unlikely(crng_init == 0)) {
-		crng_pre_init_inject(pool, sizeof(pool), true);
-		mix_pool_bytes(pool, sizeof(pool));
-	} else {
-		mix_pool_bytes(pool, sizeof(pool));
-		credit_entropy_bits(1);
-	}
-
-	memzero_explicit(pool, sizeof(pool));
-}
-
-void add_interrupt_randomness(int irq)
-{
-	enum { MIX_INFLIGHT = 1U << 31 };
-	cycles_t cycles = random_get_entropy();
-	unsigned long now = jiffies;
-	struct fast_pool *fast_pool = this_cpu_ptr(&irq_randomness);
-	struct pt_regs *regs = get_irq_regs();
-	unsigned int new_count;
-	union {
-		u32 u32[4];
-		u64 u64[2];
-		unsigned long longs[16 / sizeof(long)];
-	} irq_data;
-
-	if (cycles == 0)
-		cycles = get_reg(fast_pool, regs);
-
-	if (sizeof(cycles) == 8)
-		irq_data.u64[0] = cycles ^ rol64(now, 32) ^ irq;
-	else {
-		irq_data.u32[0] = cycles ^ irq;
-		irq_data.u32[1] = now;
-	}
-
-	if (sizeof(unsigned long) == 8)
-		irq_data.u64[1] = regs ? instruction_pointer(regs) : _RET_IP_;
-	else {
-		irq_data.u32[2] = regs ? instruction_pointer(regs) : _RET_IP_;
-		irq_data.u32[3] = get_reg(fast_pool, regs);
-	}
-
-	fast_mix(fast_pool->pool, irq_data.longs);
-	new_count = ++fast_pool->count;
-
-	if (new_count & MIX_INFLIGHT)
-		return;
-
-	if (new_count < 64 && (!time_after(now, fast_pool->last + HZ) ||
-			       unlikely(crng_init == 0)))
-		return;
-
-	if (unlikely(!fast_pool->mix.func))
-		INIT_WORK(&fast_pool->mix, mix_interrupt_randomness);
-	fast_pool->count |= MIX_INFLIGHT;
-	queue_work_on(raw_smp_processor_id(), system_highpri_wq, &fast_pool->mix);
-}
-EXPORT_SYMBOL_GPL(add_interrupt_randomness);
-
-/*
- * Each time the timer fires, we expect that we got an unpredictable
- * jump in the cycle counter. Even if the timer is running on another
- * CPU, the timer activity will be touching the stack of the CPU that is
- * generating entropy..
- *
- * Note that we don't re-arm the timer in the timer itself - we are
- * happy to be scheduled away, since that just makes the load more
- * complex, but we do not want the timer to keep ticking unless the
- * entropy loop is running.
- *
- * So the re-arming always happens in the entropy loop itself.
- */
-static void entropy_timer(struct timer_list *t)
-{
-	credit_entropy_bits(1);
-}
-
-/*
- * If we have an actual cycle counter, see if we can
- * generate enough entropy with timing noise
- */
-static void try_to_generate_entropy(void)
-{
-	struct {
-		cycles_t cycles;
-		struct timer_list timer;
-	} stack;
-
-	stack.cycles = random_get_entropy();
-
-	/* Slow counter - or none. Don't even bother */
-	if (stack.cycles == random_get_entropy())
-		return;
-
-	timer_setup_on_stack(&stack.timer, entropy_timer, 0);
-	while (!crng_ready() && !signal_pending(current)) {
-		if (!timer_pending(&stack.timer))
-			mod_timer(&stack.timer, jiffies + 1);
-		mix_pool_bytes(&stack.cycles, sizeof(stack.cycles));
-		schedule();
-		stack.cycles = random_get_entropy();
-	}
-
-	del_timer_sync(&stack.timer);
-	destroy_timer_on_stack(&stack.timer);
-	mix_pool_bytes(&stack.cycles, sizeof(stack.cycles));
-}
-
-
 /**********************************************************************
  *
  * Userspace reader/writer interfaces.
@@ -1461,29 +1159,8 @@ static void try_to_generate_entropy(void)
 SYSCALL_DEFINE3(getrandom, char __user *, buf, size_t, count, unsigned int,
 		flags)
 {
-	if (flags & ~(GRND_NONBLOCK | GRND_RANDOM | GRND_INSECURE))
-		return -EINVAL;
-
-	/*
-	 * Requesting insecure and blocking randomness at the same time makes
-	 * no sense.
-	 */
-	if ((flags & (GRND_INSECURE | GRND_RANDOM)) == (GRND_INSECURE | GRND_RANDOM))
-		return -EINVAL;
-
-	if (count > INT_MAX)
-		count = INT_MAX;
-
-	if (!(flags & GRND_INSECURE) && !crng_ready()) {
-		int ret;
-
-		if (flags & GRND_NONBLOCK)
-			return -EAGAIN;
-		ret = wait_for_random_bytes();
-		if (unlikely(ret))
-			return ret;
-	}
-	return get_random_bytes_user(buf, count);
+	// born ready.
+	return extract_crng_user(buf, count);
 }
 
 static __poll_t random_poll(struct file *file, poll_table *wait)
@@ -1502,25 +1179,14 @@ static __poll_t random_poll(struct file *file, poll_table *wait)
 
 static int write_pool(const char __user *ubuf, size_t count)
 {
-	size_t len;
-	int ret = 0;
-	u8 block[BLAKE2S_BLOCK_SIZE];
-
-	while (count) {
-		len = min(count, sizeof(block));
-		if (copy_from_user(block, ubuf, len)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		count -= len;
-		ubuf += len;
-		mix_pool_bytes(block, len);
-		cond_resched();
-	}
-
-out:
-	memzero_explicit(block, sizeof(block));
-	return ret;
+	// unique gatekey made using the time and address of the caller.
+	// No two invocations of write_pool() could ever produce the same gatekey.
+	uint64_t gatekey = __make_gatekey(ubuf);
+	// Drop it into the pool!
+	_add_unique(runtime_entropy, POOL_SIZE, gatekey, ubuf, count, count);
+	//todo: there needs to be a broader cleanup that i'm not ready for:
+	//if _add_unique() fails then we have bigger problems on our hands.
+	return 1;
 }
 
 static ssize_t random_write(struct file *file, const char __user *buffer,
@@ -1535,32 +1201,6 @@ static ssize_t random_write(struct file *file, const char __user *buffer,
 	return (ssize_t)count;
 }
 
-static ssize_t urandom_read(struct file *file, char __user *buf, size_t nbytes,
-			    loff_t *ppos)
-{
-	static int maxwarn = 10;
-
-	if (!crng_ready() && maxwarn > 0) {
-		maxwarn--;
-		if (__ratelimit(&urandom_warning))
-			pr_notice("%s: uninitialized urandom read (%zd bytes read)\n",
-				  current->comm, nbytes);
-	}
-
-	return get_random_bytes_user(buf, nbytes);
-}
-
-static ssize_t random_read(struct file *file, char __user *buf, size_t nbytes,
-			   loff_t *ppos)
-{
-	int ret;
-
-	ret = wait_for_random_bytes();
-	if (ret != 0)
-		return ret;
-	return get_random_bytes_user(buf, nbytes);
-}
-
 static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
 {
 	int size, ent_count;
@@ -1580,7 +1220,6 @@ static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
 			return -EFAULT;
 		if (ent_count < 0)
 			return -EINVAL;
-		credit_entropy_bits(ent_count);
 		return 0;
 	case RNDADDENTROPY:
 		if (!capable(CAP_SYS_ADMIN))
@@ -1594,7 +1233,6 @@ static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
 		retval = write_pool((const char __user *)p, size);
 		if (retval < 0)
 			return retval;
-		credit_entropy_bits(ent_count);
 		return 0;
 	case RNDZAPENTCNT:
 	case RNDCLEARPOOL:
@@ -1614,7 +1252,7 @@ static long random_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
 			return -EPERM;
 		if (!crng_ready())
 			return -ENODATA;
-		crng_reseed(false);
+		crng_reseed(runtime_entropy, sizeof(runtime_entropy));
 		return 0;
 	default:
 		return -EINVAL;
@@ -1685,6 +1323,8 @@ static int sysctl_random_write_wakeup_bits = POOL_MIN_BITS;
 static int sysctl_poolsize = POOL_BITS;
 static u8 sysctl_bootid[UUID_SIZE];
 
+
+// todo: not sure how to fix proc_do_uuid() this shouldn't need a spin lock with the new key scheulding primitive.
 /*
  * This function is used to return both the bootid UUID, and random
  * UUID. The difference is in whether table->data is NULL; if it is,

^ permalink raw reply related	[flat|nested] 63+ messages in thread

* Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction
       [not found]   ` <CAOnCY6RUN+CSwjsD6Vg-MDi7ERAj2kKLorMLGp1jE8dTZ+3cpQ@mail.gmail.com>
@ 2022-03-28 19:33     ` Michael Brooks
  0 siblings, 0 replies; 63+ messages in thread
From: Michael Brooks @ 2022-03-28 19:33 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Dominik Brodowski, Eric Biggers, Greg Kroah-Hartman,
	Jason A. Donenfeld, Jean-Philippe Aumasson, Theodore Ts'o,
	linux-kernel, stable

I should have said 'predicate', not 'axiom'.  Please review the
rigorous proof of correctness as listed on my github.  Another issue
I'd like to bring up in 5.17 is the apparent lack of mathematical
proofs and overuse of the word "entropy" when we should be using the
word uniqueness  - do we really want to go live with something that
isn't documented at all and just trust the word of any one individual?

My team put together mathematical proof of correctness, the onerious
of proving a defect in the implementation is up to individuals who run
the code and observe a failing - that is the point of code review.
The proposed changes to the pool means it no longer needs to be
"drained" or "accumulated", however we still may need a lock to make
sure the /dev/random device driver is ready for use prior to access.
Although I am actually not entirely convinced that this lock is needed
because it directly impacts boot time because of KSLR.  We could lock
and wait for the slow long term storage to wake up and feed us the
seed - but is that really necessary?  The alternate_rand() function in
the provided patch should be sufficient for the vast majority of
non-paranoid users, I don't see how a would-be adversary could
undermine this construction - sure it isn't ideally secure as a fully
loaded keypool - but it's good enough to thwart a memory corruption
exploit.  A fast boot option is preferable IMHO.

Regards,
Mike

On Mon, Mar 28, 2022 at 9:16 AM Michael Brooks <m@sweetwater.ai> wrote:
>
> Jason,
>
> I think this is an astute observation - the current design of the pool has this defect of being emptied leading to deadlocks, and in addition, access to it undermines the predictability of the pool.  This needs to be fundamentally addressed.
>
> What if access to the pool could never undermine the predictability of the pool state?  If this axiom holds, then there would never be a reason to keep a global counter for how much “entropy” or uniqueness is in the pool - and that would remove the lock that handle_irq_event_percpu() is fighting over... which creates a condition where one lock is being battled over by every irq event - which is made worse as the machine has additional CPUs.  This one lock is so bad, it is basically a GIL, and we need to have a serious discussion on how to remove this one lock.
>
> This may sound far fetched, but there is actually some clever cryptography that can provide a good solution here.    What if the pool was not a linear structure FILO structure, but actually a jump table?  What if access to this jump table was randomized, so that no two callers could take the same path through the table?  What if in addition, upon read - the reader "covers their tracks" and modifies their entrypoint into the table to prevent any other caller from being able to follow the same path?  This is exactly what keypoolrandom does.
>
> Now, if the output of this jump table is used as an input to cryptographic primitives such as an ideal block cipher or hash function - then even a single bit flip would cause an avalanche which dramatically affects the output - and also obscures how it was generated.  Therefore, access using this method could never undermine the pool state - therefore this speciture structure never "drains" but rather becomes less and less predictable over time.
>
> This kind of jump table has been written and is called https://github.com/TheRook/KeypoolRandom.  This took three years to write, has been peer reviewed and has exquisite documentation, if you take the time to read over this code and the docs - I think you'll enjoy it.  This project is very easy to compile and to run - it is the software equivalent of a breakout board, and doesn't require a full kernel compile to see how it functions.
>
> Regards,
> Michael Brooks
>
>
> On Mon, Mar 28, 2022 at 4:20 AM Sasha Levin <sashal@kernel.org> wrote:
>>
>> From: "Jason A. Donenfeld" <Jason@zx2c4.com>
>>
>> [ Upstream commit 6e8ec2552c7d13991148e551e3325a624d73fac6 ]
>>
>> The current 4096-bit LFSR used for entropy collection had a few
>> desirable attributes for the context in which it was created. For
>> example, the state was huge, which meant that /dev/random would be able
>> to output quite a bit of accumulated entropy before blocking. It was
>> also, in its time, quite fast at accumulating entropy byte-by-byte,
>> which matters given the varying contexts in which mix_pool_bytes() is
>> called. And its diffusion was relatively high, which meant that changes
>> would ripple across several words of state rather quickly.
>>
>> However, it also suffers from a few security vulnerabilities. In
>> particular, inputs learned by an attacker can be undone, but moreover,
>> if the state of the pool leaks, its contents can be controlled and
>> entirely zeroed out. I've demonstrated this attack with this SMT2
>> script, <https://xn--4db.cc/5o9xO8pb>, which Boolector/CaDiCal solves in
>> a matter of seconds on a single core of my laptop, resulting in little
>> proof of concept C demonstrators such as <https://xn--4db.cc/jCkvvIaH/c>.
>>
>> For basically all recent formal models of RNGs, these attacks represent
>> a significant cryptographic flaw. But how does this manifest
>> practically? If an attacker has access to the system to such a degree
>> that he can learn the internal state of the RNG, arguably there are
>> other lower hanging vulnerabilities -- side-channel, infoleak, or
>> otherwise -- that might have higher priority. On the other hand, seed
>> files are frequently used on systems that have a hard time generating
>> much entropy on their own, and these seed files, being files, often leak
>> or are duplicated and distributed accidentally, or are even seeded over
>> the Internet intentionally, where their contents might be recorded or
>> tampered with. Seen this way, an otherwise quasi-implausible
>> vulnerability is a bit more practical than initially thought.
>>
>> Another aspect of the current mix_pool_bytes() function is that, while
>> its performance was arguably competitive for the time in which it was
>> created, it's no longer considered so. This patch improves performance
>> significantly: on a high-end CPU, an i7-11850H, it improves performance
>> of mix_pool_bytes() by 225%, and on a low-end CPU, a Cortex-A7, it
>> improves performance by 103%.
>>
>> This commit replaces the LFSR of mix_pool_bytes() with a straight-
>> forward cryptographic hash function, BLAKE2s, which is already in use
>> for pool extraction. Universal hashing with a secret seed was considered
>> too, something along the lines of <https://eprint.iacr.org/2013/338>,
>> but the requirement for a secret seed makes for a chicken & egg problem.
>> Instead we go with a formally proven scheme using a computational hash
>> function, described in sections 5.1, 6.4, and B.1.8 of
>> <https://eprint.iacr.org/2019/198>.
>>
>> BLAKE2s outputs 256 bits, which should give us an appropriate amount of
>> min-entropy accumulation, and a wide enough margin of collision
>> resistance against active attacks. mix_pool_bytes() becomes a simple
>> call to blake2s_update(), for accumulation, while the extraction step
>> becomes a blake2s_final() to generate a seed, with which we can then do
>> a HKDF-like or BLAKE2X-like expansion, the first part of which we fold
>> back as an init key for subsequent blake2s_update()s, and the rest we
>> produce to the caller. This then is provided to our CRNG like usual. In
>> that expansion step, we make opportunistic use of 32 bytes of RDRAND
>> output, just as before. We also always reseed the crng with 32 bytes,
>> unconditionally, or not at all, rather than sometimes with 16 as before,
>> as we don't win anything by limiting beyond the 16 byte threshold.
>>
>> Going for a hash function as an entropy collector is a conservative,
>> proven approach. The result of all this is a much simpler and much less
>> bespoke construction than what's there now, which not only plugs a
>> vulnerability but also improves performance considerably.
>>
>> Cc: Theodore Ts'o <tytso@mit.edu>
>> Cc: Dominik Brodowski <linux@dominikbrodowski.net>
>> Reviewed-by: Eric Biggers <ebiggers@google.com>
>> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
>> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
>> Signed-off-by: Sasha Levin <sashal@kernel.org>
>> ---
>>  drivers/char/random.c | 304 ++++++++----------------------------------
>>  1 file changed, 55 insertions(+), 249 deletions(-)
>>
>> diff --git a/drivers/char/random.c b/drivers/char/random.c
>> index 3404a91edf29..882f78829a24 100644
>> --- a/drivers/char/random.c
>> +++ b/drivers/char/random.c
>> @@ -42,61 +42,6 @@
>>   */
>>
>>  /*
>> - * (now, with legal B.S. out of the way.....)
>> - *
>> - * This routine gathers environmental noise from device drivers, etc.,
>> - * and returns good random numbers, suitable for cryptographic use.
>> - * Besides the obvious cryptographic uses, these numbers are also good
>> - * for seeding TCP sequence numbers, and other places where it is
>> - * desirable to have numbers which are not only random, but hard to
>> - * predict by an attacker.
>> - *
>> - * Theory of operation
>> - * ===================
>> - *
>> - * Computers are very predictable devices.  Hence it is extremely hard
>> - * to produce truly random numbers on a computer --- as opposed to
>> - * pseudo-random numbers, which can easily generated by using a
>> - * algorithm.  Unfortunately, it is very easy for attackers to guess
>> - * the sequence of pseudo-random number generators, and for some
>> - * applications this is not acceptable.  So instead, we must try to
>> - * gather "environmental noise" from the computer's environment, which
>> - * must be hard for outside attackers to observe, and use that to
>> - * generate random numbers.  In a Unix environment, this is best done
>> - * from inside the kernel.
>> - *
>> - * Sources of randomness from the environment include inter-keyboard
>> - * timings, inter-interrupt timings from some interrupts, and other
>> - * events which are both (a) non-deterministic and (b) hard for an
>> - * outside observer to measure.  Randomness from these sources are
>> - * added to an "entropy pool", which is mixed using a CRC-like function.
>> - * This is not cryptographically strong, but it is adequate assuming
>> - * the randomness is not chosen maliciously, and it is fast enough that
>> - * the overhead of doing it on every interrupt is very reasonable.
>> - * As random bytes are mixed into the entropy pool, the routines keep
>> - * an *estimate* of how many bits of randomness have been stored into
>> - * the random number generator's internal state.
>> - *
>> - * When random bytes are desired, they are obtained by taking the BLAKE2s
>> - * hash of the contents of the "entropy pool".  The BLAKE2s hash avoids
>> - * exposing the internal state of the entropy pool.  It is believed to
>> - * be computationally infeasible to derive any useful information
>> - * about the input of BLAKE2s from its output.  Even if it is possible to
>> - * analyze BLAKE2s in some clever way, as long as the amount of data
>> - * returned from the generator is less than the inherent entropy in
>> - * the pool, the output data is totally unpredictable.  For this
>> - * reason, the routine decreases its internal estimate of how many
>> - * bits of "true randomness" are contained in the entropy pool as it
>> - * outputs random numbers.
>> - *
>> - * If this estimate goes to zero, the routine can still generate
>> - * random numbers; however, an attacker may (at least in theory) be
>> - * able to infer the future output of the generator from prior
>> - * outputs.  This requires successful cryptanalysis of BLAKE2s, which is
>> - * not believed to be feasible, but there is a remote possibility.
>> - * Nonetheless, these numbers should be useful for the vast majority
>> - * of purposes.
>> - *
>>   * Exported interfaces ---- output
>>   * ===============================
>>   *
>> @@ -298,23 +243,6 @@
>>   *
>>   *     mknod /dev/random c 1 8
>>   *     mknod /dev/urandom c 1 9
>> - *
>> - * Acknowledgements:
>> - * =================
>> - *
>> - * Ideas for constructing this random number generator were derived
>> - * from Pretty Good Privacy's random number generator, and from private
>> - * discussions with Phil Karn.  Colin Plumb provided a faster random
>> - * number generator, which speed up the mixing function of the entropy
>> - * pool, taken from PGPfone.  Dale Worley has also contributed many
>> - * useful ideas and suggestions to improve this driver.
>> - *
>> - * Any flaws in the design are solely my responsibility, and should
>> - * not be attributed to the Phil, Colin, or any of authors of PGP.
>> - *
>> - * Further background information on this topic may be obtained from
>> - * RFC 1750, "Randomness Recommendations for Security", by Donald
>> - * Eastlake, Steve Crocker, and Jeff Schiller.
>>   */
>>
>>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>> @@ -358,79 +286,15 @@
>>
>>  /* #define ADD_INTERRUPT_BENCH */
>>
>> -/*
>> - * If the entropy count falls under this number of bits, then we
>> - * should wake up processes which are selecting or polling on write
>> - * access to /dev/random.
>> - */
>> -static int random_write_wakeup_bits = 28 * (1 << 5);
>> -
>> -/*
>> - * Originally, we used a primitive polynomial of degree .poolwords
>> - * over GF(2).  The taps for various sizes are defined below.  They
>> - * were chosen to be evenly spaced except for the last tap, which is 1
>> - * to get the twisting happening as fast as possible.
>> - *
>> - * For the purposes of better mixing, we use the CRC-32 polynomial as
>> - * well to make a (modified) twisted Generalized Feedback Shift
>> - * Register.  (See M. Matsumoto & Y. Kurita, 1992.  Twisted GFSR
>> - * generators.  ACM Transactions on Modeling and Computer Simulation
>> - * 2(3):179-194.  Also see M. Matsumoto & Y. Kurita, 1994.  Twisted
>> - * GFSR generators II.  ACM Transactions on Modeling and Computer
>> - * Simulation 4:254-266)
>> - *
>> - * Thanks to Colin Plumb for suggesting this.
>> - *
>> - * The mixing operation is much less sensitive than the output hash,
>> - * where we use BLAKE2s.  All that we want of mixing operation is that
>> - * it be a good non-cryptographic hash; i.e. it not produce collisions
>> - * when fed "random" data of the sort we expect to see.  As long as
>> - * the pool state differs for different inputs, we have preserved the
>> - * input entropy and done a good job.  The fact that an intelligent
>> - * attacker can construct inputs that will produce controlled
>> - * alterations to the pool's state is not important because we don't
>> - * consider such inputs to contribute any randomness.  The only
>> - * property we need with respect to them is that the attacker can't
>> - * increase his/her knowledge of the pool's state.  Since all
>> - * additions are reversible (knowing the final state and the input,
>> - * you can reconstruct the initial state), if an attacker has any
>> - * uncertainty about the initial state, he/she can only shuffle that
>> - * uncertainty about, but never cause any collisions (which would
>> - * decrease the uncertainty).
>> - *
>> - * Our mixing functions were analyzed by Lacharme, Roeck, Strubel, and
>> - * Videau in their paper, "The Linux Pseudorandom Number Generator
>> - * Revisited" (see: http://eprint.iacr.org/2012/251.pdf).  In their
>> - * paper, they point out that we are not using a true Twisted GFSR,
>> - * since Matsumoto & Kurita used a trinomial feedback polynomial (that
>> - * is, with only three taps, instead of the six that we are using).
>> - * As a result, the resulting polynomial is neither primitive nor
>> - * irreducible, and hence does not have a maximal period over
>> - * GF(2**32).  They suggest a slight change to the generator
>> - * polynomial which improves the resulting TGFSR polynomial to be
>> - * irreducible, which we have made here.
>> - */
>>  enum poolinfo {
>> -       POOL_WORDS = 128,
>> -       POOL_WORDMASK = POOL_WORDS - 1,
>> -       POOL_BYTES = POOL_WORDS * sizeof(u32),
>> -       POOL_BITS = POOL_BYTES * 8,
>> +       POOL_BITS = BLAKE2S_HASH_SIZE * 8,
>>         POOL_BITSHIFT = ilog2(POOL_BITS),
>>
>>         /* To allow fractional bits to be tracked, the entropy_count field is
>>          * denominated in units of 1/8th bits. */
>>         POOL_ENTROPY_SHIFT = 3,
>>  #define POOL_ENTROPY_BITS() (input_pool.entropy_count >> POOL_ENTROPY_SHIFT)
>> -       POOL_FRACBITS = POOL_BITS << POOL_ENTROPY_SHIFT,
>> -
>> -       /* x^128 + x^104 + x^76 + x^51 +x^25 + x + 1 */
>> -       POOL_TAP1 = 104,
>> -       POOL_TAP2 = 76,
>> -       POOL_TAP3 = 51,
>> -       POOL_TAP4 = 25,
>> -       POOL_TAP5 = 1,
>> -
>> -       EXTRACT_SIZE = BLAKE2S_HASH_SIZE / 2
>> +       POOL_FRACBITS = POOL_BITS << POOL_ENTROPY_SHIFT
>>  };
>>
>>  /*
>> @@ -438,6 +302,12 @@ enum poolinfo {
>>   */
>>  static DECLARE_WAIT_QUEUE_HEAD(random_write_wait);
>>  static struct fasync_struct *fasync;
>> +/*
>> + * If the entropy count falls under this number of bits, then we
>> + * should wake up processes which are selecting or polling on write
>> + * access to /dev/random.
>> + */
>> +static int random_write_wakeup_bits = POOL_BITS * 3 / 4;
>>
>>  static DEFINE_SPINLOCK(random_ready_list_lock);
>>  static LIST_HEAD(random_ready_list);
>> @@ -493,73 +363,31 @@ MODULE_PARM_DESC(ratelimit_disable, "Disable random ratelimit suppression");
>>   *
>>   **********************************************************************/
>>
>> -static u32 input_pool_data[POOL_WORDS] __latent_entropy;
>> -
>>  static struct {
>> +       struct blake2s_state hash;
>>         spinlock_t lock;
>> -       u16 add_ptr;
>> -       u16 input_rotate;
>>         int entropy_count;
>>  } input_pool = {
>> +       .hash.h = { BLAKE2S_IV0 ^ (0x01010000 | BLAKE2S_HASH_SIZE),
>> +                   BLAKE2S_IV1, BLAKE2S_IV2, BLAKE2S_IV3, BLAKE2S_IV4,
>> +                   BLAKE2S_IV5, BLAKE2S_IV6, BLAKE2S_IV7 },
>> +       .hash.outlen = BLAKE2S_HASH_SIZE,
>>         .lock = __SPIN_LOCK_UNLOCKED(input_pool.lock),
>>  };
>>
>> -static ssize_t extract_entropy(void *buf, size_t nbytes, int min);
>> -static ssize_t _extract_entropy(void *buf, size_t nbytes);
>> +static bool extract_entropy(void *buf, size_t nbytes, int min);
>> +static void _extract_entropy(void *buf, size_t nbytes);
>>
>>  static void crng_reseed(struct crng_state *crng, bool use_input_pool);
>>
>> -static const u32 twist_table[8] = {
>> -       0x00000000, 0x3b6e20c8, 0x76dc4190, 0x4db26158,
>> -       0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 };
>> -
>>  /*
>>   * This function adds bytes into the entropy "pool".  It does not
>>   * update the entropy estimate.  The caller should call
>>   * credit_entropy_bits if this is appropriate.
>> - *
>> - * The pool is stirred with a primitive polynomial of the appropriate
>> - * degree, and then twisted.  We twist by three bits at a time because
>> - * it's cheap to do so and helps slightly in the expected case where
>> - * the entropy is concentrated in the low-order bits.
>>   */
>>  static void _mix_pool_bytes(const void *in, int nbytes)
>>  {
>> -       unsigned long i;
>> -       int input_rotate;
>> -       const u8 *bytes = in;
>> -       u32 w;
>> -
>> -       input_rotate = input_pool.input_rotate;
>> -       i = input_pool.add_ptr;
>> -
>> -       /* mix one byte at a time to simplify size handling and churn faster */
>> -       while (nbytes--) {
>> -               w = rol32(*bytes++, input_rotate);
>> -               i = (i - 1) & POOL_WORDMASK;
>> -
>> -               /* XOR in the various taps */
>> -               w ^= input_pool_data[i];
>> -               w ^= input_pool_data[(i + POOL_TAP1) & POOL_WORDMASK];
>> -               w ^= input_pool_data[(i + POOL_TAP2) & POOL_WORDMASK];
>> -               w ^= input_pool_data[(i + POOL_TAP3) & POOL_WORDMASK];
>> -               w ^= input_pool_data[(i + POOL_TAP4) & POOL_WORDMASK];
>> -               w ^= input_pool_data[(i + POOL_TAP5) & POOL_WORDMASK];
>> -
>> -               /* Mix the result back in with a twist */
>> -               input_pool_data[i] = (w >> 3) ^ twist_table[w & 7];
>> -
>> -               /*
>> -                * Normally, we add 7 bits of rotation to the pool.
>> -                * At the beginning of the pool, add an extra 7 bits
>> -                * rotation, so that successive passes spread the
>> -                * input bits across the pool evenly.
>> -                */
>> -               input_rotate = (input_rotate + (i ? 7 : 14)) & 31;
>> -       }
>> -
>> -       input_pool.input_rotate = input_rotate;
>> -       input_pool.add_ptr = i;
>> +       blake2s_update(&input_pool.hash, in, nbytes);
>>  }
>>
>>  static void __mix_pool_bytes(const void *in, int nbytes)
>> @@ -953,15 +781,14 @@ static int crng_slow_load(const u8 *cp, size_t len)
>>  static void crng_reseed(struct crng_state *crng, bool use_input_pool)
>>  {
>>         unsigned long flags;
>> -       int i, num;
>> +       int i;
>>         union {
>>                 u8 block[CHACHA_BLOCK_SIZE];
>>                 u32 key[8];
>>         } buf;
>>
>>         if (use_input_pool) {
>> -               num = extract_entropy(&buf, 32, 16);
>> -               if (num == 0)
>> +               if (!extract_entropy(&buf, 32, 16))
>>                         return;
>>         } else {
>>                 _extract_crng(&primary_crng, buf.block);
>> @@ -1329,74 +1156,48 @@ static size_t account(size_t nbytes, int min)
>>  }
>>
>>  /*
>> - * This function does the actual extraction for extract_entropy.
>> - *
>> - * Note: we assume that .poolwords is a multiple of 16 words.
>> + * This is an HKDF-like construction for using the hashed collected entropy
>> + * as a PRF key, that's then expanded block-by-block.
>>   */
>> -static void extract_buf(u8 *out)
>> +static void _extract_entropy(void *buf, size_t nbytes)
>>  {
>> -       struct blake2s_state state __aligned(__alignof__(unsigned long));
>> -       u8 hash[BLAKE2S_HASH_SIZE];
>> -       unsigned long *salt;
>>         unsigned long flags;
>> -
>> -       blake2s_init(&state, sizeof(hash));
>> -
>> -       /*
>> -        * If we have an architectural hardware random number
>> -        * generator, use it for BLAKE2's salt & personal fields.
>> -        */
>> -       for (salt = (unsigned long *)&state.h[4];
>> -            salt < (unsigned long *)&state.h[8]; ++salt) {
>> -               unsigned long v;
>> -               if (!arch_get_random_long(&v))
>> -                       break;
>> -               *salt ^= v;
>> +       u8 seed[BLAKE2S_HASH_SIZE], next_key[BLAKE2S_HASH_SIZE];
>> +       struct {
>> +               unsigned long rdrand[32 / sizeof(long)];
>> +               size_t counter;
>> +       } block;
>> +       size_t i;
>> +
>> +       for (i = 0; i < ARRAY_SIZE(block.rdrand); ++i) {
>> +               if (!arch_get_random_long(&block.rdrand[i]))
>> +                       block.rdrand[i] = random_get_entropy();
>>         }
>>
>> -       /* Generate a hash across the pool */
>>         spin_lock_irqsave(&input_pool.lock, flags);
>> -       blake2s_update(&state, (const u8 *)input_pool_data, POOL_BYTES);
>> -       blake2s_final(&state, hash); /* final zeros out state */
>>
>> -       /*
>> -        * We mix the hash back into the pool to prevent backtracking
>> -        * attacks (where the attacker knows the state of the pool
>> -        * plus the current outputs, and attempts to find previous
>> -        * outputs), unless the hash function can be inverted. By
>> -        * mixing at least a hash worth of hash data back, we make
>> -        * brute-forcing the feedback as hard as brute-forcing the
>> -        * hash.
>> -        */
>> -       __mix_pool_bytes(hash, sizeof(hash));
>> -       spin_unlock_irqrestore(&input_pool.lock, flags);
>> +       /* seed = HASHPRF(last_key, entropy_input) */
>> +       blake2s_final(&input_pool.hash, seed);
>>
>> -       /* Note that EXTRACT_SIZE is half of hash size here, because above
>> -        * we've dumped the full length back into mixer. By reducing the
>> -        * amount that we emit, we retain a level of forward secrecy.
>> -        */
>> -       memcpy(out, hash, EXTRACT_SIZE);
>> -       memzero_explicit(hash, sizeof(hash));
>> -}
>> +       /* next_key = HASHPRF(seed, RDRAND || 0) */
>> +       block.counter = 0;
>> +       blake2s(next_key, (u8 *)&block, seed, sizeof(next_key), sizeof(block), sizeof(seed));
>> +       blake2s_init_key(&input_pool.hash, BLAKE2S_HASH_SIZE, next_key, sizeof(next_key));
>>
>> -static ssize_t _extract_entropy(void *buf, size_t nbytes)
>> -{
>> -       ssize_t ret = 0, i;
>> -       u8 tmp[EXTRACT_SIZE];
>> +       spin_unlock_irqrestore(&input_pool.lock, flags);
>> +       memzero_explicit(next_key, sizeof(next_key));
>>
>>         while (nbytes) {
>> -               extract_buf(tmp);
>> -               i = min_t(int, nbytes, EXTRACT_SIZE);
>> -               memcpy(buf, tmp, i);
>> +               i = min_t(size_t, nbytes, BLAKE2S_HASH_SIZE);
>> +               /* output = HASHPRF(seed, RDRAND || ++counter) */
>> +               ++block.counter;
>> +               blake2s(buf, (u8 *)&block, seed, i, sizeof(block), sizeof(seed));
>>                 nbytes -= i;
>>                 buf += i;
>> -               ret += i;
>>         }
>>
>> -       /* Wipe data just returned from memory */
>> -       memzero_explicit(tmp, sizeof(tmp));
>> -
>> -       return ret;
>> +       memzero_explicit(seed, sizeof(seed));
>> +       memzero_explicit(&block, sizeof(block));
>>  }
>>
>>  /*
>> @@ -1404,13 +1205,18 @@ static ssize_t _extract_entropy(void *buf, size_t nbytes)
>>   * returns it in a buffer.
>>   *
>>   * The min parameter specifies the minimum amount we can pull before
>> - * failing to avoid races that defeat catastrophic reseeding.
>> + * failing to avoid races that defeat catastrophic reseeding. If we
>> + * have less than min entropy available, we return false and buf is
>> + * not filled.
>>   */
>> -static ssize_t extract_entropy(void *buf, size_t nbytes, int min)
>> +static bool extract_entropy(void *buf, size_t nbytes, int min)
>>  {
>>         trace_extract_entropy(nbytes, POOL_ENTROPY_BITS(), _RET_IP_);
>> -       nbytes = account(nbytes, min);
>> -       return _extract_entropy(buf, nbytes);
>> +       if (account(nbytes, min)) {
>> +               _extract_entropy(buf, nbytes);
>> +               return true;
>> +       }
>> +       return false;
>>  }
>>
>>  #define warn_unseeded_randomness(previous) \
>> @@ -1674,7 +1480,7 @@ static void __init init_std_data(void)
>>         unsigned long rv;
>>
>>         mix_pool_bytes(&now, sizeof(now));
>> -       for (i = POOL_BYTES; i > 0; i -= sizeof(rv)) {
>> +       for (i = BLAKE2S_BLOCK_SIZE; i > 0; i -= sizeof(rv)) {
>>                 if (!arch_get_random_seed_long(&rv) &&
>>                     !arch_get_random_long(&rv))
>>                         rv = random_get_entropy();
>> --
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction
  2022-03-28 18:08   ` Eric Biggers
  2022-03-28 18:34     ` Michael Brooks
@ 2022-03-29  5:31     ` Jason A. Donenfeld
  2022-04-05 22:10       ` Jason A. Donenfeld
  2022-03-29 15:38     ` Theodore Ts'o
  2 siblings, 1 reply; 63+ messages in thread
From: Jason A. Donenfeld @ 2022-03-29  5:31 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Sasha Levin, LKML, stable, Theodore Ts'o, Dominik Brodowski,
	Greg Kroah-Hartman, Jean-Philippe Aumasson

Hi Sasha,

On Mon, Mar 28, 2022 at 2:08 PM Eric Biggers <ebiggers@google.com> wrote:
>
> On Mon, Mar 28, 2022 at 07:18:00AM -0400, Sasha Levin wrote:
> > From: "Jason A. Donenfeld" <Jason@zx2c4.com>
> >
> > [ Upstream commit 6e8ec2552c7d13991148e551e3325a624d73fac6 ]
> >
>
> I don't think it's a good idea to start backporting random commits to random.c
> that weren't marked for stable.  There were a lot of changes in v5.18, and
> sometimes they relate to each other in subtle ways, so the individual commits
> aren't necessarily safe to pick.
>
> IMO, you shouldn't backport any non-stable-Cc'ed commits to random.c unless
> Jason explicitly reviews the exact sequence of commits that you're backporting.

I'm inclined to agree with Eric here that you might be a bit careful
about autosel'ing 5.18, given how extensive the changes were. In
theory they should all be properly sequenced so that nothing breaks,
but I'd still be cautious. However, if you want, maybe we can work out
some plan for backporting. I'll take a look and maybe will ping you on
IRC about it.

Jason

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction
  2022-03-28 18:08   ` Eric Biggers
  2022-03-28 18:34     ` Michael Brooks
  2022-03-29  5:31     ` Jason A. Donenfeld
@ 2022-03-29 15:38     ` Theodore Ts'o
  2022-03-29 17:34       ` Michael Brooks
  2 siblings, 1 reply; 63+ messages in thread
From: Theodore Ts'o @ 2022-03-29 15:38 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Sasha Levin, linux-kernel, stable, Jason A. Donenfeld,
	Dominik Brodowski, Greg Kroah-Hartman, Jean-Philippe Aumasson

On Mon, Mar 28, 2022 at 06:08:26PM +0000, Eric Biggers wrote:
> On Mon, Mar 28, 2022 at 07:18:00AM -0400, Sasha Levin wrote:
> > From: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > 
> > [ Upstream commit 6e8ec2552c7d13991148e551e3325a624d73fac6 ]
> > 
> 
> I don't think it's a good idea to start backporting random commits to random.c
> that weren't marked for stable.  There were a lot of changes in v5.18, and
> sometimes they relate to each other in subtle ways, so the individual commits
> aren't necessarily safe to pick.
> 
> IMO, you shouldn't backport any non-stable-Cc'ed commits to random.c unless
> Jason explicitly reviews the exact sequence of commits that you're backporting.

Especially this commit in general, which is making a fundamental
change in how we extract entropy.  We should be very careful about
taking such changes into stable; a release or two of additonal "soak"
time would be a good idea before these go into the LTS releases in particular.

     	      	     	  	       	  - Ted

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction
  2022-03-29 15:38     ` Theodore Ts'o
@ 2022-03-29 17:34       ` Michael Brooks
  2022-03-29 18:28         ` Theodore Ts'o
  0 siblings, 1 reply; 63+ messages in thread
From: Michael Brooks @ 2022-03-29 17:34 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Eric Biggers, Sasha Levin, linux-kernel, stable,
	Jason A. Donenfeld, Dominik Brodowski, Greg Kroah-Hartman,
	Jean-Philippe Aumasson

I agree with Ted,  this patch is just to start the discussion on how
we can safely remove these locks for the improvement of safety and
security.  Both boot and interrupt benchmarks stand to benefit from a
patch like this, so it is worth a deep dive.

Feedback welcome, I am always looking for ways I can be a better
engineer, and a better hacker and a better person. And we are all here
to make the very best kernel.

Regards,
Micahel

On Tue, Mar 29, 2022 at 8:39 AM Theodore Ts'o <tytso@mit.edu> wrote:
>
> On Mon, Mar 28, 2022 at 06:08:26PM +0000, Eric Biggers wrote:
> > On Mon, Mar 28, 2022 at 07:18:00AM -0400, Sasha Levin wrote:
> > > From: "Jason A. Donenfeld" <Jason@zx2c4.com>
> > >
> > > [ Upstream commit 6e8ec2552c7d13991148e551e3325a624d73fac6 ]
> > >
> >
> > I don't think it's a good idea to start backporting random commits to random.c
> > that weren't marked for stable.  There were a lot of changes in v5.18, and
> > sometimes they relate to each other in subtle ways, so the individual commits
> > aren't necessarily safe to pick.
> >
> > IMO, you shouldn't backport any non-stable-Cc'ed commits to random.c unless
> > Jason explicitly reviews the exact sequence of commits that you're backporting.
>
> Especially this commit in general, which is making a fundamental
> change in how we extract entropy.  We should be very careful about
> taking such changes into stable; a release or two of additonal "soak"
> time would be a good idea before these go into the LTS releases in particular.
>
>                                           - Ted

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction
  2022-03-29 17:34       ` Michael Brooks
@ 2022-03-29 18:28         ` Theodore Ts'o
  0 siblings, 0 replies; 63+ messages in thread
From: Theodore Ts'o @ 2022-03-29 18:28 UTC (permalink / raw)
  To: Michael Brooks
  Cc: Eric Biggers, Sasha Levin, linux-kernel, stable,
	Jason A. Donenfeld, Dominik Brodowski, Greg Kroah-Hartman,
	Jean-Philippe Aumasson

On Tue, Mar 29, 2022 at 10:34:49AM -0700, Michael Brooks wrote:
> I agree with Ted,  this patch is just to start the discussion on how
> we can safely remove these locks for the improvement of safety and
> security.  Both boot and interrupt benchmarks stand to benefit from a
> patch like this, so it is worth a deep dive.
> 
> Feedback welcome, I am always looking for ways I can be a better
> engineer, and a better hacker and a better person. And we are all here
> to make the very best kernel.

I think you're talking about a different patch than the one mentioned
in the subject line (which is upstream commit 6e8ec2552c7d, authored
by Jason)?

						- Ted

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction Sasha Levin
  2022-03-28 18:08   ` Eric Biggers
       [not found]   ` <CAOnCY6RUN+CSwjsD6Vg-MDi7ERAj2kKLorMLGp1jE8dTZ+3cpQ@mail.gmail.com>
@ 2022-03-30 16:08   ` Michael Brooks
  2022-03-30 16:49     ` David Laight
  2 siblings, 1 reply; 63+ messages in thread
From: Michael Brooks @ 2022-03-30 16:08 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Dominik Brodowski, Eric Biggers, Greg Kroah-Hartman,
	Jason A. Donenfeld, Jean-Philippe Aumasson, Theodore Ts'o,
	linux-kernel, stable

I think I understand the bug that Jason is referring to, but really he
is referring to the tip of a larger issue - and correct me if I am
wrong.

Let’s say an attacker is a user resident on the machine and can
internally drain the pool via `cat /dev/urandom > /dev/null` - this
will take the high quality random sources and throw it away - forcing
the kernel to use the degraded try_to_generate_entropy() method.

In this state, the kernel performance will be affected, this is a
moderate DoS attack - it won’t bring the system down, but you are
going to have a bad time.  What is serious is the state of the pool -
what happens is that the ratio of “good” to “bad” entropy sources
becomes more and more undesirable.  Sure,  mixing jiffies will fool
die-harder, but that’s easy to do.  There is a deeper issue here that
die-harder cannot test for - and that is a more intentional parallel
construction.  Die-Harder is attempting a parallel construction
without knowing how the RNG is designed.

It is safe to assume that a user who is resident on the machine not
only has a wrist watch, but also can query jiffies directly, this is
not a hidden value.  Now, this codebase seems to use that word
“entropy” a whole lot, and entropy is a physics term. In physical
space - entropy has a lot of emptiness that expands over time, and
also contains known values - these attributes do not aid in the
creation of a secure /dev/random device driver.  Sure Jiffies provides
entropy, but it is a bit like pointing at the moon because it is a
value that everyone can see, and is even something that a remote
attacker could determine using something similar to a vector clock.
The physics definition of “entropy” is simply not useful for our
needs.  What we really need is to collect as many “unknown values” as
we can.

I’d like to describe this bug using mathematics, because that is how I
work - I am the kind of person that appreciates rigor.  In this case,
let's use inductive reasoning to illuminate this issue.

Now, in this attack scenario let “p” be the overall pool state and let
“n” be the good unknown values added to the pool.  Finally, let “k” be
the known values - such as jiffies.  If we then describe the ratio of
unknown uniqueness with known uniqueness as p=n/k then as a k grows
the overall predictability of the pool approaches an infinite value as
k approaches zero.   A parallel pool, let's call it p’ (that is
pronounced “p-prime” for those who don’t get the notation).  let
p’=n’/k’. In this case the attacker has no hope of constructing n’,
but they can construct k’ - therefore the attacker’s parasol model of
the pool p’ will become more accurate as the attack persists leading
to p’ = p as time->∞.

Q.E.D.

In summation, it’s not entropy that we are after  and mix_pool_bytes()
does not impact the ratio above. What we are really after is ‘hidden
uniqueness’.  We need unique values that are not known to any would-be
attacker, and the more uncertainty we can add to our pool the harder
it will be for an outsider to reconstruct the internal state - which
is why find_unqinees_in_memeory() is valuable.

There is a way to solve the predictability of the internal pool
without melting the ice caps - and “jumptable” and “gatekey” does this
by allowing access to the pool without revealing ANY internal
structure. Furthermore if we are using an ideal block cipher or hash
function in our csprng then these primitives do a great job at
obscuring the pre-image because that is what they were built for.

TLDR; this ain't good - yes, Jason is right that the branch is
currently vulnerable. So should we roll out this code? Or should we do
it right?

-Michael Brooks


On Mon, Mar 28, 2022 at 4:20 AM Sasha Levin <sashal@kernel.org> wrote:
>
> From: "Jason A. Donenfeld" <Jason@zx2c4.com>
>
> [ Upstream commit 6e8ec2552c7d13991148e551e3325a624d73fac6 ]
>
> The current 4096-bit LFSR used for entropy collection had a few
> desirable attributes for the context in which it was created. For
> example, the state was huge, which meant that /dev/random would be able
> to output quite a bit of accumulated entropy before blocking. It was
> also, in its time, quite fast at accumulating entropy byte-by-byte,
> which matters given the varying contexts in which mix_pool_bytes() is
> called. And its diffusion was relatively high, which meant that changes
> would ripple across several words of state rather quickly.
>
> However, it also suffers from a few security vulnerabilities. In
> particular, inputs learned by an attacker can be undone, but moreover,
> if the state of the pool leaks, its contents can be controlled and
> entirely zeroed out. I've demonstrated this attack with this SMT2
> script, <https://xn--4db.cc/5o9xO8pb>, which Boolector/CaDiCal solves in
> a matter of seconds on a single core of my laptop, resulting in little
> proof of concept C demonstrators such as <https://xn--4db.cc/jCkvvIaH/c>.
>
> For basically all recent formal models of RNGs, these attacks represent
> a significant cryptographic flaw. But how does this manifest
> practically? If an attacker has access to the system to such a degree
> that he can learn the internal state of the RNG, arguably there are
> other lower hanging vulnerabilities -- side-channel, infoleak, or
> otherwise -- that might have higher priority. On the other hand, seed
> files are frequently used on systems that have a hard time generating
> much entropy on their own, and these seed files, being files, often leak
> or are duplicated and distributed accidentally, or are even seeded over
> the Internet intentionally, where their contents might be recorded or
> tampered with. Seen this way, an otherwise quasi-implausible
> vulnerability is a bit more practical than initially thought.
>
> Another aspect of the current mix_pool_bytes() function is that, while
> its performance was arguably competitive for the time in which it was
> created, it's no longer considered so. This patch improves performance
> significantly: on a high-end CPU, an i7-11850H, it improves performance
> of mix_pool_bytes() by 225%, and on a low-end CPU, a Cortex-A7, it
> improves performance by 103%.
>
> This commit replaces the LFSR of mix_pool_bytes() with a straight-
> forward cryptographic hash function, BLAKE2s, which is already in use
> for pool extraction. Universal hashing with a secret seed was considered
> too, something along the lines of <https://eprint.iacr.org/2013/338>,
> but the requirement for a secret seed makes for a chicken & egg problem.
> Instead we go with a formally proven scheme using a computational hash
> function, described in sections 5.1, 6.4, and B.1.8 of
> <https://eprint.iacr.org/2019/198>.
>
> BLAKE2s outputs 256 bits, which should give us an appropriate amount of
> min-entropy accumulation, and a wide enough margin of collision
> resistance against active attacks. mix_pool_bytes() becomes a simple
> call to blake2s_update(), for accumulation, while the extraction step
> becomes a blake2s_final() to generate a seed, with which we can then do
> a HKDF-like or BLAKE2X-like expansion, the first part of which we fold
> back as an init key for subsequent blake2s_update()s, and the rest we
> produce to the caller. This then is provided to our CRNG like usual. In
> that expansion step, we make opportunistic use of 32 bytes of RDRAND
> output, just as before. We also always reseed the crng with 32 bytes,
> unconditionally, or not at all, rather than sometimes with 16 as before,
> as we don't win anything by limiting beyond the 16 byte threshold.
>
> Going for a hash function as an entropy collector is a conservative,
> proven approach. The result of all this is a much simpler and much less
> bespoke construction than what's there now, which not only plugs a
> vulnerability but also improves performance considerably.
>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Cc: Dominik Brodowski <linux@dominikbrodowski.net>
> Reviewed-by: Eric Biggers <ebiggers@google.com>
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Reviewed-by: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>  drivers/char/random.c | 304 ++++++++----------------------------------
>  1 file changed, 55 insertions(+), 249 deletions(-)
>
> diff --git a/drivers/char/random.c b/drivers/char/random.c
> index 3404a91edf29..882f78829a24 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -42,61 +42,6 @@
>   */
>
>  /*
> - * (now, with legal B.S. out of the way.....)
> - *
> - * This routine gathers environmental noise from device drivers, etc.,
> - * and returns good random numbers, suitable for cryptographic use.
> - * Besides the obvious cryptographic uses, these numbers are also good
> - * for seeding TCP sequence numbers, and other places where it is
> - * desirable to have numbers which are not only random, but hard to
> - * predict by an attacker.
> - *
> - * Theory of operation
> - * ===================
> - *
> - * Computers are very predictable devices.  Hence it is extremely hard
> - * to produce truly random numbers on a computer --- as opposed to
> - * pseudo-random numbers, which can easily generated by using a
> - * algorithm.  Unfortunately, it is very easy for attackers to guess
> - * the sequence of pseudo-random number generators, and for some
> - * applications this is not acceptable.  So instead, we must try to
> - * gather "environmental noise" from the computer's environment, which
> - * must be hard for outside attackers to observe, and use that to
> - * generate random numbers.  In a Unix environment, this is best done
> - * from inside the kernel.
> - *
> - * Sources of randomness from the environment include inter-keyboard
> - * timings, inter-interrupt timings from some interrupts, and other
> - * events which are both (a) non-deterministic and (b) hard for an
> - * outside observer to measure.  Randomness from these sources are
> - * added to an "entropy pool", which is mixed using a CRC-like function.
> - * This is not cryptographically strong, but it is adequate assuming
> - * the randomness is not chosen maliciously, and it is fast enough that
> - * the overhead of doing it on every interrupt is very reasonable.
> - * As random bytes are mixed into the entropy pool, the routines keep
> - * an *estimate* of how many bits of randomness have been stored into
> - * the random number generator's internal state.
> - *
> - * When random bytes are desired, they are obtained by taking the BLAKE2s
> - * hash of the contents of the "entropy pool".  The BLAKE2s hash avoids
> - * exposing the internal state of the entropy pool.  It is believed to
> - * be computationally infeasible to derive any useful information
> - * about the input of BLAKE2s from its output.  Even if it is possible to
> - * analyze BLAKE2s in some clever way, as long as the amount of data
> - * returned from the generator is less than the inherent entropy in
> - * the pool, the output data is totally unpredictable.  For this
> - * reason, the routine decreases its internal estimate of how many
> - * bits of "true randomness" are contained in the entropy pool as it
> - * outputs random numbers.
> - *
> - * If this estimate goes to zero, the routine can still generate
> - * random numbers; however, an attacker may (at least in theory) be
> - * able to infer the future output of the generator from prior
> - * outputs.  This requires successful cryptanalysis of BLAKE2s, which is
> - * not believed to be feasible, but there is a remote possibility.
> - * Nonetheless, these numbers should be useful for the vast majority
> - * of purposes.
> - *
>   * Exported interfaces ---- output
>   * ===============================
>   *
> @@ -298,23 +243,6 @@
>   *
>   *     mknod /dev/random c 1 8
>   *     mknod /dev/urandom c 1 9
> - *
> - * Acknowledgements:
> - * =================
> - *
> - * Ideas for constructing this random number generator were derived
> - * from Pretty Good Privacy's random number generator, and from private
> - * discussions with Phil Karn.  Colin Plumb provided a faster random
> - * number generator, which speed up the mixing function of the entropy
> - * pool, taken from PGPfone.  Dale Worley has also contributed many
> - * useful ideas and suggestions to improve this driver.
> - *
> - * Any flaws in the design are solely my responsibility, and should
> - * not be attributed to the Phil, Colin, or any of authors of PGP.
> - *
> - * Further background information on this topic may be obtained from
> - * RFC 1750, "Randomness Recommendations for Security", by Donald
> - * Eastlake, Steve Crocker, and Jeff Schiller.
>   */
>
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> @@ -358,79 +286,15 @@
>
>  /* #define ADD_INTERRUPT_BENCH */
>
> -/*
> - * If the entropy count falls under this number of bits, then we
> - * should wake up processes which are selecting or polling on write
> - * access to /dev/random.
> - */
> -static int random_write_wakeup_bits = 28 * (1 << 5);
> -
> -/*
> - * Originally, we used a primitive polynomial of degree .poolwords
> - * over GF(2).  The taps for various sizes are defined below.  They
> - * were chosen to be evenly spaced except for the last tap, which is 1
> - * to get the twisting happening as fast as possible.
> - *
> - * For the purposes of better mixing, we use the CRC-32 polynomial as
> - * well to make a (modified) twisted Generalized Feedback Shift
> - * Register.  (See M. Matsumoto & Y. Kurita, 1992.  Twisted GFSR
> - * generators.  ACM Transactions on Modeling and Computer Simulation
> - * 2(3):179-194.  Also see M. Matsumoto & Y. Kurita, 1994.  Twisted
> - * GFSR generators II.  ACM Transactions on Modeling and Computer
> - * Simulation 4:254-266)
> - *
> - * Thanks to Colin Plumb for suggesting this.
> - *
> - * The mixing operation is much less sensitive than the output hash,
> - * where we use BLAKE2s.  All that we want of mixing operation is that
> - * it be a good non-cryptographic hash; i.e. it not produce collisions
> - * when fed "random" data of the sort we expect to see.  As long as
> - * the pool state differs for different inputs, we have preserved the
> - * input entropy and done a good job.  The fact that an intelligent
> - * attacker can construct inputs that will produce controlled
> - * alterations to the pool's state is not important because we don't
> - * consider such inputs to contribute any randomness.  The only
> - * property we need with respect to them is that the attacker can't
> - * increase his/her knowledge of the pool's state.  Since all
> - * additions are reversible (knowing the final state and the input,
> - * you can reconstruct the initial state), if an attacker has any
> - * uncertainty about the initial state, he/she can only shuffle that
> - * uncertainty about, but never cause any collisions (which would
> - * decrease the uncertainty).
> - *
> - * Our mixing functions were analyzed by Lacharme, Roeck, Strubel, and
> - * Videau in their paper, "The Linux Pseudorandom Number Generator
> - * Revisited" (see: http://eprint.iacr.org/2012/251.pdf).  In their
> - * paper, they point out that we are not using a true Twisted GFSR,
> - * since Matsumoto & Kurita used a trinomial feedback polynomial (that
> - * is, with only three taps, instead of the six that we are using).
> - * As a result, the resulting polynomial is neither primitive nor
> - * irreducible, and hence does not have a maximal period over
> - * GF(2**32).  They suggest a slight change to the generator
> - * polynomial which improves the resulting TGFSR polynomial to be
> - * irreducible, which we have made here.
> - */
>  enum poolinfo {
> -       POOL_WORDS = 128,
> -       POOL_WORDMASK = POOL_WORDS - 1,
> -       POOL_BYTES = POOL_WORDS * sizeof(u32),
> -       POOL_BITS = POOL_BYTES * 8,
> +       POOL_BITS = BLAKE2S_HASH_SIZE * 8,
>         POOL_BITSHIFT = ilog2(POOL_BITS),
>
>         /* To allow fractional bits to be tracked, the entropy_count field is
>          * denominated in units of 1/8th bits. */
>         POOL_ENTROPY_SHIFT = 3,
>  #define POOL_ENTROPY_BITS() (input_pool.entropy_count >> POOL_ENTROPY_SHIFT)
> -       POOL_FRACBITS = POOL_BITS << POOL_ENTROPY_SHIFT,
> -
> -       /* x^128 + x^104 + x^76 + x^51 +x^25 + x + 1 */
> -       POOL_TAP1 = 104,
> -       POOL_TAP2 = 76,
> -       POOL_TAP3 = 51,
> -       POOL_TAP4 = 25,
> -       POOL_TAP5 = 1,
> -
> -       EXTRACT_SIZE = BLAKE2S_HASH_SIZE / 2
> +       POOL_FRACBITS = POOL_BITS << POOL_ENTROPY_SHIFT
>  };
>
>  /*
> @@ -438,6 +302,12 @@ enum poolinfo {
>   */
>  static DECLARE_WAIT_QUEUE_HEAD(random_write_wait);
>  static struct fasync_struct *fasync;
> +/*
> + * If the entropy count falls under this number of bits, then we
> + * should wake up processes which are selecting or polling on write
> + * access to /dev/random.
> + */
> +static int random_write_wakeup_bits = POOL_BITS * 3 / 4;
>
>  static DEFINE_SPINLOCK(random_ready_list_lock);
>  static LIST_HEAD(random_ready_list);
> @@ -493,73 +363,31 @@ MODULE_PARM_DESC(ratelimit_disable, "Disable random ratelimit suppression");
>   *
>   **********************************************************************/
>
> -static u32 input_pool_data[POOL_WORDS] __latent_entropy;
> -
>  static struct {
> +       struct blake2s_state hash;
>         spinlock_t lock;
> -       u16 add_ptr;
> -       u16 input_rotate;
>         int entropy_count;
>  } input_pool = {
> +       .hash.h = { BLAKE2S_IV0 ^ (0x01010000 | BLAKE2S_HASH_SIZE),
> +                   BLAKE2S_IV1, BLAKE2S_IV2, BLAKE2S_IV3, BLAKE2S_IV4,
> +                   BLAKE2S_IV5, BLAKE2S_IV6, BLAKE2S_IV7 },
> +       .hash.outlen = BLAKE2S_HASH_SIZE,
>         .lock = __SPIN_LOCK_UNLOCKED(input_pool.lock),
>  };
>
> -static ssize_t extract_entropy(void *buf, size_t nbytes, int min);
> -static ssize_t _extract_entropy(void *buf, size_t nbytes);
> +static bool extract_entropy(void *buf, size_t nbytes, int min);
> +static void _extract_entropy(void *buf, size_t nbytes);
>
>  static void crng_reseed(struct crng_state *crng, bool use_input_pool);
>
> -static const u32 twist_table[8] = {
> -       0x00000000, 0x3b6e20c8, 0x76dc4190, 0x4db26158,
> -       0xedb88320, 0xd6d6a3e8, 0x9b64c2b0, 0xa00ae278 };
> -
>  /*
>   * This function adds bytes into the entropy "pool".  It does not
>   * update the entropy estimate.  The caller should call
>   * credit_entropy_bits if this is appropriate.
> - *
> - * The pool is stirred with a primitive polynomial of the appropriate
> - * degree, and then twisted.  We twist by three bits at a time because
> - * it's cheap to do so and helps slightly in the expected case where
> - * the entropy is concentrated in the low-order bits.
>   */
>  static void _mix_pool_bytes(const void *in, int nbytes)
>  {
> -       unsigned long i;
> -       int input_rotate;
> -       const u8 *bytes = in;
> -       u32 w;
> -
> -       input_rotate = input_pool.input_rotate;
> -       i = input_pool.add_ptr;
> -
> -       /* mix one byte at a time to simplify size handling and churn faster */
> -       while (nbytes--) {
> -               w = rol32(*bytes++, input_rotate);
> -               i = (i - 1) & POOL_WORDMASK;
> -
> -               /* XOR in the various taps */
> -               w ^= input_pool_data[i];
> -               w ^= input_pool_data[(i + POOL_TAP1) & POOL_WORDMASK];
> -               w ^= input_pool_data[(i + POOL_TAP2) & POOL_WORDMASK];
> -               w ^= input_pool_data[(i + POOL_TAP3) & POOL_WORDMASK];
> -               w ^= input_pool_data[(i + POOL_TAP4) & POOL_WORDMASK];
> -               w ^= input_pool_data[(i + POOL_TAP5) & POOL_WORDMASK];
> -
> -               /* Mix the result back in with a twist */
> -               input_pool_data[i] = (w >> 3) ^ twist_table[w & 7];
> -
> -               /*
> -                * Normally, we add 7 bits of rotation to the pool.
> -                * At the beginning of the pool, add an extra 7 bits
> -                * rotation, so that successive passes spread the
> -                * input bits across the pool evenly.
> -                */
> -               input_rotate = (input_rotate + (i ? 7 : 14)) & 31;
> -       }
> -
> -       input_pool.input_rotate = input_rotate;
> -       input_pool.add_ptr = i;
> +       blake2s_update(&input_pool.hash, in, nbytes);
>  }
>
>  static void __mix_pool_bytes(const void *in, int nbytes)
> @@ -953,15 +781,14 @@ static int crng_slow_load(const u8 *cp, size_t len)
>  static void crng_reseed(struct crng_state *crng, bool use_input_pool)
>  {
>         unsigned long flags;
> -       int i, num;
> +       int i;
>         union {
>                 u8 block[CHACHA_BLOCK_SIZE];
>                 u32 key[8];
>         } buf;
>
>         if (use_input_pool) {
> -               num = extract_entropy(&buf, 32, 16);
> -               if (num == 0)
> +               if (!extract_entropy(&buf, 32, 16))
>                         return;
>         } else {
>                 _extract_crng(&primary_crng, buf.block);
> @@ -1329,74 +1156,48 @@ static size_t account(size_t nbytes, int min)
>  }
>
>  /*
> - * This function does the actual extraction for extract_entropy.
> - *
> - * Note: we assume that .poolwords is a multiple of 16 words.
> + * This is an HKDF-like construction for using the hashed collected entropy
> + * as a PRF key, that's then expanded block-by-block.
>   */
> -static void extract_buf(u8 *out)
> +static void _extract_entropy(void *buf, size_t nbytes)
>  {
> -       struct blake2s_state state __aligned(__alignof__(unsigned long));
> -       u8 hash[BLAKE2S_HASH_SIZE];
> -       unsigned long *salt;
>         unsigned long flags;
> -
> -       blake2s_init(&state, sizeof(hash));
> -
> -       /*
> -        * If we have an architectural hardware random number
> -        * generator, use it for BLAKE2's salt & personal fields.
> -        */
> -       for (salt = (unsigned long *)&state.h[4];
> -            salt < (unsigned long *)&state.h[8]; ++salt) {
> -               unsigned long v;
> -               if (!arch_get_random_long(&v))
> -                       break;
> -               *salt ^= v;
> +       u8 seed[BLAKE2S_HASH_SIZE], next_key[BLAKE2S_HASH_SIZE];
> +       struct {
> +               unsigned long rdrand[32 / sizeof(long)];
> +               size_t counter;
> +       } block;
> +       size_t i;
> +
> +       for (i = 0; i < ARRAY_SIZE(block.rdrand); ++i) {
> +               if (!arch_get_random_long(&block.rdrand[i]))
> +                       block.rdrand[i] = random_get_entropy();
>         }
>
> -       /* Generate a hash across the pool */
>         spin_lock_irqsave(&input_pool.lock, flags);
> -       blake2s_update(&state, (const u8 *)input_pool_data, POOL_BYTES);
> -       blake2s_final(&state, hash); /* final zeros out state */
>
> -       /*
> -        * We mix the hash back into the pool to prevent backtracking
> -        * attacks (where the attacker knows the state of the pool
> -        * plus the current outputs, and attempts to find previous
> -        * outputs), unless the hash function can be inverted. By
> -        * mixing at least a hash worth of hash data back, we make
> -        * brute-forcing the feedback as hard as brute-forcing the
> -        * hash.
> -        */
> -       __mix_pool_bytes(hash, sizeof(hash));
> -       spin_unlock_irqrestore(&input_pool.lock, flags);
> +       /* seed = HASHPRF(last_key, entropy_input) */
> +       blake2s_final(&input_pool.hash, seed);
>
> -       /* Note that EXTRACT_SIZE is half of hash size here, because above
> -        * we've dumped the full length back into mixer. By reducing the
> -        * amount that we emit, we retain a level of forward secrecy.
> -        */
> -       memcpy(out, hash, EXTRACT_SIZE);
> -       memzero_explicit(hash, sizeof(hash));
> -}
> +       /* next_key = HASHPRF(seed, RDRAND || 0) */
> +       block.counter = 0;
> +       blake2s(next_key, (u8 *)&block, seed, sizeof(next_key), sizeof(block), sizeof(seed));
> +       blake2s_init_key(&input_pool.hash, BLAKE2S_HASH_SIZE, next_key, sizeof(next_key));
>
> -static ssize_t _extract_entropy(void *buf, size_t nbytes)
> -{
> -       ssize_t ret = 0, i;
> -       u8 tmp[EXTRACT_SIZE];
> +       spin_unlock_irqrestore(&input_pool.lock, flags);
> +       memzero_explicit(next_key, sizeof(next_key));
>
>         while (nbytes) {
> -               extract_buf(tmp);
> -               i = min_t(int, nbytes, EXTRACT_SIZE);
> -               memcpy(buf, tmp, i);
> +               i = min_t(size_t, nbytes, BLAKE2S_HASH_SIZE);
> +               /* output = HASHPRF(seed, RDRAND || ++counter) */
> +               ++block.counter;
> +               blake2s(buf, (u8 *)&block, seed, i, sizeof(block), sizeof(seed));
>                 nbytes -= i;
>                 buf += i;
> -               ret += i;
>         }
>
> -       /* Wipe data just returned from memory */
> -       memzero_explicit(tmp, sizeof(tmp));
> -
> -       return ret;
> +       memzero_explicit(seed, sizeof(seed));
> +       memzero_explicit(&block, sizeof(block));
>  }
>
>  /*
> @@ -1404,13 +1205,18 @@ static ssize_t _extract_entropy(void *buf, size_t nbytes)
>   * returns it in a buffer.
>   *
>   * The min parameter specifies the minimum amount we can pull before
> - * failing to avoid races that defeat catastrophic reseeding.
> + * failing to avoid races that defeat catastrophic reseeding. If we
> + * have less than min entropy available, we return false and buf is
> + * not filled.
>   */
> -static ssize_t extract_entropy(void *buf, size_t nbytes, int min)
> +static bool extract_entropy(void *buf, size_t nbytes, int min)
>  {
>         trace_extract_entropy(nbytes, POOL_ENTROPY_BITS(), _RET_IP_);
> -       nbytes = account(nbytes, min);
> -       return _extract_entropy(buf, nbytes);
> +       if (account(nbytes, min)) {
> +               _extract_entropy(buf, nbytes);
> +               return true;
> +       }
> +       return false;
>  }
>
>  #define warn_unseeded_randomness(previous) \
> @@ -1674,7 +1480,7 @@ static void __init init_std_data(void)
>         unsigned long rv;
>
>         mix_pool_bytes(&now, sizeof(now));
> -       for (i = POOL_BYTES; i > 0; i -= sizeof(rv)) {
> +       for (i = BLAKE2S_BLOCK_SIZE; i > 0; i -= sizeof(rv)) {
>                 if (!arch_get_random_seed_long(&rv) &&
>                     !arch_get_random_long(&rv))
>                         rv = random_get_entropy();
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction
  2022-03-30 16:08   ` Michael Brooks
@ 2022-03-30 16:49     ` David Laight
  2022-03-30 17:10       ` Michael Brooks
  0 siblings, 1 reply; 63+ messages in thread
From: David Laight @ 2022-03-30 16:49 UTC (permalink / raw)
  To: 'Michael Brooks', Sasha Levin
  Cc: Dominik Brodowski, Eric Biggers, Greg Kroah-Hartman,
	Jason A. Donenfeld, Jean-Philippe Aumasson, Theodore Ts'o,
	linux-kernel, stable

From: Michael Brooks
> Sent: 30 March 2022 17:08
...
> I’d like to describe this bug using mathematics, because that is how I
> work - I am the kind of person that appreciates rigor.  In this case,
> let's use inductive reasoning to illuminate this issue.
> 
> Now, in this attack scenario let “p” be the overall pool state and let
> “n” be the good unknown values added to the pool.  Finally, let “k” be
> the known values - such as jiffies.  If we then describe the ratio of
> unknown uniqueness with known uniqueness as p=n/k then as a k grows
> the overall predictability of the pool approaches an infinite value as
> k approaches zero.   A parallel pool, let's call it p’ (that is
> pronounced “p-prime” for those who don’t get the notation).  let
> p’=n’/k’. In this case the attacker has no hope of constructing n’,
> but they can construct k’ - therefore the attacker’s parasol model of
> the pool p’ will become more accurate as the attack persists leading
> to p’ = p as time->∞.
> 
> Q.E.D.

That rather depends on how the (not) 'randmoness' is added to the pool.
If there are 'r' bits of randomness in the pool and a 'stir in' a pile
of known bits there can still be 'n' bits of randomness in the pool.

The whole thing really relies on the non-reversability of the final prng.
Otherwise if you have 'r' bits of randomness in the pool and 'p' bits
in the prng you only need to request 'r + p' bits of output to be able
to solve the 'p + r' simultaneous equations in 'p + r' unknowns
(I think that is in the field {0, 1}).

The old kernel random number generator that used xor to combine the
outputs of several LFSR is trivially reversable.
It will leak whatever it was seeded with.

The non-reversability of the pool isn't as important since you need
to reverse the prng as well.

So while, in some sense, removing 'p' bits from the entropy pool
to seed the prng only leaves 'r - p' bits left.
That is only true if you think the prng is reversable.
Provided 'r > p' (preferably 'r >> p') you can reseed the prng
again (provided you take reasonably random bits) without
really exposing any more state to an attacker.

Someone doing cat /dev/urandom >/dev/null should just keep reading
values out of the entropy pool.
But if they are discarding the values that shouldn't help them
recover the state of the entropy pool or the prng - even if only
constant values are being added to the pool.

Really what you mustn't do is delete the bits used to seed the prng
from the entropy pool.
About the only way to actually reduce the randomness of the entropy
pool is if you've discovered what is actually in it, know the
'stirring' algorithm and feed in data that exactly cancels out
bits that are present already.
I suspect that anything with root access can manage that!
(Although they can just overwrite the entropy pool itself,
and the prng for that matter.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction
  2022-03-30 16:49     ` David Laight
@ 2022-03-30 17:10       ` Michael Brooks
  2022-03-30 18:33         ` Michael Brooks
  0 siblings, 1 reply; 63+ messages in thread
From: Michael Brooks @ 2022-03-30 17:10 UTC (permalink / raw)
  To: David Laight
  Cc: Sasha Levin, Dominik Brodowski, Eric Biggers, Greg Kroah-Hartman,
	Jason A. Donenfeld, Jean-Philippe Aumasson, Theodore Ts'o,
	linux-kernel, stable

Of course I am assuming local user non-root access.  One does not need
to reverse the mix operations in order to form a parallel construction
- a one way function is sufficient for such a construct as both sides
will operate on the data in the same manner.

This attack scenario is simply a non-issue in keypoolrandom.
https://github.com/TheRook/KeypoolRandom

On Wed, Mar 30, 2022 at 9:49 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Michael Brooks
> > Sent: 30 March 2022 17:08
> ...
> > I’d like to describe this bug using mathematics, because that is how I
> > work - I am the kind of person that appreciates rigor.  In this case,
> > let's use inductive reasoning to illuminate this issue.
> >
> > Now, in this attack scenario let “p” be the overall pool state and let
> > “n” be the good unknown values added to the pool.  Finally, let “k” be
> > the known values - such as jiffies.  If we then describe the ratio of
> > unknown uniqueness with known uniqueness as p=n/k then as a k grows
> > the overall predictability of the pool approaches an infinite value as
> > k approaches zero.   A parallel pool, let's call it p’ (that is
> > pronounced “p-prime” for those who don’t get the notation).  let
> > p’=n’/k’. In this case the attacker has no hope of constructing n’,
> > but they can construct k’ - therefore the attacker’s parasol model of
> > the pool p’ will become more accurate as the attack persists leading
> > to p’ = p as time->∞.
> >
> > Q.E.D.
>
> That rather depends on how the (not) 'randmoness' is added to the pool.
> If there are 'r' bits of randomness in the pool and a 'stir in' a pile
> of known bits there can still be 'n' bits of randomness in the pool.
>
> The whole thing really relies on the non-reversability of the final prng.
> Otherwise if you have 'r' bits of randomness in the pool and 'p' bits
> in the prng you only need to request 'r + p' bits of output to be able
> to solve the 'p + r' simultaneous equations in 'p + r' unknowns
> (I think that is in the field {0, 1}).
>
> The old kernel random number generator that used xor to combine the
> outputs of several LFSR is trivially reversable.
> It will leak whatever it was seeded with.
>
> The non-reversability of the pool isn't as important since you need
> to reverse the prng as well.
>
> So while, in some sense, removing 'p' bits from the entropy pool
> to seed the prng only leaves 'r - p' bits left.
> That is only true if you think the prng is reversable.
> Provided 'r > p' (preferably 'r >> p') you can reseed the prng
> again (provided you take reasonably random bits) without
> really exposing any more state to an attacker.
>
> Someone doing cat /dev/urandom >/dev/null should just keep reading
> values out of the entropy pool.
> But if they are discarding the values that shouldn't help them
> recover the state of the entropy pool or the prng - even if only
> constant values are being added to the pool.
>
> Really what you mustn't do is delete the bits used to seed the prng
> from the entropy pool.
> About the only way to actually reduce the randomness of the entropy
> pool is if you've discovered what is actually in it, know the
> 'stirring' algorithm and feed in data that exactly cancels out
> bits that are present already.
> I suspect that anything with root access can manage that!
> (Although they can just overwrite the entropy pool itself,
> and the prng for that matter.)
>
>         David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction
  2022-03-30 17:10       ` Michael Brooks
@ 2022-03-30 18:33         ` Michael Brooks
  2022-03-30 19:01           ` Theodore Y. Ts'o
  0 siblings, 1 reply; 63+ messages in thread
From: Michael Brooks @ 2022-03-30 18:33 UTC (permalink / raw)
  To: David Laight
  Cc: Sasha Levin, Dominik Brodowski, Eric Biggers, Greg Kroah-Hartman,
	Jason A. Donenfeld, Jean-Philippe Aumasson, Theodore Ts'o,
	linux-kernel, stable

The /dev/random device driver need not concern itself with root
adversaries as this type of user has permissions to read and overwrite
memory - this user even possesses permission to replace the kernel elf
binary with a copy of /dev/random that always returns the number 0 -
that is their right.

This whole issue of leaks is because we are relying upon fast but
insecure hash-function-like methods to extract data from a sensitive
pool.  LFSR and CRC32 (which was used in an earlier version of
/dev/random) have similarities to secure hash functions - but "aren't
good enough for military work" - which is why we are even discussing
the topic of leakage.

If we use a secure primitive in the right way - preferably one that is
faster than SHA1 - such as chcha20 or even faster still AES-NI (both
in a feedback mode).  Then the leakage is stopped.  If the pool cannot
leak then we do not need an entropy counter which is the mutex that
handle_irq_event_percpu() fights over.  This reduces interrupt
latency, reduces computational load of all interrupts, and helps
contribute to the pool being less predictable because now an outside
construct has to account for race conditions.

-Michael

On Wed, Mar 30, 2022 at 10:10 AM Michael Brooks <m@sweetwater.ai> wrote:
>
> Of course I am assuming local user non-root access.  One does not need
> to reverse the mix operations in order to form a parallel construction
> - a one way function is sufficient for such a construct as both sides
> will operate on the data in the same manner.
>
> This attack scenario is simply a non-issue in keypoolrandom.
> https://github.com/TheRook/KeypoolRandom
>
> On Wed, Mar 30, 2022 at 9:49 AM David Laight <David.Laight@aculab.com> wrote:
> >
> > From: Michael Brooks
> > > Sent: 30 March 2022 17:08
> > ...
> > > I’d like to describe this bug using mathematics, because that is how I
> > > work - I am the kind of person that appreciates rigor.  In this case,
> > > let's use inductive reasoning to illuminate this issue.
> > >
> > > Now, in this attack scenario let “p” be the overall pool state and let
> > > “n” be the good unknown values added to the pool.  Finally, let “k” be
> > > the known values - such as jiffies.  If we then describe the ratio of
> > > unknown uniqueness with known uniqueness as p=n/k then as a k grows
> > > the overall predictability of the pool approaches an infinite value as
> > > k approaches zero.   A parallel pool, let's call it p’ (that is
> > > pronounced “p-prime” for those who don’t get the notation).  let
> > > p’=n’/k’. In this case the attacker has no hope of constructing n’,
> > > but they can construct k’ - therefore the attacker’s parasol model of
> > > the pool p’ will become more accurate as the attack persists leading
> > > to p’ = p as time->∞.
> > >
> > > Q.E.D.
> >
> > That rather depends on how the (not) 'randmoness' is added to the pool.
> > If there are 'r' bits of randomness in the pool and a 'stir in' a pile
> > of known bits there can still be 'n' bits of randomness in the pool.
> >
> > The whole thing really relies on the non-reversability of the final prng.
> > Otherwise if you have 'r' bits of randomness in the pool and 'p' bits
> > in the prng you only need to request 'r + p' bits of output to be able
> > to solve the 'p + r' simultaneous equations in 'p + r' unknowns
> > (I think that is in the field {0, 1}).
> >
> > The old kernel random number generator that used xor to combine the
> > outputs of several LFSR is trivially reversable.
> > It will leak whatever it was seeded with.
> >
> > The non-reversability of the pool isn't as important since you need
> > to reverse the prng as well.
> >
> > So while, in some sense, removing 'p' bits from the entropy pool
> > to seed the prng only leaves 'r - p' bits left.
> > That is only true if you think the prng is reversable.
> > Provided 'r > p' (preferably 'r >> p') you can reseed the prng
> > again (provided you take reasonably random bits) without
> > really exposing any more state to an attacker.
> >
> > Someone doing cat /dev/urandom >/dev/null should just keep reading
> > values out of the entropy pool.
> > But if they are discarding the values that shouldn't help them
> > recover the state of the entropy pool or the prng - even if only
> > constant values are being added to the pool.
> >
> > Really what you mustn't do is delete the bits used to seed the prng
> > from the entropy pool.
> > About the only way to actually reduce the randomness of the entropy
> > pool is if you've discovered what is actually in it, know the
> > 'stirring' algorithm and feed in data that exactly cancels out
> > bits that are present already.
> > I suspect that anything with root access can manage that!
> > (Although they can just overwrite the entropy pool itself,
> > and the prng for that matter.)
> >
> >         David
> >
> > -
> > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction
  2022-03-30 18:33         ` Michael Brooks
@ 2022-03-30 19:01           ` Theodore Y. Ts'o
  2022-03-30 19:08             ` Michael Brooks
  0 siblings, 1 reply; 63+ messages in thread
From: Theodore Y. Ts'o @ 2022-03-30 19:01 UTC (permalink / raw)
  To: Michael Brooks
  Cc: David Laight, Sasha Levin, Dominik Brodowski, Eric Biggers,
	Greg Kroah-Hartman, Jason A. Donenfeld, Jean-Philippe Aumasson,
	linux-kernel, stable

On Wed, Mar 30, 2022 at 11:33:21AM -0700, Michael Brooks wrote:
> The /dev/random device driver need not concern itself with root
> adversaries as this type of user has permissions to read and overwrite
> memory - this user even possesses permission to replace the kernel elf
> binary with a copy of /dev/random that always returns the number 0 -
> that is their right.

The design consideration that random number generators do concern
themselves with is recovery after pool exposure.  This could happen
through any number of ways; maybe someone got a hold of the suspended
image after a hiberation, or maybe a VM is getting hybernated, and
then replicated, etc.

One can argue whether or not it's "reasonable" that these sorts of
attacks could happen, or whether they are equivalent to full root
access whether you can overwrite the pool.  The point remains that it
is *possible* to have situations where the internal state of the RNG
might have gotten exposed, and a design criteria is how quickly or
reliably can you reocver from that situation over time.

See the Yarrow paper and its discussion of iterative guessing attack
for an explanation of why cryptographers like John Kelsey, Bruce
Schneier, and Niels Ferguson think it is important.  And please don't
argue with me on this point while discussing which patches should be
backported to stable kernels --- argue with them.  :-)

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction
  2022-03-30 19:01           ` Theodore Y. Ts'o
@ 2022-03-30 19:08             ` Michael Brooks
  0 siblings, 0 replies; 63+ messages in thread
From: Michael Brooks @ 2022-03-30 19:08 UTC (permalink / raw)
  To: Theodore Y. Ts'o
  Cc: David Laight, Sasha Levin, Dominik Brodowski, Eric Biggers,
	Greg Kroah-Hartman, Jason A. Donenfeld, Jean-Philippe Aumasson,
	linux-kernel, stable, nicholas Lyons

Good point Ted, I agree we should have a defense-in-depth design that
plans on failure. I expect keypoolrandom to be resistant against this
attack as well.

In this threat model, p' is a Laplacian Demon. Any parallel
construction is aided by whatever source of information the attacker
can come by.  Using jiffies as a known-perimage aids Laplace's Demon,
as does a memory disclosure vulnerability.  Leplace's Demon can simply
"get lucky" and report both false-positives and false-negatives, but
in this model it should get more lucky over time.  Now we get into the
realm of statistics and predictive mathematics, which gets into
Langevin Dynamics and Einstein's early work describing Brownian
Motion.

Looping in a fellow Cryptographer Nicholas, who has a passion for
Laplace's work.

Regards,
Michael

On Wed, Mar 30, 2022 at 12:01 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> On Wed, Mar 30, 2022 at 11:33:21AM -0700, Michael Brooks wrote:
> > The /dev/random device driver need not concern itself with root
> > adversaries as this type of user has permissions to read and overwrite
> > memory - this user even possesses permission to replace the kernel elf
> > binary with a copy of /dev/random that always returns the number 0 -
> > that is their right.
>
> The design consideration that random number generators do concern
> themselves with is recovery after pool exposure.  This could happen
> through any number of ways; maybe someone got a hold of the suspended
> image after a hiberation, or maybe a VM is getting hybernated, and
> then replicated, etc.
>
> One can argue whether or not it's "reasonable" that these sorts of
> attacks could happen, or whether they are equivalent to full root
> access whether you can overwrite the pool.  The point remains that it
> is *possible* to have situations where the internal state of the RNG
> might have gotten exposed, and a design criteria is how quickly or
> reliably can you reocver from that situation over time.
>
> See the Yarrow paper and its discussion of iterative guessing attack
> for an explanation of why cryptographers like John Kelsey, Bruce
> Schneier, and Niels Ferguson think it is important.  And please don't
> argue with me on this point while discussing which patches should be
> backported to stable kernels --- argue with them.  :-)
>
> Cheers,
>
>                                                 - Ted

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH AUTOSEL 5.17 29/43] signal, x86: Delay calling signals in atomic on RT enabled kernels
  2022-03-28 16:35     ` Sebastian Andrzej Siewior
@ 2022-03-31 16:59       ` Sasha Levin
  0 siblings, 0 replies; 63+ messages in thread
From: Sasha Levin @ 2022-03-31 16:59 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Eric W. Biederman, linux-kernel, stable, Oleg Nesterov,
	Steven Rostedt, Thomas Gleixner, mingo, bp, dave.hansen, x86,
	peterz, juri.lelli, vincent.guittot, luto, frederic,
	mark.rutland, valentin.schneider, keescook, elver, legion

On Mon, Mar 28, 2022 at 06:35:00PM +0200, Sebastian Andrzej Siewior wrote:
>On 2022-03-28 09:31:51 [-0500], Eric W. Biederman wrote:
>>
>> Thank you for cc'ing me.  You probably want to hold off on back-porting
>> this patch.  The appropriate fix requires some more conversation.
>>
>> At a mininum this patch should not be using TIF_NOTIFY_RESUME.
>
>Sasha,
>
>could you please drop this patch from the stable backports (5.15, 5.16, 5.17).

Will do, thanks.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction
  2022-03-29  5:31     ` Jason A. Donenfeld
@ 2022-04-05 22:10       ` Jason A. Donenfeld
  0 siblings, 0 replies; 63+ messages in thread
From: Jason A. Donenfeld @ 2022-04-05 22:10 UTC (permalink / raw)
  To: Sasha Levin, Greg Kroah-Hartman
  Cc: LKML, stable, Theodore Ts'o, Dominik Brodowski, Eric Biggers

Hi Greg, Sasha,

On Tue, Mar 29, 2022 at 7:31 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> I'm inclined to agree with Eric here that you might be a bit careful
> about autosel'ing 5.18, given how extensive the changes were. In
> theory they should all be properly sequenced so that nothing breaks,
> but I'd still be cautious. However, if you want, maybe we can work out
> some plan for backporting.

It's still way too early to backport these, but I'll maintain these
two branches for a little while:

https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/log/?h=stable/linux-5.15.y
https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/log/?h=stable/linux-5.17.y

So that when or if it does ever make sense to do that, it's been
maintained incrementally while the knowledge is fresh. I'm omitting
from those branches new feature development, such as vmgenid.

Jason

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH AUTOSEL 5.17 42/43] Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag"
  2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 42/43] Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag" Sasha Levin
@ 2022-07-07 21:30   ` Tom Crossland
  2022-07-07 21:36     ` Limonciello, Mario
  0 siblings, 1 reply; 63+ messages in thread
From: Tom Crossland @ 2022-07-07 21:30 UTC (permalink / raw)
  To: Sasha Levin, linux-kernel, stable
  Cc: Rafael J. Wysocki, Mario Limonciello, Huang Rui, Mika Westerberg,
	rafael, linux-acpi

Hi, I'm observing the issue described here which I think is due to a 
recent regression:

https://github.com/intel/linux-intel-lts/issues/22

sudo dmesg -t -l err

ACPI BIOS Error (bug): Could not resolve symbol [\_PR.PR00._CPC], 
AE_NOT_FOUND (20211217/psargs-330)
ACPI Error: Aborting method \_PR.PR01._CPC due to previous error 
(AE_NOT_FOUND) (20211217/psparse-529)
ACPI BIOS Error (bug): Could not resolve symbol [\_PR.PR00._CPC], 
AE_NOT_FOUND (20211217/psargs-330)
ACPI Error: Aborting method \_PR.PR02._CPC due to previous error 
(AE_NOT_FOUND) (20211217/psparse-529)
ACPI BIOS Error (bug): Could not resolve symbol [\_PR.PR00._CPC], 
AE_NOT_FOUND (20211217/psargs-330)
ACPI Error: Aborting method \_PR.PR03._CPC due to previous error 
(AE_NOT_FOUND) (20211217/psparse-529)

System:
   Kernel: 5.18.9-arch1-1 arch: x86_64 bits: 64 compiler: gcc v: 12.1.0
     parameters: initrd=\intel-ucode.img initrd=\initramfs-linux.img
     root=xxx intel_iommu=on iommu=pt
  Machine:
   Type: Desktop Mobo: Intel model: NUC7i5BNB v: J31144-304 serial: <filter>
     UEFI: Intel v: BNKBL357.86A.0088.2022.0125.1102 date: 01/25/2022

I hope this is the correct forum to report the issue. Apologies if not.

On 28/03/2022 13.18, Sasha Levin wrote:
> From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
>
> [ Upstream commit 2ca8e6285250c07a2e5a22ecbfd59b5a4ef73484 ]
>
> Revert commit 159d8c274fd9 ("ACPI: Pass the same capabilities to the
> _OSC regardless of the query flag") which caused legitimate usage
> scenarios (when the platform firmware does not want the OS to control
> certain platform features controlled by the system bus scope _OSC) to
> break and was misguided by some misleading language in the _OSC
> definition in the ACPI specification (in particular, Section 6.2.11.1.3
> "Sequence of _OSC Calls" that contradicts other perts of the _OSC
> definition).
>
> Link: https://lore.kernel.org/linux-acpi/CAJZ5v0iStA0JmO0H3z+VgQsVuQONVjKPpw0F5HKfiq=Gb6B5yw@mail.gmail.com
> Reported-by: Mario Limonciello <Mario.Limonciello@amd.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Tested-by: Mario Limonciello <mario.limonciello@amd.com>
> Acked-by: Huang Rui <ray.huang@amd.com>
> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> Signed-off-by: Sasha Levin <sashal@kernel.org>
> ---
>   drivers/acpi/bus.c | 27 +++++++++++++++++++--------
>   1 file changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index 07f604832fd6..079b952ab59f 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -332,21 +332,32 @@ static void acpi_bus_osc_negotiate_platform_control(void)
>   	if (ACPI_FAILURE(acpi_run_osc(handle, &context)))
>   		return;
>   
> -	kfree(context.ret.pointer);
> +	capbuf_ret = context.ret.pointer;
> +	if (context.ret.length <= OSC_SUPPORT_DWORD) {
> +		kfree(context.ret.pointer);
> +		return;
> +	}
>   
> -	/* Now run _OSC again with query flag clear */
> +	/*
> +	 * Now run _OSC again with query flag clear and with the caps
> +	 * supported by both the OS and the platform.
> +	 */
>   	capbuf[OSC_QUERY_DWORD] = 0;
> +	capbuf[OSC_SUPPORT_DWORD] = capbuf_ret[OSC_SUPPORT_DWORD];
> +	kfree(context.ret.pointer);
>   
>   	if (ACPI_FAILURE(acpi_run_osc(handle, &context)))
>   		return;
>   
>   	capbuf_ret = context.ret.pointer;
> -	osc_sb_apei_support_acked =
> -		capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_APEI_SUPPORT;
> -	osc_pc_lpi_support_confirmed =
> -		capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_PCLPI_SUPPORT;
> -	osc_sb_native_usb4_support_confirmed =
> -		capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_NATIVE_USB4_SUPPORT;
> +	if (context.ret.length > OSC_SUPPORT_DWORD) {
> +		osc_sb_apei_support_acked =
> +			capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_APEI_SUPPORT;
> +		osc_pc_lpi_support_confirmed =
> +			capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_PCLPI_SUPPORT;
> +		osc_sb_native_usb4_support_confirmed =
> +			capbuf_ret[OSC_SUPPORT_DWORD] & OSC_SB_NATIVE_USB4_SUPPORT;
> +	}
>   
>   	kfree(context.ret.pointer);
>   }

^ permalink raw reply	[flat|nested] 63+ messages in thread

* RE: [PATCH AUTOSEL 5.17 42/43] Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag"
  2022-07-07 21:30   ` Tom Crossland
@ 2022-07-07 21:36     ` Limonciello, Mario
  2022-07-08  9:22       ` Tom Crossland
  0 siblings, 1 reply; 63+ messages in thread
From: Limonciello, Mario @ 2022-07-07 21:36 UTC (permalink / raw)
  To: Tom Crossland, Sasha Levin, linux-kernel, stable
  Cc: Rafael J. Wysocki, Huang, Ray, Mika Westerberg, rafael, linux-acpi

[Public]



> -----Original Message-----
> From: Tom Crossland <tomc@fortu.net>
> Sent: Thursday, July 7, 2022 16:31
> To: Sasha Levin <sashal@kernel.org>; linux-kernel@vger.kernel.org;
> stable@vger.kernel.org
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>; Limonciello, Mario
> <Mario.Limonciello@amd.com>; Huang, Ray <Ray.Huang@amd.com>; Mika
> Westerberg <mika.westerberg@linux.intel.com>; rafael@kernel.org; linux-
> acpi@vger.kernel.org
> Subject: Re: [PATCH AUTOSEL 5.17 42/43] Revert "ACPI: Pass the same
> capabilities to the _OSC regardless of the query flag"
> 
> Hi, I'm observing the issue described here which I think is due to a
> recent regression:
> 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.c
> om%2Fintel%2Flinux-intel-
> lts%2Fissues%2F22&amp;data=05%7C01%7CMario.Limonciello%40amd.com%7
> C77419b612f9540e333ff08da606002ee%7C3dd8961fe4884e608e11a82d994e18
> 3d%7C0%7C0%7C637928263354159054%7CUnknown%7CTWFpbGZsb3d8eyJWI
> joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C300
> 0%7C%7C%7C&amp;sdata=X%2FEAU9GbRD%2FfYxCMUmnWI1cJ8dk8sICk0iYu
> %2BKGqtl4%3D&amp;reserved=0
> 
> sudo dmesg -t -l err
> 
> ACPI BIOS Error (bug): Could not resolve symbol [\_PR.PR00._CPC],
> AE_NOT_FOUND (20211217/psargs-330)
> ACPI Error: Aborting method \_PR.PR01._CPC due to previous error
> (AE_NOT_FOUND) (20211217/psparse-529)
> ACPI BIOS Error (bug): Could not resolve symbol [\_PR.PR00._CPC],
> AE_NOT_FOUND (20211217/psargs-330)
> ACPI Error: Aborting method \_PR.PR02._CPC due to previous error
> (AE_NOT_FOUND) (20211217/psparse-529)
> ACPI BIOS Error (bug): Could not resolve symbol [\_PR.PR00._CPC],
> AE_NOT_FOUND (20211217/psargs-330)
> ACPI Error: Aborting method \_PR.PR03._CPC due to previous error
> (AE_NOT_FOUND) (20211217/psparse-529)
> 
> System:
>    Kernel: 5.18.9-arch1-1 arch: x86_64 bits: 64 compiler: gcc v: 12.1.0
>      parameters: initrd=\intel-ucode.img initrd=\initramfs-linux.img
>      root=xxx intel_iommu=on iommu=pt
>   Machine:
>    Type: Desktop Mobo: Intel model: NUC7i5BNB v: J31144-304 serial: <filter>
>      UEFI: Intel v: BNKBL357.86A.0088.2022.0125.1102 date: 01/25/2022
> 
> I hope this is the correct forum to report the issue. Apologies if not.
> 

This is the fix for it:

https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-next&id=7feec7430edddb87c24b0a86b08a03d0b496a755


> On 28/03/2022 13.18, Sasha Levin wrote:
> > From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> >
> > [ Upstream commit 2ca8e6285250c07a2e5a22ecbfd59b5a4ef73484 ]
> >
> > Revert commit 159d8c274fd9 ("ACPI: Pass the same capabilities to the
> > _OSC regardless of the query flag") which caused legitimate usage
> > scenarios (when the platform firmware does not want the OS to control
> > certain platform features controlled by the system bus scope _OSC) to
> > break and was misguided by some misleading language in the _OSC
> > definition in the ACPI specification (in particular, Section 6.2.11.1.3
> > "Sequence of _OSC Calls" that contradicts other perts of the _OSC
> > definition).
> >
> > Link:
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker
> nel.org%2Flinux-
> acpi%2FCAJZ5v0iStA0JmO0H3z%2BVgQsVuQONVjKPpw0F5HKfiq%3DGb6B5yw%
> 40mail.gmail.com&amp;data=05%7C01%7CMario.Limonciello%40amd.com%7C
> 77419b612f9540e333ff08da606002ee%7C3dd8961fe4884e608e11a82d994e183
> d%7C0%7C0%7C637928263354159054%7CUnknown%7CTWFpbGZsb3d8eyJWIj
> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C300
> 0%7C%7C%7C&amp;sdata=Te3BK%2B0q2QmrqqoG5mbV%2FNguoMgiwzILNHl
> %2BhUMLFlY%3D&amp;reserved=0
> > Reported-by: Mario Limonciello <Mario.Limonciello@amd.com>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Tested-by: Mario Limonciello <mario.limonciello@amd.com>
> > Acked-by: Huang Rui <ray.huang@amd.com>
> > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > Signed-off-by: Sasha Levin <sashal@kernel.org>
> > ---
> >   drivers/acpi/bus.c | 27 +++++++++++++++++++--------
> >   1 file changed, 19 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> > index 07f604832fd6..079b952ab59f 100644
> > --- a/drivers/acpi/bus.c
> > +++ b/drivers/acpi/bus.c
> > @@ -332,21 +332,32 @@ static void
> acpi_bus_osc_negotiate_platform_control(void)
> >   	if (ACPI_FAILURE(acpi_run_osc(handle, &context)))
> >   		return;
> >
> > -	kfree(context.ret.pointer);
> > +	capbuf_ret = context.ret.pointer;
> > +	if (context.ret.length <= OSC_SUPPORT_DWORD) {
> > +		kfree(context.ret.pointer);
> > +		return;
> > +	}
> >
> > -	/* Now run _OSC again with query flag clear */
> > +	/*
> > +	 * Now run _OSC again with query flag clear and with the caps
> > +	 * supported by both the OS and the platform.
> > +	 */
> >   	capbuf[OSC_QUERY_DWORD] = 0;
> > +	capbuf[OSC_SUPPORT_DWORD] =
> capbuf_ret[OSC_SUPPORT_DWORD];
> > +	kfree(context.ret.pointer);
> >
> >   	if (ACPI_FAILURE(acpi_run_osc(handle, &context)))
> >   		return;
> >
> >   	capbuf_ret = context.ret.pointer;
> > -	osc_sb_apei_support_acked =
> > -		capbuf_ret[OSC_SUPPORT_DWORD] &
> OSC_SB_APEI_SUPPORT;
> > -	osc_pc_lpi_support_confirmed =
> > -		capbuf_ret[OSC_SUPPORT_DWORD] &
> OSC_SB_PCLPI_SUPPORT;
> > -	osc_sb_native_usb4_support_confirmed =
> > -		capbuf_ret[OSC_SUPPORT_DWORD] &
> OSC_SB_NATIVE_USB4_SUPPORT;
> > +	if (context.ret.length > OSC_SUPPORT_DWORD) {
> > +		osc_sb_apei_support_acked =
> > +			capbuf_ret[OSC_SUPPORT_DWORD] &
> OSC_SB_APEI_SUPPORT;
> > +		osc_pc_lpi_support_confirmed =
> > +			capbuf_ret[OSC_SUPPORT_DWORD] &
> OSC_SB_PCLPI_SUPPORT;
> > +		osc_sb_native_usb4_support_confirmed =
> > +			capbuf_ret[OSC_SUPPORT_DWORD] &
> OSC_SB_NATIVE_USB4_SUPPORT;
> > +	}
> >
> >   	kfree(context.ret.pointer);
> >   }

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: [PATCH AUTOSEL 5.17 42/43] Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag"
  2022-07-07 21:36     ` Limonciello, Mario
@ 2022-07-08  9:22       ` Tom Crossland
  0 siblings, 0 replies; 63+ messages in thread
From: Tom Crossland @ 2022-07-08  9:22 UTC (permalink / raw)
  To: Limonciello, Mario
  Cc: Sasha Levin, linux-kernel, stable, Rafael J. Wysocki, Huang, Ray,
	Mika Westerberg, rafael, linux-acpi

I can confirm that the ACPI BIOS Errors no longer appear in the kernel
log using mainline 5.19.0-rc5 with the patch applied.

Many thanks

On Thu, Jul 7, 2022 at 11:36 PM Limonciello, Mario
<Mario.Limonciello@amd.com> wrote:
>
> [Public]
>
>
>
> > -----Original Message-----
> > From: Tom Crossland <tomc@fortu.net>
> > Sent: Thursday, July 7, 2022 16:31
> > To: Sasha Levin <sashal@kernel.org>; linux-kernel@vger.kernel.org;
> > stable@vger.kernel.org
> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>; Limonciello, Mario
> > <Mario.Limonciello@amd.com>; Huang, Ray <Ray.Huang@amd.com>; Mika
> > Westerberg <mika.westerberg@linux.intel.com>; rafael@kernel.org; linux-
> > acpi@vger.kernel.org
> > Subject: Re: [PATCH AUTOSEL 5.17 42/43] Revert "ACPI: Pass the same
> > capabilities to the _OSC regardless of the query flag"
> >
> > Hi, I'm observing the issue described here which I think is due to a
> > recent regression:
> >
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.c
> > om%2Fintel%2Flinux-intel-
> > lts%2Fissues%2F22&amp;data=05%7C01%7CMario.Limonciello%40amd.com%7
> > C77419b612f9540e333ff08da606002ee%7C3dd8961fe4884e608e11a82d994e18
> > 3d%7C0%7C0%7C637928263354159054%7CUnknown%7CTWFpbGZsb3d8eyJWI
> > joiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C300
> > 0%7C%7C%7C&amp;sdata=X%2FEAU9GbRD%2FfYxCMUmnWI1cJ8dk8sICk0iYu
> > %2BKGqtl4%3D&amp;reserved=0
> >
> > sudo dmesg -t -l err
> >
> > ACPI BIOS Error (bug): Could not resolve symbol [\_PR.PR00._CPC],
> > AE_NOT_FOUND (20211217/psargs-330)
> > ACPI Error: Aborting method \_PR.PR01._CPC due to previous error
> > (AE_NOT_FOUND) (20211217/psparse-529)
> > ACPI BIOS Error (bug): Could not resolve symbol [\_PR.PR00._CPC],
> > AE_NOT_FOUND (20211217/psargs-330)
> > ACPI Error: Aborting method \_PR.PR02._CPC due to previous error
> > (AE_NOT_FOUND) (20211217/psparse-529)
> > ACPI BIOS Error (bug): Could not resolve symbol [\_PR.PR00._CPC],
> > AE_NOT_FOUND (20211217/psargs-330)
> > ACPI Error: Aborting method \_PR.PR03._CPC due to previous error
> > (AE_NOT_FOUND) (20211217/psparse-529)
> >
> > System:
> >    Kernel: 5.18.9-arch1-1 arch: x86_64 bits: 64 compiler: gcc v: 12.1.0
> >      parameters: initrd=\intel-ucode.img initrd=\initramfs-linux.img
> >      root=xxx intel_iommu=on iommu=pt
> >   Machine:
> >    Type: Desktop Mobo: Intel model: NUC7i5BNB v: J31144-304 serial: <filter>
> >      UEFI: Intel v: BNKBL357.86A.0088.2022.0125.1102 date: 01/25/2022
> >
> > I hope this is the correct forum to report the issue. Apologies if not.
> >
>
> This is the fix for it:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-next&id=7feec7430edddb87c24b0a86b08a03d0b496a755
>
>
> > On 28/03/2022 13.18, Sasha Levin wrote:
> > > From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> > >
> > > [ Upstream commit 2ca8e6285250c07a2e5a22ecbfd59b5a4ef73484 ]
> > >
> > > Revert commit 159d8c274fd9 ("ACPI: Pass the same capabilities to the
> > > _OSC regardless of the query flag") which caused legitimate usage
> > > scenarios (when the platform firmware does not want the OS to control
> > > certain platform features controlled by the system bus scope _OSC) to
> > > break and was misguided by some misleading language in the _OSC
> > > definition in the ACPI specification (in particular, Section 6.2.11.1.3
> > > "Sequence of _OSC Calls" that contradicts other perts of the _OSC
> > > definition).
> > >
> > > Link:
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.ker
> > nel.org%2Flinux-
> > acpi%2FCAJZ5v0iStA0JmO0H3z%2BVgQsVuQONVjKPpw0F5HKfiq%3DGb6B5yw%
> > 40mail.gmail.com&amp;data=05%7C01%7CMario.Limonciello%40amd.com%7C
> > 77419b612f9540e333ff08da606002ee%7C3dd8961fe4884e608e11a82d994e183
> > d%7C0%7C0%7C637928263354159054%7CUnknown%7CTWFpbGZsb3d8eyJWIj
> > oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C300
> > 0%7C%7C%7C&amp;sdata=Te3BK%2B0q2QmrqqoG5mbV%2FNguoMgiwzILNHl
> > %2BhUMLFlY%3D&amp;reserved=0
> > > Reported-by: Mario Limonciello <Mario.Limonciello@amd.com>
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Tested-by: Mario Limonciello <mario.limonciello@amd.com>
> > > Acked-by: Huang Rui <ray.huang@amd.com>
> > > Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > > Signed-off-by: Sasha Levin <sashal@kernel.org>
> > > ---
> > >   drivers/acpi/bus.c | 27 +++++++++++++++++++--------
> > >   1 file changed, 19 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> > > index 07f604832fd6..079b952ab59f 100644
> > > --- a/drivers/acpi/bus.c
> > > +++ b/drivers/acpi/bus.c
> > > @@ -332,21 +332,32 @@ static void
> > acpi_bus_osc_negotiate_platform_control(void)
> > >     if (ACPI_FAILURE(acpi_run_osc(handle, &context)))
> > >             return;
> > >
> > > -   kfree(context.ret.pointer);
> > > +   capbuf_ret = context.ret.pointer;
> > > +   if (context.ret.length <= OSC_SUPPORT_DWORD) {
> > > +           kfree(context.ret.pointer);
> > > +           return;
> > > +   }
> > >
> > > -   /* Now run _OSC again with query flag clear */
> > > +   /*
> > > +    * Now run _OSC again with query flag clear and with the caps
> > > +    * supported by both the OS and the platform.
> > > +    */
> > >     capbuf[OSC_QUERY_DWORD] = 0;
> > > +   capbuf[OSC_SUPPORT_DWORD] =
> > capbuf_ret[OSC_SUPPORT_DWORD];
> > > +   kfree(context.ret.pointer);
> > >
> > >     if (ACPI_FAILURE(acpi_run_osc(handle, &context)))
> > >             return;
> > >
> > >     capbuf_ret = context.ret.pointer;
> > > -   osc_sb_apei_support_acked =
> > > -           capbuf_ret[OSC_SUPPORT_DWORD] &
> > OSC_SB_APEI_SUPPORT;
> > > -   osc_pc_lpi_support_confirmed =
> > > -           capbuf_ret[OSC_SUPPORT_DWORD] &
> > OSC_SB_PCLPI_SUPPORT;
> > > -   osc_sb_native_usb4_support_confirmed =
> > > -           capbuf_ret[OSC_SUPPORT_DWORD] &
> > OSC_SB_NATIVE_USB4_SUPPORT;
> > > +   if (context.ret.length > OSC_SUPPORT_DWORD) {
> > > +           osc_sb_apei_support_acked =
> > > +                   capbuf_ret[OSC_SUPPORT_DWORD] &
> > OSC_SB_APEI_SUPPORT;
> > > +           osc_pc_lpi_support_confirmed =
> > > +                   capbuf_ret[OSC_SUPPORT_DWORD] &
> > OSC_SB_PCLPI_SUPPORT;
> > > +           osc_sb_native_usb4_support_confirmed =
> > > +                   capbuf_ret[OSC_SUPPORT_DWORD] &
> > OSC_SB_NATIVE_USB4_SUPPORT;
> > > +   }
> > >
> > >     kfree(context.ret.pointer);
> > >   }

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2022-07-08  9:22 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-28 11:17 [PATCH AUTOSEL 5.17 01/43] LSM: general protection fault in legacy_parse_param Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 02/43] regulator: rpi-panel: Handle I2C errors/timing to the Atmel Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 03/43] crypto: hisilicon/qm - cleanup warning in qm_vf_read_qos Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 04/43] crypto: octeontx2 - CN10K CPT to RNM workaround Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 05/43] gcc-plugins/stackleak: Exactly match strings instead of prefixes Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 06/43] rcu: Kill rnp->ofl_seq and use only rcu_state.ofl_lock for exclusion Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 07/43] pinctrl: npcm: Fix broken references to chip->parent_device Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 08/43] rcu: Mark writes to the rcu_segcblist structure's ->flags field Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 09/43] block: throttle split bio in case of iops limit Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 10/43] memstick/mspro_block: fix handling of read-only devices Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 11/43] block/bfq_wf2q: correct weight to ioprio Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 12/43] crypto: xts - Add softdep on ecb Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 13/43] crypto: hisilicon/sec - not need to enable sm4 extra mode at HW V3 Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 14/43] block, bfq: don't move oom_bfqq Sasha Levin
2022-03-28 11:17 ` [PATCH AUTOSEL 5.17 15/43] selinux: use correct type for context length Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 16/43] random: use computational hash for entropy extraction Sasha Levin
2022-03-28 18:08   ` Eric Biggers
2022-03-28 18:34     ` Michael Brooks
2022-03-29  5:31     ` Jason A. Donenfeld
2022-04-05 22:10       ` Jason A. Donenfeld
2022-03-29 15:38     ` Theodore Ts'o
2022-03-29 17:34       ` Michael Brooks
2022-03-29 18:28         ` Theodore Ts'o
     [not found]   ` <CAOnCY6RUN+CSwjsD6Vg-MDi7ERAj2kKLorMLGp1jE8dTZ+3cpQ@mail.gmail.com>
2022-03-28 19:33     ` Michael Brooks
2022-03-30 16:08   ` Michael Brooks
2022-03-30 16:49     ` David Laight
2022-03-30 17:10       ` Michael Brooks
2022-03-30 18:33         ` Michael Brooks
2022-03-30 19:01           ` Theodore Y. Ts'o
2022-03-30 19:08             ` Michael Brooks
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 17/43] random: remove batched entropy locking Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 18/43] random: absorb fast pool into input pool after fast load Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 19/43] powercap/dtpm_cpu: Reset per_cpu variable in the release function Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 20/43] random: round-robin registers as ulong, not u32 Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 21/43] arm64: module: remove (NOLOAD) from linker script Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 22/43] selinux: allow FIOCLEX and FIONCLEX with policy capability Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 23/43] loop: use sysfs_emit() in the sysfs xxx show() Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 24/43] Fix incorrect type in assignment of ipv6 port for audit Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 25/43] irqchip/qcom-pdc: Fix broken locking Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 26/43] irqchip/nvic: Release nvic_base upon failure Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 27/43] fs/binfmt_elf: Fix AT_PHDR for unusual ELF files Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 28/43] hwrng: cavium - fix NULL but dereferenced coccicheck error Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 29/43] signal, x86: Delay calling signals in atomic on RT enabled kernels Sasha Levin
2022-03-28 14:31   ` Eric W. Biederman
2022-03-28 16:35     ` Sebastian Andrzej Siewior
2022-03-31 16:59       ` Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 30/43] bfq: fix use-after-free in bfq_dispatch_request Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 31/43] ACPICA: Avoid walking the ACPI Namespace if it is not there Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 32/43] ACPI / x86: Add skip i2c clients quirk for Nextbook Ares 8 Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 33/43] ACPI / x86: Add skip i2c clients quirk for Lenovo Yoga Tablet 1050F/L Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 34/43] lib/raid6/test/Makefile: Use $(pound) instead of \# for Make 4.3 Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 35/43] Revert "Revert "block, bfq: honor already-setup queue merges"" Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 36/43] ACPI/APEI: Limit printable size of BERT table data Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 37/43] PM: core: keep irq flags in device_pm_check_callbacks() Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 38/43] parisc: Fix non-access data TLB cache flush faults Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 39/43] parisc: Fix handling off probe non-access faults Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 40/43] nvme-tcp: lockdep: annotate in-kernel sockets Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 41/43] spi: tegra20: Use of_device_get_match_data() Sasha Levin
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 42/43] Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag" Sasha Levin
2022-07-07 21:30   ` Tom Crossland
2022-07-07 21:36     ` Limonciello, Mario
2022-07-08  9:22       ` Tom Crossland
2022-03-28 11:18 ` [PATCH AUTOSEL 5.17 43/43] spi: fsi: Implement a timeout for polling status Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).