linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT pull] irq/urgent for v5.16-rc1
@ 2021-11-14 13:30 Thomas Gleixner
  2021-11-14 13:30 ` [GIT pull] locking/urgent " Thomas Gleixner
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Thomas Gleixner @ 2021-11-14 13:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, x86

Linus,

please pull the latest irq/urgent branch from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq-urgent-2021-11-14

up to:  979292af5b51: Merge tag 'irqchip-fixes-5.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent

A set of fixes for the interrupt subsystem:

  - Core code:

    A regression fix for the Open Firmware interrupt mapping code where a
    interrupt controller property in a node caused a map property in the
    same node to be ignored.

  - Interrupt chip drivers:

    - Workaround a limitation in SiFive PLIC interrupt chip which silently
      ignores an EOI when the interrupt line is masked.

    - Provide the missing mask/unmask implementation for the CSKY MP
      interrupt controller.

  - PCI/MSI:

    - Prevent a use after free when PCI/MSI interrupts are released by
      destroying the sysfs entries before freeing the memory which is
      accessed in the sysfs show() function.

    - Implement a mask quirk for the Nvidia ION AHCI chip which does not
      advertise masking capability despite implementing it. Even worse the
      chip comes out of reset with all MSI entries masked, which due to the
      missing masking capability never get unmasked.

    - Move the check which prevents accessing the MSI[X] masking for XEN
      back into the low level accessors. The recent consolidation missed
      that these accessors can be invoked from places which do not have
      that check which broke XEN. Move them back to he original place
      instead of sprinkling tons of these checks all over the code.

   
Thanks,

	tglx

------------------>
Guo Ren (2):
      irqchip/csky-mpintc: Fixup mask/unmask implementation
      irqchip/sifive-plic: Fixup EOI failed when masked

Marc Zyngier (3):
      PCI/MSI: Deal with devices lying about their MSI mask capability
      PCI: Add MSI masking quirk for Nvidia ION AHCI
      of/irq: Don't ignore interrupt-controller when interrupt-map failed

Thomas Gleixner (2):
      PCI/MSI: Move non-mask check back into low level accessors
      PCI/MSI: Destroy sysfs before freeing entries


 drivers/irqchip/irq-csky-mpintc.c |  8 ++++----
 drivers/irqchip/irq-sifive-plic.c |  8 +++++++-
 drivers/of/irq.c                  | 19 ++++++++++++++++---
 drivers/pci/msi.c                 | 39 ++++++++++++++++++++++-----------------
 drivers/pci/quirks.c              |  6 ++++++
 include/linux/msi.h               |  2 +-
 include/linux/pci.h               |  2 ++
 kernel/irq/msi.c                  |  4 ++--
 8 files changed, 60 insertions(+), 28 deletions(-)

diff --git a/drivers/irqchip/irq-csky-mpintc.c b/drivers/irqchip/irq-csky-mpintc.c
index cb403c960ac0..4aebd67d4f8f 100644
--- a/drivers/irqchip/irq-csky-mpintc.c
+++ b/drivers/irqchip/irq-csky-mpintc.c
@@ -78,7 +78,7 @@ static void csky_mpintc_handler(struct pt_regs *regs)
 		readl_relaxed(reg_base + INTCL_RDYIR));
 }
 
-static void csky_mpintc_enable(struct irq_data *d)
+static void csky_mpintc_unmask(struct irq_data *d)
 {
 	void __iomem *reg_base = this_cpu_read(intcl_reg);
 
@@ -87,7 +87,7 @@ static void csky_mpintc_enable(struct irq_data *d)
 	writel_relaxed(d->hwirq, reg_base + INTCL_SENR);
 }
 
-static void csky_mpintc_disable(struct irq_data *d)
+static void csky_mpintc_mask(struct irq_data *d)
 {
 	void __iomem *reg_base = this_cpu_read(intcl_reg);
 
@@ -164,8 +164,8 @@ static int csky_irq_set_affinity(struct irq_data *d,
 static struct irq_chip csky_irq_chip = {
 	.name           = "C-SKY SMP Intc",
 	.irq_eoi	= csky_mpintc_eoi,
-	.irq_enable	= csky_mpintc_enable,
-	.irq_disable	= csky_mpintc_disable,
+	.irq_unmask	= csky_mpintc_unmask,
+	.irq_mask	= csky_mpintc_mask,
 	.irq_set_type	= csky_mpintc_set_type,
 #ifdef CONFIG_SMP
 	.irq_set_affinity = csky_irq_set_affinity,
diff --git a/drivers/irqchip/irq-sifive-plic.c b/drivers/irqchip/irq-sifive-plic.c
index cf74cfa82045..259065d271ef 100644
--- a/drivers/irqchip/irq-sifive-plic.c
+++ b/drivers/irqchip/irq-sifive-plic.c
@@ -163,7 +163,13 @@ static void plic_irq_eoi(struct irq_data *d)
 {
 	struct plic_handler *handler = this_cpu_ptr(&plic_handlers);
 
-	writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
+	if (irqd_irq_masked(d)) {
+		plic_irq_unmask(d);
+		writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
+		plic_irq_mask(d);
+	} else {
+		writel(d->hwirq, handler->hart_base + CONTEXT_CLAIM);
+	}
 }
 
 static struct irq_chip plic_chip = {
diff --git a/drivers/of/irq.c b/drivers/of/irq.c
index 32be5a03951f..b10f015b2e37 100644
--- a/drivers/of/irq.c
+++ b/drivers/of/irq.c
@@ -161,9 +161,10 @@ int of_irq_parse_raw(const __be32 *addr, struct of_phandle_args *out_irq)
 		 * if it is then we are done, unless there is an
 		 * interrupt-map which takes precedence.
 		 */
+		bool intc = of_property_read_bool(ipar, "interrupt-controller");
+
 		imap = of_get_property(ipar, "interrupt-map", &imaplen);
-		if (imap == NULL &&
-		    of_property_read_bool(ipar, "interrupt-controller")) {
+		if (imap == NULL && intc) {
 			pr_debug(" -> got it !\n");
 			return 0;
 		}
@@ -244,8 +245,20 @@ int of_irq_parse_raw(const __be32 *addr, struct of_phandle_args *out_irq)
 
 			pr_debug(" -> imaplen=%d\n", imaplen);
 		}
-		if (!match)
+		if (!match) {
+			if (intc) {
+				/*
+				 * The PASEMI Nemo is a known offender, so
+				 * let's only warn for anyone else.
+				 */
+				WARN(!IS_ENABLED(CONFIG_PPC_PASEMI),
+				     "%pOF interrupt-map failed, using interrupt-controller\n",
+				     ipar);
+				return 0;
+			}
+
 			goto fail;
+		}
 
 		/*
 		 * Successfully parsed an interrrupt-map translation; copy new
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 12e296d634eb..48e3f4e47b29 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -148,6 +148,9 @@ static noinline void pci_msi_update_mask(struct msi_desc *desc, u32 clear, u32 s
 	raw_spinlock_t *lock = &desc->dev->msi_lock;
 	unsigned long flags;
 
+	if (!desc->msi_attrib.can_mask)
+		return;
+
 	raw_spin_lock_irqsave(lock, flags);
 	desc->msi_mask &= ~clear;
 	desc->msi_mask |= set;
@@ -181,7 +184,8 @@ static void pci_msix_write_vector_ctrl(struct msi_desc *desc, u32 ctrl)
 {
 	void __iomem *desc_addr = pci_msix_desc_addr(desc);
 
-	writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
+	if (desc->msi_attrib.can_mask)
+		writel(ctrl, desc_addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
 }
 
 static inline void pci_msix_mask(struct msi_desc *desc)
@@ -200,23 +204,17 @@ static inline void pci_msix_unmask(struct msi_desc *desc)
 
 static void __pci_msi_mask_desc(struct msi_desc *desc, u32 mask)
 {
-	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
-		return;
-
 	if (desc->msi_attrib.is_msix)
 		pci_msix_mask(desc);
-	else if (desc->msi_attrib.maskbit)
+	else
 		pci_msi_mask(desc, mask);
 }
 
 static void __pci_msi_unmask_desc(struct msi_desc *desc, u32 mask)
 {
-	if (pci_msi_ignore_mask || desc->msi_attrib.is_virtual)
-		return;
-
 	if (desc->msi_attrib.is_msix)
 		pci_msix_unmask(desc);
-	else if (desc->msi_attrib.maskbit)
+	else
 		pci_msi_unmask(desc, mask);
 }
 
@@ -370,6 +368,11 @@ static void free_msi_irqs(struct pci_dev *dev)
 			for (i = 0; i < entry->nvec_used; i++)
 				BUG_ON(irq_has_action(entry->irq + i));
 
+	if (dev->msi_irq_groups) {
+		msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
+		dev->msi_irq_groups = NULL;
+	}
+
 	pci_msi_teardown_msi_irqs(dev);
 
 	list_for_each_entry_safe(entry, tmp, msi_list, list) {
@@ -381,11 +384,6 @@ static void free_msi_irqs(struct pci_dev *dev)
 		list_del(&entry->list);
 		free_msi_entry(entry);
 	}
-
-	if (dev->msi_irq_groups) {
-		msi_destroy_sysfs(&dev->dev, dev->msi_irq_groups);
-		dev->msi_irq_groups = NULL;
-	}
 }
 
 static void pci_intx_for_msi(struct pci_dev *dev, int enable)
@@ -479,12 +477,16 @@ msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd)
 		goto out;
 
 	pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, &control);
+	/* Lies, damned lies, and MSIs */
+	if (dev->dev_flags & PCI_DEV_FLAGS_HAS_MSI_MASKING)
+		control |= PCI_MSI_FLAGS_MASKBIT;
 
 	entry->msi_attrib.is_msix	= 0;
 	entry->msi_attrib.is_64		= !!(control & PCI_MSI_FLAGS_64BIT);
 	entry->msi_attrib.is_virtual    = 0;
 	entry->msi_attrib.entry_nr	= 0;
-	entry->msi_attrib.maskbit	= !!(control & PCI_MSI_FLAGS_MASKBIT);
+	entry->msi_attrib.can_mask	= !pci_msi_ignore_mask &&
+					  !!(control & PCI_MSI_FLAGS_MASKBIT);
 	entry->msi_attrib.default_irq	= dev->irq;	/* Save IOAPIC IRQ */
 	entry->msi_attrib.multi_cap	= (control & PCI_MSI_FLAGS_QMASK) >> 1;
 	entry->msi_attrib.multiple	= ilog2(__roundup_pow_of_two(nvec));
@@ -495,7 +497,7 @@ msi_setup_entry(struct pci_dev *dev, int nvec, struct irq_affinity *affd)
 		entry->mask_pos = dev->msi_cap + PCI_MSI_MASK_32;
 
 	/* Save the initial mask status */
-	if (entry->msi_attrib.maskbit)
+	if (entry->msi_attrib.can_mask)
 		pci_read_config_dword(dev, entry->mask_pos, &entry->msi_mask);
 
 out:
@@ -639,10 +641,13 @@ static int msix_setup_entries(struct pci_dev *dev, void __iomem *base,
 		entry->msi_attrib.is_virtual =
 			entry->msi_attrib.entry_nr >= vec_count;
 
+		entry->msi_attrib.can_mask	= !pci_msi_ignore_mask &&
+						  !entry->msi_attrib.is_virtual;
+
 		entry->msi_attrib.default_irq	= dev->irq;
 		entry->mask_base		= base;
 
-		if (!entry->msi_attrib.is_virtual) {
+		if (entry->msi_attrib.can_mask) {
 			addr = pci_msix_desc_addr(entry);
 			entry->msix_ctrl = readl(addr + PCI_MSIX_ENTRY_VECTOR_CTRL);
 		}
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index aedb78c86ddc..003950c738d2 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5851,3 +5851,9 @@ DECLARE_PCI_FIXUP_ENABLE(PCI_VENDOR_ID_PERICOM, 0x2303,
 			 pci_fixup_pericom_acs_store_forward);
 DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_PERICOM, 0x2303,
 			 pci_fixup_pericom_acs_store_forward);
+
+static void nvidia_ion_ahci_fixup(struct pci_dev *pdev)
+{
+	pdev->dev_flags |= PCI_DEV_FLAGS_HAS_MSI_MASKING;
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, 0x0ab8, nvidia_ion_ahci_fixup);
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 49cf6eb222e7..e616f94c7c58 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -148,7 +148,7 @@ struct msi_desc {
 				u8	is_msix		: 1;
 				u8	multiple	: 3;
 				u8	multi_cap	: 3;
-				u8	maskbit		: 1;
+				u8	can_mask	: 1;
 				u8	is_64		: 1;
 				u8	is_virtual	: 1;
 				u16	entry_nr;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c8afbee5da4b..d0dba7fd9848 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -233,6 +233,8 @@ enum pci_dev_flags {
 	PCI_DEV_FLAGS_NO_FLR_RESET = (__force pci_dev_flags_t) (1 << 10),
 	/* Don't use Relaxed Ordering for TLPs directed at this device */
 	PCI_DEV_FLAGS_NO_RELAXED_ORDERING = (__force pci_dev_flags_t) (1 << 11),
+	/* Device does honor MSI masking despite saying otherwise */
+	PCI_DEV_FLAGS_HAS_MSI_MASKING = (__force pci_dev_flags_t) (1 << 12),
 };
 
 enum pci_irq_reroute_variant {
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 6a5ecee6e567..7f350ae59c5f 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -529,10 +529,10 @@ static bool msi_check_reservation_mode(struct irq_domain *domain,
 
 	/*
 	 * Checking the first MSI descriptor is sufficient. MSIX supports
-	 * masking and MSI does so when the maskbit is set.
+	 * masking and MSI does so when the can_mask attribute is set.
 	 */
 	desc = first_msi_entry(dev);
-	return desc->msi_attrib.is_msix || desc->msi_attrib.maskbit;
+	return desc->msi_attrib.is_msix || desc->msi_attrib.can_mask;
 }
 
 int __msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [GIT pull] locking/urgent for v5.16-rc1
  2021-11-14 13:30 [GIT pull] irq/urgent for v5.16-rc1 Thomas Gleixner
@ 2021-11-14 13:30 ` Thomas Gleixner
  2021-11-14 19:10   ` pr-tracker-bot
  2021-11-14 13:30 ` [GIT pull] timers/urgent " Thomas Gleixner
  2021-11-14 19:10 ` [GIT pull] irq/urgent " pr-tracker-bot
  2 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2021-11-14 13:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, x86

Linus,

please pull the latest locking/urgent branch from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking-urgent-2021-11-14

up to:  2105a92748e8: static_call,x86: Robustify trampoline patching

A single fix for static calls to make the trampoline patching more robust
by placing explicit signature bytes after the call trampoline to prevent
patching random other jumps like the CFI jump table entries.

Thanks,

	tglx

------------------>
Peter Zijlstra (1):
      static_call,x86: Robustify trampoline patching


 arch/x86/include/asm/static_call.h |  1 +
 arch/x86/kernel/static_call.c      | 14 ++++++++++----
 tools/objtool/check.c              |  3 +++
 3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/static_call.h b/arch/x86/include/asm/static_call.h
index cbb67b6030f9..39ebe0511869 100644
--- a/arch/x86/include/asm/static_call.h
+++ b/arch/x86/include/asm/static_call.h
@@ -27,6 +27,7 @@
 	    ".globl " STATIC_CALL_TRAMP_STR(name) "		\n"	\
 	    STATIC_CALL_TRAMP_STR(name) ":			\n"	\
 	    insns "						\n"	\
+	    ".byte 0x53, 0x43, 0x54				\n"	\
 	    ".type " STATIC_CALL_TRAMP_STR(name) ", @function	\n"	\
 	    ".size " STATIC_CALL_TRAMP_STR(name) ", . - " STATIC_CALL_TRAMP_STR(name) " \n" \
 	    ".popsection					\n")
diff --git a/arch/x86/kernel/static_call.c b/arch/x86/kernel/static_call.c
index ea028e736831..9c407a33a774 100644
--- a/arch/x86/kernel/static_call.c
+++ b/arch/x86/kernel/static_call.c
@@ -56,10 +56,15 @@ static void __ref __static_call_transform(void *insn, enum insn_type type, void
 	text_poke_bp(insn, code, size, emulate);
 }
 
-static void __static_call_validate(void *insn, bool tail)
+static void __static_call_validate(void *insn, bool tail, bool tramp)
 {
 	u8 opcode = *(u8 *)insn;
 
+	if (tramp && memcmp(insn+5, "SCT", 3)) {
+		pr_err("trampoline signature fail");
+		BUG();
+	}
+
 	if (tail) {
 		if (opcode == JMP32_INSN_OPCODE ||
 		    opcode == RET_INSN_OPCODE)
@@ -74,7 +79,8 @@ static void __static_call_validate(void *insn, bool tail)
 	/*
 	 * If we ever trigger this, our text is corrupt, we'll probably not live long.
 	 */
-	WARN_ONCE(1, "unexpected static_call insn opcode 0x%x at %pS\n", opcode, insn);
+	pr_err("unexpected static_call insn opcode 0x%x at %pS\n", opcode, insn);
+	BUG();
 }
 
 static inline enum insn_type __sc_insn(bool null, bool tail)
@@ -97,12 +103,12 @@ void arch_static_call_transform(void *site, void *tramp, void *func, bool tail)
 	mutex_lock(&text_mutex);
 
 	if (tramp) {
-		__static_call_validate(tramp, true);
+		__static_call_validate(tramp, true, true);
 		__static_call_transform(tramp, __sc_insn(!func, true), func);
 	}
 
 	if (IS_ENABLED(CONFIG_HAVE_STATIC_CALL_INLINE) && site) {
-		__static_call_validate(site, tail);
+		__static_call_validate(site, tail, false);
 		__static_call_transform(site, __sc_insn(!func, tail), func);
 	}
 
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index add39902166d..21735829b860 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -3310,6 +3310,9 @@ static bool ignore_unreachable_insn(struct objtool_file *file, struct instructio
 	if (!insn->func)
 		return false;
 
+	if (insn->func->static_call_tramp)
+		return true;
+
 	/*
 	 * CONFIG_UBSAN_TRAP inserts a UD2 when it sees
 	 * __builtin_unreachable().  The BUG() macro has an unreachable() after


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [GIT pull] timers/urgent for v5.16-rc1
  2021-11-14 13:30 [GIT pull] irq/urgent for v5.16-rc1 Thomas Gleixner
  2021-11-14 13:30 ` [GIT pull] locking/urgent " Thomas Gleixner
@ 2021-11-14 13:30 ` Thomas Gleixner
  2021-11-14 19:02   ` Linus Torvalds
  2021-11-14 19:10   ` pr-tracker-bot
  2021-11-14 19:10 ` [GIT pull] irq/urgent " pr-tracker-bot
  2 siblings, 2 replies; 11+ messages in thread
From: Thomas Gleixner @ 2021-11-14 13:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, x86

Linus,

please pull the latest timers/urgent branch from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-2021-11-14

up to:  ca7752caeaa7: posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()

A single fix for POSIX CPU timers to address a problem where POSIX CPU
timer delivery stops working for a new child task because copy_process()
copies state information which is only valid for the parent task.

Thanks,

	tglx

------------------>
Michael Pratt (1):
      posix-cpu-timers: Clear task::posix_cputimers_work in copy_process()


 include/linux/posix-timers.h   |  2 ++
 kernel/fork.c                  |  1 +
 kernel/time/posix-cpu-timers.c | 19 +++++++++++++++++--
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 00fef0064355..5bbcd280bfd2 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -184,8 +184,10 @@ static inline void posix_cputimers_group_init(struct posix_cputimers *pct,
 #endif
 
 #ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK
+void clear_posix_cputimers_work(struct task_struct *p);
 void posix_cputimers_init_work(void);
 #else
+static inline void clear_posix_cputimers_work(struct task_struct *p) { }
 static inline void posix_cputimers_init_work(void) { }
 #endif
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 8e9feeef555e..8269ae2e5d7c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2279,6 +2279,7 @@ static __latent_entropy struct task_struct *copy_process(
 	p->pdeath_signal = 0;
 	INIT_LIST_HEAD(&p->thread_group);
 	p->task_works = NULL;
+	clear_posix_cputimers_work(p);
 
 #ifdef CONFIG_KRETPROBES
 	p->kretprobe_instances.first = NULL;
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 643d412ac623..96b4e7810426 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1158,14 +1158,29 @@ static void posix_cpu_timers_work(struct callback_head *work)
 	handle_posix_cpu_timers(current);
 }
 
+/*
+ * Clear existing posix CPU timers task work.
+ */
+void clear_posix_cputimers_work(struct task_struct *p)
+{
+	/*
+	 * A copied work entry from the old task is not meaningful, clear it.
+	 * N.B. init_task_work will not do this.
+	 */
+	memset(&p->posix_cputimers_work.work, 0,
+	       sizeof(p->posix_cputimers_work.work));
+	init_task_work(&p->posix_cputimers_work.work,
+		       posix_cpu_timers_work);
+	p->posix_cputimers_work.scheduled = false;
+}
+
 /*
  * Initialize posix CPU timers task work in init task. Out of line to
  * keep the callback static and to avoid header recursion hell.
  */
 void __init posix_cputimers_init_work(void)
 {
-	init_task_work(&current->posix_cputimers_work.work,
-		       posix_cpu_timers_work);
+	clear_posix_cputimers_work(current);
 }
 
 /*


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [GIT pull] timers/urgent for v5.16-rc1
  2021-11-14 13:30 ` [GIT pull] timers/urgent " Thomas Gleixner
@ 2021-11-14 19:02   ` Linus Torvalds
  2021-11-14 19:24     ` Thomas Gleixner
  2021-11-15 15:51     ` Peter Zijlstra
  2021-11-14 19:10   ` pr-tracker-bot
  1 sibling, 2 replies; 11+ messages in thread
From: Linus Torvalds @ 2021-11-14 19:02 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Nov 14, 2021 at 5:31 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> +       /*
> +        * A copied work entry from the old task is not meaningful, clear it.
> +        * N.B. init_task_work will not do this.
> +        */
> +       memset(&p->posix_cputimers_work.work, 0,
> +              sizeof(p->posix_cputimers_work.work));
> +       init_task_work(&p->posix_cputimers_work.work,
> +                      posix_cpu_timers_work);

Ugh.

Instead of the added four lines of comment, and two lines of
"memset()", maybe this should just have made init_task_work() DTRT?

Yes,. I see this:

        /* Protect against double add, see task_tick_numa and task_numa_work */
        p->numa_work.next               = &p->numa_work;
        ...
        init_task_work(&p->numa_work, task_numa_work);

but I think that one is so subtle and such a special case that it
should have been updated - just make that magic special flag happen
after the init_task_work.

A lot of the other cases seem to zero-initialize things elsewhere
(generally with kzalloc()), but I note that at least
io_ring_exit_work() seems to have this:

        struct io_tctx_exit exit;
        ...
        init_task_work(&exit.task_work, io_tctx_exit_cb);

and the ->next pointer is never set to NULL.

Now, in 99% of all cases the ->next pointer simply doesn't matter,
because task_work_add() will only set it, not caring about the old
value.

But apparently it matters for posix_cputimers_work and for numa_work,
and so I think it's very illogical that init_task_work() will not
actually initialize it properly.

Hmm?

I've pulled this, but it really looks like the wrong solution to the
whole "uninitialized data".

And that task_tick_numa() special case is truly horrendous, and really
should go after the init_task_work() regardless, exactly because you'd
expect that init_task_work() to initialize the work even if it doesn't
happen to right now.

Or is somebody doing init_task_work() to only change the work-function
on an already initialized work entry? Becuase that sounds both racy
and broken to me, and none of the things I looked at from a quick grep
looked like that at all.

              Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [GIT pull] irq/urgent for v5.16-rc1
  2021-11-14 13:30 [GIT pull] irq/urgent for v5.16-rc1 Thomas Gleixner
  2021-11-14 13:30 ` [GIT pull] locking/urgent " Thomas Gleixner
  2021-11-14 13:30 ` [GIT pull] timers/urgent " Thomas Gleixner
@ 2021-11-14 19:10 ` pr-tracker-bot
  2 siblings, 0 replies; 11+ messages in thread
From: pr-tracker-bot @ 2021-11-14 19:10 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linus Torvalds, linux-kernel, x86

The pull request you sent on Sun, 14 Nov 2021 14:30:56 +0100 (CET):

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq-urgent-2021-11-14

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/c36e33e2f477052b0f4b1da45af403f98d3f7eb4

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [GIT pull] locking/urgent for v5.16-rc1
  2021-11-14 13:30 ` [GIT pull] locking/urgent " Thomas Gleixner
@ 2021-11-14 19:10   ` pr-tracker-bot
  0 siblings, 0 replies; 11+ messages in thread
From: pr-tracker-bot @ 2021-11-14 19:10 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linus Torvalds, linux-kernel, x86

The pull request you sent on Sun, 14 Nov 2021 14:30:58 +0100 (CET):

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git locking-urgent-2021-11-14

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/218cc8b860a255ce7f1a03ff3ec70953c423d27d

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [GIT pull] timers/urgent for v5.16-rc1
  2021-11-14 13:30 ` [GIT pull] timers/urgent " Thomas Gleixner
  2021-11-14 19:02   ` Linus Torvalds
@ 2021-11-14 19:10   ` pr-tracker-bot
  1 sibling, 0 replies; 11+ messages in thread
From: pr-tracker-bot @ 2021-11-14 19:10 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Linus Torvalds, linux-kernel, x86

The pull request you sent on Sun, 14 Nov 2021 14:30:59 +0100 (CET):

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-2021-11-14

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/622c72b651c85cb55bae147debc1a2fae0189b53

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [GIT pull] timers/urgent for v5.16-rc1
  2021-11-14 19:02   ` Linus Torvalds
@ 2021-11-14 19:24     ` Thomas Gleixner
  2021-11-15 15:51     ` Peter Zijlstra
  1 sibling, 0 replies; 11+ messages in thread
From: Thomas Gleixner @ 2021-11-14 19:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, the arch/x86 maintainers, Peter Zijlstra

On Sun, Nov 14 2021 at 11:02, Linus Torvalds wrote:
> On Sun, Nov 14, 2021 at 5:31 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>>
>> +       /*
>> +        * A copied work entry from the old task is not meaningful, clear it.
>> +        * N.B. init_task_work will not do this.
>> +        */
>> +       memset(&p->posix_cputimers_work.work, 0,
>> +              sizeof(p->posix_cputimers_work.work));
>> +       init_task_work(&p->posix_cputimers_work.work,
>> +                      posix_cpu_timers_work);
>
> Ugh.
>
> Instead of the added four lines of comment, and two lines of
> "memset()", maybe this should just have made init_task_work() DTRT?
>
> Yes,. I see this:
>
>         /* Protect against double add, see task_tick_numa and task_numa_work */
>         p->numa_work.next               = &p->numa_work;
>         ...
>         init_task_work(&p->numa_work, task_numa_work);
>
> but I think that one is so subtle and such a special case that it
> should have been updated - just make that magic special flag happen
> after the init_task_work.
>
> A lot of the other cases seem to zero-initialize things elsewhere
> (generally with kzalloc()), but I note that at least
> io_ring_exit_work() seems to have this:
>
>         struct io_tctx_exit exit;
>         ...
>         init_task_work(&exit.task_work, io_tctx_exit_cb);
>
> and the ->next pointer is never set to NULL.
>
> Now, in 99% of all cases the ->next pointer simply doesn't matter,
> because task_work_add() will only set it, not caring about the old
> value.
>
> But apparently it matters for posix_cputimers_work and for numa_work,
> and so I think it's very illogical that init_task_work() will not
> actually initialize it properly.
>
> Hmm?
>
> I've pulled this, but it really looks like the wrong solution to the
> whole "uninitialized data".
>
> And that task_tick_numa() special case is truly horrendous, and really
> should go after the init_task_work() regardless, exactly because you'd
> expect that init_task_work() to initialize the work even if it doesn't
> happen to right now.
>
> Or is somebody doing init_task_work() to only change the work-function
> on an already initialized work entry? Becuase that sounds both racy
> and broken to me, and none of the things I looked at from a quick grep
> looked like that at all.

I'll have a deeper look at that tomorrow.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [GIT pull] timers/urgent for v5.16-rc1
  2021-11-14 19:02   ` Linus Torvalds
  2021-11-14 19:24     ` Thomas Gleixner
@ 2021-11-15 15:51     ` Peter Zijlstra
  2021-11-15 16:07       ` Peter Zijlstra
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2021-11-15 15:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Linux Kernel Mailing List, the arch/x86 maintainers

On Sun, Nov 14, 2021 at 11:02:31AM -0800, Linus Torvalds wrote:
> On Sun, Nov 14, 2021 at 5:31 AM Thomas Gleixner <tglx@linutronix.de> wrote:

> But apparently it matters for posix_cputimers_work and for numa_work,
> and so I think it's very illogical that init_task_work() will not
> actually initialize it properly.

The problem with the posix timers thing seems to be that it can race
against fork() but afaict it can't actually mis-behave if it has garbage
in ->next, so the clearing here is pure paranoia.

> And that task_tick_numa() special case is truly horrendous, and really
> should go after the init_task_work() regardless, exactly because you'd
> expect that init_task_work() to initialize the work even if it doesn't
> happen to right now.

Yeah, it's a wee bit 'special' allright :-)

> Or is somebody doing init_task_work() to only change the work-function
> on an already initialized work entry? Becuase that sounds both racy
> and broken to me, and none of the things I looked at from a quick grep
> looked like that at all.

The worst I found is someone sharing an rcu_head between task_work and
call_rcu (supposedly at different stages in the life-time).

I couldn't find any other weird cases.

---
 include/linux/task_work.h      | 1 +
 kernel/sched/fair.c            | 4 ++--
 kernel/time/posix-cpu-timers.c | 2 --
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index 5b8a93f288bb..fbbc9aa8e4ae 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -10,6 +10,7 @@ typedef void (*task_work_func_t)(struct callback_head *);
 static inline void
 init_task_work(struct callback_head *twork, task_work_func_t func)
 {
+	twork->next = NULL;
 	twork->func = func;
 }
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6e476f6d9435..d03dacdecf73 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2823,14 +2823,14 @@ void init_numa_balancing(unsigned long clone_flags, struct task_struct *p)
 	p->node_stamp			= 0;
 	p->numa_scan_seq		= mm ? mm->numa_scan_seq : 0;
 	p->numa_scan_period		= sysctl_numa_balancing_scan_delay;
-	/* Protect against double add, see task_tick_numa and task_numa_work */
-	p->numa_work.next		= &p->numa_work;
 	p->numa_faults			= NULL;
 	RCU_INIT_POINTER(p->numa_group, NULL);
 	p->last_task_numa_placement	= 0;
 	p->last_sum_exec_runtime	= 0;
 
 	init_task_work(&p->numa_work, task_numa_work);
+	/* Protect against double add, see task_tick_numa and task_numa_work */
+	p->numa_work.next		= &p->numa_work;
 
 	/* New address space, reset the preferred nid */
 	if (!(clone_flags & CLONE_VM)) {
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index 96b4e7810426..3352759e6916 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1167,8 +1167,6 @@ void clear_posix_cputimers_work(struct task_struct *p)
 	 * A copied work entry from the old task is not meaningful, clear it.
 	 * N.B. init_task_work will not do this.
 	 */
-	memset(&p->posix_cputimers_work.work, 0,
-	       sizeof(p->posix_cputimers_work.work));
 	init_task_work(&p->posix_cputimers_work.work,
 		       posix_cpu_timers_work);
 	p->posix_cputimers_work.scheduled = false;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [GIT pull] timers/urgent for v5.16-rc1
  2021-11-15 15:51     ` Peter Zijlstra
@ 2021-11-15 16:07       ` Peter Zijlstra
  2021-11-15 17:55         ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2021-11-15 16:07 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Linux Kernel Mailing List, the arch/x86 maintainers

On Mon, Nov 15, 2021 at 04:51:34PM +0100, Peter Zijlstra wrote:
> diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
> index 96b4e7810426..3352759e6916 100644
> --- a/kernel/time/posix-cpu-timers.c
> +++ b/kernel/time/posix-cpu-timers.c
> @@ -1167,8 +1167,6 @@ void clear_posix_cputimers_work(struct task_struct *p)
>  	 * A copied work entry from the old task is not meaningful, clear it.
>  	 * N.B. init_task_work will not do this.
>  	 */
> -	memset(&p->posix_cputimers_work.work, 0,
> -	       sizeof(p->posix_cputimers_work.work));

Also, clearly that comments needs to go..

>  	init_task_work(&p->posix_cputimers_work.work,
>  		       posix_cpu_timers_work);
>  	p->posix_cputimers_work.scheduled = false;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [GIT pull] timers/urgent for v5.16-rc1
  2021-11-15 16:07       ` Peter Zijlstra
@ 2021-11-15 17:55         ` Linus Torvalds
  0 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2021-11-15 17:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Thomas Gleixner, Linux Kernel Mailing List, the arch/x86 maintainers

On Mon, Nov 15, 2021 at 8:07 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Also, clearly that comments needs to go..

Yeah, with that your patch looks good to me.

Thanks,
          Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-11-16  1:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-14 13:30 [GIT pull] irq/urgent for v5.16-rc1 Thomas Gleixner
2021-11-14 13:30 ` [GIT pull] locking/urgent " Thomas Gleixner
2021-11-14 19:10   ` pr-tracker-bot
2021-11-14 13:30 ` [GIT pull] timers/urgent " Thomas Gleixner
2021-11-14 19:02   ` Linus Torvalds
2021-11-14 19:24     ` Thomas Gleixner
2021-11-15 15:51     ` Peter Zijlstra
2021-11-15 16:07       ` Peter Zijlstra
2021-11-15 17:55         ` Linus Torvalds
2021-11-14 19:10   ` pr-tracker-bot
2021-11-14 19:10 ` [GIT pull] irq/urgent " pr-tracker-bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).