All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] genirq: Avoid race between cpu hot plug and irq_desc() allocation paths
@ 2018-09-05 17:05 Prakruthi Deepak Heragu
  2018-09-05 18:23 ` Thomas Gleixner
  0 siblings, 1 reply; 4+ messages in thread
From: Prakruthi Deepak Heragu @ 2018-09-05 17:05 UTC (permalink / raw)
  To: tglx
  Cc: linux-kernel, ckadabi, tsoni, bryanh, Prasad Sodagudi,
	Prakruthi Deepak Heragu

One of the cores might have just allocated irq_desc() and other core
might be doing irq migration in the hot plug path. In the hot plug path
during the IRQ migration, for_each_active_irq macro is trying to get
irqs whose bits are set in allocated_irqs bit map but there is no return
value check after irq_to_desc for desc validity.

[   24.566381] msm_thermal:do_core_control Set Offline: CPU4 Temp: 73
[   24.568821] Unable to handle kernel NULL pointer dereference at virtual address 000000a4
[   24.568931] pgd = ffffffc002184000
[   24.568995] [000000a4] *pgd=0000000178df5003, *pud=0000000178df5003, *pmd=0000000178df6003, *pte=0060000017a00707
[   24.569153] ------------[ cut here ]------------
[   24.569228] Kernel BUG at ffffffc0000f3060 [verbose debug info unavailable]
[   24.569334] Internal error: Oops - BUG: 96000005 [#1] PREEMPT SMP
[   24.569422] Modules linked in:
[   24.569486] CPU: 4 PID: 28 Comm: migration/4 Tainted: G        W       4.4.8-perf-9407222-eng #1
[   24.569684] task: ffffffc0f28f0e80 ti: ffffffc0f2a84000 task.ti: ffffffc0f2a84000
[   24.569785] PC is at do_raw_spin_lock+0x20/0x160
[   24.569859] LR is at _raw_spin_lock+0x34/0x40
[   24.569931] pc : [<ffffffc0000f3060>] lr : [<ffffffc001023dec>] pstate: 200001c5
[   24.570029] sp : ffffffc0f2a87bc0
[   24.570091] x29: ffffffc0f2a87bc0 x28: ffffffc001033988
[   24.570174] x27: ffffffc001adebb0 x26: 0000000000000000
[   24.570257] x25: 00000000000000a0 x24: 0000000000000020
[   24.570339] x23: ffffffc001033978 x22: ffffffc001adeb80
[   24.570421] x21: 000000000000027e x20: 0000000000000000
[   24.570502] x19: 00000000000000a0 x18: 000000000000000d
[   24.570584] x17: 0000000000005f00 x16: 0000000000000003
[   24.570666] x15: 000000000000bd39 x14: 0ffffffffffffffe
[   24.570748] x13: 0000000000000000 x12: 0000000000000018
[   24.570829] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
[   24.570911] x9 : fefefefeff332e6d x8 : 7f7f7f7f7f7f7f7f
[   24.570993] x7 : ffffffc00344f6a0 x6 : 0000000000000000
[   24.571075] x5 : 0000000000000001 x4 : ffffffc00344f488
[   24.571157] x3 : 0000000000000000 x2 : 0000000000000000
[   24.571238] x1 : ffffffc0f2a84000 x0 : 0000000000004ead
...
...
...
[   24.581324] Call trace:
[   24.581379] [<ffffffc0000f3060>] do_raw_spin_lock+0x20/0x160
[   24.581463] [<ffffffc001023dec>] _raw_spin_lock+0x34/0x40
[   24.581546] [<ffffffc000103f10>] irq_migrate_all_off_this_cpu+0x84/0x1c4
[   24.581641] [<ffffffc00008ec84>] __cpu_disable+0x54/0x74
[   24.581722] [<ffffffc0000a3368>] take_cpu_down+0x1c/0x58
[   24.581803] [<ffffffc00013ac08>] multi_cpu_stop+0xb0/0x10c
[   24.581885] [<ffffffc00013ad60>] cpu_stopper_thread+0xb8/0x184
[   24.581972] [<ffffffc0000c4920>] smpboot_thread_fn+0x26c/0x290
[   24.582057] [<ffffffc0000c0f84>] kthread+0x100/0x108
[   24.582135] Code: aa0003f3 aa1e03e0 d503201f 5289d5a0 (b9400661)
[   24.582224] ---[ end trace 609f38584306f5d9 ]---

Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Prakruthi Deepak Heragu <pheragu@codeaurora.org>
---
 kernel/irq/cpuhotplug.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 5b1072e..17c5e71 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -160,6 +160,9 @@ void irq_migrate_all_off_this_cpu(void)
 		bool affinity_broken;
 
 		desc = irq_to_desc(irq);
+		if (!desc)
+			continue;
+
 		raw_spin_lock(&desc->lock);
 		affinity_broken = migrate_one_irq(desc);
 		raw_spin_unlock(&desc->lock);
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] genirq: Avoid race between cpu hot plug and irq_desc() allocation paths
  2018-09-05 17:05 [PATCH] genirq: Avoid race between cpu hot plug and irq_desc() allocation paths Prakruthi Deepak Heragu
@ 2018-09-05 18:23 ` Thomas Gleixner
  2018-09-05 23:27   ` pheragu
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Gleixner @ 2018-09-05 18:23 UTC (permalink / raw)
  To: Prakruthi Deepak Heragu
  Cc: linux-kernel, ckadabi, tsoni, bryanh, Prasad Sodagudi

On Wed, 5 Sep 2018, Prakruthi Deepak Heragu wrote:

> One of the cores might have just allocated irq_desc() and other core
> might be doing irq migration in the hot plug path. In the hot plug path
> during the IRQ migration, for_each_active_irq macro is trying to get
> irqs whose bits are set in allocated_irqs bit map but there is no return
> value check after irq_to_desc for desc validity.

Confused. All parts involved, irq allocation/deallocation and the CPU
hotplug code take sparse_irq_lock to prevent exavtly that.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] genirq: Avoid race between cpu hot plug and irq_desc() allocation paths
  2018-09-05 18:23 ` Thomas Gleixner
@ 2018-09-05 23:27   ` pheragu
  2018-09-06  7:56     ` Thomas Gleixner
  0 siblings, 1 reply; 4+ messages in thread
From: pheragu @ 2018-09-05 23:27 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, ckadabi, tsoni, bryanh, Prasad Sodagudi

On 2018-09-05 11:23, Thomas Gleixner wrote:
> On Wed, 5 Sep 2018, Prakruthi Deepak Heragu wrote:
> 
>> One of the cores might have just allocated irq_desc() and other core
>> might be doing irq migration in the hot plug path. In the hot plug 
>> path
>> during the IRQ migration, for_each_active_irq macro is trying to get
>> irqs whose bits are set in allocated_irqs bit map but there is no 
>> return
>> value check after irq_to_desc for desc validity.
> 
> Confused. All parts involved, irq allocation/deallocation and the CPU
> hotplug code take sparse_irq_lock to prevent exavtly that.
> 
Removing the NULL pointer check and adding this sparse_irq_lock
that you suggested will solve this issue. The code looks like
this now. Is this okay?
diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 9409b55..f2ef76e 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -185,14 +185,10 @@ void irq_migrate_all_off_this_cpu(void)
  {
  	struct irq_desc *desc;
  	unsigned int irq;
-
+	irq_lock_sparse();
  	for_each_active_irq(irq) {
  		bool affinity_broken;
-
  		desc = irq_to_desc(irq);
-		if (!desc)
-			continue;
-
  		raw_spin_lock(&desc->lock);
  		affinity_broken = migrate_one_irq(desc);
  		raw_spin_unlock(&desc->lock);
@@ -202,6 +198,7 @@ void irq_migrate_all_off_this_cpu(void)
  					    irq, smp_processor_id());
  		}
  	}
+	irq_unlock_sparse();
  }

  static void irq_restore_affinity_of_irq(struct irq_desc *desc, unsigned 
int cpu)

> Thanks,
> 
> 	tglx

Thanks,
Prakruthi Deepak Heragu

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] genirq: Avoid race between cpu hot plug and irq_desc() allocation paths
  2018-09-05 23:27   ` pheragu
@ 2018-09-06  7:56     ` Thomas Gleixner
  0 siblings, 0 replies; 4+ messages in thread
From: Thomas Gleixner @ 2018-09-06  7:56 UTC (permalink / raw)
  To: pheragu; +Cc: linux-kernel, ckadabi, tsoni, bryanh, Prasad Sodagudi

On Wed, 5 Sep 2018, pheragu@codeaurora.org wrote:

> On 2018-09-05 11:23, Thomas Gleixner wrote:
> > On Wed, 5 Sep 2018, Prakruthi Deepak Heragu wrote:
> > 
> > > One of the cores might have just allocated irq_desc() and other core
> > > might be doing irq migration in the hot plug path. In the hot plug path
> > > during the IRQ migration, for_each_active_irq macro is trying to get
> > > irqs whose bits are set in allocated_irqs bit map but there is no return
> > > value check after irq_to_desc for desc validity.
> > 
> > Confused. All parts involved, irq allocation/deallocation and the CPU
> > hotplug code take sparse_irq_lock to prevent exavtly that.
> > 
> Removing the NULL pointer check and adding this sparse_irq_lock
> that you suggested will solve this issue. The code looks like
> this now. Is this okay?

No, it's not okay at all. You _CANNOT_ take sparse_irq_lock() there.

I wrote that _ALL_ parts take that lock already. Care to look at the code?

takedown_cpu()
{
 	/*
	 * Prevent irq alloc/free while the dying cpu reorganizes the
         * interrupt affinities.
         */
        irq_lock_sparse();

        /*
         * So now all preempt/rcu users must observe !cpu_active().
         */
	err = stop_machine_cpuslocked(take_cpu_down, NULL, cpumask_of(cpu));

and __cpu_disable() which in turn calls irq_migrate_all_off_this_cpu() is
called from take_cpu_down(). So this deadlocks in any case.

I have no idea what kind of modifications you have in your kernel which
make

   1) The thing explode in the first place

   2) Not immediately deadlock with that hack you just sent

and honestly I don't want to know at all.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-09-06  7:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-05 17:05 [PATCH] genirq: Avoid race between cpu hot plug and irq_desc() allocation paths Prakruthi Deepak Heragu
2018-09-05 18:23 ` Thomas Gleixner
2018-09-05 23:27   ` pheragu
2018-09-06  7:56     ` Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.