All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] TRACE_IRQFLAGS wreckage
@ 2020-08-20  7:30 Peter Zijlstra
  2020-08-20  7:30 ` [PATCH 1/9] lockdep: Use raw_cpu_*() for per-cpu variables Peter Zijlstra
                   ` (9 more replies)
  0 siblings, 10 replies; 23+ messages in thread
From: Peter Zijlstra @ 2020-08-20  7:30 UTC (permalink / raw)
  To: linux-kernel, mingo, will
  Cc: npiggin, elver, jgross, paulmck, rostedt, rjw, joel, svens, tglx, peterz


TRACE_IRQFLAGS

local_irq_*() keeps a software state that mirrors the hardware state,
used for lockdep, includes tracepoints.

raw_local_irq_*() does not update the software state, no tracing.

---

Problem 1:

	raw_local_irq_save(); // software state on
	local_irq_save(); // software state off
	...
	local_irq_restore(); // software state still off, because we don't enable IRQs
	raw_local_irq_restore(); // software state still off, *whoopsie*

existing instances:

 - lock_acquire()
     raw_local_irq_save()
     __lock_acquire()
       arch_spin_lock(&graph_lock)
         pv_wait() := kvm_wait() (same or worse for Xen/HyperV)
           local_irq_save()

 - trace_clock_global()
     raw_local_irq_save()
     arch_spin_lock()
       pv_wait() := kvm_wait()
	 local_irq_save()

 - apic_retrigger_irq()
     raw_local_irq_save()
     apic->send_IPI() := default_send_IPI_single_phys()
       local_irq_save()

Possible solutions:

 A) make it work by enabling the tracing inside raw_*()
 B) make it work by keeping tracing disabled inside raw_*()
 C) call it broken and clean it up now

Now, given that the only reason to use the raw_* variant is because you don't
want tracing. Therefore A) seems like a weird option (although it can be done).
C) is tempting, but OTOH it ends up converting a _lot_ of code to raw just
because there is one raw user, this strips the validation/tracing off for all
the other users.

So we pick B) and declare any code that ends up doing:

	raw_local_irq_save()
	local_irq_save()
	lockdep_assert_irqs_disabled();

broken. AFAICT this problem has existed forever, the only reason it came
up is because I changed IRQ tracing vs lockdep recursion and the first
instance is fairly common, the other cases hardly ever happen.

---

Problem 2:

	raw_local_irq_save(); // software state on
	trace_*()
          ...
          perf_tp_event()
            ...
            perf_callchain()
	    <#PF>
	      trace_hardirqs_off(); // software state off
	      ...
	      if (regs_irqs_disabled(regs)) // false
	        trace_hardirqs_on();
	    </#PF>
	raw_local_irq_restore(); // software state stays off, *whoopsie*

existing instances:

 - lock_acquire() / lock_release()
     raw_local_irq_save()
     trace_lock_acquire() / trace_lock_release()

 - function tracing

Possible solutions:

 A) fix every architecture's entry code
 B) only fix kernel/entry/common.c
 C) fix lockdep tracepoints and pray

This series does C, AFAICT this problem has existed forever.

---

Problem 3:

	raw_local_irq_save(); // software state on
	<#NMI>
	  trace_hardirqs_off(); // software state off
	  ...
	  if (regs_irqs_disabled(regs)) // false
	    trace_hardirqs_on();
	</#NMI>
	raw_local_irq_restore(); // software state stays off, *whoopsie*

Possible solutions:

This *should* not be a problem if an architecture has it's entry ordering
right. In particular we rely on the architecture doing nmi_enter() before
trace_hardirqs_off().

In that case, in_nmi() will be true, and lockdep_hardirqs_*() should NO-OP,
except if CONFIG_TRACE_IRQFLAGS_NMI (x86).

There might be a problem with using lockdep_assert_irqs_disabled() from NMI
context, if so, those needs a little TLC.

---

The patches in this series do (in reverse order):

 - 2C
 - 1B
 - fix fallout in idle due to the trace_lock_*() tracepoints suddenly
   being visible to rcu-lockdep.

---
 arch/arm/mach-omap2/pm34xx.c      |    4 --
 arch/arm64/kernel/process.c       |    2 -
 arch/nds32/include/asm/irqflags.h |    5 ++
 arch/powerpc/include/asm/hw_irq.h |   11 ++---
 arch/s390/kernel/idle.c           |    3 -
 arch/x86/entry/thunk_32.S         |    5 --
 arch/x86/include/asm/mmu.h        |    1 
 arch/x86/kernel/process.c         |    4 --
 arch/x86/mm/tlb.c                 |   13 +-----
 drivers/cpuidle/cpuidle.c         |   19 +++++++--
 drivers/idle/intel_idle.c         |   16 --------
 include/linux/cpuidle.h           |   13 +++---
 include/linux/irqflags.h          |   73 ++++++++++++++++++++------------------
 include/linux/lockdep.h           |   18 ++++++---
 include/linux/mmu_context.h       |    5 ++
 kernel/locking/lockdep.c          |   18 +++++----
 kernel/sched/idle.c               |   21 ++++------
 17 files changed, 112 insertions(+), 119 deletions(-)




^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-08-27  7:56 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-20  7:30 [PATCH 0/9] TRACE_IRQFLAGS wreckage Peter Zijlstra
2020-08-20  7:30 ` [PATCH 1/9] lockdep: Use raw_cpu_*() for per-cpu variables Peter Zijlstra
2020-08-20 13:54   ` Steven Rostedt
2020-08-20  7:30 ` [PATCH 2/9] sched,idle,rcu: Push rcu_idle deeper into the idle path Peter Zijlstra
2020-08-20 13:58   ` Steven Rostedt
2020-08-20  7:30 ` [PATCH 3/9] cpuidle: Make CPUIDLE_FLAG_TLB_FLUSHED generic Peter Zijlstra
2020-08-20  7:30 ` [PATCH 4/9] cpuidle: Move trace_cpu_idle() into generic code Peter Zijlstra
2020-08-20  7:30 ` [PATCH 5/9] x86/entry: Remove unused THUNKs Peter Zijlstra
2020-08-20  7:30 ` [PATCH 6/9] locking/lockdep: Cleanup Peter Zijlstra
2020-08-20  7:30 ` [PATCH 7/9] nds32: Implement arch_irqs_disabled() Peter Zijlstra
2020-08-20  7:30 ` [PATCH 8/9] lockdep: Only trace IRQ edges Peter Zijlstra
2020-08-20  7:30 ` [PATCH 9/9] lockdep,trace: Expose tracepoints Peter Zijlstra
2020-08-20 14:36 ` [PATCH 0/9] TRACE_IRQFLAGS wreckage Steven Rostedt
2020-08-20 14:39   ` Steven Rostedt
2020-08-20 14:49   ` Marco Elver
2020-08-20 14:58   ` peterz
2020-08-20 16:53     ` Steven Rostedt
2020-08-20 17:20     ` Marco Elver
2020-08-20 19:59       ` Steven Rostedt
2020-08-21  6:37         ` Marco Elver
2020-08-21  6:54       ` peterz
2020-08-21  7:05         ` Marco Elver
2020-08-27  7:54       ` [tip: sched/urgent] sched: Use __always_inline on is_idle_task() tip-bot2 for Marco Elver

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.