All of lore.kernel.org
 help / color / mirror / Atom feed
* irq_desc use-after-free in smp_irq_move_cleanup_interrupt
@ 2015-11-24  0:59 Joe Lawrence
  2015-11-25 18:16 ` Thomas Gleixner
  0 siblings, 1 reply; 20+ messages in thread
From: Joe Lawrence @ 2015-11-24  0:59 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner, Joe Lawrence

Hi Thomas,

I've been chasing down a use-after-free on an irq_desc structure during
repeated device removal testing that crashes 4.3 in
smp_irq_move_cleanup_interrupt.  So far I have a bunch of crashes and some
data gleaned from instrumenting the kernel with trace logging.  Details to
follow, but in short, I think I'm hitting the use-after-free because of
a782a7e46bb5 "x86/irq: Store irq descriptor in vector array".

During the tests, I usually see RCU stalls (sometimes a crash later) --
when looking at a vmcore, the CPU stall is here:

  crash> bt ffff881038ee4200
  PID: 0      TASK: ffff881038ee4200  CPU: 1   COMMAND: "swapper/1"
      [exception RIP: _raw_spin_lock+0x10]
      RIP: ffffffff81672a10  RSP: ffff88103f843f70  RFLAGS: 00000046
      RAX: 0000000000000000  RBX: 0000000000000091  RCX: 0000000000000001
      RDX: 0000000000000001  RSI: 0000000000000001  RDI: ffffffff81d36a88
      RBP: ffff88103f843fa8   R8: 0000000000000101   R9: 0000000000000f78
      R10: 0000000000100000  R11: ffff8810384ccf8c  R12: 0000000000000091
      R13: ffff88102a819794  R14: 000000000000b5e8  R15: ffff88102a8196f8
      CS: 0010  SS: 0018
   #0 [ffff88103f843f70] smp_irq_move_cleanup_interrupt at ffffffff8105319b
   #1 [ffff88103f843fb0] irq_move_cleanup_interrupt at ffffffff81673a04
  --- <IRQ stack> ---
   #2 [ffff881038f0fd90] irq_move_cleanup_interrupt at ffffffff81673a04
      [exception RIP: unknown or invalid address]
      RIP: 000000000000001d  RSP: 0000000000000000  RFLAGS: ffff881038f10000
      RAX: 0000000000000001  RBX: ffff881038f0c000  RCX: 0000000000000005
      RDX: 00000000000a087a  RSI: 0000000000051ca2  RDI: 0000000000000018
      RBP: ffff88103f84f100   R8: ffff881038f0fea0   R9: ffffe8f000042378
      R10: 0000000000000002  R11: 000000ee954efd02  R12: ffffffff810e5060
      R13: ffff881038f0fdc8  R14: ffff88103f84f500  R15: ffff88103f84f100
      ORIG_RAX: ffff88103f856c40  CS: 8000a044  SS: ffffffffffffffdf
  bt: WARNING: possibly bogus exception frame
   #3 [ffff881038f0fe38] cpuidle_enter_state at ffffffff8151e278
   #4 [ffff881038f0fea8] cpuidle_enter at ffffffff8151e417
   #5 [ffff881038f0feb8] call_cpuidle at ffffffff810bef42
   #6 [ffff881038f0fed0] cpu_startup_entry at ffffffff810bf1dc
   #7 [ffff881038f0ff28] start_secondary at ffffffff8104e9e0

The irq_desc is in R15: ffff88102a8196f8

  crash> kmem ffff88102a8196f8
  CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
  ffff88103f408c80 kmalloc-512              512       5292      5460    140    32k
    SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
    ffffea0040aa0600  ffff88102a818000     0     39         17    22
    FREE / [ALLOCATED]
     ffff88102a8196f8  

        PAGE         PHYSICAL      MAPPING       INDEX CNT FLAGS
  ffffea0040aa0640 102a819000                0        0  0 2fffff80008000 tail

This irq_desc is no longer allocated, it's been filled with the slub debug
poison pattern (hence the spinlock is stuck):

  crash> rd ffff88102a8196f8 0x28
  ffff88102a8196f8:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819708:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819718:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819728:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819738:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819748:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819758:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819768:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819778:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819788:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819798:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a8197a8:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a8197b8:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a8197c8:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a8197d8:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a8197e8:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a8197f8:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819808:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819818:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk
  ffff88102a819828:  6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b   kkkkkkkkkkkkkkkk

The irq vector is in RBX: 0000000000000091

  crash> p/d 0x91
  $1 = 145

The irq_desc_tree entry for vector 0x91 appears to have been updated to
something new (the device re-add):

  crash> tree -t radix irq_desc_tree -s irq_desc | grep 'irq = 0x91' -B11 | grep -e '^ffff' -e 'irq ='
  ffff88102fb144e8
      irq = 0x91,

But from a custom crash gdb script, the freed irq_desc can still be found
in CPU 1's vector_irq[145]:

  cpu[1] vector_irq[145] irq_desc @ 0xffff88102a8196f8

Sifting through a heavily grepped and abbreviated trace log:

  - The irq moved from CPU 1, vector 145 to CPU 44, vector 134 
  - Both CPUs skip cleaning up their vector_irq[] entries for this irq
    because data->move_in_progress is set (is this normal?)
  - The irq moves again to CPU 34, vector 123
  - The underlying device is removed @ 593.106514 jiffies
  - The irq_desc is freed @ 593.239879 jiffies
  - CPU 1 is last heard from @ 1022.830083 jiffies

  [001]    22.936764: __assign_irq_vector : cpu 44 : vector=134 -> 0xffff88102a8196f8
  [044]    35.209945: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
  [001]    35.212370: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
  [044]    35.674114: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
  [001]    35.675395: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
  [044]    36.265716: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
  [044]    36.517785: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
  [001]    36.520115: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
  [001]    54.636651: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
  [001]    54.636722: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
  [044]    61.670267: __assign_irq_vector : cpu 34 : vector=123 -> 0xffff88102a8196f8
  [001]    61.670979: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
  [044]    61.696120: smp_irq_move_cleanup_interrupt : cpu (this) : vector=134 0xffff88102a8196f8 -> (nil)
  [034]    69.086107: smp_irq_move_cleanup_interrupt : vector == data->cfg.vector && ... : vector=123 0xffff88102a8196f8
  [000]   593.239811: clear_irq_vector : 1 : cpu 34 : vector=123 0xffff88102a8196f8 -> (nil)
  [000]   593.239879: free_desc : desc @ 0xffff88102a8196f8
  [001]  1022.830083: smp_irq_move_cleanup_interrupt : IS_ERR_OR_NULL : vector=144 (nil)

  crash> log | grep 'mpt3sas[0-9]*: _scsih_remove$'
  [  291.658410] mpt3sas0: _scsih_remove
  [  593.106514] mpt3sas1: _scsih_remove           << free_desc
  [  874.345239] mpt3sas3: _scsih_remove

Prior to a782a7e46bb5 "x86/irq: Store irq descriptor in vector array", the
vector_irq array held irq values, those interested in the irq_desc would
have to perform their own irq_to_desc() (ie, a radix_tree_lookup of the
irq_desc_tree).  The radix tree is updated in free_desc(), so any
subsequent lookups would fail.

After the change though, I'm not so sure that direct access to the
irq_desc is safe -- I couldn't figure out anything preventing
use-after-free (the sparse_irq_lock and the vector_lock seem to protect
their respective data-structures, but not the dual-use of the irq_desc's
by both the irq_desc_tree and the per-cpu vector_irq[].)

Hopefully this is on the right track -- I can modify the trace logging or
run other tests if there is additional information that would be helpful.

Regards,

-- Joe

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-12-10 18:43 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-24  0:59 irq_desc use-after-free in smp_irq_move_cleanup_interrupt Joe Lawrence
2015-11-25 18:16 ` Thomas Gleixner
2015-11-25 19:31   ` Thomas Gleixner
2015-11-25 21:12     ` Thomas Gleixner
2015-11-25 22:02       ` Joe Lawrence
2015-11-27  8:06       ` Jiang Liu
2015-11-27  8:25         ` Thomas Gleixner
2015-11-30  8:09           ` [Bugfix 1/5] x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer Jiang Liu
2015-11-30  8:09             ` [Bugfix 2/5] x86/irq: Enhance __assign_irq_vector() to rollback in case of failure Jiang Liu
2015-12-10 18:39               ` [tip:x86/urgent] " tip-bot for Jiang Liu
2015-11-30  8:09             ` [Bugfix 3/5] x86/irq: Fix a race window in x86_vector_free_irqs() Jiang Liu
2015-12-10 18:40               ` [tip:x86/urgent] " tip-bot for Jiang Liu
2015-11-30  8:09             ` [Bugfix 4/5] x86/irq: Fix a race condition between vector assigning and cleanup Jiang Liu
2015-12-01 22:46               ` Joe Lawrence
2015-12-08  0:29                 ` Joe Lawrence
2015-12-08 21:31                   ` Thomas Gleixner
2015-12-10 18:40               ` [tip:x86/urgent] " tip-bot for Jiang Liu
2015-11-30  8:09             ` [Bugfix 5/5] x86/irq: Trivial cleanups for x86 vector allocation code Jiang Liu
2015-12-10 18:42               ` [tip:x86/apic] " tip-bot for Jiang Liu
2015-12-10 18:39             ` [tip:x86/urgent] x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer tip-bot for Jiang Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.