All of lore.kernel.org
 help / color / mirror / Atom feed
* virt_blk BUG: sleeping function called from invalid context
@ 2014-06-27 11:57 Josh Boyer
  2014-06-29  8:26   ` Michael S. Tsirkin
  0 siblings, 1 reply; 21+ messages in thread
From: Josh Boyer @ 2014-06-27 11:57 UTC (permalink / raw)
  To: Rusty Russell, Michael S. Tsirkin
  Cc: virtualization, Linux-Kernel@Vger. Kernel. Org, Brian Lane

Hi All,

We've had a report[1] of the virt_blk driver causing a lot of spew
because it's calling a sleeping function from an invalid context.  The
backtrace is below.  This is with kernel v3.16-rc2-69-gd91d66e88ea9.

The reporter is on CC and can give you relevant details.

josh

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1113805

[drm] Initialized bochs-drm 1.0.0 20130925 for 0000:00:02.0 on minor 0
 virtio-pci 0000:00:05.0: irq 40 for MSI/MSI-X
 virtio-pci 0000:00:05.0: irq 41 for MSI/MSI-X
  vda: vda1 vda2
 virtio-pci 0000:00:06.0: irq 42 for MSI/MSI-X
 virtio-pci 0000:00:06.0: irq 43 for MSI/MSI-X
  vdb: vdb1
 tsc: Refined TSC clocksource calibration: 3392.169 MHz
 md: bind<vdb1>
 mdadm (350) used greatest stack depth: 12384 bytes left
 md: bind<vda2>
 md: raid1 personality registered for level 1
 md/raid1:md127: active with 2 out of 2 mirrors
 created bitmap (1 pages) for device md127
 md127: bitmap initialized from disk: read 1 pages, set 0 of 153 bits
 md127: detected capacity change from 0 to 10203693056
  md127: unknown partition table
 systemd-cryptsetup[358]: Set cipher aes, mode xts-plain64, key size
512 bits for device
/dev/disk/by-uuid/6972a564-542d-498b-b3a0-7d71c2e966a2.
  md127: unknown partition table
  md127: unknown partition table
 dracut-initqueue[296]: Scanning devices dm-0  for LVM logical volumes
fedora-foo1/root fedora-foo1/swap fedora-foo1/root fedora-foo1/swap
 dracut-initqueue[296]: inactive '/dev/fedora-foo1/root' [7.42 GiB] inherit
 dracut-initqueue[296]: inactive '/dev/fedora-foo1/swap' [2.00 GiB] inherit
 systemd-fsck[662]: /dev/mapper/fedora--foo1-root: clean, 22215/486720
files, 222526/1944576 blocks
 BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:586
 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
 2 locks held by swapper/1/0:
  #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>]
virtblk_done+0x42/0xe0 [virtio_blk]
  #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at:
[<ffffffff81633718>] bitmap_endwrite+0x68/0x240
 irq event stamp: 33518
 hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
 hardirqs last disabled at (33516): [<ffffffff818122ed>]
common_interrupt+0x6d/0x72
 softirqs last  enabled at (33518): [<ffffffff810a1272>]
_local_bh_enable+0x22/0x50
 softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c
  0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000
  0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2
 Call Trace:
  <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
  [<ffffffff810d4f14>] __might_sleep+0x184/0x240
  [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440
  [<ffffffff810e1de6>] ? local_clock+0x16/0x30
  [<ffffffff810fc23f>] ? lock_release_holdtime.part.28+0xf/0x200
  [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
  [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
  [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
  [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
  [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
  [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
  [<ffffffff813b46c4>] blk_update_request+0x94/0x420
  [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
  [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
  [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
  [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
  [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
  [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
  [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
  [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
  [<ffffffff81119436>] handle_edge_irq+0x66/0x130
  [<ffffffff8101c3e4>] handle_irq+0x84/0x150
  [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
  [<ffffffff818122f2>] common_interrupt+0x72/0x72
  <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
  [<ffffffff81025454>] default_idle+0x24/0x230
  [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
  [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
  [<ffffffff8104df1b>] start_secondary+0x25b/0x300

 =================================
 [ INFO: inconsistent lock state ]
 3.16.0-0.rc2.git2.1.fc21.x86_64 #1 Not tainted
 ---------------------------------
 inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
 swapper/1/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
  (kernfs_mutex){?.+.+.}, at: [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
 {HARDIRQ-ON-W} state was registered at:
   [<ffffffff811004b2>] __lock_acquire+0x942/0x1ca0
   [<ffffffff811020f4>] lock_acquire+0xa4/0x1d0
   [<ffffffff8180cb35>] mutex_lock_nested+0x85/0x440
   [<ffffffff812d6d6f>] kernfs_activate+0x1f/0xf0
   [<ffffffff812d7140>] kernfs_create_root+0xf0/0x110
   [<ffffffff821f1ddc>] sysfs_init+0x13/0x51
   [<ffffffff821ef5af>] mnt_init+0x107/0x231
   [<ffffffff821ef213>] vfs_caches_init+0x98/0x106
   [<ffffffff821bdfb0>] start_kernel+0x44f/0x4bc
   [<ffffffff821bd4d7>] x86_64_start_reservations+0x2a/0x2c
   [<ffffffff821bd626>] x86_64_start_kernel+0x14d/0x170
 irq event stamp: 33518
 hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
 hardirqs last disabled at (33516): [<ffffffff818122ed>]
common_interrupt+0x6d/0x72
 softirqs last  enabled at (33518): [<ffffffff810a1272>]
_local_bh_enable+0x22/0x50
 softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80

                                              other info that might
help us debug this:
  Possible unsafe locking scenario:
        CPU0
        ----
   lock(kernfs_mutex);
   <Interrupt>
     lock(kernfs_mutex);

                                               *** DEADLOCK ***
 2 locks held by swapper/1/0:
  #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>]
virtblk_done+0x42/0xe0 [virtio_blk]
  #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at:
[<ffffffff81633718>] bitmap_endwrite+0x68/0x240

                                              stack backtrace:
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  0000000000000000 f90db13964f4ee05 ffff88007d4039d0 ffffffff81807b4c
  ffff88007bcb19d0 ffff88007d403a20 ffffffff81805358 0000000000000000
  0000000000000000 0000000000000001 ffff88007bcb25a8 ffff88007bcb19d0
 Call Trace:
  <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
  [<ffffffff81805358>] print_usage_bug+0x1f0/0x205
  [<ffffffff810ff450>] mark_lock+0x610/0x6d0
  [<ffffffff810fe520>] ? print_shortest_lock_dependencies+0x1d0/0x1d0
  [<ffffffff81100614>] __lock_acquire+0xaa4/0x1ca0
  [<ffffffff8101dc4d>] ? show_trace_log_lvl+0x4d/0x60
  [<ffffffff8101c8ad>] ? show_stack_log_lvl+0xad/0x1b0
  [<ffffffff811020f4>] lock_acquire+0xa4/0x1d0
  [<ffffffff812d76a0>] ? kernfs_notify+0x90/0x150
  [<ffffffff8180cb35>] mutex_lock_nested+0x85/0x440
  [<ffffffff812d76a0>] ? kernfs_notify+0x90/0x150
  [<ffffffff810fc23f>] ? lock_release_holdtime.part.28+0xf/0x200
  [<ffffffff812d76a0>] ? kernfs_notify+0x90/0x150
  [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
  [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
  [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
  [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
  [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
  [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
  [<ffffffff813b46c4>] blk_update_request+0x94/0x420
  [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
  [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
  [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
  [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
  [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
  [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
  [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
  [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
  [<ffffffff81119436>] handle_edge_irq+0x66/0x130
  [<ffffffff8101c3e4>] handle_irq+0x84/0x150
  [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
  [<ffffffff818122f2>] common_interrupt+0x72/0x72
  <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
  [<ffffffff81025454>] default_idle+0x24/0x230
  [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
  [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
  [<ffffffff8104df1b>] start_secondary+0x25b/0x30

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: virt_blk BUG: sleeping function called from invalid context
  2014-06-27 11:57 virt_blk BUG: sleeping function called from invalid context Josh Boyer
@ 2014-06-29  8:26   ` Michael S. Tsirkin
  0 siblings, 0 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2014-06-29  8:26 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Rusty Russell, virtualization, Linux-Kernel@Vger. Kernel. Org,
	Brian Lane, Jens Axboe, Christoph Hellwig

On Fri, Jun 27, 2014 at 07:57:38AM -0400, Josh Boyer wrote:
> Hi All,
> 
> We've had a report[1] of the virt_blk driver causing a lot of spew
> because it's calling a sleeping function from an invalid context.  The
> backtrace is below.  This is with kernel v3.16-rc2-69-gd91d66e88ea9.

Hi Jens, pls see below - it looks like the call to blk_mq_end_io
from IRQ context is causing the issue.
IIUC you switched virtio to this from __blk_end_request_all in

commit 1cf7e9c68fe84248174e998922b39e508375e7c1
    virtio_blk: blk-mq support

Is this always safe?
I note that at least one other driver is doing this:
drivers/block/mtip32xx/mtip32xx.c

Thanks!

> The reporter is on CC and can give you relevant details.
> 
> josh
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1113805
> 
> [drm] Initialized bochs-drm 1.0.0 20130925 for 0000:00:02.0 on minor 0
>  virtio-pci 0000:00:05.0: irq 40 for MSI/MSI-X
>  virtio-pci 0000:00:05.0: irq 41 for MSI/MSI-X
>   vda: vda1 vda2
>  virtio-pci 0000:00:06.0: irq 42 for MSI/MSI-X
>  virtio-pci 0000:00:06.0: irq 43 for MSI/MSI-X
>   vdb: vdb1
>  tsc: Refined TSC clocksource calibration: 3392.169 MHz
>  md: bind<vdb1>
>  mdadm (350) used greatest stack depth: 12384 bytes left
>  md: bind<vda2>
>  md: raid1 personality registered for level 1
>  md/raid1:md127: active with 2 out of 2 mirrors
>  created bitmap (1 pages) for device md127
>  md127: bitmap initialized from disk: read 1 pages, set 0 of 153 bits
>  md127: detected capacity change from 0 to 10203693056
>   md127: unknown partition table
>  systemd-cryptsetup[358]: Set cipher aes, mode xts-plain64, key size
> 512 bits for device
> /dev/disk/by-uuid/6972a564-542d-498b-b3a0-7d71c2e966a2.
>   md127: unknown partition table
>   md127: unknown partition table
>  dracut-initqueue[296]: Scanning devices dm-0  for LVM logical volumes
> fedora-foo1/root fedora-foo1/swap fedora-foo1/root fedora-foo1/swap
>  dracut-initqueue[296]: inactive '/dev/fedora-foo1/root' [7.42 GiB] inherit
>  dracut-initqueue[296]: inactive '/dev/fedora-foo1/swap' [2.00 GiB] inherit
>  systemd-fsck[662]: /dev/mapper/fedora--foo1-root: clean, 22215/486720
> files, 222526/1944576 blocks
>  BUG: sleeping function called from invalid context at
> kernel/locking/mutex.c:586
>  in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
>  2 locks held by swapper/1/0:
>   #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>]
> virtblk_done+0x42/0xe0 [virtio_blk]
>   #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at:
> [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
>  irq event stamp: 33518
>  hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
>  hardirqs last disabled at (33516): [<ffffffff818122ed>]
> common_interrupt+0x6d/0x72
>  softirqs last  enabled at (33518): [<ffffffff810a1272>]
> _local_bh_enable+0x22/0x50
>  softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
>  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
>  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>   0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c
>   0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000
>   0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2
>  Call Trace:
>   <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
>   [<ffffffff810d4f14>] __might_sleep+0x184/0x240
>   [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440
>   [<ffffffff810e1de6>] ? local_clock+0x16/0x30
>   [<ffffffff810fc23f>] ? lock_release_holdtime.part.28+0xf/0x200
>   [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
>   [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
>   [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
>   [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
>   [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
>   [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
>   [<ffffffff813b46c4>] blk_update_request+0x94/0x420
>   [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
>   [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
>   [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
>   [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
>   [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
>   [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
>   [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
>   [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
>   [<ffffffff81119436>] handle_edge_irq+0x66/0x130
>   [<ffffffff8101c3e4>] handle_irq+0x84/0x150
>   [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
>   [<ffffffff818122f2>] common_interrupt+0x72/0x72
>   <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
>   [<ffffffff81025454>] default_idle+0x24/0x230
>   [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
>   [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
>   [<ffffffff8104df1b>] start_secondary+0x25b/0x300
> 
>  =================================
>  [ INFO: inconsistent lock state ]
>  3.16.0-0.rc2.git2.1.fc21.x86_64 #1 Not tainted
>  ---------------------------------
>  inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
>  swapper/1/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
>   (kernfs_mutex){?.+.+.}, at: [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
>  {HARDIRQ-ON-W} state was registered at:
>    [<ffffffff811004b2>] __lock_acquire+0x942/0x1ca0
>    [<ffffffff811020f4>] lock_acquire+0xa4/0x1d0
>    [<ffffffff8180cb35>] mutex_lock_nested+0x85/0x440
>    [<ffffffff812d6d6f>] kernfs_activate+0x1f/0xf0
>    [<ffffffff812d7140>] kernfs_create_root+0xf0/0x110
>    [<ffffffff821f1ddc>] sysfs_init+0x13/0x51
>    [<ffffffff821ef5af>] mnt_init+0x107/0x231
>    [<ffffffff821ef213>] vfs_caches_init+0x98/0x106
>    [<ffffffff821bdfb0>] start_kernel+0x44f/0x4bc
>    [<ffffffff821bd4d7>] x86_64_start_reservations+0x2a/0x2c
>    [<ffffffff821bd626>] x86_64_start_kernel+0x14d/0x170
>  irq event stamp: 33518
>  hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
>  hardirqs last disabled at (33516): [<ffffffff818122ed>]
> common_interrupt+0x6d/0x72
>  softirqs last  enabled at (33518): [<ffffffff810a1272>]
> _local_bh_enable+0x22/0x50
>  softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
> 
>                                               other info that might
> help us debug this:
>   Possible unsafe locking scenario:
>         CPU0
>         ----
>    lock(kernfs_mutex);
>    <Interrupt>
>      lock(kernfs_mutex);
> 
>                                                *** DEADLOCK ***
>  2 locks held by swapper/1/0:
>   #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>]
> virtblk_done+0x42/0xe0 [virtio_blk]
>   #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at:
> [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
> 
>                                               stack backtrace:
>  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
>  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>   0000000000000000 f90db13964f4ee05 ffff88007d4039d0 ffffffff81807b4c
>   ffff88007bcb19d0 ffff88007d403a20 ffffffff81805358 0000000000000000
>   0000000000000000 0000000000000001 ffff88007bcb25a8 ffff88007bcb19d0
>  Call Trace:
>   <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
>   [<ffffffff81805358>] print_usage_bug+0x1f0/0x205
>   [<ffffffff810ff450>] mark_lock+0x610/0x6d0
>   [<ffffffff810fe520>] ? print_shortest_lock_dependencies+0x1d0/0x1d0
>   [<ffffffff81100614>] __lock_acquire+0xaa4/0x1ca0
>   [<ffffffff8101dc4d>] ? show_trace_log_lvl+0x4d/0x60
>   [<ffffffff8101c8ad>] ? show_stack_log_lvl+0xad/0x1b0
>   [<ffffffff811020f4>] lock_acquire+0xa4/0x1d0
>   [<ffffffff812d76a0>] ? kernfs_notify+0x90/0x150
>   [<ffffffff8180cb35>] mutex_lock_nested+0x85/0x440
>   [<ffffffff812d76a0>] ? kernfs_notify+0x90/0x150
>   [<ffffffff810fc23f>] ? lock_release_holdtime.part.28+0xf/0x200
>   [<ffffffff812d76a0>] ? kernfs_notify+0x90/0x150
>   [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
>   [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
>   [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
>   [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
>   [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
>   [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
>   [<ffffffff813b46c4>] blk_update_request+0x94/0x420
>   [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
>   [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
>   [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
>   [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
>   [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
>   [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
>   [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
>   [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
>   [<ffffffff81119436>] handle_edge_irq+0x66/0x130
>   [<ffffffff8101c3e4>] handle_irq+0x84/0x150
>   [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
>   [<ffffffff818122f2>] common_interrupt+0x72/0x72
>   <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
>   [<ffffffff81025454>] default_idle+0x24/0x230
>   [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
>   [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
>   [<ffffffff8104df1b>] start_secondary+0x25b/0x30

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: virt_blk BUG: sleeping function called from invalid context
@ 2014-06-29  8:26   ` Michael S. Tsirkin
  0 siblings, 0 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2014-06-29  8:26 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Jens Axboe, Linux-Kernel@Vger. Kernel. Org, virtualization,
	Brian Lane, Christoph Hellwig

On Fri, Jun 27, 2014 at 07:57:38AM -0400, Josh Boyer wrote:
> Hi All,
> 
> We've had a report[1] of the virt_blk driver causing a lot of spew
> because it's calling a sleeping function from an invalid context.  The
> backtrace is below.  This is with kernel v3.16-rc2-69-gd91d66e88ea9.

Hi Jens, pls see below - it looks like the call to blk_mq_end_io
from IRQ context is causing the issue.
IIUC you switched virtio to this from __blk_end_request_all in

commit 1cf7e9c68fe84248174e998922b39e508375e7c1
    virtio_blk: blk-mq support

Is this always safe?
I note that at least one other driver is doing this:
drivers/block/mtip32xx/mtip32xx.c

Thanks!

> The reporter is on CC and can give you relevant details.
> 
> josh
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1113805
> 
> [drm] Initialized bochs-drm 1.0.0 20130925 for 0000:00:02.0 on minor 0
>  virtio-pci 0000:00:05.0: irq 40 for MSI/MSI-X
>  virtio-pci 0000:00:05.0: irq 41 for MSI/MSI-X
>   vda: vda1 vda2
>  virtio-pci 0000:00:06.0: irq 42 for MSI/MSI-X
>  virtio-pci 0000:00:06.0: irq 43 for MSI/MSI-X
>   vdb: vdb1
>  tsc: Refined TSC clocksource calibration: 3392.169 MHz
>  md: bind<vdb1>
>  mdadm (350) used greatest stack depth: 12384 bytes left
>  md: bind<vda2>
>  md: raid1 personality registered for level 1
>  md/raid1:md127: active with 2 out of 2 mirrors
>  created bitmap (1 pages) for device md127
>  md127: bitmap initialized from disk: read 1 pages, set 0 of 153 bits
>  md127: detected capacity change from 0 to 10203693056
>   md127: unknown partition table
>  systemd-cryptsetup[358]: Set cipher aes, mode xts-plain64, key size
> 512 bits for device
> /dev/disk/by-uuid/6972a564-542d-498b-b3a0-7d71c2e966a2.
>   md127: unknown partition table
>   md127: unknown partition table
>  dracut-initqueue[296]: Scanning devices dm-0  for LVM logical volumes
> fedora-foo1/root fedora-foo1/swap fedora-foo1/root fedora-foo1/swap
>  dracut-initqueue[296]: inactive '/dev/fedora-foo1/root' [7.42 GiB] inherit
>  dracut-initqueue[296]: inactive '/dev/fedora-foo1/swap' [2.00 GiB] inherit
>  systemd-fsck[662]: /dev/mapper/fedora--foo1-root: clean, 22215/486720
> files, 222526/1944576 blocks
>  BUG: sleeping function called from invalid context at
> kernel/locking/mutex.c:586
>  in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
>  2 locks held by swapper/1/0:
>   #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>]
> virtblk_done+0x42/0xe0 [virtio_blk]
>   #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at:
> [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
>  irq event stamp: 33518
>  hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
>  hardirqs last disabled at (33516): [<ffffffff818122ed>]
> common_interrupt+0x6d/0x72
>  softirqs last  enabled at (33518): [<ffffffff810a1272>]
> _local_bh_enable+0x22/0x50
>  softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
>  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
>  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>   0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c
>   0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000
>   0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2
>  Call Trace:
>   <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
>   [<ffffffff810d4f14>] __might_sleep+0x184/0x240
>   [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440
>   [<ffffffff810e1de6>] ? local_clock+0x16/0x30
>   [<ffffffff810fc23f>] ? lock_release_holdtime.part.28+0xf/0x200
>   [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
>   [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
>   [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
>   [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
>   [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
>   [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
>   [<ffffffff813b46c4>] blk_update_request+0x94/0x420
>   [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
>   [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
>   [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
>   [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
>   [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
>   [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
>   [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
>   [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
>   [<ffffffff81119436>] handle_edge_irq+0x66/0x130
>   [<ffffffff8101c3e4>] handle_irq+0x84/0x150
>   [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
>   [<ffffffff818122f2>] common_interrupt+0x72/0x72
>   <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
>   [<ffffffff81025454>] default_idle+0x24/0x230
>   [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
>   [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
>   [<ffffffff8104df1b>] start_secondary+0x25b/0x300
> 
>  =================================
>  [ INFO: inconsistent lock state ]
>  3.16.0-0.rc2.git2.1.fc21.x86_64 #1 Not tainted
>  ---------------------------------
>  inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
>  swapper/1/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
>   (kernfs_mutex){?.+.+.}, at: [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
>  {HARDIRQ-ON-W} state was registered at:
>    [<ffffffff811004b2>] __lock_acquire+0x942/0x1ca0
>    [<ffffffff811020f4>] lock_acquire+0xa4/0x1d0
>    [<ffffffff8180cb35>] mutex_lock_nested+0x85/0x440
>    [<ffffffff812d6d6f>] kernfs_activate+0x1f/0xf0
>    [<ffffffff812d7140>] kernfs_create_root+0xf0/0x110
>    [<ffffffff821f1ddc>] sysfs_init+0x13/0x51
>    [<ffffffff821ef5af>] mnt_init+0x107/0x231
>    [<ffffffff821ef213>] vfs_caches_init+0x98/0x106
>    [<ffffffff821bdfb0>] start_kernel+0x44f/0x4bc
>    [<ffffffff821bd4d7>] x86_64_start_reservations+0x2a/0x2c
>    [<ffffffff821bd626>] x86_64_start_kernel+0x14d/0x170
>  irq event stamp: 33518
>  hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
>  hardirqs last disabled at (33516): [<ffffffff818122ed>]
> common_interrupt+0x6d/0x72
>  softirqs last  enabled at (33518): [<ffffffff810a1272>]
> _local_bh_enable+0x22/0x50
>  softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
> 
>                                               other info that might
> help us debug this:
>   Possible unsafe locking scenario:
>         CPU0
>         ----
>    lock(kernfs_mutex);
>    <Interrupt>
>      lock(kernfs_mutex);
> 
>                                                *** DEADLOCK ***
>  2 locks held by swapper/1/0:
>   #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>]
> virtblk_done+0x42/0xe0 [virtio_blk]
>   #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at:
> [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
> 
>                                               stack backtrace:
>  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
>  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>   0000000000000000 f90db13964f4ee05 ffff88007d4039d0 ffffffff81807b4c
>   ffff88007bcb19d0 ffff88007d403a20 ffffffff81805358 0000000000000000
>   0000000000000000 0000000000000001 ffff88007bcb25a8 ffff88007bcb19d0
>  Call Trace:
>   <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
>   [<ffffffff81805358>] print_usage_bug+0x1f0/0x205
>   [<ffffffff810ff450>] mark_lock+0x610/0x6d0
>   [<ffffffff810fe520>] ? print_shortest_lock_dependencies+0x1d0/0x1d0
>   [<ffffffff81100614>] __lock_acquire+0xaa4/0x1ca0
>   [<ffffffff8101dc4d>] ? show_trace_log_lvl+0x4d/0x60
>   [<ffffffff8101c8ad>] ? show_stack_log_lvl+0xad/0x1b0
>   [<ffffffff811020f4>] lock_acquire+0xa4/0x1d0
>   [<ffffffff812d76a0>] ? kernfs_notify+0x90/0x150
>   [<ffffffff8180cb35>] mutex_lock_nested+0x85/0x440
>   [<ffffffff812d76a0>] ? kernfs_notify+0x90/0x150
>   [<ffffffff810fc23f>] ? lock_release_holdtime.part.28+0xf/0x200
>   [<ffffffff812d76a0>] ? kernfs_notify+0x90/0x150
>   [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
>   [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
>   [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
>   [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
>   [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
>   [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
>   [<ffffffff813b46c4>] blk_update_request+0x94/0x420
>   [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
>   [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
>   [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
>   [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
>   [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
>   [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
>   [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
>   [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
>   [<ffffffff81119436>] handle_edge_irq+0x66/0x130
>   [<ffffffff8101c3e4>] handle_irq+0x84/0x150
>   [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
>   [<ffffffff818122f2>] common_interrupt+0x72/0x72
>   <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
>   [<ffffffff81025454>] default_idle+0x24/0x230
>   [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
>   [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
>   [<ffffffff8104df1b>] start_secondary+0x25b/0x30

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: virt_blk BUG: sleeping function called from invalid context
  2014-06-29  8:26   ` Michael S. Tsirkin
  (?)
@ 2014-06-29 19:32   ` Christoph Hellwig
  2014-06-29 20:47       ` Michael S. Tsirkin
  -1 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2014-06-29 19:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Josh Boyer, Rusty Russell, virtualization,
	Linux-Kernel@Vger. Kernel. Org, Brian Lane, Jens Axboe,
	Christoph Hellwig

On Sun, Jun 29, 2014 at 11:26:37AM +0300, Michael S. Tsirkin wrote:
> On Fri, Jun 27, 2014 at 07:57:38AM -0400, Josh Boyer wrote:
> > Hi All,
> > 
> > We've had a report[1] of the virt_blk driver causing a lot of spew
> > because it's calling a sleeping function from an invalid context.  The
> > backtrace is below.  This is with kernel v3.16-rc2-69-gd91d66e88ea9.
> 
> Hi Jens, pls see below - it looks like the call to blk_mq_end_io
> from IRQ context is causing the issue.
> IIUC you switched virtio to this from __blk_end_request_all in
> 
> commit 1cf7e9c68fe84248174e998922b39e508375e7c1
>     virtio_blk: blk-mq support
> 
> Is this always safe?
> I note that at least one other driver is doing this:
> drivers/block/mtip32xx/mtip32xx.c

Just like __blk_end_request_all blk_mq_end_io is supposed to be called
from irq context.  The problem is that the MD bio end_io handler is calling
a sleeping function.  Not sure if that's a bug in MD though given the
kernfs symbols in the all trace and the recent churn in that area.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: virt_blk BUG: sleeping function called from invalid context
  2014-06-29  8:26   ` Michael S. Tsirkin
  (?)
  (?)
@ 2014-06-29 19:32   ` Christoph Hellwig
  -1 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2014-06-29 19:32 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jens Axboe, Josh Boyer, Linux-Kernel@Vger. Kernel. Org,
	virtualization, Brian Lane, Christoph Hellwig

On Sun, Jun 29, 2014 at 11:26:37AM +0300, Michael S. Tsirkin wrote:
> On Fri, Jun 27, 2014 at 07:57:38AM -0400, Josh Boyer wrote:
> > Hi All,
> > 
> > We've had a report[1] of the virt_blk driver causing a lot of spew
> > because it's calling a sleeping function from an invalid context.  The
> > backtrace is below.  This is with kernel v3.16-rc2-69-gd91d66e88ea9.
> 
> Hi Jens, pls see below - it looks like the call to blk_mq_end_io
> from IRQ context is causing the issue.
> IIUC you switched virtio to this from __blk_end_request_all in
> 
> commit 1cf7e9c68fe84248174e998922b39e508375e7c1
>     virtio_blk: blk-mq support
> 
> Is this always safe?
> I note that at least one other driver is doing this:
> drivers/block/mtip32xx/mtip32xx.c

Just like __blk_end_request_all blk_mq_end_io is supposed to be called
from irq context.  The problem is that the MD bio end_io handler is calling
a sleeping function.  Not sure if that's a bug in MD though given the
kernfs symbols in the all trace and the recent churn in that area.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: virt_blk BUG: sleeping function called from invalid context
  2014-06-29 19:32   ` Christoph Hellwig
@ 2014-06-29 20:47       ` Michael S. Tsirkin
  0 siblings, 0 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2014-06-29 20:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Josh Boyer, Rusty Russell, virtualization,
	Linux-Kernel@Vger. Kernel. Org, Brian Lane, Jens Axboe,
	Tejun Heo, John McCutchan, Robert Love, Eric Paris,
	Greg Kroah-Hartman

On Sun, Jun 29, 2014 at 09:32:22PM +0200, Christoph Hellwig wrote:
> On Sun, Jun 29, 2014 at 11:26:37AM +0300, Michael S. Tsirkin wrote:
> > On Fri, Jun 27, 2014 at 07:57:38AM -0400, Josh Boyer wrote:
> > > Hi All,
> > > 
> > > We've had a report[1] of the virt_blk driver causing a lot of spew
> > > because it's calling a sleeping function from an invalid context.  The
> > > backtrace is below.  This is with kernel v3.16-rc2-69-gd91d66e88ea9.
> > 
> > Hi Jens, pls see below - it looks like the call to blk_mq_end_io
> > from IRQ context is causing the issue.
> > IIUC you switched virtio to this from __blk_end_request_all in
> > 
> > commit 1cf7e9c68fe84248174e998922b39e508375e7c1
> >     virtio_blk: blk-mq support
> > 
> > Is this always safe?
> > I note that at least one other driver is doing this:
> > drivers/block/mtip32xx/mtip32xx.c
> 
> Just like __blk_end_request_all blk_mq_end_io is supposed to be called
> from irq context.  The problem is that the MD bio end_io handler is calling
> a sleeping function.  Not sure if that's a bug in MD though given the
> kernfs symbols in the all trace and the recent churn in that area.

My understanding is this:

bitmap_endwrite -> calls sysfs_notify_dirent_safe under spinlock
 -> calls kernfs_notify which takes a mutex.

So I am guessing it is this commit:

commit d911d98748018f7c8facc035ba39c30f5cce6f9c
Author: Tejun Heo <tj@kernel.org>
Date:   Wed Apr 9 11:07:31 2014 -0400

    kernfs: make kernfs_notify() trigger inotify events too

Tejun, what do you think?

Josh, Brian, could you try reverting that commit to see if it helps?

-- 
MST

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: virt_blk BUG: sleeping function called from invalid context
@ 2014-06-29 20:47       ` Michael S. Tsirkin
  0 siblings, 0 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2014-06-29 20:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Josh Boyer, Robert Love, Greg Kroah-Hartman,
	Linux-Kernel@Vger. Kernel. Org, virtualization, Tejun Heo,
	Eric Paris, Brian Lane, John McCutchan

On Sun, Jun 29, 2014 at 09:32:22PM +0200, Christoph Hellwig wrote:
> On Sun, Jun 29, 2014 at 11:26:37AM +0300, Michael S. Tsirkin wrote:
> > On Fri, Jun 27, 2014 at 07:57:38AM -0400, Josh Boyer wrote:
> > > Hi All,
> > > 
> > > We've had a report[1] of the virt_blk driver causing a lot of spew
> > > because it's calling a sleeping function from an invalid context.  The
> > > backtrace is below.  This is with kernel v3.16-rc2-69-gd91d66e88ea9.
> > 
> > Hi Jens, pls see below - it looks like the call to blk_mq_end_io
> > from IRQ context is causing the issue.
> > IIUC you switched virtio to this from __blk_end_request_all in
> > 
> > commit 1cf7e9c68fe84248174e998922b39e508375e7c1
> >     virtio_blk: blk-mq support
> > 
> > Is this always safe?
> > I note that at least one other driver is doing this:
> > drivers/block/mtip32xx/mtip32xx.c
> 
> Just like __blk_end_request_all blk_mq_end_io is supposed to be called
> from irq context.  The problem is that the MD bio end_io handler is calling
> a sleeping function.  Not sure if that's a bug in MD though given the
> kernfs symbols in the all trace and the recent churn in that area.

My understanding is this:

bitmap_endwrite -> calls sysfs_notify_dirent_safe under spinlock
 -> calls kernfs_notify which takes a mutex.

So I am guessing it is this commit:

commit d911d98748018f7c8facc035ba39c30f5cce6f9c
Author: Tejun Heo <tj@kernel.org>
Date:   Wed Apr 9 11:07:31 2014 -0400

    kernfs: make kernfs_notify() trigger inotify events too

Tejun, what do you think?

Josh, Brian, could you try reverting that commit to see if it helps?

-- 
MST

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: virt_blk BUG: sleeping function called from invalid context
  2014-06-29 20:47       ` Michael S. Tsirkin
@ 2014-06-29 20:55         ` Jens Axboe
  -1 siblings, 0 replies; 21+ messages in thread
From: Jens Axboe @ 2014-06-29 20:55 UTC (permalink / raw)
  To: Michael S. Tsirkin, Christoph Hellwig
  Cc: Josh Boyer, Rusty Russell, virtualization,
	Linux-Kernel@Vger. Kernel. Org, Brian Lane, Tejun Heo,
	John McCutchan, Robert Love, Eric Paris, Greg Kroah-Hartman

On 06/29/2014 02:47 PM, Michael S. Tsirkin wrote:
> On Sun, Jun 29, 2014 at 09:32:22PM +0200, Christoph Hellwig wrote:
>> On Sun, Jun 29, 2014 at 11:26:37AM +0300, Michael S. Tsirkin wrote:
>>> On Fri, Jun 27, 2014 at 07:57:38AM -0400, Josh Boyer wrote:
>>>> Hi All,
>>>>
>>>> We've had a report[1] of the virt_blk driver causing a lot of spew
>>>> because it's calling a sleeping function from an invalid context.  The
>>>> backtrace is below.  This is with kernel v3.16-rc2-69-gd91d66e88ea9.
>>>
>>> Hi Jens, pls see below - it looks like the call to blk_mq_end_io
>>> from IRQ context is causing the issue.
>>> IIUC you switched virtio to this from __blk_end_request_all in
>>>
>>> commit 1cf7e9c68fe84248174e998922b39e508375e7c1
>>>     virtio_blk: blk-mq support
>>>
>>> Is this always safe?
>>> I note that at least one other driver is doing this:
>>> drivers/block/mtip32xx/mtip32xx.c
>>
>> Just like __blk_end_request_all blk_mq_end_io is supposed to be called
>> from irq context.  The problem is that the MD bio end_io handler is calling
>> a sleeping function.  Not sure if that's a bug in MD though given the
>> kernfs symbols in the all trace and the recent churn in that area.
> 
> My understanding is this:
> 
> bitmap_endwrite -> calls sysfs_notify_dirent_safe under spinlock
>  -> calls kernfs_notify which takes a mutex.
> 
> So I am guessing it is this commit:
> 
> commit d911d98748018f7c8facc035ba39c30f5cce6f9c
> Author: Tejun Heo <tj@kernel.org>
> Date:   Wed Apr 9 11:07:31 2014 -0400
> 
>     kernfs: make kernfs_notify() trigger inotify events too
> 
> Tejun, what do you think?
> 
> Josh, Brian, could you try reverting that commit to see if it helps?

That definitely be a bug. If you need to block off ->bi_end_io(), just
must punt to a worker thread.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: virt_blk BUG: sleeping function called from invalid context
@ 2014-06-29 20:55         ` Jens Axboe
  0 siblings, 0 replies; 21+ messages in thread
From: Jens Axboe @ 2014-06-29 20:55 UTC (permalink / raw)
  To: Michael S. Tsirkin, Christoph Hellwig
  Cc: Josh Boyer, Robert Love, Greg Kroah-Hartman,
	Linux-Kernel@Vger. Kernel. Org, virtualization, Tejun Heo,
	Eric Paris, Brian Lane, John McCutchan

On 06/29/2014 02:47 PM, Michael S. Tsirkin wrote:
> On Sun, Jun 29, 2014 at 09:32:22PM +0200, Christoph Hellwig wrote:
>> On Sun, Jun 29, 2014 at 11:26:37AM +0300, Michael S. Tsirkin wrote:
>>> On Fri, Jun 27, 2014 at 07:57:38AM -0400, Josh Boyer wrote:
>>>> Hi All,
>>>>
>>>> We've had a report[1] of the virt_blk driver causing a lot of spew
>>>> because it's calling a sleeping function from an invalid context.  The
>>>> backtrace is below.  This is with kernel v3.16-rc2-69-gd91d66e88ea9.
>>>
>>> Hi Jens, pls see below - it looks like the call to blk_mq_end_io
>>> from IRQ context is causing the issue.
>>> IIUC you switched virtio to this from __blk_end_request_all in
>>>
>>> commit 1cf7e9c68fe84248174e998922b39e508375e7c1
>>>     virtio_blk: blk-mq support
>>>
>>> Is this always safe?
>>> I note that at least one other driver is doing this:
>>> drivers/block/mtip32xx/mtip32xx.c
>>
>> Just like __blk_end_request_all blk_mq_end_io is supposed to be called
>> from irq context.  The problem is that the MD bio end_io handler is calling
>> a sleeping function.  Not sure if that's a bug in MD though given the
>> kernfs symbols in the all trace and the recent churn in that area.
> 
> My understanding is this:
> 
> bitmap_endwrite -> calls sysfs_notify_dirent_safe under spinlock
>  -> calls kernfs_notify which takes a mutex.
> 
> So I am guessing it is this commit:
> 
> commit d911d98748018f7c8facc035ba39c30f5cce6f9c
> Author: Tejun Heo <tj@kernel.org>
> Date:   Wed Apr 9 11:07:31 2014 -0400
> 
>     kernfs: make kernfs_notify() trigger inotify events too
> 
> Tejun, what do you think?
> 
> Josh, Brian, could you try reverting that commit to see if it helps?

That definitely be a bug. If you need to block off ->bi_end_io(), just
must punt to a worker thread.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: virt_blk BUG: sleeping function called from invalid context
  2014-06-29 20:55         ` Jens Axboe
@ 2014-06-30 20:17           ` Tejun Heo
  -1 siblings, 0 replies; 21+ messages in thread
From: Tejun Heo @ 2014-06-30 20:17 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Michael S. Tsirkin, Christoph Hellwig, Josh Boyer, Rusty Russell,
	virtualization, Linux-Kernel@Vger. Kernel. Org, Brian Lane,
	John McCutchan, Robert Love, Eric Paris, Greg Kroah-Hartman

On Sun, Jun 29, 2014 at 02:55:36PM -0600, Jens Axboe wrote:
> > commit d911d98748018f7c8facc035ba39c30f5cce6f9c
> > Author: Tejun Heo <tj@kernel.org>
> > Date:   Wed Apr 9 11:07:31 2014 -0400
> > 
> >     kernfs: make kernfs_notify() trigger inotify events too
> > 
> > Tejun, what do you think?
> > 
> > Josh, Brian, could you try reverting that commit to see if it helps?
> 
> That definitely be a bug. If you need to block off ->bi_end_io(), just
> must punt to a worker thread.

Ugh... didn't realize we had users triggering sysfs notifications from
an atomic context.  fsnotify support requires sleepable context.
Guess I'll have to punt it to a work item. :(

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: virt_blk BUG: sleeping function called from invalid context
@ 2014-06-30 20:17           ` Tejun Heo
  0 siblings, 0 replies; 21+ messages in thread
From: Tejun Heo @ 2014-06-30 20:17 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Josh Boyer, Michael S. Tsirkin, John McCutchan,
	Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org,
	virtualization, Robert Love, Eric Paris, Brian Lane,
	Christoph Hellwig

On Sun, Jun 29, 2014 at 02:55:36PM -0600, Jens Axboe wrote:
> > commit d911d98748018f7c8facc035ba39c30f5cce6f9c
> > Author: Tejun Heo <tj@kernel.org>
> > Date:   Wed Apr 9 11:07:31 2014 -0400
> > 
> >     kernfs: make kernfs_notify() trigger inotify events too
> > 
> > Tejun, what do you think?
> > 
> > Josh, Brian, could you try reverting that commit to see if it helps?
> 
> That definitely be a bug. If you need to block off ->bi_end_io(), just
> must punt to a worker thread.

Ugh... didn't realize we had users triggering sysfs notifications from
an atomic context.  fsnotify support requires sleepable context.
Guess I'll have to punt it to a work item. :(

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH driver-core-linus] kernfs: kernfs_notify() must be useable from non-sleepable contexts
  2014-06-30 20:17           ` Tejun Heo
@ 2014-07-01 20:41             ` Tejun Heo
  -1 siblings, 0 replies; 21+ messages in thread
From: Tejun Heo @ 2014-07-01 20:41 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jens Axboe, Michael S. Tsirkin, Christoph Hellwig, Josh Boyer,
	Rusty Russell, virtualization, Linux-Kernel@Vger. Kernel. Org,
	Brian Lane, John McCutchan, Robert Love, Eric Paris

d911d9874801 ("kernfs: make kernfs_notify() trigger inotify events
too") added fsnotify triggering to kernfs_notify() which requires a
sleepable context.  There are already existing users of
kernfs_notify() which invoke it from an atomic context and in general
it's silly to require a sleepable context for triggering a
notification.

The following is an invalid context bug triggerd by md invoking
sysfs_notify() from IO completion path.

 BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
 2 locks held by swapper/1/0:
  #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>] virtblk_done+0x42/0xe0 [virtio_blk]
  #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at: [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
 irq event stamp: 33518
 hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
 hardirqs last disabled at (33516): [<ffffffff818122ed>] common_interrupt+0x6d/0x72
 softirqs last  enabled at (33518): [<ffffffff810a1272>] _local_bh_enable+0x22/0x50
 softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c
  0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000
  0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2
 Call Trace:
  <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
  [<ffffffff810d4f14>] __might_sleep+0x184/0x240
  [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440
  [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
  [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
  [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
  [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
  [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
  [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
  [<ffffffff813b46c4>] blk_update_request+0x94/0x420
  [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
  [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
  [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
  [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
  [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
  [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
  [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
  [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
  [<ffffffff81119436>] handle_edge_irq+0x66/0x130
  [<ffffffff8101c3e4>] handle_irq+0x84/0x150
  [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
  [<ffffffff818122f2>] common_interrupt+0x72/0x72
  <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
  [<ffffffff81025454>] default_idle+0x24/0x230
  [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
  [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
  [<ffffffff8104df1b>] start_secondary+0x25b/0x300

This patch fixes it by punting the notification delivery through a
work item.  This ends up adding an extra pointer to kernfs_elem_attr
enlarging kernfs_node by a pointer, which is not ideal but not a very
big deal either.  If this turns out to be an actual issue, we can move
kernfs_elem_attr->size to kernfs_node->iattr later.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
---
 fs/kernfs/file.c       |   69 +++++++++++++++++++++++++++++++++++++++----------
 include/linux/kernfs.h |    1 
 2 files changed, 56 insertions(+), 14 deletions(-)

--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -39,6 +39,19 @@ struct kernfs_open_node {
 	struct list_head	files; /* goes through kernfs_open_file.list */
 };
 
+/*
+ * kernfs_notify() may be called from any context and bounces notifications
+ * through a work item.  To minimize space overhead in kernfs_node, the
+ * pending queue is implemented as a singly linked list of kernfs_nodes.
+ * The list is terminated with the self pointer so that whether a
+ * kernfs_node is on the list or not can be determined by testing the next
+ * pointer for NULL.
+ */
+#define KERNFS_NOTIFY_EOL			((void *)&kernfs_notify_list)
+
+static DEFINE_SPINLOCK(kernfs_notify_lock);
+static struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL;
+
 static struct kernfs_open_file *kernfs_of(struct file *file)
 {
 	return ((struct seq_file *)file->private_data)->private;
@@ -783,24 +796,25 @@ static unsigned int kernfs_fop_poll(stru
 	return DEFAULT_POLLMASK|POLLERR|POLLPRI;
 }
 
-/**
- * kernfs_notify - notify a kernfs file
- * @kn: file to notify
- *
- * Notify @kn such that poll(2) on @kn wakes up.
- */
-void kernfs_notify(struct kernfs_node *kn)
+static void kernfs_notify_workfn(struct work_struct *work)
 {
-	struct kernfs_root *root = kernfs_root(kn);
+	struct kernfs_node *kn;
 	struct kernfs_open_node *on;
 	struct kernfs_super_info *info;
-	unsigned long flags;
-
-	if (WARN_ON(kernfs_type(kn) != KERNFS_FILE))
+repeat:
+	/* pop one off the notify_list */
+	spin_lock_irq(&kernfs_notify_lock);
+	kn = kernfs_notify_list;
+	if (kn == KERNFS_NOTIFY_EOL) {
+		spin_unlock_irq(&kernfs_notify_lock);
 		return;
+	}
+	kernfs_notify_list = kn->attr.notify_next;
+	kn->attr.notify_next = NULL;
+	spin_unlock_irq(&kernfs_notify_lock);
 
 	/* kick poll */
-	spin_lock_irqsave(&kernfs_open_node_lock, flags);
+	spin_lock_irq(&kernfs_open_node_lock);
 
 	on = kn->attr.open;
 	if (on) {
@@ -808,12 +822,12 @@ void kernfs_notify(struct kernfs_node *k
 		wake_up_interruptible(&on->poll);
 	}
 
-	spin_unlock_irqrestore(&kernfs_open_node_lock, flags);
+	spin_unlock_irq(&kernfs_open_node_lock);
 
 	/* kick fsnotify */
 	mutex_lock(&kernfs_mutex);
 
-	list_for_each_entry(info, &root->supers, node) {
+	list_for_each_entry(info, &kernfs_root(kn)->supers, node) {
 		struct inode *inode;
 		struct dentry *dentry;
 
@@ -833,6 +847,33 @@ void kernfs_notify(struct kernfs_node *k
 	}
 
 	mutex_unlock(&kernfs_mutex);
+	kernfs_put(kn);
+	goto repeat;
+}
+
+/**
+ * kernfs_notify - notify a kernfs file
+ * @kn: file to notify
+ *
+ * Notify @kn such that poll(2) on @kn wakes up.  Maybe be called from any
+ * context.
+ */
+void kernfs_notify(struct kernfs_node *kn)
+{
+	static DECLARE_WORK(kernfs_notify_work, kernfs_notify_workfn);
+	unsigned long flags;
+
+	if (WARN_ON(kernfs_type(kn) != KERNFS_FILE))
+		return;
+
+	spin_lock_irqsave(&kernfs_notify_lock, flags);
+	if (!kn->attr.notify_next) {
+		kernfs_get(kn);
+		kn->attr.notify_next = kernfs_notify_list;
+		kernfs_notify_list = kn;
+		schedule_work(&kernfs_notify_work);
+	}
+	spin_unlock_irqrestore(&kernfs_notify_lock, flags);
 }
 EXPORT_SYMBOL_GPL(kernfs_notify);
 
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -91,6 +91,7 @@ struct kernfs_elem_attr {
 	const struct kernfs_ops	*ops;
 	struct kernfs_open_node	*open;
 	loff_t			size;
+	struct kernfs_node	*notify_next;	/* for kernfs_notify() */
 };
 
 /*

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH driver-core-linus] kernfs: kernfs_notify() must be useable from non-sleepable contexts
@ 2014-07-01 20:41             ` Tejun Heo
  0 siblings, 0 replies; 21+ messages in thread
From: Tejun Heo @ 2014-07-01 20:41 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jens Axboe, Josh Boyer, Michael S. Tsirkin, John McCutchan,
	Linux-Kernel@Vger. Kernel. Org, virtualization, Robert Love,
	Eric Paris, Brian Lane, Christoph Hellwig

d911d9874801 ("kernfs: make kernfs_notify() trigger inotify events
too") added fsnotify triggering to kernfs_notify() which requires a
sleepable context.  There are already existing users of
kernfs_notify() which invoke it from an atomic context and in general
it's silly to require a sleepable context for triggering a
notification.

The following is an invalid context bug triggerd by md invoking
sysfs_notify() from IO completion path.

 BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
 2 locks held by swapper/1/0:
  #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>] virtblk_done+0x42/0xe0 [virtio_blk]
  #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at: [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
 irq event stamp: 33518
 hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
 hardirqs last disabled at (33516): [<ffffffff818122ed>] common_interrupt+0x6d/0x72
 softirqs last  enabled at (33518): [<ffffffff810a1272>] _local_bh_enable+0x22/0x50
 softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c
  0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000
  0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2
 Call Trace:
  <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
  [<ffffffff810d4f14>] __might_sleep+0x184/0x240
  [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440
  [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
  [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
  [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
  [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
  [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
  [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
  [<ffffffff813b46c4>] blk_update_request+0x94/0x420
  [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
  [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
  [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
  [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
  [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
  [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
  [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
  [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
  [<ffffffff81119436>] handle_edge_irq+0x66/0x130
  [<ffffffff8101c3e4>] handle_irq+0x84/0x150
  [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
  [<ffffffff818122f2>] common_interrupt+0x72/0x72
  <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
  [<ffffffff81025454>] default_idle+0x24/0x230
  [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
  [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
  [<ffffffff8104df1b>] start_secondary+0x25b/0x300

This patch fixes it by punting the notification delivery through a
work item.  This ends up adding an extra pointer to kernfs_elem_attr
enlarging kernfs_node by a pointer, which is not ideal but not a very
big deal either.  If this turns out to be an actual issue, we can move
kernfs_elem_attr->size to kernfs_node->iattr later.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
---
 fs/kernfs/file.c       |   69 +++++++++++++++++++++++++++++++++++++++----------
 include/linux/kernfs.h |    1 
 2 files changed, 56 insertions(+), 14 deletions(-)

--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -39,6 +39,19 @@ struct kernfs_open_node {
 	struct list_head	files; /* goes through kernfs_open_file.list */
 };
 
+/*
+ * kernfs_notify() may be called from any context and bounces notifications
+ * through a work item.  To minimize space overhead in kernfs_node, the
+ * pending queue is implemented as a singly linked list of kernfs_nodes.
+ * The list is terminated with the self pointer so that whether a
+ * kernfs_node is on the list or not can be determined by testing the next
+ * pointer for NULL.
+ */
+#define KERNFS_NOTIFY_EOL			((void *)&kernfs_notify_list)
+
+static DEFINE_SPINLOCK(kernfs_notify_lock);
+static struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL;
+
 static struct kernfs_open_file *kernfs_of(struct file *file)
 {
 	return ((struct seq_file *)file->private_data)->private;
@@ -783,24 +796,25 @@ static unsigned int kernfs_fop_poll(stru
 	return DEFAULT_POLLMASK|POLLERR|POLLPRI;
 }
 
-/**
- * kernfs_notify - notify a kernfs file
- * @kn: file to notify
- *
- * Notify @kn such that poll(2) on @kn wakes up.
- */
-void kernfs_notify(struct kernfs_node *kn)
+static void kernfs_notify_workfn(struct work_struct *work)
 {
-	struct kernfs_root *root = kernfs_root(kn);
+	struct kernfs_node *kn;
 	struct kernfs_open_node *on;
 	struct kernfs_super_info *info;
-	unsigned long flags;
-
-	if (WARN_ON(kernfs_type(kn) != KERNFS_FILE))
+repeat:
+	/* pop one off the notify_list */
+	spin_lock_irq(&kernfs_notify_lock);
+	kn = kernfs_notify_list;
+	if (kn == KERNFS_NOTIFY_EOL) {
+		spin_unlock_irq(&kernfs_notify_lock);
 		return;
+	}
+	kernfs_notify_list = kn->attr.notify_next;
+	kn->attr.notify_next = NULL;
+	spin_unlock_irq(&kernfs_notify_lock);
 
 	/* kick poll */
-	spin_lock_irqsave(&kernfs_open_node_lock, flags);
+	spin_lock_irq(&kernfs_open_node_lock);
 
 	on = kn->attr.open;
 	if (on) {
@@ -808,12 +822,12 @@ void kernfs_notify(struct kernfs_node *k
 		wake_up_interruptible(&on->poll);
 	}
 
-	spin_unlock_irqrestore(&kernfs_open_node_lock, flags);
+	spin_unlock_irq(&kernfs_open_node_lock);
 
 	/* kick fsnotify */
 	mutex_lock(&kernfs_mutex);
 
-	list_for_each_entry(info, &root->supers, node) {
+	list_for_each_entry(info, &kernfs_root(kn)->supers, node) {
 		struct inode *inode;
 		struct dentry *dentry;
 
@@ -833,6 +847,33 @@ void kernfs_notify(struct kernfs_node *k
 	}
 
 	mutex_unlock(&kernfs_mutex);
+	kernfs_put(kn);
+	goto repeat;
+}
+
+/**
+ * kernfs_notify - notify a kernfs file
+ * @kn: file to notify
+ *
+ * Notify @kn such that poll(2) on @kn wakes up.  Maybe be called from any
+ * context.
+ */
+void kernfs_notify(struct kernfs_node *kn)
+{
+	static DECLARE_WORK(kernfs_notify_work, kernfs_notify_workfn);
+	unsigned long flags;
+
+	if (WARN_ON(kernfs_type(kn) != KERNFS_FILE))
+		return;
+
+	spin_lock_irqsave(&kernfs_notify_lock, flags);
+	if (!kn->attr.notify_next) {
+		kernfs_get(kn);
+		kn->attr.notify_next = kernfs_notify_list;
+		kernfs_notify_list = kn;
+		schedule_work(&kernfs_notify_work);
+	}
+	spin_unlock_irqrestore(&kernfs_notify_lock, flags);
 }
 EXPORT_SYMBOL_GPL(kernfs_notify);
 
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -91,6 +91,7 @@ struct kernfs_elem_attr {
 	const struct kernfs_ops	*ops;
 	struct kernfs_open_node	*open;
 	loff_t			size;
+	struct kernfs_node	*notify_next;	/* for kernfs_notify() */
 };
 
 /*

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH driver-core-linus] kernfs: kernfs_notify() must be useable from non-sleepable contexts
  2014-07-01 20:41             ` Tejun Heo
@ 2014-07-01 20:51               ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-01 20:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Michael S. Tsirkin, Christoph Hellwig, Josh Boyer,
	Rusty Russell, virtualization, Linux-Kernel@Vger. Kernel. Org,
	Brian Lane, John McCutchan, Robert Love, Eric Paris

On Tue, Jul 01, 2014 at 04:41:03PM -0400, Tejun Heo wrote:
> d911d9874801 ("kernfs: make kernfs_notify() trigger inotify events
> too") added fsnotify triggering to kernfs_notify() which requires a
> sleepable context.  There are already existing users of
> kernfs_notify() which invoke it from an atomic context and in general
> it's silly to require a sleepable context for triggering a
> notification.
> 
> The following is an invalid context bug triggerd by md invoking
> sysfs_notify() from IO completion path.
> 
>  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
>  in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
>  2 locks held by swapper/1/0:
>   #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>] virtblk_done+0x42/0xe0 [virtio_blk]
>   #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at: [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
>  irq event stamp: 33518
>  hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
>  hardirqs last disabled at (33516): [<ffffffff818122ed>] common_interrupt+0x6d/0x72
>  softirqs last  enabled at (33518): [<ffffffff810a1272>] _local_bh_enable+0x22/0x50
>  softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
>  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
>  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>   0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c
>   0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000
>   0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2
>  Call Trace:
>   <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
>   [<ffffffff810d4f14>] __might_sleep+0x184/0x240
>   [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440
>   [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
>   [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
>   [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
>   [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
>   [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
>   [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
>   [<ffffffff813b46c4>] blk_update_request+0x94/0x420
>   [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
>   [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
>   [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
>   [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
>   [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
>   [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
>   [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
>   [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
>   [<ffffffff81119436>] handle_edge_irq+0x66/0x130
>   [<ffffffff8101c3e4>] handle_irq+0x84/0x150
>   [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
>   [<ffffffff818122f2>] common_interrupt+0x72/0x72
>   <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
>   [<ffffffff81025454>] default_idle+0x24/0x230
>   [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
>   [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
>   [<ffffffff8104df1b>] start_secondary+0x25b/0x300
> 
> This patch fixes it by punting the notification delivery through a
> work item.  This ends up adding an extra pointer to kernfs_elem_attr
> enlarging kernfs_node by a pointer, which is not ideal but not a very
> big deal either.  If this turns out to be an actual issue, we can move
> kernfs_elem_attr->size to kernfs_node->iattr later.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> ---
>  fs/kernfs/file.c       |   69 +++++++++++++++++++++++++++++++++++++++----------
>  include/linux/kernfs.h |    1 
>  2 files changed, 56 insertions(+), 14 deletions(-)

Looks good to me, do you want to take this with your other kernfs
patches for 3.16-final?  Or if you don't have that, I can take it
through my tree, it's your choice, either is fine for me.

If you want it in your tree, feel free to add:
	Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH driver-core-linus] kernfs: kernfs_notify() must be useable from non-sleepable contexts
@ 2014-07-01 20:51               ` Greg Kroah-Hartman
  0 siblings, 0 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-01 20:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Josh Boyer, Michael S. Tsirkin, John McCutchan,
	Linux-Kernel@Vger. Kernel. Org, virtualization, Robert Love,
	Eric Paris, Brian Lane, Christoph Hellwig

On Tue, Jul 01, 2014 at 04:41:03PM -0400, Tejun Heo wrote:
> d911d9874801 ("kernfs: make kernfs_notify() trigger inotify events
> too") added fsnotify triggering to kernfs_notify() which requires a
> sleepable context.  There are already existing users of
> kernfs_notify() which invoke it from an atomic context and in general
> it's silly to require a sleepable context for triggering a
> notification.
> 
> The following is an invalid context bug triggerd by md invoking
> sysfs_notify() from IO completion path.
> 
>  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
>  in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
>  2 locks held by swapper/1/0:
>   #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>] virtblk_done+0x42/0xe0 [virtio_blk]
>   #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at: [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
>  irq event stamp: 33518
>  hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
>  hardirqs last disabled at (33516): [<ffffffff818122ed>] common_interrupt+0x6d/0x72
>  softirqs last  enabled at (33518): [<ffffffff810a1272>] _local_bh_enable+0x22/0x50
>  softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
>  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
>  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>   0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c
>   0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000
>   0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2
>  Call Trace:
>   <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
>   [<ffffffff810d4f14>] __might_sleep+0x184/0x240
>   [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440
>   [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
>   [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
>   [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
>   [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
>   [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
>   [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
>   [<ffffffff813b46c4>] blk_update_request+0x94/0x420
>   [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
>   [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
>   [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
>   [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
>   [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
>   [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
>   [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
>   [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
>   [<ffffffff81119436>] handle_edge_irq+0x66/0x130
>   [<ffffffff8101c3e4>] handle_irq+0x84/0x150
>   [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
>   [<ffffffff818122f2>] common_interrupt+0x72/0x72
>   <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
>   [<ffffffff81025454>] default_idle+0x24/0x230
>   [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
>   [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
>   [<ffffffff8104df1b>] start_secondary+0x25b/0x300
> 
> This patch fixes it by punting the notification delivery through a
> work item.  This ends up adding an extra pointer to kernfs_elem_attr
> enlarging kernfs_node by a pointer, which is not ideal but not a very
> big deal either.  If this turns out to be an actual issue, we can move
> kernfs_elem_attr->size to kernfs_node->iattr later.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> ---
>  fs/kernfs/file.c       |   69 +++++++++++++++++++++++++++++++++++++++----------
>  include/linux/kernfs.h |    1 
>  2 files changed, 56 insertions(+), 14 deletions(-)

Looks good to me, do you want to take this with your other kernfs
patches for 3.16-final?  Or if you don't have that, I can take it
through my tree, it's your choice, either is fine for me.

If you want it in your tree, feel free to add:
	Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH driver-core-linus] kernfs: kernfs_notify() must be useable from non-sleepable contexts
  2014-07-01 20:41             ` Tejun Heo
@ 2014-07-02  5:53               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2014-07-02  5:53 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greg Kroah-Hartman, Jens Axboe, Christoph Hellwig, Josh Boyer,
	Rusty Russell, virtualization, Linux-Kernel@Vger. Kernel. Org,
	Brian Lane, John McCutchan, Robert Love, Eric Paris

On Tue, Jul 01, 2014 at 04:41:03PM -0400, Tejun Heo wrote:
> d911d9874801 ("kernfs: make kernfs_notify() trigger inotify events
> too") added fsnotify triggering to kernfs_notify() which requires a
> sleepable context.  There are already existing users of
> kernfs_notify() which invoke it from an atomic context and in general
> it's silly to require a sleepable context for triggering a
> notification.
> 
> The following is an invalid context bug triggerd by md invoking
> sysfs_notify() from IO completion path.
> 
>  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
>  in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
>  2 locks held by swapper/1/0:
>   #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>] virtblk_done+0x42/0xe0 [virtio_blk]
>   #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at: [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
>  irq event stamp: 33518
>  hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
>  hardirqs last disabled at (33516): [<ffffffff818122ed>] common_interrupt+0x6d/0x72
>  softirqs last  enabled at (33518): [<ffffffff810a1272>] _local_bh_enable+0x22/0x50
>  softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
>  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
>  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>   0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c
>   0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000
>   0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2
>  Call Trace:
>   <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
>   [<ffffffff810d4f14>] __might_sleep+0x184/0x240
>   [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440
>   [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
>   [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
>   [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
>   [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
>   [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
>   [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
>   [<ffffffff813b46c4>] blk_update_request+0x94/0x420
>   [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
>   [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
>   [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
>   [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
>   [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
>   [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
>   [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
>   [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
>   [<ffffffff81119436>] handle_edge_irq+0x66/0x130
>   [<ffffffff8101c3e4>] handle_irq+0x84/0x150
>   [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
>   [<ffffffff818122f2>] common_interrupt+0x72/0x72
>   <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
>   [<ffffffff81025454>] default_idle+0x24/0x230
>   [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
>   [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
>   [<ffffffff8104df1b>] start_secondary+0x25b/0x300
> 
> This patch fixes it by punting the notification delivery through a
> work item.  This ends up adding an extra pointer to kernfs_elem_attr
> enlarging kernfs_node by a pointer, which is not ideal but not a very
> big deal either.  If this turns out to be an actual issue, we can move
> kernfs_elem_attr->size to kernfs_node->iattr later.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jens Axboe <axboe@kernel.dk>

FWIW

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  fs/kernfs/file.c       |   69 +++++++++++++++++++++++++++++++++++++++----------
>  include/linux/kernfs.h |    1 
>  2 files changed, 56 insertions(+), 14 deletions(-)
> 
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -39,6 +39,19 @@ struct kernfs_open_node {
>  	struct list_head	files; /* goes through kernfs_open_file.list */
>  };
>  
> +/*
> + * kernfs_notify() may be called from any context and bounces notifications
> + * through a work item.  To minimize space overhead in kernfs_node, the
> + * pending queue is implemented as a singly linked list of kernfs_nodes.
> + * The list is terminated with the self pointer so that whether a
> + * kernfs_node is on the list or not can be determined by testing the next
> + * pointer for NULL.
> + */
> +#define KERNFS_NOTIFY_EOL			((void *)&kernfs_notify_list)
> +
> +static DEFINE_SPINLOCK(kernfs_notify_lock);
> +static struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL;
> +
>  static struct kernfs_open_file *kernfs_of(struct file *file)
>  {
>  	return ((struct seq_file *)file->private_data)->private;
> @@ -783,24 +796,25 @@ static unsigned int kernfs_fop_poll(stru
>  	return DEFAULT_POLLMASK|POLLERR|POLLPRI;
>  }
>  
> -/**
> - * kernfs_notify - notify a kernfs file
> - * @kn: file to notify
> - *
> - * Notify @kn such that poll(2) on @kn wakes up.
> - */
> -void kernfs_notify(struct kernfs_node *kn)
> +static void kernfs_notify_workfn(struct work_struct *work)
>  {
> -	struct kernfs_root *root = kernfs_root(kn);
> +	struct kernfs_node *kn;
>  	struct kernfs_open_node *on;
>  	struct kernfs_super_info *info;
> -	unsigned long flags;
> -
> -	if (WARN_ON(kernfs_type(kn) != KERNFS_FILE))
> +repeat:
> +	/* pop one off the notify_list */
> +	spin_lock_irq(&kernfs_notify_lock);
> +	kn = kernfs_notify_list;
> +	if (kn == KERNFS_NOTIFY_EOL) {
> +		spin_unlock_irq(&kernfs_notify_lock);
>  		return;
> +	}
> +	kernfs_notify_list = kn->attr.notify_next;
> +	kn->attr.notify_next = NULL;
> +	spin_unlock_irq(&kernfs_notify_lock);
>  
>  	/* kick poll */
> -	spin_lock_irqsave(&kernfs_open_node_lock, flags);
> +	spin_lock_irq(&kernfs_open_node_lock);
>  
>  	on = kn->attr.open;
>  	if (on) {
> @@ -808,12 +822,12 @@ void kernfs_notify(struct kernfs_node *k
>  		wake_up_interruptible(&on->poll);
>  	}
>  
> -	spin_unlock_irqrestore(&kernfs_open_node_lock, flags);
> +	spin_unlock_irq(&kernfs_open_node_lock);
>  
>  	/* kick fsnotify */
>  	mutex_lock(&kernfs_mutex);
>  
> -	list_for_each_entry(info, &root->supers, node) {
> +	list_for_each_entry(info, &kernfs_root(kn)->supers, node) {
>  		struct inode *inode;
>  		struct dentry *dentry;
>  
> @@ -833,6 +847,33 @@ void kernfs_notify(struct kernfs_node *k
>  	}
>  
>  	mutex_unlock(&kernfs_mutex);
> +	kernfs_put(kn);
> +	goto repeat;
> +}
> +
> +/**
> + * kernfs_notify - notify a kernfs file
> + * @kn: file to notify
> + *
> + * Notify @kn such that poll(2) on @kn wakes up.  Maybe be called from any
> + * context.
> + */
> +void kernfs_notify(struct kernfs_node *kn)
> +{
> +	static DECLARE_WORK(kernfs_notify_work, kernfs_notify_workfn);
> +	unsigned long flags;
> +
> +	if (WARN_ON(kernfs_type(kn) != KERNFS_FILE))
> +		return;
> +
> +	spin_lock_irqsave(&kernfs_notify_lock, flags);
> +	if (!kn->attr.notify_next) {
> +		kernfs_get(kn);
> +		kn->attr.notify_next = kernfs_notify_list;
> +		kernfs_notify_list = kn;
> +		schedule_work(&kernfs_notify_work);
> +	}
> +	spin_unlock_irqrestore(&kernfs_notify_lock, flags);
>  }
>  EXPORT_SYMBOL_GPL(kernfs_notify);
>  
> --- a/include/linux/kernfs.h
> +++ b/include/linux/kernfs.h
> @@ -91,6 +91,7 @@ struct kernfs_elem_attr {
>  	const struct kernfs_ops	*ops;
>  	struct kernfs_open_node	*open;
>  	loff_t			size;
> +	struct kernfs_node	*notify_next;	/* for kernfs_notify() */
>  };
>  
>  /*

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH driver-core-linus] kernfs: kernfs_notify() must be useable from non-sleepable contexts
@ 2014-07-02  5:53               ` Michael S. Tsirkin
  0 siblings, 0 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2014-07-02  5:53 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Josh Boyer, Robert Love, John McCutchan,
	Greg Kroah-Hartman, Linux-Kernel@Vger. Kernel. Org,
	virtualization, Eric Paris, Brian Lane, Christoph Hellwig

On Tue, Jul 01, 2014 at 04:41:03PM -0400, Tejun Heo wrote:
> d911d9874801 ("kernfs: make kernfs_notify() trigger inotify events
> too") added fsnotify triggering to kernfs_notify() which requires a
> sleepable context.  There are already existing users of
> kernfs_notify() which invoke it from an atomic context and in general
> it's silly to require a sleepable context for triggering a
> notification.
> 
> The following is an invalid context bug triggerd by md invoking
> sysfs_notify() from IO completion path.
> 
>  BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
>  in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
>  2 locks held by swapper/1/0:
>   #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>] virtblk_done+0x42/0xe0 [virtio_blk]
>   #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at: [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
>  irq event stamp: 33518
>  hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
>  hardirqs last disabled at (33516): [<ffffffff818122ed>] common_interrupt+0x6d/0x72
>  softirqs last  enabled at (33518): [<ffffffff810a1272>] _local_bh_enable+0x22/0x50
>  softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
>  CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
>  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>   0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c
>   0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000
>   0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2
>  Call Trace:
>   <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
>   [<ffffffff810d4f14>] __might_sleep+0x184/0x240
>   [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440
>   [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
>   [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
>   [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
>   [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
>   [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
>   [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
>   [<ffffffff813b46c4>] blk_update_request+0x94/0x420
>   [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
>   [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
>   [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
>   [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
>   [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
>   [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
>   [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
>   [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
>   [<ffffffff81119436>] handle_edge_irq+0x66/0x130
>   [<ffffffff8101c3e4>] handle_irq+0x84/0x150
>   [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
>   [<ffffffff818122f2>] common_interrupt+0x72/0x72
>   <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
>   [<ffffffff81025454>] default_idle+0x24/0x230
>   [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
>   [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
>   [<ffffffff8104df1b>] start_secondary+0x25b/0x300
> 
> This patch fixes it by punting the notification delivery through a
> work item.  This ends up adding an extra pointer to kernfs_elem_attr
> enlarging kernfs_node by a pointer, which is not ideal but not a very
> big deal either.  If this turns out to be an actual issue, we can move
> kernfs_elem_attr->size to kernfs_node->iattr later.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jens Axboe <axboe@kernel.dk>

FWIW

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  fs/kernfs/file.c       |   69 +++++++++++++++++++++++++++++++++++++++----------
>  include/linux/kernfs.h |    1 
>  2 files changed, 56 insertions(+), 14 deletions(-)
> 
> --- a/fs/kernfs/file.c
> +++ b/fs/kernfs/file.c
> @@ -39,6 +39,19 @@ struct kernfs_open_node {
>  	struct list_head	files; /* goes through kernfs_open_file.list */
>  };
>  
> +/*
> + * kernfs_notify() may be called from any context and bounces notifications
> + * through a work item.  To minimize space overhead in kernfs_node, the
> + * pending queue is implemented as a singly linked list of kernfs_nodes.
> + * The list is terminated with the self pointer so that whether a
> + * kernfs_node is on the list or not can be determined by testing the next
> + * pointer for NULL.
> + */
> +#define KERNFS_NOTIFY_EOL			((void *)&kernfs_notify_list)
> +
> +static DEFINE_SPINLOCK(kernfs_notify_lock);
> +static struct kernfs_node *kernfs_notify_list = KERNFS_NOTIFY_EOL;
> +
>  static struct kernfs_open_file *kernfs_of(struct file *file)
>  {
>  	return ((struct seq_file *)file->private_data)->private;
> @@ -783,24 +796,25 @@ static unsigned int kernfs_fop_poll(stru
>  	return DEFAULT_POLLMASK|POLLERR|POLLPRI;
>  }
>  
> -/**
> - * kernfs_notify - notify a kernfs file
> - * @kn: file to notify
> - *
> - * Notify @kn such that poll(2) on @kn wakes up.
> - */
> -void kernfs_notify(struct kernfs_node *kn)
> +static void kernfs_notify_workfn(struct work_struct *work)
>  {
> -	struct kernfs_root *root = kernfs_root(kn);
> +	struct kernfs_node *kn;
>  	struct kernfs_open_node *on;
>  	struct kernfs_super_info *info;
> -	unsigned long flags;
> -
> -	if (WARN_ON(kernfs_type(kn) != KERNFS_FILE))
> +repeat:
> +	/* pop one off the notify_list */
> +	spin_lock_irq(&kernfs_notify_lock);
> +	kn = kernfs_notify_list;
> +	if (kn == KERNFS_NOTIFY_EOL) {
> +		spin_unlock_irq(&kernfs_notify_lock);
>  		return;
> +	}
> +	kernfs_notify_list = kn->attr.notify_next;
> +	kn->attr.notify_next = NULL;
> +	spin_unlock_irq(&kernfs_notify_lock);
>  
>  	/* kick poll */
> -	spin_lock_irqsave(&kernfs_open_node_lock, flags);
> +	spin_lock_irq(&kernfs_open_node_lock);
>  
>  	on = kn->attr.open;
>  	if (on) {
> @@ -808,12 +822,12 @@ void kernfs_notify(struct kernfs_node *k
>  		wake_up_interruptible(&on->poll);
>  	}
>  
> -	spin_unlock_irqrestore(&kernfs_open_node_lock, flags);
> +	spin_unlock_irq(&kernfs_open_node_lock);
>  
>  	/* kick fsnotify */
>  	mutex_lock(&kernfs_mutex);
>  
> -	list_for_each_entry(info, &root->supers, node) {
> +	list_for_each_entry(info, &kernfs_root(kn)->supers, node) {
>  		struct inode *inode;
>  		struct dentry *dentry;
>  
> @@ -833,6 +847,33 @@ void kernfs_notify(struct kernfs_node *k
>  	}
>  
>  	mutex_unlock(&kernfs_mutex);
> +	kernfs_put(kn);
> +	goto repeat;
> +}
> +
> +/**
> + * kernfs_notify - notify a kernfs file
> + * @kn: file to notify
> + *
> + * Notify @kn such that poll(2) on @kn wakes up.  Maybe be called from any
> + * context.
> + */
> +void kernfs_notify(struct kernfs_node *kn)
> +{
> +	static DECLARE_WORK(kernfs_notify_work, kernfs_notify_workfn);
> +	unsigned long flags;
> +
> +	if (WARN_ON(kernfs_type(kn) != KERNFS_FILE))
> +		return;
> +
> +	spin_lock_irqsave(&kernfs_notify_lock, flags);
> +	if (!kn->attr.notify_next) {
> +		kernfs_get(kn);
> +		kn->attr.notify_next = kernfs_notify_list;
> +		kernfs_notify_list = kn;
> +		schedule_work(&kernfs_notify_work);
> +	}
> +	spin_unlock_irqrestore(&kernfs_notify_lock, flags);
>  }
>  EXPORT_SYMBOL_GPL(kernfs_notify);
>  
> --- a/include/linux/kernfs.h
> +++ b/include/linux/kernfs.h
> @@ -91,6 +91,7 @@ struct kernfs_elem_attr {
>  	const struct kernfs_ops	*ops;
>  	struct kernfs_open_node	*open;
>  	loff_t			size;
> +	struct kernfs_node	*notify_next;	/* for kernfs_notify() */
>  };
>  
>  /*

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH driver-core-linus] kernfs: kernfs_notify() must be useable from non-sleepable contexts
  2014-07-01 20:51               ` Greg Kroah-Hartman
@ 2014-07-02 14:14                 ` Tejun Heo
  -1 siblings, 0 replies; 21+ messages in thread
From: Tejun Heo @ 2014-07-02 14:14 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jens Axboe, Michael S. Tsirkin, Christoph Hellwig, Josh Boyer,
	Rusty Russell, virtualization, Linux-Kernel@Vger. Kernel. Org,
	Brian Lane, John McCutchan, Robert Love, Eric Paris

Hello,

On Tue, Jul 01, 2014 at 01:51:48PM -0700, Greg Kroah-Hartman wrote:
> Looks good to me, do you want to take this with your other kernfs
> patches for 3.16-final?  Or if you don't have that, I can take it
> through my tree, it's your choice, either is fine for me.
> 
> If you want it in your tree, feel free to add:
> 	Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

The other kernfs changes are cgroup specific and in
cgroup/for-3.16-fixes.  I think this one fits better in the driver
core tree.  Can you please route it?

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH driver-core-linus] kernfs: kernfs_notify() must be useable from non-sleepable contexts
@ 2014-07-02 14:14                 ` Tejun Heo
  0 siblings, 0 replies; 21+ messages in thread
From: Tejun Heo @ 2014-07-02 14:14 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Jens Axboe, Josh Boyer, Michael S. Tsirkin, John McCutchan,
	Linux-Kernel@Vger. Kernel. Org, virtualization, Robert Love,
	Eric Paris, Brian Lane, Christoph Hellwig

Hello,

On Tue, Jul 01, 2014 at 01:51:48PM -0700, Greg Kroah-Hartman wrote:
> Looks good to me, do you want to take this with your other kernfs
> patches for 3.16-final?  Or if you don't have that, I can take it
> through my tree, it's your choice, either is fine for me.
> 
> If you want it in your tree, feel free to add:
> 	Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

The other kernfs changes are cgroup specific and in
cgroup/for-3.16-fixes.  I think this one fits better in the driver
core tree.  Can you please route it?

Thanks!

-- 
tejun

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH driver-core-linus] kernfs: kernfs_notify() must be useable from non-sleepable contexts
  2014-07-02 14:14                 ` Tejun Heo
@ 2014-07-02 16:31                   ` Greg Kroah-Hartman
  -1 siblings, 0 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-02 16:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Michael S. Tsirkin, Christoph Hellwig, Josh Boyer,
	Rusty Russell, virtualization, Linux-Kernel@Vger. Kernel. Org,
	Brian Lane, John McCutchan, Robert Love, Eric Paris

On Wed, Jul 02, 2014 at 10:14:16AM -0400, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jul 01, 2014 at 01:51:48PM -0700, Greg Kroah-Hartman wrote:
> > Looks good to me, do you want to take this with your other kernfs
> > patches for 3.16-final?  Or if you don't have that, I can take it
> > through my tree, it's your choice, either is fine for me.
> > 
> > If you want it in your tree, feel free to add:
> > 	Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> The other kernfs changes are cgroup specific and in
> cgroup/for-3.16-fixes.  I think this one fits better in the driver
> core tree.  Can you please route it?

Sure, will queue it up now.

greg k-h

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH driver-core-linus] kernfs: kernfs_notify() must be useable from non-sleepable contexts
@ 2014-07-02 16:31                   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 21+ messages in thread
From: Greg Kroah-Hartman @ 2014-07-02 16:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Josh Boyer, Michael S. Tsirkin, John McCutchan,
	Linux-Kernel@Vger. Kernel. Org, virtualization, Robert Love,
	Eric Paris, Brian Lane, Christoph Hellwig

On Wed, Jul 02, 2014 at 10:14:16AM -0400, Tejun Heo wrote:
> Hello,
> 
> On Tue, Jul 01, 2014 at 01:51:48PM -0700, Greg Kroah-Hartman wrote:
> > Looks good to me, do you want to take this with your other kernfs
> > patches for 3.16-final?  Or if you don't have that, I can take it
> > through my tree, it's your choice, either is fine for me.
> > 
> > If you want it in your tree, feel free to add:
> > 	Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> The other kernfs changes are cgroup specific and in
> cgroup/for-3.16-fixes.  I think this one fits better in the driver
> core tree.  Can you please route it?

Sure, will queue it up now.

greg k-h

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-07-02 16:32 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-27 11:57 virt_blk BUG: sleeping function called from invalid context Josh Boyer
2014-06-29  8:26 ` Michael S. Tsirkin
2014-06-29  8:26   ` Michael S. Tsirkin
2014-06-29 19:32   ` Christoph Hellwig
2014-06-29 20:47     ` Michael S. Tsirkin
2014-06-29 20:47       ` Michael S. Tsirkin
2014-06-29 20:55       ` Jens Axboe
2014-06-29 20:55         ` Jens Axboe
2014-06-30 20:17         ` Tejun Heo
2014-06-30 20:17           ` Tejun Heo
2014-07-01 20:41           ` [PATCH driver-core-linus] kernfs: kernfs_notify() must be useable from non-sleepable contexts Tejun Heo
2014-07-01 20:41             ` Tejun Heo
2014-07-01 20:51             ` Greg Kroah-Hartman
2014-07-01 20:51               ` Greg Kroah-Hartman
2014-07-02 14:14               ` Tejun Heo
2014-07-02 14:14                 ` Tejun Heo
2014-07-02 16:31                 ` Greg Kroah-Hartman
2014-07-02 16:31                   ` Greg Kroah-Hartman
2014-07-02  5:53             ` Michael S. Tsirkin
2014-07-02  5:53               ` Michael S. Tsirkin
2014-06-29 19:32   ` virt_blk BUG: sleeping function called from invalid context Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.