linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kobject lifetime issues in blk-mq
@ 2018-11-09 20:35 Guenter Roeck
  2018-11-12  9:20 ` Ming Lei
  0 siblings, 1 reply; 18+ messages in thread
From: Guenter Roeck @ 2018-11-09 20:35 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe
  Cc: Peter Zijlstra, linux-block, linux-kernel, Hannes Reinecke,
	Paolo Bonzini, Christoph Hellwig, Martin K. Petersen,
	James E.J. Bottomley, linux-scsi

Hi,

Images with CONFIG_DEBUG_KOBJECT=y, CONFIG_DEBUG_KOBJECT_RELEASE=y
enabled report kobject lifetime issues when booting an image in qemu
using virtio-scsi-pci.

Bisect points to two different commits, depending on the kernel
configuration.

1st configuration: defconfig plus:

CONFIG_VIRTIO_BLK=y
CONFIG_SCSI_LOWLEVEL=y
CONFIG_SCSI_VIRTIO=y
CONFIG_VIRTIO_PCI=y
CONFIG_VIRTIO_MMIO=y
CONFIG_DEBUG_OBJECTS=y
CONFIG_DEBUG_OBJECTS_FREE=y
CONFIG_DEBUG_OBJECTS_TIMERS=y
CONFIG_DEBUG_KOBJECT=y
CONFIG_DEBUG_KOBJECT_RELEASE=y

Bisecting this configuration points to commit b5b6e8c8d3b4 ("scsi:
virtio_scsi: fix IO hang caused by automatic irq vector affinity")
as the culprit. This commit enforces the use of blk-mq for virtio_scsi;
the bisect log is therefore a bit misleading.

2nd configuration: As above, plus:
CONFIG_SCSI_MQ_DEFAULT=y

With this configuration, bisect points to commit 7ea5fe31c12d
("blk-mq: make lifetime consitent between q/ctx and its kobject").

The qemu command line used to reproduce the problem is as follows.
The qemu version does not seem to matter (I tested with qemu versions
2.5, 2.11, and 3.0).

qemu-system-x86_64 \
	-kernel arch/x86/boot/bzImage \
	-device virtio-scsi-pci,id=scsi \
	-device scsi-hd,bus=scsi.0,drive=d0 \
	-drive file=./wheezy.img,format=raw,if=none,id=d0 \
	-snapshot \
	-m 2G -smp 4 -enable-kvm \
	-net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic \
	-nographic -monitor none \
	-no-reboot \
	-append "console=ttyS0 root=/dev/sda earlyprintk=serial panic_on_warn=1 panic=-1"

Root file system:
	https://storage.googleapis.com/syzkaller/wheezy.img
or:
	https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/rootfs.ext2.gz

It seems that any root file system can be used to reproduce the problem.
Also, the problem is not limited to virtio-scsi-pci. I also tried
am53c974, lsi53c810, and lsi53c895a (after enabling the respective
drivers), with the same result.

Overall, this suggests that the problem is related to blk-mq and was
indeed introduced with commit 7ea5fe31c12d.

For reference, I attached an actual crash log as well as a set of bisect
logs below.

Unfortunately, my understanding of blk-mq is not good enough to find the
underlying problem and suggest a fix. Please let me know if there is
anything else I can do to help fixing the problem.

Note that the problem was originally reported by syzbot running a test
on chromeos-4.14. It may well be that the problem has already been
reported and is being worked on. If so, please apologize the noise.

Thanks,
Guenter

---
[    8.700038] ------------[ cut here ]------------
[    8.700376] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10
[    8.701032] WARNING: CPU: 0 PID: 0 at lib/debugobjects.c:329 debug_print_object+0x6a/0x80
[    8.701032] Kernel panic - not syncing: panic_on_warn set ...
[    8.701032] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.20.0-rc1 #30
[    8.701032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[    8.701032] Call Trace:
[    8.701032]  <IRQ>
[    8.701032]  dump_stack+0x46/0x5b
[    8.701032]  panic+0xf3/0x246
[    8.701032]  ? debug_print_object+0x6a/0x80
[    8.701032]  __warn+0xeb/0xf0
[    8.701032]  ? debug_print_object+0x6a/0x80
[    8.701032]  ? debug_print_object+0x6a/0x80
[    8.701032]  report_bug+0xb1/0x120
[    8.701032]  fixup_bug.part.10+0x13/0x30
[    8.701032]  do_error_trap+0x8f/0xb0
[    8.701032]  do_invalid_op+0x31/0x40
[    8.701032]  ? debug_print_object+0x6a/0x80
[    8.701032]  invalid_op+0x14/0x20
[    8.701032] RIP: 0010:debug_print_object+0x6a/0x80
[    8.701032] Code: 8b 43 10 83 c2 01 8b 4b 14 89 15 c1 e2 78 01 4c 8b 45 00 4c
89 e6 48 c7 c7 f0 ca 41 a6 48 8b 14 c5 a0 f7 24 a6 e8 06 8e cc ff <0f> 0b 5b 83
05 70 47 1a 01 01 5d 41 5c c3 83 05 65 47 1a 01 01 c3
[    8.701032] RSP: 0018:ffff89b9fda03e48 EFLAGS: 00010082
[    8.701032] RAX: 0000000000000000 RBX: ffff89b9fd49d0c8 RCX: ffffffffa6646e18
[    8.701032] RDX: 0000000000000001 RSI: 0000000000000082 RDI: ffffffffa6cc596c
[    8.701032] RBP: ffffffffa664a5c0 R08: 79616c6564203a74 R09: 5f6b726f775f6465
[    8.701032] R10: ffff89b9fda03f10 R11: 3178302f3078302b R12: ffffffffa6403e22
[    8.701032] R13: 0000000000000001 R14: ffffffffa6d4d458 R15: ffff89b9fc4c8780
[    8.701032]  ? debug_print_object+0x6a/0x80
[    8.701032]  debug_check_no_obj_freed+0x1b8/0x1ea
[    8.701032]  ? rcu_process_callbacks+0x2a7/0x750
[    8.701032]  kmem_cache_free+0x6e/0x1a0
[    8.701032]  ? __blk_release_queue+0x150/0x150
[    8.701032]  rcu_process_callbacks+0x2a7/0x750
[    8.701032]  __do_softirq+0xf2/0x2c7
[    8.701032]  irq_exit+0xb7/0xc0
[    8.701032]  smp_apic_timer_interrupt+0x67/0x130
[    8.701032]  apic_timer_interrupt+0xf/0x20
[    8.701032]  </IRQ>
[    8.701032] RIP: 0010:default_idle+0x12/0x140
[    8.701032] Code: fe ff ff c6 43 08 00 fb eb b0 31 c0 eb b4 e8 65 c8 60 ff 90
90 90 90 90 41 54 55 53 65 8b 2d 15 c4 3b 5a 66 66 66 66 90 fb f4 <65> 8b 2d 07
c4 3b 5a 66 66 66 66 90 5b 5d 41 5c c3 65 8b 05 f6 c3
[    8.701032] RSP: 0018:ffffffffa6603e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[    8.701032] RAX: ffffffffa5c52d10 RBX: 0000000000000000 RCX: 0000000000000000
[    8.701032] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000020605092a
[    8.701032] RBP: 0000000000000000 R08: ffffffffcff7766a R09: ffff89b9fda20b00
[    8.701032] R10: 0000000000000000 R11: 0000000000002400 R12: ffffffffa6611740
[    8.701032] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffa6611740
[    8.701032]  ? __sched_text_end+0x5/0x5
[    8.701032]  do_idle+0x19c/0x230
[    8.701032]  cpu_startup_entry+0x14/0x20
[    8.701032]  start_kernel+0x491/0x4b1
[    8.701032]  secondary_startup_64+0xa4/0xb0
[    8.701032] Kernel Offset: 0x24200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

---
# bad: [a351e9b9fc24e982ec2f0e76379a49826036da12] Linux 4.11
# good: [c470abd4fde40ea6a0846a2beab642a578c0b8cd] Linux 4.10
git bisect start 'v4.11' 'v4.10'
# good: [1ec5c1867af085897bb9e0f67bef3713334dbe7f] Merge tag 'gpio-v4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
git bisect good 1ec5c1867af085897bb9e0f67bef3713334dbe7f
# good: [79b17ea740d9fab178d6a1aa15d848b5e6c01b82] Merge tag 'trace-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
git bisect good 79b17ea740d9fab178d6a1aa15d848b5e6c01b82
# bad: [d4ee21ef61dfcaeba7542cdc1c9242364a8ca5c0] Merge tag 'reset-fixes-for-4.11-2' of git://git.pengutronix.de/git/pza/linux into fixes
git bisect bad d4ee21ef61dfcaeba7542cdc1c9242364a8ca5c0
# good: [5eca1c10cbaa9c366c18ca79f81f21c731e3dcc7] sched/headers: Clean up <linux/sched.h>
git bisect good 5eca1c10cbaa9c366c18ca79f81f21c731e3dcc7
# good: [2d62e0768d3c28536d4cfe4c40ba1e5e8e442a93] Merge tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect good 2d62e0768d3c28536d4cfe4c40ba1e5e8e442a93
# bad: [9db61d6fd65f53eaee13fbb451fd761ce926dea9] Merge tag 'xfs-4.11-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
git bisect bad 9db61d6fd65f53eaee13fbb451fd761ce926dea9
# good: [f7d6a7283aa1de430c6323a9714d1a787bc2d1ea] Merge tag 'powerpc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
git bisect good f7d6a7283aa1de430c6323a9714d1a787bc2d1ea
# good: [ec3b93ae0bf4742e9cbb40e1964129926c1464e0] Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good ec3b93ae0bf4742e9cbb40e1964129926c1464e0
# bad: [34bbce9e344b47e8871273409632f525973afad4] Merge branch 'for-linus' of git://git.kernel.dk/linux-block
git bisect bad 34bbce9e344b47e8871273409632f525973afad4
# good: [bb61ce54e8986ea18f7eb13cfa4a6fa8eb311bbd] Merge tag 'media/v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
git bisect good bb61ce54e8986ea18f7eb13cfa4a6fa8eb311bbd
# bad: [b0bfdfc2bf7fa85317824c6a389fc373dfcef5bc] block/sed: Fix opal user range check and unused variables
git bisect bad b0bfdfc2bf7fa85317824c6a389fc373dfcef5bc
# bad: [6c8b232efea1ad3d263ff8b9c16b7e8767a77488] blk-mq: make lifetime consistent between hctx and its kobject
git bisect bad 6c8b232efea1ad3d263ff8b9c16b7e8767a77488
# bad: [7ea5fe31c12dd8bcf4a9c5a4a7e8e23826a9a3b8] blk-mq: make lifetime consitent between q/ctx and its kobject
git bisect bad 7ea5fe31c12dd8bcf4a9c5a4a7e8e23826a9a3b8
# good: [737f98cfe7de8df7433a4d846850aa8efa44bd48] blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue()
git bisect good 737f98cfe7de8df7433a4d846850aa8efa44bd48
# first bad commit: [7ea5fe31c12dd8bcf4a9c5a4a7e8e23826a9a3b8] blk-mq: make lifetime consitent between q/ctx and its kobject

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kobject lifetime issues in blk-mq
  2018-11-09 20:35 kobject lifetime issues in blk-mq Guenter Roeck
@ 2018-11-12  9:20 ` Ming Lei
  2018-11-12  9:44   ` Ming Lei
  2018-11-12 16:16   ` Guenter Roeck
  0 siblings, 2 replies; 18+ messages in thread
From: Ming Lei @ 2018-11-12  9:20 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Ming Lei, Jens Axboe, Peter Zijlstra, linux-block, linux-kernel,
	Hannes Reinecke, Paolo Bonzini, Christoph Hellwig,
	Martin K. Petersen, James E.J. Bottomley, linux-scsi

On Fri, Nov 09, 2018 at 12:35:18PM -0800, Guenter Roeck wrote:
> Hi,
> 
> Images with CONFIG_DEBUG_KOBJECT=y, CONFIG_DEBUG_KOBJECT_RELEASE=y
> enabled report kobject lifetime issues when booting an image in qemu
> using virtio-scsi-pci.
> 
> Bisect points to two different commits, depending on the kernel
> configuration.
> 
> 1st configuration: defconfig plus:
> 
> CONFIG_VIRTIO_BLK=y
> CONFIG_SCSI_LOWLEVEL=y
> CONFIG_SCSI_VIRTIO=y
> CONFIG_VIRTIO_PCI=y
> CONFIG_VIRTIO_MMIO=y
> CONFIG_DEBUG_OBJECTS=y
> CONFIG_DEBUG_OBJECTS_FREE=y
> CONFIG_DEBUG_OBJECTS_TIMERS=y
> CONFIG_DEBUG_KOBJECT=y
> CONFIG_DEBUG_KOBJECT_RELEASE=y
> 
> Bisecting this configuration points to commit b5b6e8c8d3b4 ("scsi:
> virtio_scsi: fix IO hang caused by automatic irq vector affinity")
> as the culprit. This commit enforces the use of blk-mq for virtio_scsi;
> the bisect log is therefore a bit misleading.
> 
> 2nd configuration: As above, plus:
> CONFIG_SCSI_MQ_DEFAULT=y
> 
> With this configuration, bisect points to commit 7ea5fe31c12d
> ("blk-mq: make lifetime consitent between q/ctx and its kobject").
> 
> The qemu command line used to reproduce the problem is as follows.
> The qemu version does not seem to matter (I tested with qemu versions
> 2.5, 2.11, and 3.0).
> 
> qemu-system-x86_64 \
> 	-kernel arch/x86/boot/bzImage \
> 	-device virtio-scsi-pci,id=scsi \
> 	-device scsi-hd,bus=scsi.0,drive=d0 \
> 	-drive file=./wheezy.img,format=raw,if=none,id=d0 \
> 	-snapshot \
> 	-m 2G -smp 4 -enable-kvm \
> 	-net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic \
> 	-nographic -monitor none \
> 	-no-reboot \
> 	-append "console=ttyS0 root=/dev/sda earlyprintk=serial panic_on_warn=1 panic=-1"
> 
> Root file system:
> 	https://storage.googleapis.com/syzkaller/wheezy.img
> or:
> 	https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/rootfs.ext2.gz
> 
> It seems that any root file system can be used to reproduce the problem.
> Also, the problem is not limited to virtio-scsi-pci. I also tried
> am53c974, lsi53c810, and lsi53c895a (after enabling the respective
> drivers), with the same result.
> 
> Overall, this suggests that the problem is related to blk-mq and was
> indeed introduced with commit 7ea5fe31c12d.
> 
> For reference, I attached an actual crash log as well as a set of bisect
> logs below.
> 
> Unfortunately, my understanding of blk-mq is not good enough to find the
> underlying problem and suggest a fix. Please let me know if there is
> anything else I can do to help fixing the problem.
> 
> Note that the problem was originally reported by syzbot running a test
> on chromeos-4.14. It may well be that the problem has already been
> reported and is being worked on. If so, please apologize the noise.
> 
> Thanks,
> Guenter
> 
> ---
> [    8.700038] ------------[ cut here ]------------
> [    8.700376] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10
> [    8.701032] WARNING: CPU: 0 PID: 0 at lib/debugobjects.c:329 debug_print_object+0x6a/0x80
> [    8.701032] Kernel panic - not syncing: panic_on_warn set ...
> [    8.701032] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.20.0-rc1 #30
> [    8.701032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> [    8.701032] Call Trace:
> [    8.701032]  <IRQ>
> [    8.701032]  dump_stack+0x46/0x5b
> [    8.701032]  panic+0xf3/0x246
> [    8.701032]  ? debug_print_object+0x6a/0x80
> [    8.701032]  __warn+0xeb/0xf0
> [    8.701032]  ? debug_print_object+0x6a/0x80
> [    8.701032]  ? debug_print_object+0x6a/0x80
> [    8.701032]  report_bug+0xb1/0x120
> [    8.701032]  fixup_bug.part.10+0x13/0x30
> [    8.701032]  do_error_trap+0x8f/0xb0
> [    8.701032]  do_invalid_op+0x31/0x40
> [    8.701032]  ? debug_print_object+0x6a/0x80
> [    8.701032]  invalid_op+0x14/0x20
> [    8.701032] RIP: 0010:debug_print_object+0x6a/0x80
> [    8.701032] Code: 8b 43 10 83 c2 01 8b 4b 14 89 15 c1 e2 78 01 4c 8b 45 00 4c
> 89 e6 48 c7 c7 f0 ca 41 a6 48 8b 14 c5 a0 f7 24 a6 e8 06 8e cc ff <0f> 0b 5b 83
> 05 70 47 1a 01 01 5d 41 5c c3 83 05 65 47 1a 01 01 c3
> [    8.701032] RSP: 0018:ffff89b9fda03e48 EFLAGS: 00010082
> [    8.701032] RAX: 0000000000000000 RBX: ffff89b9fd49d0c8 RCX: ffffffffa6646e18
> [    8.701032] RDX: 0000000000000001 RSI: 0000000000000082 RDI: ffffffffa6cc596c
> [    8.701032] RBP: ffffffffa664a5c0 R08: 79616c6564203a74 R09: 5f6b726f775f6465
> [    8.701032] R10: ffff89b9fda03f10 R11: 3178302f3078302b R12: ffffffffa6403e22
> [    8.701032] R13: 0000000000000001 R14: ffffffffa6d4d458 R15: ffff89b9fc4c8780
> [    8.701032]  ? debug_print_object+0x6a/0x80
> [    8.701032]  debug_check_no_obj_freed+0x1b8/0x1ea
> [    8.701032]  ? rcu_process_callbacks+0x2a7/0x750
> [    8.701032]  kmem_cache_free+0x6e/0x1a0
> [    8.701032]  ? __blk_release_queue+0x150/0x150
> [    8.701032]  rcu_process_callbacks+0x2a7/0x750
> [    8.701032]  __do_softirq+0xf2/0x2c7
> [    8.701032]  irq_exit+0xb7/0xc0
> [    8.701032]  smp_apic_timer_interrupt+0x67/0x130
> [    8.701032]  apic_timer_interrupt+0xf/0x20
> [    8.701032]  </IRQ>
> [    8.701032] RIP: 0010:default_idle+0x12/0x140
> [    8.701032] Code: fe ff ff c6 43 08 00 fb eb b0 31 c0 eb b4 e8 65 c8 60 ff 90
> 90 90 90 90 41 54 55 53 65 8b 2d 15 c4 3b 5a 66 66 66 66 90 fb f4 <65> 8b 2d 07
> c4 3b 5a 66 66 66 66 90 5b 5d 41 5c c3 65 8b 05 f6 c3
> [    8.701032] RSP: 0018:ffffffffa6603e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> [    8.701032] RAX: ffffffffa5c52d10 RBX: 0000000000000000 RCX: 0000000000000000
> [    8.701032] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000020605092a
> [    8.701032] RBP: 0000000000000000 R08: ffffffffcff7766a R09: ffff89b9fda20b00
> [    8.701032] R10: 0000000000000000 R11: 0000000000002400 R12: ffffffffa6611740
> [    8.701032] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffa6611740
> [    8.701032]  ? __sched_text_end+0x5/0x5
> [    8.701032]  do_idle+0x19c/0x230
> [    8.701032]  cpu_startup_entry+0x14/0x20
> [    8.701032]  start_kernel+0x491/0x4b1
> [    8.701032]  secondary_startup_64+0xa4/0xb0
> [    8.701032] Kernel Offset: 0x24200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> 

That is simply caused by enabling DEBUG_KOBJECT_RELEASE itself, which
tries to do kboject_release() after delaying a while.

Wrt. this issue, q->mq_kobj is embedded in 'request_queue' whose
instance is freed in release handler of q->kobj. So when one instance
of 'request_queue' is freed, DEBUG_KOBJECT_RELEASE still expects that
the q->mq_kobj is active.

The release handler of q->mq_kobj is NOP actually, so in reality it won't
be one issue at all.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kobject lifetime issues in blk-mq
  2018-11-12  9:20 ` Ming Lei
@ 2018-11-12  9:44   ` Ming Lei
  2018-11-12 16:48     ` Guenter Roeck
  2018-11-12 16:16   ` Guenter Roeck
  1 sibling, 1 reply; 18+ messages in thread
From: Ming Lei @ 2018-11-12  9:44 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Ming Lei, Jens Axboe, Peter Zijlstra, linux-block, linux-kernel,
	Hannes Reinecke, Paolo Bonzini, Christoph Hellwig,
	Martin K. Petersen, James E.J. Bottomley, linux-scsi

On Mon, Nov 12, 2018 at 05:20:51PM +0800, Ming Lei wrote:
> On Fri, Nov 09, 2018 at 12:35:18PM -0800, Guenter Roeck wrote:
> > Hi,
> > 
> > Images with CONFIG_DEBUG_KOBJECT=y, CONFIG_DEBUG_KOBJECT_RELEASE=y
> > enabled report kobject lifetime issues when booting an image in qemu
> > using virtio-scsi-pci.
> > 
> > Bisect points to two different commits, depending on the kernel
> > configuration.
> > 
> > 1st configuration: defconfig plus:
> > 
> > CONFIG_VIRTIO_BLK=y
> > CONFIG_SCSI_LOWLEVEL=y
> > CONFIG_SCSI_VIRTIO=y
> > CONFIG_VIRTIO_PCI=y
> > CONFIG_VIRTIO_MMIO=y
> > CONFIG_DEBUG_OBJECTS=y
> > CONFIG_DEBUG_OBJECTS_FREE=y
> > CONFIG_DEBUG_OBJECTS_TIMERS=y
> > CONFIG_DEBUG_KOBJECT=y
> > CONFIG_DEBUG_KOBJECT_RELEASE=y
> > 
> > Bisecting this configuration points to commit b5b6e8c8d3b4 ("scsi:
> > virtio_scsi: fix IO hang caused by automatic irq vector affinity")
> > as the culprit. This commit enforces the use of blk-mq for virtio_scsi;
> > the bisect log is therefore a bit misleading.
> > 
> > 2nd configuration: As above, plus:
> > CONFIG_SCSI_MQ_DEFAULT=y
> > 
> > With this configuration, bisect points to commit 7ea5fe31c12d
> > ("blk-mq: make lifetime consitent between q/ctx and its kobject").
> > 
> > The qemu command line used to reproduce the problem is as follows.
> > The qemu version does not seem to matter (I tested with qemu versions
> > 2.5, 2.11, and 3.0).
> > 
> > qemu-system-x86_64 \
> > 	-kernel arch/x86/boot/bzImage \
> > 	-device virtio-scsi-pci,id=scsi \
> > 	-device scsi-hd,bus=scsi.0,drive=d0 \
> > 	-drive file=./wheezy.img,format=raw,if=none,id=d0 \
> > 	-snapshot \
> > 	-m 2G -smp 4 -enable-kvm \
> > 	-net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic \
> > 	-nographic -monitor none \
> > 	-no-reboot \
> > 	-append "console=ttyS0 root=/dev/sda earlyprintk=serial panic_on_warn=1 panic=-1"
> > 
> > Root file system:
> > 	https://storage.googleapis.com/syzkaller/wheezy.img
> > or:
> > 	https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/rootfs.ext2.gz
> > 
> > It seems that any root file system can be used to reproduce the problem.
> > Also, the problem is not limited to virtio-scsi-pci. I also tried
> > am53c974, lsi53c810, and lsi53c895a (after enabling the respective
> > drivers), with the same result.
> > 
> > Overall, this suggests that the problem is related to blk-mq and was
> > indeed introduced with commit 7ea5fe31c12d.
> > 
> > For reference, I attached an actual crash log as well as a set of bisect
> > logs below.
> > 
> > Unfortunately, my understanding of blk-mq is not good enough to find the
> > underlying problem and suggest a fix. Please let me know if there is
> > anything else I can do to help fixing the problem.
> > 
> > Note that the problem was originally reported by syzbot running a test
> > on chromeos-4.14. It may well be that the problem has already been
> > reported and is being worked on. If so, please apologize the noise.
> > 
> > Thanks,
> > Guenter
> > 
> > ---
> > [    8.700038] ------------[ cut here ]------------
> > [    8.700376] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10
> > [    8.701032] WARNING: CPU: 0 PID: 0 at lib/debugobjects.c:329 debug_print_object+0x6a/0x80
> > [    8.701032] Kernel panic - not syncing: panic_on_warn set ...
> > [    8.701032] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.20.0-rc1 #30
> > [    8.701032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > [    8.701032] Call Trace:
> > [    8.701032]  <IRQ>
> > [    8.701032]  dump_stack+0x46/0x5b
> > [    8.701032]  panic+0xf3/0x246
> > [    8.701032]  ? debug_print_object+0x6a/0x80
> > [    8.701032]  __warn+0xeb/0xf0
> > [    8.701032]  ? debug_print_object+0x6a/0x80
> > [    8.701032]  ? debug_print_object+0x6a/0x80
> > [    8.701032]  report_bug+0xb1/0x120
> > [    8.701032]  fixup_bug.part.10+0x13/0x30
> > [    8.701032]  do_error_trap+0x8f/0xb0
> > [    8.701032]  do_invalid_op+0x31/0x40
> > [    8.701032]  ? debug_print_object+0x6a/0x80
> > [    8.701032]  invalid_op+0x14/0x20
> > [    8.701032] RIP: 0010:debug_print_object+0x6a/0x80
> > [    8.701032] Code: 8b 43 10 83 c2 01 8b 4b 14 89 15 c1 e2 78 01 4c 8b 45 00 4c
> > 89 e6 48 c7 c7 f0 ca 41 a6 48 8b 14 c5 a0 f7 24 a6 e8 06 8e cc ff <0f> 0b 5b 83
> > 05 70 47 1a 01 01 5d 41 5c c3 83 05 65 47 1a 01 01 c3
> > [    8.701032] RSP: 0018:ffff89b9fda03e48 EFLAGS: 00010082
> > [    8.701032] RAX: 0000000000000000 RBX: ffff89b9fd49d0c8 RCX: ffffffffa6646e18
> > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000082 RDI: ffffffffa6cc596c
> > [    8.701032] RBP: ffffffffa664a5c0 R08: 79616c6564203a74 R09: 5f6b726f775f6465
> > [    8.701032] R10: ffff89b9fda03f10 R11: 3178302f3078302b R12: ffffffffa6403e22
> > [    8.701032] R13: 0000000000000001 R14: ffffffffa6d4d458 R15: ffff89b9fc4c8780
> > [    8.701032]  ? debug_print_object+0x6a/0x80
> > [    8.701032]  debug_check_no_obj_freed+0x1b8/0x1ea
> > [    8.701032]  ? rcu_process_callbacks+0x2a7/0x750
> > [    8.701032]  kmem_cache_free+0x6e/0x1a0
> > [    8.701032]  ? __blk_release_queue+0x150/0x150
> > [    8.701032]  rcu_process_callbacks+0x2a7/0x750
> > [    8.701032]  __do_softirq+0xf2/0x2c7
> > [    8.701032]  irq_exit+0xb7/0xc0
> > [    8.701032]  smp_apic_timer_interrupt+0x67/0x130
> > [    8.701032]  apic_timer_interrupt+0xf/0x20
> > [    8.701032]  </IRQ>
> > [    8.701032] RIP: 0010:default_idle+0x12/0x140
> > [    8.701032] Code: fe ff ff c6 43 08 00 fb eb b0 31 c0 eb b4 e8 65 c8 60 ff 90
> > 90 90 90 90 41 54 55 53 65 8b 2d 15 c4 3b 5a 66 66 66 66 90 fb f4 <65> 8b 2d 07
> > c4 3b 5a 66 66 66 66 90 5b 5d 41 5c c3 65 8b 05 f6 c3
> > [    8.701032] RSP: 0018:ffffffffa6603e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > [    8.701032] RAX: ffffffffa5c52d10 RBX: 0000000000000000 RCX: 0000000000000000
> > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000020605092a
> > [    8.701032] RBP: 0000000000000000 R08: ffffffffcff7766a R09: ffff89b9fda20b00
> > [    8.701032] R10: 0000000000000000 R11: 0000000000002400 R12: ffffffffa6611740
> > [    8.701032] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffa6611740
> > [    8.701032]  ? __sched_text_end+0x5/0x5
> > [    8.701032]  do_idle+0x19c/0x230
> > [    8.701032]  cpu_startup_entry+0x14/0x20
> > [    8.701032]  start_kernel+0x491/0x4b1
> > [    8.701032]  secondary_startup_64+0xa4/0xb0
> > [    8.701032] Kernel Offset: 0x24200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > 
> 
> That is simply caused by enabling DEBUG_KOBJECT_RELEASE itself, which
> tries to do kboject_release() after delaying a while.
> 
> Wrt. this issue, q->mq_kobj is embedded in 'request_queue' whose
> instance is freed in release handler of q->kobj. So when one instance
> of 'request_queue' is freed, DEBUG_KOBJECT_RELEASE still expects that
> the q->mq_kobj is active.
> 
> The release handler of q->mq_kobj is NOP actually, so in reality it won't
> be one issue at all.

Given we have removed legacy IO path completely, it should be fine to
remove q->mq_kobj and simply put everything under the kobj of queue,
especially any attributes under mq aren't marked as stable ABI.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kobject lifetime issues in blk-mq
  2018-11-12  9:20 ` Ming Lei
  2018-11-12  9:44   ` Ming Lei
@ 2018-11-12 16:16   ` Guenter Roeck
  1 sibling, 0 replies; 18+ messages in thread
From: Guenter Roeck @ 2018-11-12 16:16 UTC (permalink / raw)
  To: Ming Lei
  Cc: Ming Lei, Jens Axboe, Peter Zijlstra, linux-block, linux-kernel,
	Hannes Reinecke, Paolo Bonzini, Christoph Hellwig,
	Martin K. Petersen, James E.J. Bottomley, linux-scsi

On Mon, Nov 12, 2018 at 05:20:52PM +0800, Ming Lei wrote:
> On Fri, Nov 09, 2018 at 12:35:18PM -0800, Guenter Roeck wrote:
> > Hi,
> > 
> > Images with CONFIG_DEBUG_KOBJECT=y, CONFIG_DEBUG_KOBJECT_RELEASE=y
> > enabled report kobject lifetime issues when booting an image in qemu
> > using virtio-scsi-pci.
> > 
> > Bisect points to two different commits, depending on the kernel
> > configuration.
> > 
> > 1st configuration: defconfig plus:
> > 
> > CONFIG_VIRTIO_BLK=y
> > CONFIG_SCSI_LOWLEVEL=y
> > CONFIG_SCSI_VIRTIO=y
> > CONFIG_VIRTIO_PCI=y
> > CONFIG_VIRTIO_MMIO=y
> > CONFIG_DEBUG_OBJECTS=y
> > CONFIG_DEBUG_OBJECTS_FREE=y
> > CONFIG_DEBUG_OBJECTS_TIMERS=y
> > CONFIG_DEBUG_KOBJECT=y
> > CONFIG_DEBUG_KOBJECT_RELEASE=y
> > 
> > Bisecting this configuration points to commit b5b6e8c8d3b4 ("scsi:
> > virtio_scsi: fix IO hang caused by automatic irq vector affinity")
> > as the culprit. This commit enforces the use of blk-mq for virtio_scsi;
> > the bisect log is therefore a bit misleading.
> > 
> > 2nd configuration: As above, plus:
> > CONFIG_SCSI_MQ_DEFAULT=y
> > 
> > With this configuration, bisect points to commit 7ea5fe31c12d
> > ("blk-mq: make lifetime consitent between q/ctx and its kobject").
> > 
> > The qemu command line used to reproduce the problem is as follows.
> > The qemu version does not seem to matter (I tested with qemu versions
> > 2.5, 2.11, and 3.0).
> > 
> > qemu-system-x86_64 \
> > 	-kernel arch/x86/boot/bzImage \
> > 	-device virtio-scsi-pci,id=scsi \
> > 	-device scsi-hd,bus=scsi.0,drive=d0 \
> > 	-drive file=./wheezy.img,format=raw,if=none,id=d0 \
> > 	-snapshot \
> > 	-m 2G -smp 4 -enable-kvm \
> > 	-net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic \
> > 	-nographic -monitor none \
> > 	-no-reboot \
> > 	-append "console=ttyS0 root=/dev/sda earlyprintk=serial panic_on_warn=1 panic=-1"
> > 
> > Root file system:
> > 	https://storage.googleapis.com/syzkaller/wheezy.img
> > or:
> > 	https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/rootfs.ext2.gz
> > 
> > It seems that any root file system can be used to reproduce the problem.
> > Also, the problem is not limited to virtio-scsi-pci. I also tried
> > am53c974, lsi53c810, and lsi53c895a (after enabling the respective
> > drivers), with the same result.
> > 
> > Overall, this suggests that the problem is related to blk-mq and was
> > indeed introduced with commit 7ea5fe31c12d.
> > 
> > For reference, I attached an actual crash log as well as a set of bisect
> > logs below.
> > 
> > Unfortunately, my understanding of blk-mq is not good enough to find the
> > underlying problem and suggest a fix. Please let me know if there is
> > anything else I can do to help fixing the problem.
> > 
> > Note that the problem was originally reported by syzbot running a test
> > on chromeos-4.14. It may well be that the problem has already been
> > reported and is being worked on. If so, please apologize the noise.
> > 
> > Thanks,
> > Guenter
> > 
> > ---
> > [    8.700038] ------------[ cut here ]------------
> > [    8.700376] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10
> > [    8.701032] WARNING: CPU: 0 PID: 0 at lib/debugobjects.c:329 debug_print_object+0x6a/0x80
> > [    8.701032] Kernel panic - not syncing: panic_on_warn set ...
> > [    8.701032] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.20.0-rc1 #30
> > [    8.701032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > [    8.701032] Call Trace:
> > [    8.701032]  <IRQ>
> > [    8.701032]  dump_stack+0x46/0x5b
> > [    8.701032]  panic+0xf3/0x246
> > [    8.701032]  ? debug_print_object+0x6a/0x80
> > [    8.701032]  __warn+0xeb/0xf0
> > [    8.701032]  ? debug_print_object+0x6a/0x80
> > [    8.701032]  ? debug_print_object+0x6a/0x80
> > [    8.701032]  report_bug+0xb1/0x120
> > [    8.701032]  fixup_bug.part.10+0x13/0x30
> > [    8.701032]  do_error_trap+0x8f/0xb0
> > [    8.701032]  do_invalid_op+0x31/0x40
> > [    8.701032]  ? debug_print_object+0x6a/0x80
> > [    8.701032]  invalid_op+0x14/0x20
> > [    8.701032] RIP: 0010:debug_print_object+0x6a/0x80
> > [    8.701032] Code: 8b 43 10 83 c2 01 8b 4b 14 89 15 c1 e2 78 01 4c 8b 45 00 4c
> > 89 e6 48 c7 c7 f0 ca 41 a6 48 8b 14 c5 a0 f7 24 a6 e8 06 8e cc ff <0f> 0b 5b 83
> > 05 70 47 1a 01 01 5d 41 5c c3 83 05 65 47 1a 01 01 c3
> > [    8.701032] RSP: 0018:ffff89b9fda03e48 EFLAGS: 00010082
> > [    8.701032] RAX: 0000000000000000 RBX: ffff89b9fd49d0c8 RCX: ffffffffa6646e18
> > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000082 RDI: ffffffffa6cc596c
> > [    8.701032] RBP: ffffffffa664a5c0 R08: 79616c6564203a74 R09: 5f6b726f775f6465
> > [    8.701032] R10: ffff89b9fda03f10 R11: 3178302f3078302b R12: ffffffffa6403e22
> > [    8.701032] R13: 0000000000000001 R14: ffffffffa6d4d458 R15: ffff89b9fc4c8780
> > [    8.701032]  ? debug_print_object+0x6a/0x80
> > [    8.701032]  debug_check_no_obj_freed+0x1b8/0x1ea
> > [    8.701032]  ? rcu_process_callbacks+0x2a7/0x750
> > [    8.701032]  kmem_cache_free+0x6e/0x1a0
> > [    8.701032]  ? __blk_release_queue+0x150/0x150
> > [    8.701032]  rcu_process_callbacks+0x2a7/0x750
> > [    8.701032]  __do_softirq+0xf2/0x2c7
> > [    8.701032]  irq_exit+0xb7/0xc0
> > [    8.701032]  smp_apic_timer_interrupt+0x67/0x130
> > [    8.701032]  apic_timer_interrupt+0xf/0x20
> > [    8.701032]  </IRQ>
> > [    8.701032] RIP: 0010:default_idle+0x12/0x140
> > [    8.701032] Code: fe ff ff c6 43 08 00 fb eb b0 31 c0 eb b4 e8 65 c8 60 ff 90
> > 90 90 90 90 41 54 55 53 65 8b 2d 15 c4 3b 5a 66 66 66 66 90 fb f4 <65> 8b 2d 07
> > c4 3b 5a 66 66 66 66 90 5b 5d 41 5c c3 65 8b 05 f6 c3
> > [    8.701032] RSP: 0018:ffffffffa6603e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > [    8.701032] RAX: ffffffffa5c52d10 RBX: 0000000000000000 RCX: 0000000000000000
> > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000020605092a
> > [    8.701032] RBP: 0000000000000000 R08: ffffffffcff7766a R09: ffff89b9fda20b00
> > [    8.701032] R10: 0000000000000000 R11: 0000000000002400 R12: ffffffffa6611740
> > [    8.701032] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffa6611740
> > [    8.701032]  ? __sched_text_end+0x5/0x5
> > [    8.701032]  do_idle+0x19c/0x230
> > [    8.701032]  cpu_startup_entry+0x14/0x20
> > [    8.701032]  start_kernel+0x491/0x4b1
> > [    8.701032]  secondary_startup_64+0xa4/0xb0
> > [    8.701032] Kernel Offset: 0x24200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > 
> 
> That is simply caused by enabling DEBUG_KOBJECT_RELEASE itself, which
> tries to do kboject_release() after delaying a while.
> 
> Wrt. this issue, q->mq_kobj is embedded in 'request_queue' whose
> instance is freed in release handler of q->kobj. So when one instance
> of 'request_queue' is freed, DEBUG_KOBJECT_RELEASE still expects that
> the q->mq_kobj is active.
> 
> The release handler of q->mq_kobj is NOP actually, so in reality it won't
> be one issue at all.
> 

So what is the solution ? Either this is an incorrect use of kobjects,
or DEBUG_KOBJECT_RELEASE is broken and needs to be marked as such
(or fixed, but I would have no idea how to do that).

I'll be happy to send a patch marking DEBUG_KOBJECT_RELEASE as broken,
but I suspect that it may be rejected with prejudice.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kobject lifetime issues in blk-mq
  2018-11-12  9:44   ` Ming Lei
@ 2018-11-12 16:48     ` Guenter Roeck
  2018-11-12 20:02       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 18+ messages in thread
From: Guenter Roeck @ 2018-11-12 16:48 UTC (permalink / raw)
  To: Ming Lei
  Cc: Ming Lei, Jens Axboe, Peter Zijlstra, linux-block, linux-kernel,
	Hannes Reinecke, Paolo Bonzini, Christoph Hellwig,
	Martin K. Petersen, James E.J. Bottomley, linux-scsi,
	Greg Kroah-Hartman

On Mon, Nov 12, 2018 at 05:44:07PM +0800, Ming Lei wrote:
> On Mon, Nov 12, 2018 at 05:20:51PM +0800, Ming Lei wrote:
> > On Fri, Nov 09, 2018 at 12:35:18PM -0800, Guenter Roeck wrote:
> > > Hi,
> > > 
> > > Images with CONFIG_DEBUG_KOBJECT=y, CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > enabled report kobject lifetime issues when booting an image in qemu
> > > using virtio-scsi-pci.
> > > 
> > > Bisect points to two different commits, depending on the kernel
> > > configuration.
> > > 
> > > 1st configuration: defconfig plus:
> > > 
> > > CONFIG_VIRTIO_BLK=y
> > > CONFIG_SCSI_LOWLEVEL=y
> > > CONFIG_SCSI_VIRTIO=y
> > > CONFIG_VIRTIO_PCI=y
> > > CONFIG_VIRTIO_MMIO=y
> > > CONFIG_DEBUG_OBJECTS=y
> > > CONFIG_DEBUG_OBJECTS_FREE=y
> > > CONFIG_DEBUG_OBJECTS_TIMERS=y
> > > CONFIG_DEBUG_KOBJECT=y
> > > CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > 
> > > Bisecting this configuration points to commit b5b6e8c8d3b4 ("scsi:
> > > virtio_scsi: fix IO hang caused by automatic irq vector affinity")
> > > as the culprit. This commit enforces the use of blk-mq for virtio_scsi;
> > > the bisect log is therefore a bit misleading.
> > > 
> > > 2nd configuration: As above, plus:
> > > CONFIG_SCSI_MQ_DEFAULT=y
> > > 
> > > With this configuration, bisect points to commit 7ea5fe31c12d
> > > ("blk-mq: make lifetime consitent between q/ctx and its kobject").
> > > 
> > > The qemu command line used to reproduce the problem is as follows.
> > > The qemu version does not seem to matter (I tested with qemu versions
> > > 2.5, 2.11, and 3.0).
> > > 
> > > qemu-system-x86_64 \
> > > 	-kernel arch/x86/boot/bzImage \
> > > 	-device virtio-scsi-pci,id=scsi \
> > > 	-device scsi-hd,bus=scsi.0,drive=d0 \
> > > 	-drive file=./wheezy.img,format=raw,if=none,id=d0 \
> > > 	-snapshot \
> > > 	-m 2G -smp 4 -enable-kvm \
> > > 	-net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic \
> > > 	-nographic -monitor none \
> > > 	-no-reboot \
> > > 	-append "console=ttyS0 root=/dev/sda earlyprintk=serial panic_on_warn=1 panic=-1"
> > > 
> > > Root file system:
> > > 	https://storage.googleapis.com/syzkaller/wheezy.img
> > > or:
> > > 	https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/rootfs.ext2.gz
> > > 
> > > It seems that any root file system can be used to reproduce the problem.
> > > Also, the problem is not limited to virtio-scsi-pci. I also tried
> > > am53c974, lsi53c810, and lsi53c895a (after enabling the respective
> > > drivers), with the same result.
> > > 
> > > Overall, this suggests that the problem is related to blk-mq and was
> > > indeed introduced with commit 7ea5fe31c12d.
> > > 
> > > For reference, I attached an actual crash log as well as a set of bisect
> > > logs below.
> > > 
> > > Unfortunately, my understanding of blk-mq is not good enough to find the
> > > underlying problem and suggest a fix. Please let me know if there is
> > > anything else I can do to help fixing the problem.
> > > 
> > > Note that the problem was originally reported by syzbot running a test
> > > on chromeos-4.14. It may well be that the problem has already been
> > > reported and is being worked on. If so, please apologize the noise.
> > > 
> > > Thanks,
> > > Guenter
> > > 
> > > ---
> > > [    8.700038] ------------[ cut here ]------------
> > > [    8.700376] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10
> > > [    8.701032] WARNING: CPU: 0 PID: 0 at lib/debugobjects.c:329 debug_print_object+0x6a/0x80
> > > [    8.701032] Kernel panic - not syncing: panic_on_warn set ...
> > > [    8.701032] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.20.0-rc1 #30
> > > [    8.701032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > [    8.701032] Call Trace:
> > > [    8.701032]  <IRQ>
> > > [    8.701032]  dump_stack+0x46/0x5b
> > > [    8.701032]  panic+0xf3/0x246
> > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > [    8.701032]  __warn+0xeb/0xf0
> > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > [    8.701032]  report_bug+0xb1/0x120
> > > [    8.701032]  fixup_bug.part.10+0x13/0x30
> > > [    8.701032]  do_error_trap+0x8f/0xb0
> > > [    8.701032]  do_invalid_op+0x31/0x40
> > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > [    8.701032]  invalid_op+0x14/0x20
> > > [    8.701032] RIP: 0010:debug_print_object+0x6a/0x80
> > > [    8.701032] Code: 8b 43 10 83 c2 01 8b 4b 14 89 15 c1 e2 78 01 4c 8b 45 00 4c
> > > 89 e6 48 c7 c7 f0 ca 41 a6 48 8b 14 c5 a0 f7 24 a6 e8 06 8e cc ff <0f> 0b 5b 83
> > > 05 70 47 1a 01 01 5d 41 5c c3 83 05 65 47 1a 01 01 c3
> > > [    8.701032] RSP: 0018:ffff89b9fda03e48 EFLAGS: 00010082
> > > [    8.701032] RAX: 0000000000000000 RBX: ffff89b9fd49d0c8 RCX: ffffffffa6646e18
> > > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000082 RDI: ffffffffa6cc596c
> > > [    8.701032] RBP: ffffffffa664a5c0 R08: 79616c6564203a74 R09: 5f6b726f775f6465
> > > [    8.701032] R10: ffff89b9fda03f10 R11: 3178302f3078302b R12: ffffffffa6403e22
> > > [    8.701032] R13: 0000000000000001 R14: ffffffffa6d4d458 R15: ffff89b9fc4c8780
> > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > [    8.701032]  debug_check_no_obj_freed+0x1b8/0x1ea
> > > [    8.701032]  ? rcu_process_callbacks+0x2a7/0x750
> > > [    8.701032]  kmem_cache_free+0x6e/0x1a0
> > > [    8.701032]  ? __blk_release_queue+0x150/0x150
> > > [    8.701032]  rcu_process_callbacks+0x2a7/0x750
> > > [    8.701032]  __do_softirq+0xf2/0x2c7
> > > [    8.701032]  irq_exit+0xb7/0xc0
> > > [    8.701032]  smp_apic_timer_interrupt+0x67/0x130
> > > [    8.701032]  apic_timer_interrupt+0xf/0x20
> > > [    8.701032]  </IRQ>
> > > [    8.701032] RIP: 0010:default_idle+0x12/0x140
> > > [    8.701032] Code: fe ff ff c6 43 08 00 fb eb b0 31 c0 eb b4 e8 65 c8 60 ff 90
> > > 90 90 90 90 41 54 55 53 65 8b 2d 15 c4 3b 5a 66 66 66 66 90 fb f4 <65> 8b 2d 07
> > > c4 3b 5a 66 66 66 66 90 5b 5d 41 5c c3 65 8b 05 f6 c3
> > > [    8.701032] RSP: 0018:ffffffffa6603e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > > [    8.701032] RAX: ffffffffa5c52d10 RBX: 0000000000000000 RCX: 0000000000000000
> > > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000020605092a
> > > [    8.701032] RBP: 0000000000000000 R08: ffffffffcff7766a R09: ffff89b9fda20b00
> > > [    8.701032] R10: 0000000000000000 R11: 0000000000002400 R12: ffffffffa6611740
> > > [    8.701032] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffa6611740
> > > [    8.701032]  ? __sched_text_end+0x5/0x5
> > > [    8.701032]  do_idle+0x19c/0x230
> > > [    8.701032]  cpu_startup_entry+0x14/0x20
> > > [    8.701032]  start_kernel+0x491/0x4b1
> > > [    8.701032]  secondary_startup_64+0xa4/0xb0
> > > [    8.701032] Kernel Offset: 0x24200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > 
> > 
> > That is simply caused by enabling DEBUG_KOBJECT_RELEASE itself, which
> > tries to do kboject_release() after delaying a while.
> > 
> > Wrt. this issue, q->mq_kobj is embedded in 'request_queue' whose
> > instance is freed in release handler of q->kobj. So when one instance
> > of 'request_queue' is freed, DEBUG_KOBJECT_RELEASE still expects that
> > the q->mq_kobj is active.
> > 
> > The release handler of q->mq_kobj is NOP actually, so in reality it won't
> > be one issue at all.
> 
> Given we have removed legacy IO path completely, it should be fine to
> remove q->mq_kobj and simply put everything under the kobj of queue,
> especially any attributes under mq aren't marked as stable ABI.

That might be possible going forward, but that does not appear to be
a feasible solution for stable releases. I am not sure though if the
"stable ABI" argument is valid. Almost all sysfs attributes are not
marked stable. If there are applications using it, removing the atttributes
would still break userspace.

Since the release function of mq_kobj is NOP (for reference, I assume
we are talking about blk_mq_sysfs_release), can it just be NULL ?
That it is an empty function seems to be just as good or bad as having
no function at all. An empty function just avoids the "broken" debug
message. That message is still better than rendering DEBUG_KOBJECT_RELEASE
useless.

Copying GregKH for input.

Guenter

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kobject lifetime issues in blk-mq
  2018-11-12 16:48     ` Guenter Roeck
@ 2018-11-12 20:02       ` Greg Kroah-Hartman
  2018-11-13  0:22         ` Ming Lei
  0 siblings, 1 reply; 18+ messages in thread
From: Greg Kroah-Hartman @ 2018-11-12 20:02 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Ming Lei, Ming Lei, Jens Axboe, Peter Zijlstra, linux-block,
	linux-kernel, Hannes Reinecke, Paolo Bonzini, Christoph Hellwig,
	Martin K. Petersen, James E.J. Bottomley, linux-scsi

On Mon, Nov 12, 2018 at 08:48:48AM -0800, Guenter Roeck wrote:
> On Mon, Nov 12, 2018 at 05:44:07PM +0800, Ming Lei wrote:
> > On Mon, Nov 12, 2018 at 05:20:51PM +0800, Ming Lei wrote:
> > > On Fri, Nov 09, 2018 at 12:35:18PM -0800, Guenter Roeck wrote:
> > > > Hi,
> > > > 
> > > > Images with CONFIG_DEBUG_KOBJECT=y, CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > > enabled report kobject lifetime issues when booting an image in qemu
> > > > using virtio-scsi-pci.
> > > > 
> > > > Bisect points to two different commits, depending on the kernel
> > > > configuration.
> > > > 
> > > > 1st configuration: defconfig plus:
> > > > 
> > > > CONFIG_VIRTIO_BLK=y
> > > > CONFIG_SCSI_LOWLEVEL=y
> > > > CONFIG_SCSI_VIRTIO=y
> > > > CONFIG_VIRTIO_PCI=y
> > > > CONFIG_VIRTIO_MMIO=y
> > > > CONFIG_DEBUG_OBJECTS=y
> > > > CONFIG_DEBUG_OBJECTS_FREE=y
> > > > CONFIG_DEBUG_OBJECTS_TIMERS=y
> > > > CONFIG_DEBUG_KOBJECT=y
> > > > CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > > 
> > > > Bisecting this configuration points to commit b5b6e8c8d3b4 ("scsi:
> > > > virtio_scsi: fix IO hang caused by automatic irq vector affinity")
> > > > as the culprit. This commit enforces the use of blk-mq for virtio_scsi;
> > > > the bisect log is therefore a bit misleading.
> > > > 
> > > > 2nd configuration: As above, plus:
> > > > CONFIG_SCSI_MQ_DEFAULT=y
> > > > 
> > > > With this configuration, bisect points to commit 7ea5fe31c12d
> > > > ("blk-mq: make lifetime consitent between q/ctx and its kobject").
> > > > 
> > > > The qemu command line used to reproduce the problem is as follows.
> > > > The qemu version does not seem to matter (I tested with qemu versions
> > > > 2.5, 2.11, and 3.0).
> > > > 
> > > > qemu-system-x86_64 \
> > > > 	-kernel arch/x86/boot/bzImage \
> > > > 	-device virtio-scsi-pci,id=scsi \
> > > > 	-device scsi-hd,bus=scsi.0,drive=d0 \
> > > > 	-drive file=./wheezy.img,format=raw,if=none,id=d0 \
> > > > 	-snapshot \
> > > > 	-m 2G -smp 4 -enable-kvm \
> > > > 	-net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic \
> > > > 	-nographic -monitor none \
> > > > 	-no-reboot \
> > > > 	-append "console=ttyS0 root=/dev/sda earlyprintk=serial panic_on_warn=1 panic=-1"
> > > > 
> > > > Root file system:
> > > > 	https://storage.googleapis.com/syzkaller/wheezy.img
> > > > or:
> > > > 	https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/rootfs.ext2.gz
> > > > 
> > > > It seems that any root file system can be used to reproduce the problem.
> > > > Also, the problem is not limited to virtio-scsi-pci. I also tried
> > > > am53c974, lsi53c810, and lsi53c895a (after enabling the respective
> > > > drivers), with the same result.
> > > > 
> > > > Overall, this suggests that the problem is related to blk-mq and was
> > > > indeed introduced with commit 7ea5fe31c12d.
> > > > 
> > > > For reference, I attached an actual crash log as well as a set of bisect
> > > > logs below.
> > > > 
> > > > Unfortunately, my understanding of blk-mq is not good enough to find the
> > > > underlying problem and suggest a fix. Please let me know if there is
> > > > anything else I can do to help fixing the problem.
> > > > 
> > > > Note that the problem was originally reported by syzbot running a test
> > > > on chromeos-4.14. It may well be that the problem has already been
> > > > reported and is being worked on. If so, please apologize the noise.
> > > > 
> > > > Thanks,
> > > > Guenter
> > > > 
> > > > ---
> > > > [    8.700038] ------------[ cut here ]------------
> > > > [    8.700376] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10
> > > > [    8.701032] WARNING: CPU: 0 PID: 0 at lib/debugobjects.c:329 debug_print_object+0x6a/0x80
> > > > [    8.701032] Kernel panic - not syncing: panic_on_warn set ...
> > > > [    8.701032] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.20.0-rc1 #30
> > > > [    8.701032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > > [    8.701032] Call Trace:
> > > > [    8.701032]  <IRQ>
> > > > [    8.701032]  dump_stack+0x46/0x5b
> > > > [    8.701032]  panic+0xf3/0x246
> > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > [    8.701032]  __warn+0xeb/0xf0
> > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > [    8.701032]  report_bug+0xb1/0x120
> > > > [    8.701032]  fixup_bug.part.10+0x13/0x30
> > > > [    8.701032]  do_error_trap+0x8f/0xb0
> > > > [    8.701032]  do_invalid_op+0x31/0x40
> > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > [    8.701032]  invalid_op+0x14/0x20
> > > > [    8.701032] RIP: 0010:debug_print_object+0x6a/0x80
> > > > [    8.701032] Code: 8b 43 10 83 c2 01 8b 4b 14 89 15 c1 e2 78 01 4c 8b 45 00 4c
> > > > 89 e6 48 c7 c7 f0 ca 41 a6 48 8b 14 c5 a0 f7 24 a6 e8 06 8e cc ff <0f> 0b 5b 83
> > > > 05 70 47 1a 01 01 5d 41 5c c3 83 05 65 47 1a 01 01 c3
> > > > [    8.701032] RSP: 0018:ffff89b9fda03e48 EFLAGS: 00010082
> > > > [    8.701032] RAX: 0000000000000000 RBX: ffff89b9fd49d0c8 RCX: ffffffffa6646e18
> > > > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000082 RDI: ffffffffa6cc596c
> > > > [    8.701032] RBP: ffffffffa664a5c0 R08: 79616c6564203a74 R09: 5f6b726f775f6465
> > > > [    8.701032] R10: ffff89b9fda03f10 R11: 3178302f3078302b R12: ffffffffa6403e22
> > > > [    8.701032] R13: 0000000000000001 R14: ffffffffa6d4d458 R15: ffff89b9fc4c8780
> > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > [    8.701032]  debug_check_no_obj_freed+0x1b8/0x1ea
> > > > [    8.701032]  ? rcu_process_callbacks+0x2a7/0x750
> > > > [    8.701032]  kmem_cache_free+0x6e/0x1a0
> > > > [    8.701032]  ? __blk_release_queue+0x150/0x150
> > > > [    8.701032]  rcu_process_callbacks+0x2a7/0x750
> > > > [    8.701032]  __do_softirq+0xf2/0x2c7
> > > > [    8.701032]  irq_exit+0xb7/0xc0
> > > > [    8.701032]  smp_apic_timer_interrupt+0x67/0x130
> > > > [    8.701032]  apic_timer_interrupt+0xf/0x20
> > > > [    8.701032]  </IRQ>
> > > > [    8.701032] RIP: 0010:default_idle+0x12/0x140
> > > > [    8.701032] Code: fe ff ff c6 43 08 00 fb eb b0 31 c0 eb b4 e8 65 c8 60 ff 90
> > > > 90 90 90 90 41 54 55 53 65 8b 2d 15 c4 3b 5a 66 66 66 66 90 fb f4 <65> 8b 2d 07
> > > > c4 3b 5a 66 66 66 66 90 5b 5d 41 5c c3 65 8b 05 f6 c3
> > > > [    8.701032] RSP: 0018:ffffffffa6603e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > > > [    8.701032] RAX: ffffffffa5c52d10 RBX: 0000000000000000 RCX: 0000000000000000
> > > > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000020605092a
> > > > [    8.701032] RBP: 0000000000000000 R08: ffffffffcff7766a R09: ffff89b9fda20b00
> > > > [    8.701032] R10: 0000000000000000 R11: 0000000000002400 R12: ffffffffa6611740
> > > > [    8.701032] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffa6611740
> > > > [    8.701032]  ? __sched_text_end+0x5/0x5
> > > > [    8.701032]  do_idle+0x19c/0x230
> > > > [    8.701032]  cpu_startup_entry+0x14/0x20
> > > > [    8.701032]  start_kernel+0x491/0x4b1
> > > > [    8.701032]  secondary_startup_64+0xa4/0xb0
> > > > [    8.701032] Kernel Offset: 0x24200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > > 
> > > 
> > > That is simply caused by enabling DEBUG_KOBJECT_RELEASE itself, which
> > > tries to do kboject_release() after delaying a while.
> > > 
> > > Wrt. this issue, q->mq_kobj is embedded in 'request_queue' whose
> > > instance is freed in release handler of q->kobj. So when one instance
> > > of 'request_queue' is freed, DEBUG_KOBJECT_RELEASE still expects that
> > > the q->mq_kobj is active.
> > > 
> > > The release handler of q->mq_kobj is NOP actually, so in reality it won't
> > > be one issue at all.
> > 
> > Given we have removed legacy IO path completely, it should be fine to
> > remove q->mq_kobj and simply put everything under the kobj of queue,
> > especially any attributes under mq aren't marked as stable ABI.
> 
> That might be possible going forward, but that does not appear to be
> a feasible solution for stable releases. I am not sure though if the
> "stable ABI" argument is valid. Almost all sysfs attributes are not
> marked stable. If there are applications using it, removing the atttributes
> would still break userspace.

Yeah, don't move attributes around for no good reason, you will make
userspace grumpy :)

> Since the release function of mq_kobj is NOP (for reference, I assume
> we are talking about blk_mq_sysfs_release), can it just be NULL ?

Someone tried to work around the internal kernel warning that comes
about when you set a release function to NULL.  As per the in-kernel
documentation, I now get to make fun of that author by saying "do you
think I added those checks and message just for no good reason?  Don't
try to be smarter than the driver core by providing an empty release
function to make the kernel be quiet.  That's NOT what the kernel is
trying to tell you here at all."

So fix this, that is the correct thing to do.  Memory is freed in a
release function, to not do so is almost always a sign of a design bug.

> That it is an empty function seems to be just as good or bad as having
> no function at all. An empty function just avoids the "broken" debug
> message. That message is still better than rendering DEBUG_KOBJECT_RELEASE
> useless.
> 
> Copying GregKH for input.

Thanks, this code is broken and needs to be fixed.

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kobject lifetime issues in blk-mq
  2018-11-12 20:02       ` Greg Kroah-Hartman
@ 2018-11-13  0:22         ` Ming Lei
  2018-11-13  0:41           ` Ming Lei
  0 siblings, 1 reply; 18+ messages in thread
From: Ming Lei @ 2018-11-13  0:22 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Guenter Roeck, Ming Lei, Jens Axboe, Peter Zijlstra, linux-block,
	linux-kernel, Hannes Reinecke, Paolo Bonzini, Christoph Hellwig,
	Martin K. Petersen, James E.J. Bottomley, linux-scsi

On Mon, Nov 12, 2018 at 12:02:36PM -0800, Greg Kroah-Hartman wrote:
> On Mon, Nov 12, 2018 at 08:48:48AM -0800, Guenter Roeck wrote:
> > On Mon, Nov 12, 2018 at 05:44:07PM +0800, Ming Lei wrote:
> > > On Mon, Nov 12, 2018 at 05:20:51PM +0800, Ming Lei wrote:
> > > > On Fri, Nov 09, 2018 at 12:35:18PM -0800, Guenter Roeck wrote:
> > > > > Hi,
> > > > > 
> > > > > Images with CONFIG_DEBUG_KOBJECT=y, CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > > > enabled report kobject lifetime issues when booting an image in qemu
> > > > > using virtio-scsi-pci.
> > > > > 
> > > > > Bisect points to two different commits, depending on the kernel
> > > > > configuration.
> > > > > 
> > > > > 1st configuration: defconfig plus:
> > > > > 
> > > > > CONFIG_VIRTIO_BLK=y
> > > > > CONFIG_SCSI_LOWLEVEL=y
> > > > > CONFIG_SCSI_VIRTIO=y
> > > > > CONFIG_VIRTIO_PCI=y
> > > > > CONFIG_VIRTIO_MMIO=y
> > > > > CONFIG_DEBUG_OBJECTS=y
> > > > > CONFIG_DEBUG_OBJECTS_FREE=y
> > > > > CONFIG_DEBUG_OBJECTS_TIMERS=y
> > > > > CONFIG_DEBUG_KOBJECT=y
> > > > > CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > > > 
> > > > > Bisecting this configuration points to commit b5b6e8c8d3b4 ("scsi:
> > > > > virtio_scsi: fix IO hang caused by automatic irq vector affinity")
> > > > > as the culprit. This commit enforces the use of blk-mq for virtio_scsi;
> > > > > the bisect log is therefore a bit misleading.
> > > > > 
> > > > > 2nd configuration: As above, plus:
> > > > > CONFIG_SCSI_MQ_DEFAULT=y
> > > > > 
> > > > > With this configuration, bisect points to commit 7ea5fe31c12d
> > > > > ("blk-mq: make lifetime consitent between q/ctx and its kobject").
> > > > > 
> > > > > The qemu command line used to reproduce the problem is as follows.
> > > > > The qemu version does not seem to matter (I tested with qemu versions
> > > > > 2.5, 2.11, and 3.0).
> > > > > 
> > > > > qemu-system-x86_64 \
> > > > > 	-kernel arch/x86/boot/bzImage \
> > > > > 	-device virtio-scsi-pci,id=scsi \
> > > > > 	-device scsi-hd,bus=scsi.0,drive=d0 \
> > > > > 	-drive file=./wheezy.img,format=raw,if=none,id=d0 \
> > > > > 	-snapshot \
> > > > > 	-m 2G -smp 4 -enable-kvm \
> > > > > 	-net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic \
> > > > > 	-nographic -monitor none \
> > > > > 	-no-reboot \
> > > > > 	-append "console=ttyS0 root=/dev/sda earlyprintk=serial panic_on_warn=1 panic=-1"
> > > > > 
> > > > > Root file system:
> > > > > 	https://storage.googleapis.com/syzkaller/wheezy.img
> > > > > or:
> > > > > 	https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/rootfs.ext2.gz
> > > > > 
> > > > > It seems that any root file system can be used to reproduce the problem.
> > > > > Also, the problem is not limited to virtio-scsi-pci. I also tried
> > > > > am53c974, lsi53c810, and lsi53c895a (after enabling the respective
> > > > > drivers), with the same result.
> > > > > 
> > > > > Overall, this suggests that the problem is related to blk-mq and was
> > > > > indeed introduced with commit 7ea5fe31c12d.
> > > > > 
> > > > > For reference, I attached an actual crash log as well as a set of bisect
> > > > > logs below.
> > > > > 
> > > > > Unfortunately, my understanding of blk-mq is not good enough to find the
> > > > > underlying problem and suggest a fix. Please let me know if there is
> > > > > anything else I can do to help fixing the problem.
> > > > > 
> > > > > Note that the problem was originally reported by syzbot running a test
> > > > > on chromeos-4.14. It may well be that the problem has already been
> > > > > reported and is being worked on. If so, please apologize the noise.
> > > > > 
> > > > > Thanks,
> > > > > Guenter
> > > > > 
> > > > > ---
> > > > > [    8.700038] ------------[ cut here ]------------
> > > > > [    8.700376] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10
> > > > > [    8.701032] WARNING: CPU: 0 PID: 0 at lib/debugobjects.c:329 debug_print_object+0x6a/0x80
> > > > > [    8.701032] Kernel panic - not syncing: panic_on_warn set ...
> > > > > [    8.701032] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.20.0-rc1 #30
> > > > > [    8.701032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > > > [    8.701032] Call Trace:
> > > > > [    8.701032]  <IRQ>
> > > > > [    8.701032]  dump_stack+0x46/0x5b
> > > > > [    8.701032]  panic+0xf3/0x246
> > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > [    8.701032]  __warn+0xeb/0xf0
> > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > [    8.701032]  report_bug+0xb1/0x120
> > > > > [    8.701032]  fixup_bug.part.10+0x13/0x30
> > > > > [    8.701032]  do_error_trap+0x8f/0xb0
> > > > > [    8.701032]  do_invalid_op+0x31/0x40
> > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > [    8.701032]  invalid_op+0x14/0x20
> > > > > [    8.701032] RIP: 0010:debug_print_object+0x6a/0x80
> > > > > [    8.701032] Code: 8b 43 10 83 c2 01 8b 4b 14 89 15 c1 e2 78 01 4c 8b 45 00 4c
> > > > > 89 e6 48 c7 c7 f0 ca 41 a6 48 8b 14 c5 a0 f7 24 a6 e8 06 8e cc ff <0f> 0b 5b 83
> > > > > 05 70 47 1a 01 01 5d 41 5c c3 83 05 65 47 1a 01 01 c3
> > > > > [    8.701032] RSP: 0018:ffff89b9fda03e48 EFLAGS: 00010082
> > > > > [    8.701032] RAX: 0000000000000000 RBX: ffff89b9fd49d0c8 RCX: ffffffffa6646e18
> > > > > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000082 RDI: ffffffffa6cc596c
> > > > > [    8.701032] RBP: ffffffffa664a5c0 R08: 79616c6564203a74 R09: 5f6b726f775f6465
> > > > > [    8.701032] R10: ffff89b9fda03f10 R11: 3178302f3078302b R12: ffffffffa6403e22
> > > > > [    8.701032] R13: 0000000000000001 R14: ffffffffa6d4d458 R15: ffff89b9fc4c8780
> > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > [    8.701032]  debug_check_no_obj_freed+0x1b8/0x1ea
> > > > > [    8.701032]  ? rcu_process_callbacks+0x2a7/0x750
> > > > > [    8.701032]  kmem_cache_free+0x6e/0x1a0
> > > > > [    8.701032]  ? __blk_release_queue+0x150/0x150
> > > > > [    8.701032]  rcu_process_callbacks+0x2a7/0x750
> > > > > [    8.701032]  __do_softirq+0xf2/0x2c7
> > > > > [    8.701032]  irq_exit+0xb7/0xc0
> > > > > [    8.701032]  smp_apic_timer_interrupt+0x67/0x130
> > > > > [    8.701032]  apic_timer_interrupt+0xf/0x20
> > > > > [    8.701032]  </IRQ>
> > > > > [    8.701032] RIP: 0010:default_idle+0x12/0x140
> > > > > [    8.701032] Code: fe ff ff c6 43 08 00 fb eb b0 31 c0 eb b4 e8 65 c8 60 ff 90
> > > > > 90 90 90 90 41 54 55 53 65 8b 2d 15 c4 3b 5a 66 66 66 66 90 fb f4 <65> 8b 2d 07
> > > > > c4 3b 5a 66 66 66 66 90 5b 5d 41 5c c3 65 8b 05 f6 c3
> > > > > [    8.701032] RSP: 0018:ffffffffa6603e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > > > > [    8.701032] RAX: ffffffffa5c52d10 RBX: 0000000000000000 RCX: 0000000000000000
> > > > > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000020605092a
> > > > > [    8.701032] RBP: 0000000000000000 R08: ffffffffcff7766a R09: ffff89b9fda20b00
> > > > > [    8.701032] R10: 0000000000000000 R11: 0000000000002400 R12: ffffffffa6611740
> > > > > [    8.701032] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffa6611740
> > > > > [    8.701032]  ? __sched_text_end+0x5/0x5
> > > > > [    8.701032]  do_idle+0x19c/0x230
> > > > > [    8.701032]  cpu_startup_entry+0x14/0x20
> > > > > [    8.701032]  start_kernel+0x491/0x4b1
> > > > > [    8.701032]  secondary_startup_64+0xa4/0xb0
> > > > > [    8.701032] Kernel Offset: 0x24200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > > > 
> > > > 
> > > > That is simply caused by enabling DEBUG_KOBJECT_RELEASE itself, which
> > > > tries to do kboject_release() after delaying a while.
> > > > 
> > > > Wrt. this issue, q->mq_kobj is embedded in 'request_queue' whose
> > > > instance is freed in release handler of q->kobj. So when one instance
> > > > of 'request_queue' is freed, DEBUG_KOBJECT_RELEASE still expects that
> > > > the q->mq_kobj is active.
> > > > 
> > > > The release handler of q->mq_kobj is NOP actually, so in reality it won't
> > > > be one issue at all.
> > > 
> > > Given we have removed legacy IO path completely, it should be fine to
> > > remove q->mq_kobj and simply put everything under the kobj of queue,
> > > especially any attributes under mq aren't marked as stable ABI.
> > 
> > That might be possible going forward, but that does not appear to be
> > a feasible solution for stable releases. I am not sure though if the
> > "stable ABI" argument is valid. Almost all sysfs attributes are not
> > marked stable. If there are applications using it, removing the atttributes
> > would still break userspace.
> 
> Yeah, don't move attributes around for no good reason, you will make
> userspace grumpy :)

There are very less attributes left under /sys/block/$DEV/mq.

> 
> > Since the release function of mq_kobj is NOP (for reference, I assume
> > we are talking about blk_mq_sysfs_release), can it just be NULL ?
> 
> Someone tried to work around the internal kernel warning that comes
> about when you set a release function to NULL.  As per the in-kernel
> documentation, I now get to make fun of that author by saying "do you
> think I added those checks and message just for no good reason?  Don't
> try to be smarter than the driver core by providing an empty release
> function to make the kernel be quiet.  That's NOT what the kernel is
> trying to tell you here at all."
> 
> So fix this, that is the correct thing to do.  Memory is freed in a
> release function, to not do so is almost always a sign of a design bug.

The root cause is that both q->kobj and q->mq_kobj are bound to lifetime
of same 'request_queue' instance.

We may allocate q->mq_kobj dynamically to fix this issue.

And now all drivers are converted to mq, and it is really necessary to keep
'q->mq_kobj'

> 
> > That it is an empty function seems to be just as good or bad as having
> > no function at all. An empty function just avoids the "broken" debug
> > message. That message is still better than rendering DEBUG_KOBJECT_RELEASE
> > useless.
> > 
> > Copying GregKH for input.
> 
> Thanks, this code is broken and needs to be fixed.

I will post a patch to address the 'issue'.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kobject lifetime issues in blk-mq
  2018-11-13  0:22         ` Ming Lei
@ 2018-11-13  0:41           ` Ming Lei
  2018-11-14 11:08             ` Ming Lei
  2018-11-14 15:12             ` Greg Kroah-Hartman
  0 siblings, 2 replies; 18+ messages in thread
From: Ming Lei @ 2018-11-13  0:41 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Guenter Roeck, Ming Lei, Jens Axboe, Peter Zijlstra, linux-block,
	linux-kernel, Hannes Reinecke, Paolo Bonzini, Christoph Hellwig,
	Martin K. Petersen, James E.J. Bottomley, linux-scsi

On Tue, Nov 13, 2018 at 08:22:26AM +0800, Ming Lei wrote:
> On Mon, Nov 12, 2018 at 12:02:36PM -0800, Greg Kroah-Hartman wrote:
> > On Mon, Nov 12, 2018 at 08:48:48AM -0800, Guenter Roeck wrote:
> > > On Mon, Nov 12, 2018 at 05:44:07PM +0800, Ming Lei wrote:
> > > > On Mon, Nov 12, 2018 at 05:20:51PM +0800, Ming Lei wrote:
> > > > > On Fri, Nov 09, 2018 at 12:35:18PM -0800, Guenter Roeck wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > Images with CONFIG_DEBUG_KOBJECT=y, CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > > > > enabled report kobject lifetime issues when booting an image in qemu
> > > > > > using virtio-scsi-pci.
> > > > > > 
> > > > > > Bisect points to two different commits, depending on the kernel
> > > > > > configuration.
> > > > > > 
> > > > > > 1st configuration: defconfig plus:
> > > > > > 
> > > > > > CONFIG_VIRTIO_BLK=y
> > > > > > CONFIG_SCSI_LOWLEVEL=y
> > > > > > CONFIG_SCSI_VIRTIO=y
> > > > > > CONFIG_VIRTIO_PCI=y
> > > > > > CONFIG_VIRTIO_MMIO=y
> > > > > > CONFIG_DEBUG_OBJECTS=y
> > > > > > CONFIG_DEBUG_OBJECTS_FREE=y
> > > > > > CONFIG_DEBUG_OBJECTS_TIMERS=y
> > > > > > CONFIG_DEBUG_KOBJECT=y
> > > > > > CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > > > > 
> > > > > > Bisecting this configuration points to commit b5b6e8c8d3b4 ("scsi:
> > > > > > virtio_scsi: fix IO hang caused by automatic irq vector affinity")
> > > > > > as the culprit. This commit enforces the use of blk-mq for virtio_scsi;
> > > > > > the bisect log is therefore a bit misleading.
> > > > > > 
> > > > > > 2nd configuration: As above, plus:
> > > > > > CONFIG_SCSI_MQ_DEFAULT=y
> > > > > > 
> > > > > > With this configuration, bisect points to commit 7ea5fe31c12d
> > > > > > ("blk-mq: make lifetime consitent between q/ctx and its kobject").
> > > > > > 
> > > > > > The qemu command line used to reproduce the problem is as follows.
> > > > > > The qemu version does not seem to matter (I tested with qemu versions
> > > > > > 2.5, 2.11, and 3.0).
> > > > > > 
> > > > > > qemu-system-x86_64 \
> > > > > > 	-kernel arch/x86/boot/bzImage \
> > > > > > 	-device virtio-scsi-pci,id=scsi \
> > > > > > 	-device scsi-hd,bus=scsi.0,drive=d0 \
> > > > > > 	-drive file=./wheezy.img,format=raw,if=none,id=d0 \
> > > > > > 	-snapshot \
> > > > > > 	-m 2G -smp 4 -enable-kvm \
> > > > > > 	-net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic \
> > > > > > 	-nographic -monitor none \
> > > > > > 	-no-reboot \
> > > > > > 	-append "console=ttyS0 root=/dev/sda earlyprintk=serial panic_on_warn=1 panic=-1"
> > > > > > 
> > > > > > Root file system:
> > > > > > 	https://storage.googleapis.com/syzkaller/wheezy.img
> > > > > > or:
> > > > > > 	https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/rootfs.ext2.gz
> > > > > > 
> > > > > > It seems that any root file system can be used to reproduce the problem.
> > > > > > Also, the problem is not limited to virtio-scsi-pci. I also tried
> > > > > > am53c974, lsi53c810, and lsi53c895a (after enabling the respective
> > > > > > drivers), with the same result.
> > > > > > 
> > > > > > Overall, this suggests that the problem is related to blk-mq and was
> > > > > > indeed introduced with commit 7ea5fe31c12d.
> > > > > > 
> > > > > > For reference, I attached an actual crash log as well as a set of bisect
> > > > > > logs below.
> > > > > > 
> > > > > > Unfortunately, my understanding of blk-mq is not good enough to find the
> > > > > > underlying problem and suggest a fix. Please let me know if there is
> > > > > > anything else I can do to help fixing the problem.
> > > > > > 
> > > > > > Note that the problem was originally reported by syzbot running a test
> > > > > > on chromeos-4.14. It may well be that the problem has already been
> > > > > > reported and is being worked on. If so, please apologize the noise.
> > > > > > 
> > > > > > Thanks,
> > > > > > Guenter
> > > > > > 
> > > > > > ---
> > > > > > [    8.700038] ------------[ cut here ]------------
> > > > > > [    8.700376] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10
> > > > > > [    8.701032] WARNING: CPU: 0 PID: 0 at lib/debugobjects.c:329 debug_print_object+0x6a/0x80
> > > > > > [    8.701032] Kernel panic - not syncing: panic_on_warn set ...
> > > > > > [    8.701032] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.20.0-rc1 #30
> > > > > > [    8.701032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > > > > [    8.701032] Call Trace:
> > > > > > [    8.701032]  <IRQ>
> > > > > > [    8.701032]  dump_stack+0x46/0x5b
> > > > > > [    8.701032]  panic+0xf3/0x246
> > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > [    8.701032]  __warn+0xeb/0xf0
> > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > [    8.701032]  report_bug+0xb1/0x120
> > > > > > [    8.701032]  fixup_bug.part.10+0x13/0x30
> > > > > > [    8.701032]  do_error_trap+0x8f/0xb0
> > > > > > [    8.701032]  do_invalid_op+0x31/0x40
> > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > [    8.701032]  invalid_op+0x14/0x20
> > > > > > [    8.701032] RIP: 0010:debug_print_object+0x6a/0x80
> > > > > > [    8.701032] Code: 8b 43 10 83 c2 01 8b 4b 14 89 15 c1 e2 78 01 4c 8b 45 00 4c
> > > > > > 89 e6 48 c7 c7 f0 ca 41 a6 48 8b 14 c5 a0 f7 24 a6 e8 06 8e cc ff <0f> 0b 5b 83
> > > > > > 05 70 47 1a 01 01 5d 41 5c c3 83 05 65 47 1a 01 01 c3
> > > > > > [    8.701032] RSP: 0018:ffff89b9fda03e48 EFLAGS: 00010082
> > > > > > [    8.701032] RAX: 0000000000000000 RBX: ffff89b9fd49d0c8 RCX: ffffffffa6646e18
> > > > > > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000082 RDI: ffffffffa6cc596c
> > > > > > [    8.701032] RBP: ffffffffa664a5c0 R08: 79616c6564203a74 R09: 5f6b726f775f6465
> > > > > > [    8.701032] R10: ffff89b9fda03f10 R11: 3178302f3078302b R12: ffffffffa6403e22
> > > > > > [    8.701032] R13: 0000000000000001 R14: ffffffffa6d4d458 R15: ffff89b9fc4c8780
> > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > [    8.701032]  debug_check_no_obj_freed+0x1b8/0x1ea
> > > > > > [    8.701032]  ? rcu_process_callbacks+0x2a7/0x750
> > > > > > [    8.701032]  kmem_cache_free+0x6e/0x1a0
> > > > > > [    8.701032]  ? __blk_release_queue+0x150/0x150
> > > > > > [    8.701032]  rcu_process_callbacks+0x2a7/0x750
> > > > > > [    8.701032]  __do_softirq+0xf2/0x2c7
> > > > > > [    8.701032]  irq_exit+0xb7/0xc0
> > > > > > [    8.701032]  smp_apic_timer_interrupt+0x67/0x130
> > > > > > [    8.701032]  apic_timer_interrupt+0xf/0x20
> > > > > > [    8.701032]  </IRQ>
> > > > > > [    8.701032] RIP: 0010:default_idle+0x12/0x140
> > > > > > [    8.701032] Code: fe ff ff c6 43 08 00 fb eb b0 31 c0 eb b4 e8 65 c8 60 ff 90
> > > > > > 90 90 90 90 41 54 55 53 65 8b 2d 15 c4 3b 5a 66 66 66 66 90 fb f4 <65> 8b 2d 07
> > > > > > c4 3b 5a 66 66 66 66 90 5b 5d 41 5c c3 65 8b 05 f6 c3
> > > > > > [    8.701032] RSP: 0018:ffffffffa6603e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > > > > > [    8.701032] RAX: ffffffffa5c52d10 RBX: 0000000000000000 RCX: 0000000000000000
> > > > > > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000020605092a
> > > > > > [    8.701032] RBP: 0000000000000000 R08: ffffffffcff7766a R09: ffff89b9fda20b00
> > > > > > [    8.701032] R10: 0000000000000000 R11: 0000000000002400 R12: ffffffffa6611740
> > > > > > [    8.701032] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffa6611740
> > > > > > [    8.701032]  ? __sched_text_end+0x5/0x5
> > > > > > [    8.701032]  do_idle+0x19c/0x230
> > > > > > [    8.701032]  cpu_startup_entry+0x14/0x20
> > > > > > [    8.701032]  start_kernel+0x491/0x4b1
> > > > > > [    8.701032]  secondary_startup_64+0xa4/0xb0
> > > > > > [    8.701032] Kernel Offset: 0x24200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > > > > 
> > > > > 
> > > > > That is simply caused by enabling DEBUG_KOBJECT_RELEASE itself, which
> > > > > tries to do kboject_release() after delaying a while.
> > > > > 
> > > > > Wrt. this issue, q->mq_kobj is embedded in 'request_queue' whose
> > > > > instance is freed in release handler of q->kobj. So when one instance
> > > > > of 'request_queue' is freed, DEBUG_KOBJECT_RELEASE still expects that
> > > > > the q->mq_kobj is active.
> > > > > 
> > > > > The release handler of q->mq_kobj is NOP actually, so in reality it won't
> > > > > be one issue at all.
> > > > 
> > > > Given we have removed legacy IO path completely, it should be fine to
> > > > remove q->mq_kobj and simply put everything under the kobj of queue,
> > > > especially any attributes under mq aren't marked as stable ABI.
> > > 
> > > That might be possible going forward, but that does not appear to be
> > > a feasible solution for stable releases. I am not sure though if the
> > > "stable ABI" argument is valid. Almost all sysfs attributes are not
> > > marked stable. If there are applications using it, removing the atttributes
> > > would still break userspace.
> > 
> > Yeah, don't move attributes around for no good reason, you will make
> > userspace grumpy :)
> 
> There are very less attributes left under /sys/block/$DEV/mq.
> 
> > 
> > > Since the release function of mq_kobj is NOP (for reference, I assume
> > > we are talking about blk_mq_sysfs_release), can it just be NULL ?
> > 
> > Someone tried to work around the internal kernel warning that comes
> > about when you set a release function to NULL.  As per the in-kernel
> > documentation, I now get to make fun of that author by saying "do you
> > think I added those checks and message just for no good reason?  Don't
> > try to be smarter than the driver core by providing an empty release
> > function to make the kernel be quiet.  That's NOT what the kernel is
> > trying to tell you here at all."
> > 
> > So fix this, that is the correct thing to do.  Memory is freed in a
> > release function, to not do so is almost always a sign of a design bug.
> 
> The root cause is that both q->kobj and q->mq_kobj are bound to lifetime
> of same 'request_queue' instance.
> 
> We may allocate q->mq_kobj dynamically to fix this issue.
> 
> And now all drivers are converted to mq, and it is really necessary to keep
> 'q->mq_kobj'
> 
> > 
> > > That it is an empty function seems to be just as good or bad as having
> > > no function at all. An empty function just avoids the "broken" debug
> > > message. That message is still better than rendering DEBUG_KOBJECT_RELEASE
> > > useless.
> > > 
> > > Copying GregKH for input.
> > 
> > Thanks, this code is broken and needs to be fixed.
> 
> I will post a patch to address the 'issue'.

Not only q->mq_kobj, there is also each 'struct blk_mq_ctx' which is
referenced via percpu variable of q->queue_ctx, we still may introduce
one indirect table and allocate each blk_mq_ctx dynamically just for
making DEBUG_KOBJECT_RELEASE happy. But code can become not readable
as before.

Just wondering if it is possible to deal with this kind of shared
lifetime issue among kobjects.

In this case, q->mq_kobj, every 'struct blk_mq_ctx' referenced via
q->queue_ctx and q->kobj share same lifetime, it is fine to just
release them in one release handler.

I will think further and see if it can be done in clean/simple way.


Thanks,
Ming

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kobject lifetime issues in blk-mq
  2018-11-13  0:41           ` Ming Lei
@ 2018-11-14 11:08             ` Ming Lei
  2018-11-14 15:14               ` Greg Kroah-Hartman
  2018-11-14 15:12             ` Greg Kroah-Hartman
  1 sibling, 1 reply; 18+ messages in thread
From: Ming Lei @ 2018-11-14 11:08 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Guenter Roeck, Ming Lei, Jens Axboe, Peter Zijlstra, linux-block,
	linux-kernel, Hannes Reinecke, Paolo Bonzini, Christoph Hellwig,
	Martin K. Petersen, James E.J. Bottomley, linux-scsi

On Tue, Nov 13, 2018 at 08:41:24AM +0800, Ming Lei wrote:
> On Tue, Nov 13, 2018 at 08:22:26AM +0800, Ming Lei wrote:
> > On Mon, Nov 12, 2018 at 12:02:36PM -0800, Greg Kroah-Hartman wrote:
> > > On Mon, Nov 12, 2018 at 08:48:48AM -0800, Guenter Roeck wrote:
> > > > On Mon, Nov 12, 2018 at 05:44:07PM +0800, Ming Lei wrote:
> > > > > On Mon, Nov 12, 2018 at 05:20:51PM +0800, Ming Lei wrote:
> > > > > > On Fri, Nov 09, 2018 at 12:35:18PM -0800, Guenter Roeck wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > Images with CONFIG_DEBUG_KOBJECT=y, CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > > > > > enabled report kobject lifetime issues when booting an image in qemu
> > > > > > > using virtio-scsi-pci.
> > > > > > > 
> > > > > > > Bisect points to two different commits, depending on the kernel
> > > > > > > configuration.
> > > > > > > 
> > > > > > > 1st configuration: defconfig plus:
> > > > > > > 
> > > > > > > CONFIG_VIRTIO_BLK=y
> > > > > > > CONFIG_SCSI_LOWLEVEL=y
> > > > > > > CONFIG_SCSI_VIRTIO=y
> > > > > > > CONFIG_VIRTIO_PCI=y
> > > > > > > CONFIG_VIRTIO_MMIO=y
> > > > > > > CONFIG_DEBUG_OBJECTS=y
> > > > > > > CONFIG_DEBUG_OBJECTS_FREE=y
> > > > > > > CONFIG_DEBUG_OBJECTS_TIMERS=y
> > > > > > > CONFIG_DEBUG_KOBJECT=y
> > > > > > > CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > > > > > 
> > > > > > > Bisecting this configuration points to commit b5b6e8c8d3b4 ("scsi:
> > > > > > > virtio_scsi: fix IO hang caused by automatic irq vector affinity")
> > > > > > > as the culprit. This commit enforces the use of blk-mq for virtio_scsi;
> > > > > > > the bisect log is therefore a bit misleading.
> > > > > > > 
> > > > > > > 2nd configuration: As above, plus:
> > > > > > > CONFIG_SCSI_MQ_DEFAULT=y
> > > > > > > 
> > > > > > > With this configuration, bisect points to commit 7ea5fe31c12d
> > > > > > > ("blk-mq: make lifetime consitent between q/ctx and its kobject").
> > > > > > > 
> > > > > > > The qemu command line used to reproduce the problem is as follows.
> > > > > > > The qemu version does not seem to matter (I tested with qemu versions
> > > > > > > 2.5, 2.11, and 3.0).
> > > > > > > 
> > > > > > > qemu-system-x86_64 \
> > > > > > > 	-kernel arch/x86/boot/bzImage \
> > > > > > > 	-device virtio-scsi-pci,id=scsi \
> > > > > > > 	-device scsi-hd,bus=scsi.0,drive=d0 \
> > > > > > > 	-drive file=./wheezy.img,format=raw,if=none,id=d0 \
> > > > > > > 	-snapshot \
> > > > > > > 	-m 2G -smp 4 -enable-kvm \
> > > > > > > 	-net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic \
> > > > > > > 	-nographic -monitor none \
> > > > > > > 	-no-reboot \
> > > > > > > 	-append "console=ttyS0 root=/dev/sda earlyprintk=serial panic_on_warn=1 panic=-1"
> > > > > > > 
> > > > > > > Root file system:
> > > > > > > 	https://storage.googleapis.com/syzkaller/wheezy.img
> > > > > > > or:
> > > > > > > 	https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/rootfs.ext2.gz
> > > > > > > 
> > > > > > > It seems that any root file system can be used to reproduce the problem.
> > > > > > > Also, the problem is not limited to virtio-scsi-pci. I also tried
> > > > > > > am53c974, lsi53c810, and lsi53c895a (after enabling the respective
> > > > > > > drivers), with the same result.
> > > > > > > 
> > > > > > > Overall, this suggests that the problem is related to blk-mq and was
> > > > > > > indeed introduced with commit 7ea5fe31c12d.
> > > > > > > 
> > > > > > > For reference, I attached an actual crash log as well as a set of bisect
> > > > > > > logs below.
> > > > > > > 
> > > > > > > Unfortunately, my understanding of blk-mq is not good enough to find the
> > > > > > > underlying problem and suggest a fix. Please let me know if there is
> > > > > > > anything else I can do to help fixing the problem.
> > > > > > > 
> > > > > > > Note that the problem was originally reported by syzbot running a test
> > > > > > > on chromeos-4.14. It may well be that the problem has already been
> > > > > > > reported and is being worked on. If so, please apologize the noise.
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > Guenter
> > > > > > > 
> > > > > > > ---
> > > > > > > [    8.700038] ------------[ cut here ]------------
> > > > > > > [    8.700376] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10
> > > > > > > [    8.701032] WARNING: CPU: 0 PID: 0 at lib/debugobjects.c:329 debug_print_object+0x6a/0x80
> > > > > > > [    8.701032] Kernel panic - not syncing: panic_on_warn set ...
> > > > > > > [    8.701032] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.20.0-rc1 #30
> > > > > > > [    8.701032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > > > > > [    8.701032] Call Trace:
> > > > > > > [    8.701032]  <IRQ>
> > > > > > > [    8.701032]  dump_stack+0x46/0x5b
> > > > > > > [    8.701032]  panic+0xf3/0x246
> > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > [    8.701032]  __warn+0xeb/0xf0
> > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > [    8.701032]  report_bug+0xb1/0x120
> > > > > > > [    8.701032]  fixup_bug.part.10+0x13/0x30
> > > > > > > [    8.701032]  do_error_trap+0x8f/0xb0
> > > > > > > [    8.701032]  do_invalid_op+0x31/0x40
> > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > [    8.701032]  invalid_op+0x14/0x20
> > > > > > > [    8.701032] RIP: 0010:debug_print_object+0x6a/0x80
> > > > > > > [    8.701032] Code: 8b 43 10 83 c2 01 8b 4b 14 89 15 c1 e2 78 01 4c 8b 45 00 4c
> > > > > > > 89 e6 48 c7 c7 f0 ca 41 a6 48 8b 14 c5 a0 f7 24 a6 e8 06 8e cc ff <0f> 0b 5b 83
> > > > > > > 05 70 47 1a 01 01 5d 41 5c c3 83 05 65 47 1a 01 01 c3
> > > > > > > [    8.701032] RSP: 0018:ffff89b9fda03e48 EFLAGS: 00010082
> > > > > > > [    8.701032] RAX: 0000000000000000 RBX: ffff89b9fd49d0c8 RCX: ffffffffa6646e18
> > > > > > > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000082 RDI: ffffffffa6cc596c
> > > > > > > [    8.701032] RBP: ffffffffa664a5c0 R08: 79616c6564203a74 R09: 5f6b726f775f6465
> > > > > > > [    8.701032] R10: ffff89b9fda03f10 R11: 3178302f3078302b R12: ffffffffa6403e22
> > > > > > > [    8.701032] R13: 0000000000000001 R14: ffffffffa6d4d458 R15: ffff89b9fc4c8780
> > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > [    8.701032]  debug_check_no_obj_freed+0x1b8/0x1ea
> > > > > > > [    8.701032]  ? rcu_process_callbacks+0x2a7/0x750
> > > > > > > [    8.701032]  kmem_cache_free+0x6e/0x1a0
> > > > > > > [    8.701032]  ? __blk_release_queue+0x150/0x150
> > > > > > > [    8.701032]  rcu_process_callbacks+0x2a7/0x750
> > > > > > > [    8.701032]  __do_softirq+0xf2/0x2c7
> > > > > > > [    8.701032]  irq_exit+0xb7/0xc0
> > > > > > > [    8.701032]  smp_apic_timer_interrupt+0x67/0x130
> > > > > > > [    8.701032]  apic_timer_interrupt+0xf/0x20
> > > > > > > [    8.701032]  </IRQ>
> > > > > > > [    8.701032] RIP: 0010:default_idle+0x12/0x140
> > > > > > > [    8.701032] Code: fe ff ff c6 43 08 00 fb eb b0 31 c0 eb b4 e8 65 c8 60 ff 90
> > > > > > > 90 90 90 90 41 54 55 53 65 8b 2d 15 c4 3b 5a 66 66 66 66 90 fb f4 <65> 8b 2d 07
> > > > > > > c4 3b 5a 66 66 66 66 90 5b 5d 41 5c c3 65 8b 05 f6 c3
> > > > > > > [    8.701032] RSP: 0018:ffffffffa6603e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > > > > > > [    8.701032] RAX: ffffffffa5c52d10 RBX: 0000000000000000 RCX: 0000000000000000
> > > > > > > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000020605092a
> > > > > > > [    8.701032] RBP: 0000000000000000 R08: ffffffffcff7766a R09: ffff89b9fda20b00
> > > > > > > [    8.701032] R10: 0000000000000000 R11: 0000000000002400 R12: ffffffffa6611740
> > > > > > > [    8.701032] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffa6611740
> > > > > > > [    8.701032]  ? __sched_text_end+0x5/0x5
> > > > > > > [    8.701032]  do_idle+0x19c/0x230
> > > > > > > [    8.701032]  cpu_startup_entry+0x14/0x20
> > > > > > > [    8.701032]  start_kernel+0x491/0x4b1
> > > > > > > [    8.701032]  secondary_startup_64+0xa4/0xb0
> > > > > > > [    8.701032] Kernel Offset: 0x24200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > > > > > 
> > > > > > 
> > > > > > That is simply caused by enabling DEBUG_KOBJECT_RELEASE itself, which
> > > > > > tries to do kboject_release() after delaying a while.
> > > > > > 
> > > > > > Wrt. this issue, q->mq_kobj is embedded in 'request_queue' whose
> > > > > > instance is freed in release handler of q->kobj. So when one instance
> > > > > > of 'request_queue' is freed, DEBUG_KOBJECT_RELEASE still expects that
> > > > > > the q->mq_kobj is active.
> > > > > > 
> > > > > > The release handler of q->mq_kobj is NOP actually, so in reality it won't
> > > > > > be one issue at all.
> > > > > 
> > > > > Given we have removed legacy IO path completely, it should be fine to
> > > > > remove q->mq_kobj and simply put everything under the kobj of queue,
> > > > > especially any attributes under mq aren't marked as stable ABI.
> > > > 
> > > > That might be possible going forward, but that does not appear to be
> > > > a feasible solution for stable releases. I am not sure though if the
> > > > "stable ABI" argument is valid. Almost all sysfs attributes are not
> > > > marked stable. If there are applications using it, removing the atttributes
> > > > would still break userspace.
> > > 
> > > Yeah, don't move attributes around for no good reason, you will make
> > > userspace grumpy :)
> > 
> > There are very less attributes left under /sys/block/$DEV/mq.
> > 
> > > 
> > > > Since the release function of mq_kobj is NOP (for reference, I assume
> > > > we are talking about blk_mq_sysfs_release), can it just be NULL ?
> > > 
> > > Someone tried to work around the internal kernel warning that comes
> > > about when you set a release function to NULL.  As per the in-kernel
> > > documentation, I now get to make fun of that author by saying "do you
> > > think I added those checks and message just for no good reason?  Don't
> > > try to be smarter than the driver core by providing an empty release
> > > function to make the kernel be quiet.  That's NOT what the kernel is
> > > trying to tell you here at all."
> > > 
> > > So fix this, that is the correct thing to do.  Memory is freed in a
> > > release function, to not do so is almost always a sign of a design bug.
> > 
> > The root cause is that both q->kobj and q->mq_kobj are bound to lifetime
> > of same 'request_queue' instance.
> > 
> > We may allocate q->mq_kobj dynamically to fix this issue.
> > 
> > And now all drivers are converted to mq, and it is really necessary to keep
> > 'q->mq_kobj'
> > 
> > > 
> > > > That it is an empty function seems to be just as good or bad as having
> > > > no function at all. An empty function just avoids the "broken" debug
> > > > message. That message is still better than rendering DEBUG_KOBJECT_RELEASE
> > > > useless.
> > > > 
> > > > Copying GregKH for input.
> > > 
> > > Thanks, this code is broken and needs to be fixed.
> > 
> > I will post a patch to address the 'issue'.
> 
> Not only q->mq_kobj, there is also each 'struct blk_mq_ctx' which is
> referenced via percpu variable of q->queue_ctx, we still may introduce
> one indirect table and allocate each blk_mq_ctx dynamically just for
> making DEBUG_KOBJECT_RELEASE happy. But code can become not readable
> as before.
> 
> Just wondering if it is possible to deal with this kind of shared
> lifetime issue among kobjects.
> 
> In this case, q->mq_kobj, every 'struct blk_mq_ctx' referenced via
> q->queue_ctx and q->kobj share same lifetime, it is fine to just
> release them in one release handler.
> 
> I will think further and see if it can be done in clean/simple way.

Hi Guenter, Greg and Guys,

What do you think about the following approach?

---
diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 2d737f9e7ba7..e15f5760ea1f 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -298,10 +298,12 @@ void blk_mq_sysfs_init(struct request_queue *q)
 	int cpu;
 
 	kobject_init(&q->mq_kobj, &blk_mq_ktype);
+	kobject_mark_no_delay_release(&q->mq_kobj);
 
 	for_each_possible_cpu(cpu) {
 		ctx = per_cpu_ptr(q->queue_ctx, cpu);
 		kobject_init(&ctx->kobj, &blk_mq_ctx_ktype);
+		kobject_mark_no_delay_release(&ctx->kobj);
 	}
 }
 
diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index 1ab0d624fb36..a8857366fb76 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -78,6 +78,7 @@ struct kobject {
 	unsigned int state_add_uevent_sent:1;
 	unsigned int state_remove_uevent_sent:1;
 	unsigned int uevent_suppress:1;
+	unsigned int no_delay_release:1;
 };
 
 extern __printf(2, 3)
@@ -136,6 +137,16 @@ static inline bool kobject_has_children(struct kobject *kobj)
 	return kobj->sd && kobj->sd->dir.subdirs;
 }
 
+/*
+ * DEBUG_KOBJECT_RELEASE may break case in which more than one kboject
+ * share one same external object, and have same lifetime, so these
+ * users may tell kobject core to not do delay release via this helper.
+ */
+static inline void kobject_mark_no_delay_release(struct kobject *kobj)
+{
+	kobj->no_delay_release = 1;
+}
+
 struct kobj_type {
 	void (*release)(struct kobject *kobj);
 	const struct sysfs_ops *sysfs_ops;
diff --git a/lib/kobject.c b/lib/kobject.c
index 97d86dc17c42..f6200c4d7aa1 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -683,10 +683,14 @@ static void kobject_release(struct kref *kref)
 #ifdef CONFIG_DEBUG_KOBJECT_RELEASE
 	unsigned long delay = HZ + HZ * (get_random_int() & 0x3);
 	pr_info("kobject: '%s' (%p): %s, parent %p (delayed %ld)\n",
-		 kobject_name(kobj), kobj, __func__, kobj->parent, delay);
+		 kobject_name(kobj), kobj, __func__, kobj->parent,
+		 delay * !kobj->no_delay_release);
 	INIT_DELAYED_WORK(&kobj->release, kobject_delayed_cleanup);
 
-	schedule_delayed_work(&kobj->release, delay);
+	if (kobj->no_delay_release)
+		kobject_cleanup(kobj);
+	else
+		schedule_delayed_work(&kobj->release, delay);
 #else
 	kobject_cleanup(kobj);
 #endif

-- 
Ming

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: kobject lifetime issues in blk-mq
  2018-11-13  0:41           ` Ming Lei
  2018-11-14 11:08             ` Ming Lei
@ 2018-11-14 15:12             ` Greg Kroah-Hartman
  1 sibling, 0 replies; 18+ messages in thread
From: Greg Kroah-Hartman @ 2018-11-14 15:12 UTC (permalink / raw)
  To: Ming Lei
  Cc: Guenter Roeck, Ming Lei, Jens Axboe, Peter Zijlstra, linux-block,
	linux-kernel, Hannes Reinecke, Paolo Bonzini, Christoph Hellwig,
	Martin K. Petersen, James E.J. Bottomley, linux-scsi

On Tue, Nov 13, 2018 at 08:41:25AM +0800, Ming Lei wrote:
> On Tue, Nov 13, 2018 at 08:22:26AM +0800, Ming Lei wrote:
> > On Mon, Nov 12, 2018 at 12:02:36PM -0800, Greg Kroah-Hartman wrote:
> > > On Mon, Nov 12, 2018 at 08:48:48AM -0800, Guenter Roeck wrote:
> > > > On Mon, Nov 12, 2018 at 05:44:07PM +0800, Ming Lei wrote:
> > > > > On Mon, Nov 12, 2018 at 05:20:51PM +0800, Ming Lei wrote:
> > > > > > On Fri, Nov 09, 2018 at 12:35:18PM -0800, Guenter Roeck wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > Images with CONFIG_DEBUG_KOBJECT=y, CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > > > > > enabled report kobject lifetime issues when booting an image in qemu
> > > > > > > using virtio-scsi-pci.
> > > > > > > 
> > > > > > > Bisect points to two different commits, depending on the kernel
> > > > > > > configuration.
> > > > > > > 
> > > > > > > 1st configuration: defconfig plus:
> > > > > > > 
> > > > > > > CONFIG_VIRTIO_BLK=y
> > > > > > > CONFIG_SCSI_LOWLEVEL=y
> > > > > > > CONFIG_SCSI_VIRTIO=y
> > > > > > > CONFIG_VIRTIO_PCI=y
> > > > > > > CONFIG_VIRTIO_MMIO=y
> > > > > > > CONFIG_DEBUG_OBJECTS=y
> > > > > > > CONFIG_DEBUG_OBJECTS_FREE=y
> > > > > > > CONFIG_DEBUG_OBJECTS_TIMERS=y
> > > > > > > CONFIG_DEBUG_KOBJECT=y
> > > > > > > CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > > > > > 
> > > > > > > Bisecting this configuration points to commit b5b6e8c8d3b4 ("scsi:
> > > > > > > virtio_scsi: fix IO hang caused by automatic irq vector affinity")
> > > > > > > as the culprit. This commit enforces the use of blk-mq for virtio_scsi;
> > > > > > > the bisect log is therefore a bit misleading.
> > > > > > > 
> > > > > > > 2nd configuration: As above, plus:
> > > > > > > CONFIG_SCSI_MQ_DEFAULT=y
> > > > > > > 
> > > > > > > With this configuration, bisect points to commit 7ea5fe31c12d
> > > > > > > ("blk-mq: make lifetime consitent between q/ctx and its kobject").
> > > > > > > 
> > > > > > > The qemu command line used to reproduce the problem is as follows.
> > > > > > > The qemu version does not seem to matter (I tested with qemu versions
> > > > > > > 2.5, 2.11, and 3.0).
> > > > > > > 
> > > > > > > qemu-system-x86_64 \
> > > > > > > 	-kernel arch/x86/boot/bzImage \
> > > > > > > 	-device virtio-scsi-pci,id=scsi \
> > > > > > > 	-device scsi-hd,bus=scsi.0,drive=d0 \
> > > > > > > 	-drive file=./wheezy.img,format=raw,if=none,id=d0 \
> > > > > > > 	-snapshot \
> > > > > > > 	-m 2G -smp 4 -enable-kvm \
> > > > > > > 	-net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic \
> > > > > > > 	-nographic -monitor none \
> > > > > > > 	-no-reboot \
> > > > > > > 	-append "console=ttyS0 root=/dev/sda earlyprintk=serial panic_on_warn=1 panic=-1"
> > > > > > > 
> > > > > > > Root file system:
> > > > > > > 	https://storage.googleapis.com/syzkaller/wheezy.img
> > > > > > > or:
> > > > > > > 	https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/rootfs.ext2.gz
> > > > > > > 
> > > > > > > It seems that any root file system can be used to reproduce the problem.
> > > > > > > Also, the problem is not limited to virtio-scsi-pci. I also tried
> > > > > > > am53c974, lsi53c810, and lsi53c895a (after enabling the respective
> > > > > > > drivers), with the same result.
> > > > > > > 
> > > > > > > Overall, this suggests that the problem is related to blk-mq and was
> > > > > > > indeed introduced with commit 7ea5fe31c12d.
> > > > > > > 
> > > > > > > For reference, I attached an actual crash log as well as a set of bisect
> > > > > > > logs below.
> > > > > > > 
> > > > > > > Unfortunately, my understanding of blk-mq is not good enough to find the
> > > > > > > underlying problem and suggest a fix. Please let me know if there is
> > > > > > > anything else I can do to help fixing the problem.
> > > > > > > 
> > > > > > > Note that the problem was originally reported by syzbot running a test
> > > > > > > on chromeos-4.14. It may well be that the problem has already been
> > > > > > > reported and is being worked on. If so, please apologize the noise.
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > Guenter
> > > > > > > 
> > > > > > > ---
> > > > > > > [    8.700038] ------------[ cut here ]------------
> > > > > > > [    8.700376] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10
> > > > > > > [    8.701032] WARNING: CPU: 0 PID: 0 at lib/debugobjects.c:329 debug_print_object+0x6a/0x80
> > > > > > > [    8.701032] Kernel panic - not syncing: panic_on_warn set ...
> > > > > > > [    8.701032] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.20.0-rc1 #30
> > > > > > > [    8.701032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > > > > > [    8.701032] Call Trace:
> > > > > > > [    8.701032]  <IRQ>
> > > > > > > [    8.701032]  dump_stack+0x46/0x5b
> > > > > > > [    8.701032]  panic+0xf3/0x246
> > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > [    8.701032]  __warn+0xeb/0xf0
> > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > [    8.701032]  report_bug+0xb1/0x120
> > > > > > > [    8.701032]  fixup_bug.part.10+0x13/0x30
> > > > > > > [    8.701032]  do_error_trap+0x8f/0xb0
> > > > > > > [    8.701032]  do_invalid_op+0x31/0x40
> > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > [    8.701032]  invalid_op+0x14/0x20
> > > > > > > [    8.701032] RIP: 0010:debug_print_object+0x6a/0x80
> > > > > > > [    8.701032] Code: 8b 43 10 83 c2 01 8b 4b 14 89 15 c1 e2 78 01 4c 8b 45 00 4c
> > > > > > > 89 e6 48 c7 c7 f0 ca 41 a6 48 8b 14 c5 a0 f7 24 a6 e8 06 8e cc ff <0f> 0b 5b 83
> > > > > > > 05 70 47 1a 01 01 5d 41 5c c3 83 05 65 47 1a 01 01 c3
> > > > > > > [    8.701032] RSP: 0018:ffff89b9fda03e48 EFLAGS: 00010082
> > > > > > > [    8.701032] RAX: 0000000000000000 RBX: ffff89b9fd49d0c8 RCX: ffffffffa6646e18
> > > > > > > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000082 RDI: ffffffffa6cc596c
> > > > > > > [    8.701032] RBP: ffffffffa664a5c0 R08: 79616c6564203a74 R09: 5f6b726f775f6465
> > > > > > > [    8.701032] R10: ffff89b9fda03f10 R11: 3178302f3078302b R12: ffffffffa6403e22
> > > > > > > [    8.701032] R13: 0000000000000001 R14: ffffffffa6d4d458 R15: ffff89b9fc4c8780
> > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > [    8.701032]  debug_check_no_obj_freed+0x1b8/0x1ea
> > > > > > > [    8.701032]  ? rcu_process_callbacks+0x2a7/0x750
> > > > > > > [    8.701032]  kmem_cache_free+0x6e/0x1a0
> > > > > > > [    8.701032]  ? __blk_release_queue+0x150/0x150
> > > > > > > [    8.701032]  rcu_process_callbacks+0x2a7/0x750
> > > > > > > [    8.701032]  __do_softirq+0xf2/0x2c7
> > > > > > > [    8.701032]  irq_exit+0xb7/0xc0
> > > > > > > [    8.701032]  smp_apic_timer_interrupt+0x67/0x130
> > > > > > > [    8.701032]  apic_timer_interrupt+0xf/0x20
> > > > > > > [    8.701032]  </IRQ>
> > > > > > > [    8.701032] RIP: 0010:default_idle+0x12/0x140
> > > > > > > [    8.701032] Code: fe ff ff c6 43 08 00 fb eb b0 31 c0 eb b4 e8 65 c8 60 ff 90
> > > > > > > 90 90 90 90 41 54 55 53 65 8b 2d 15 c4 3b 5a 66 66 66 66 90 fb f4 <65> 8b 2d 07
> > > > > > > c4 3b 5a 66 66 66 66 90 5b 5d 41 5c c3 65 8b 05 f6 c3
> > > > > > > [    8.701032] RSP: 0018:ffffffffa6603e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > > > > > > [    8.701032] RAX: ffffffffa5c52d10 RBX: 0000000000000000 RCX: 0000000000000000
> > > > > > > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000020605092a
> > > > > > > [    8.701032] RBP: 0000000000000000 R08: ffffffffcff7766a R09: ffff89b9fda20b00
> > > > > > > [    8.701032] R10: 0000000000000000 R11: 0000000000002400 R12: ffffffffa6611740
> > > > > > > [    8.701032] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffa6611740
> > > > > > > [    8.701032]  ? __sched_text_end+0x5/0x5
> > > > > > > [    8.701032]  do_idle+0x19c/0x230
> > > > > > > [    8.701032]  cpu_startup_entry+0x14/0x20
> > > > > > > [    8.701032]  start_kernel+0x491/0x4b1
> > > > > > > [    8.701032]  secondary_startup_64+0xa4/0xb0
> > > > > > > [    8.701032] Kernel Offset: 0x24200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > > > > > 
> > > > > > 
> > > > > > That is simply caused by enabling DEBUG_KOBJECT_RELEASE itself, which
> > > > > > tries to do kboject_release() after delaying a while.
> > > > > > 
> > > > > > Wrt. this issue, q->mq_kobj is embedded in 'request_queue' whose
> > > > > > instance is freed in release handler of q->kobj. So when one instance
> > > > > > of 'request_queue' is freed, DEBUG_KOBJECT_RELEASE still expects that
> > > > > > the q->mq_kobj is active.
> > > > > > 
> > > > > > The release handler of q->mq_kobj is NOP actually, so in reality it won't
> > > > > > be one issue at all.
> > > > > 
> > > > > Given we have removed legacy IO path completely, it should be fine to
> > > > > remove q->mq_kobj and simply put everything under the kobj of queue,
> > > > > especially any attributes under mq aren't marked as stable ABI.
> > > > 
> > > > That might be possible going forward, but that does not appear to be
> > > > a feasible solution for stable releases. I am not sure though if the
> > > > "stable ABI" argument is valid. Almost all sysfs attributes are not
> > > > marked stable. If there are applications using it, removing the atttributes
> > > > would still break userspace.
> > > 
> > > Yeah, don't move attributes around for no good reason, you will make
> > > userspace grumpy :)
> > 
> > There are very less attributes left under /sys/block/$DEV/mq.
> > 
> > > 
> > > > Since the release function of mq_kobj is NOP (for reference, I assume
> > > > we are talking about blk_mq_sysfs_release), can it just be NULL ?
> > > 
> > > Someone tried to work around the internal kernel warning that comes
> > > about when you set a release function to NULL.  As per the in-kernel
> > > documentation, I now get to make fun of that author by saying "do you
> > > think I added those checks and message just for no good reason?  Don't
> > > try to be smarter than the driver core by providing an empty release
> > > function to make the kernel be quiet.  That's NOT what the kernel is
> > > trying to tell you here at all."
> > > 
> > > So fix this, that is the correct thing to do.  Memory is freed in a
> > > release function, to not do so is almost always a sign of a design bug.
> > 
> > The root cause is that both q->kobj and q->mq_kobj are bound to lifetime
> > of same 'request_queue' instance.

Then that is the problem right there.  You can not have two reference
counted objects that somehow "share" the refrence counted logic like
this.  You can not "bind" them together except if you properly handle
the references on all objects.

Hopefully you do not actually have more than one kobject in the same
structure?  You can have pointers, but not a real structure.

If you have a pointer, great, treat it as an individual object like you
should and handle things correctly that way by properly incrementing and
decrementing it as needed.  But do not think that somehow you are
controlling them at the same time as other parts of the kernel (and
userspace) can grab references and hold things for longer than you are
expecting.

That is why you can not have an empty release function, to do so means
you have a design bug with how you have used these structures.

So please go fix this, if you just have pointers to kobjects, there
should not be any issues.  If you do not, then you have problems.  Do
you have any specific code I should look at here to try to figure out
what you all are doing (note I'm at a conference all week so my response
time is slow...)

> > We may allocate q->mq_kobj dynamically to fix this issue.
> > 
> > And now all drivers are converted to mq, and it is really necessary to keep
> > 'q->mq_kobj'
> > 
> > > 
> > > > That it is an empty function seems to be just as good or bad as having
> > > > no function at all. An empty function just avoids the "broken" debug
> > > > message. That message is still better than rendering DEBUG_KOBJECT_RELEASE
> > > > useless.
> > > > 
> > > > Copying GregKH for input.
> > > 
> > > Thanks, this code is broken and needs to be fixed.
> > 
> > I will post a patch to address the 'issue'.
> 
> Not only q->mq_kobj, there is also each 'struct blk_mq_ctx' which is
> referenced via percpu variable of q->queue_ctx, we still may introduce
> one indirect table and allocate each blk_mq_ctx dynamically just for
> making DEBUG_KOBJECT_RELEASE happy. But code can become not readable
> as before.
> 
> Just wondering if it is possible to deal with this kind of shared
> lifetime issue among kobjects.

No, and you should not think about it as a "shared lifetime issue", all
objects have their own lifetime and if it just so happens that some of
them seem to share the same increment/decrement logic at the same time,
that's great.  But again, you can not rely on that as this test proves.

> In this case, q->mq_kobj, every 'struct blk_mq_ctx' referenced via
> q->queue_ctx and q->kobj share same lifetime, it is fine to just
> release them in one release handler.

No it is not as you can see, and as I wrote above.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kobject lifetime issues in blk-mq
  2018-11-14 11:08             ` Ming Lei
@ 2018-11-14 15:14               ` Greg Kroah-Hartman
  2018-11-15  0:36                 ` Ming Lei
  0 siblings, 1 reply; 18+ messages in thread
From: Greg Kroah-Hartman @ 2018-11-14 15:14 UTC (permalink / raw)
  To: Ming Lei
  Cc: Guenter Roeck, Ming Lei, Jens Axboe, Peter Zijlstra, linux-block,
	linux-kernel, Hannes Reinecke, Paolo Bonzini, Christoph Hellwig,
	Martin K. Petersen, James E.J. Bottomley, linux-scsi

On Wed, Nov 14, 2018 at 07:08:28PM +0800, Ming Lei wrote:
> On Tue, Nov 13, 2018 at 08:41:24AM +0800, Ming Lei wrote:
> > On Tue, Nov 13, 2018 at 08:22:26AM +0800, Ming Lei wrote:
> > > On Mon, Nov 12, 2018 at 12:02:36PM -0800, Greg Kroah-Hartman wrote:
> > > > On Mon, Nov 12, 2018 at 08:48:48AM -0800, Guenter Roeck wrote:
> > > > > On Mon, Nov 12, 2018 at 05:44:07PM +0800, Ming Lei wrote:
> > > > > > On Mon, Nov 12, 2018 at 05:20:51PM +0800, Ming Lei wrote:
> > > > > > > On Fri, Nov 09, 2018 at 12:35:18PM -0800, Guenter Roeck wrote:
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > Images with CONFIG_DEBUG_KOBJECT=y, CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > > > > > > enabled report kobject lifetime issues when booting an image in qemu
> > > > > > > > using virtio-scsi-pci.
> > > > > > > > 
> > > > > > > > Bisect points to two different commits, depending on the kernel
> > > > > > > > configuration.
> > > > > > > > 
> > > > > > > > 1st configuration: defconfig plus:
> > > > > > > > 
> > > > > > > > CONFIG_VIRTIO_BLK=y
> > > > > > > > CONFIG_SCSI_LOWLEVEL=y
> > > > > > > > CONFIG_SCSI_VIRTIO=y
> > > > > > > > CONFIG_VIRTIO_PCI=y
> > > > > > > > CONFIG_VIRTIO_MMIO=y
> > > > > > > > CONFIG_DEBUG_OBJECTS=y
> > > > > > > > CONFIG_DEBUG_OBJECTS_FREE=y
> > > > > > > > CONFIG_DEBUG_OBJECTS_TIMERS=y
> > > > > > > > CONFIG_DEBUG_KOBJECT=y
> > > > > > > > CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > > > > > > 
> > > > > > > > Bisecting this configuration points to commit b5b6e8c8d3b4 ("scsi:
> > > > > > > > virtio_scsi: fix IO hang caused by automatic irq vector affinity")
> > > > > > > > as the culprit. This commit enforces the use of blk-mq for virtio_scsi;
> > > > > > > > the bisect log is therefore a bit misleading.
> > > > > > > > 
> > > > > > > > 2nd configuration: As above, plus:
> > > > > > > > CONFIG_SCSI_MQ_DEFAULT=y
> > > > > > > > 
> > > > > > > > With this configuration, bisect points to commit 7ea5fe31c12d
> > > > > > > > ("blk-mq: make lifetime consitent between q/ctx and its kobject").
> > > > > > > > 
> > > > > > > > The qemu command line used to reproduce the problem is as follows.
> > > > > > > > The qemu version does not seem to matter (I tested with qemu versions
> > > > > > > > 2.5, 2.11, and 3.0).
> > > > > > > > 
> > > > > > > > qemu-system-x86_64 \
> > > > > > > > 	-kernel arch/x86/boot/bzImage \
> > > > > > > > 	-device virtio-scsi-pci,id=scsi \
> > > > > > > > 	-device scsi-hd,bus=scsi.0,drive=d0 \
> > > > > > > > 	-drive file=./wheezy.img,format=raw,if=none,id=d0 \
> > > > > > > > 	-snapshot \
> > > > > > > > 	-m 2G -smp 4 -enable-kvm \
> > > > > > > > 	-net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic \
> > > > > > > > 	-nographic -monitor none \
> > > > > > > > 	-no-reboot \
> > > > > > > > 	-append "console=ttyS0 root=/dev/sda earlyprintk=serial panic_on_warn=1 panic=-1"
> > > > > > > > 
> > > > > > > > Root file system:
> > > > > > > > 	https://storage.googleapis.com/syzkaller/wheezy.img
> > > > > > > > or:
> > > > > > > > 	https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/rootfs.ext2.gz
> > > > > > > > 
> > > > > > > > It seems that any root file system can be used to reproduce the problem.
> > > > > > > > Also, the problem is not limited to virtio-scsi-pci. I also tried
> > > > > > > > am53c974, lsi53c810, and lsi53c895a (after enabling the respective
> > > > > > > > drivers), with the same result.
> > > > > > > > 
> > > > > > > > Overall, this suggests that the problem is related to blk-mq and was
> > > > > > > > indeed introduced with commit 7ea5fe31c12d.
> > > > > > > > 
> > > > > > > > For reference, I attached an actual crash log as well as a set of bisect
> > > > > > > > logs below.
> > > > > > > > 
> > > > > > > > Unfortunately, my understanding of blk-mq is not good enough to find the
> > > > > > > > underlying problem and suggest a fix. Please let me know if there is
> > > > > > > > anything else I can do to help fixing the problem.
> > > > > > > > 
> > > > > > > > Note that the problem was originally reported by syzbot running a test
> > > > > > > > on chromeos-4.14. It may well be that the problem has already been
> > > > > > > > reported and is being worked on. If so, please apologize the noise.
> > > > > > > > 
> > > > > > > > Thanks,
> > > > > > > > Guenter
> > > > > > > > 
> > > > > > > > ---
> > > > > > > > [    8.700038] ------------[ cut here ]------------
> > > > > > > > [    8.700376] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10
> > > > > > > > [    8.701032] WARNING: CPU: 0 PID: 0 at lib/debugobjects.c:329 debug_print_object+0x6a/0x80
> > > > > > > > [    8.701032] Kernel panic - not syncing: panic_on_warn set ...
> > > > > > > > [    8.701032] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.20.0-rc1 #30
> > > > > > > > [    8.701032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > > > > > > [    8.701032] Call Trace:
> > > > > > > > [    8.701032]  <IRQ>
> > > > > > > > [    8.701032]  dump_stack+0x46/0x5b
> > > > > > > > [    8.701032]  panic+0xf3/0x246
> > > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > > [    8.701032]  __warn+0xeb/0xf0
> > > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > > [    8.701032]  report_bug+0xb1/0x120
> > > > > > > > [    8.701032]  fixup_bug.part.10+0x13/0x30
> > > > > > > > [    8.701032]  do_error_trap+0x8f/0xb0
> > > > > > > > [    8.701032]  do_invalid_op+0x31/0x40
> > > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > > [    8.701032]  invalid_op+0x14/0x20
> > > > > > > > [    8.701032] RIP: 0010:debug_print_object+0x6a/0x80
> > > > > > > > [    8.701032] Code: 8b 43 10 83 c2 01 8b 4b 14 89 15 c1 e2 78 01 4c 8b 45 00 4c
> > > > > > > > 89 e6 48 c7 c7 f0 ca 41 a6 48 8b 14 c5 a0 f7 24 a6 e8 06 8e cc ff <0f> 0b 5b 83
> > > > > > > > 05 70 47 1a 01 01 5d 41 5c c3 83 05 65 47 1a 01 01 c3
> > > > > > > > [    8.701032] RSP: 0018:ffff89b9fda03e48 EFLAGS: 00010082
> > > > > > > > [    8.701032] RAX: 0000000000000000 RBX: ffff89b9fd49d0c8 RCX: ffffffffa6646e18
> > > > > > > > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000082 RDI: ffffffffa6cc596c
> > > > > > > > [    8.701032] RBP: ffffffffa664a5c0 R08: 79616c6564203a74 R09: 5f6b726f775f6465
> > > > > > > > [    8.701032] R10: ffff89b9fda03f10 R11: 3178302f3078302b R12: ffffffffa6403e22
> > > > > > > > [    8.701032] R13: 0000000000000001 R14: ffffffffa6d4d458 R15: ffff89b9fc4c8780
> > > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > > [    8.701032]  debug_check_no_obj_freed+0x1b8/0x1ea
> > > > > > > > [    8.701032]  ? rcu_process_callbacks+0x2a7/0x750
> > > > > > > > [    8.701032]  kmem_cache_free+0x6e/0x1a0
> > > > > > > > [    8.701032]  ? __blk_release_queue+0x150/0x150
> > > > > > > > [    8.701032]  rcu_process_callbacks+0x2a7/0x750
> > > > > > > > [    8.701032]  __do_softirq+0xf2/0x2c7
> > > > > > > > [    8.701032]  irq_exit+0xb7/0xc0
> > > > > > > > [    8.701032]  smp_apic_timer_interrupt+0x67/0x130
> > > > > > > > [    8.701032]  apic_timer_interrupt+0xf/0x20
> > > > > > > > [    8.701032]  </IRQ>
> > > > > > > > [    8.701032] RIP: 0010:default_idle+0x12/0x140
> > > > > > > > [    8.701032] Code: fe ff ff c6 43 08 00 fb eb b0 31 c0 eb b4 e8 65 c8 60 ff 90
> > > > > > > > 90 90 90 90 41 54 55 53 65 8b 2d 15 c4 3b 5a 66 66 66 66 90 fb f4 <65> 8b 2d 07
> > > > > > > > c4 3b 5a 66 66 66 66 90 5b 5d 41 5c c3 65 8b 05 f6 c3
> > > > > > > > [    8.701032] RSP: 0018:ffffffffa6603e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > > > > > > > [    8.701032] RAX: ffffffffa5c52d10 RBX: 0000000000000000 RCX: 0000000000000000
> > > > > > > > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000020605092a
> > > > > > > > [    8.701032] RBP: 0000000000000000 R08: ffffffffcff7766a R09: ffff89b9fda20b00
> > > > > > > > [    8.701032] R10: 0000000000000000 R11: 0000000000002400 R12: ffffffffa6611740
> > > > > > > > [    8.701032] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffa6611740
> > > > > > > > [    8.701032]  ? __sched_text_end+0x5/0x5
> > > > > > > > [    8.701032]  do_idle+0x19c/0x230
> > > > > > > > [    8.701032]  cpu_startup_entry+0x14/0x20
> > > > > > > > [    8.701032]  start_kernel+0x491/0x4b1
> > > > > > > > [    8.701032]  secondary_startup_64+0xa4/0xb0
> > > > > > > > [    8.701032] Kernel Offset: 0x24200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > > > > > > 
> > > > > > > 
> > > > > > > That is simply caused by enabling DEBUG_KOBJECT_RELEASE itself, which
> > > > > > > tries to do kboject_release() after delaying a while.
> > > > > > > 
> > > > > > > Wrt. this issue, q->mq_kobj is embedded in 'request_queue' whose
> > > > > > > instance is freed in release handler of q->kobj. So when one instance
> > > > > > > of 'request_queue' is freed, DEBUG_KOBJECT_RELEASE still expects that
> > > > > > > the q->mq_kobj is active.
> > > > > > > 
> > > > > > > The release handler of q->mq_kobj is NOP actually, so in reality it won't
> > > > > > > be one issue at all.
> > > > > > 
> > > > > > Given we have removed legacy IO path completely, it should be fine to
> > > > > > remove q->mq_kobj and simply put everything under the kobj of queue,
> > > > > > especially any attributes under mq aren't marked as stable ABI.
> > > > > 
> > > > > That might be possible going forward, but that does not appear to be
> > > > > a feasible solution for stable releases. I am not sure though if the
> > > > > "stable ABI" argument is valid. Almost all sysfs attributes are not
> > > > > marked stable. If there are applications using it, removing the atttributes
> > > > > would still break userspace.
> > > > 
> > > > Yeah, don't move attributes around for no good reason, you will make
> > > > userspace grumpy :)
> > > 
> > > There are very less attributes left under /sys/block/$DEV/mq.
> > > 
> > > > 
> > > > > Since the release function of mq_kobj is NOP (for reference, I assume
> > > > > we are talking about blk_mq_sysfs_release), can it just be NULL ?
> > > > 
> > > > Someone tried to work around the internal kernel warning that comes
> > > > about when you set a release function to NULL.  As per the in-kernel
> > > > documentation, I now get to make fun of that author by saying "do you
> > > > think I added those checks and message just for no good reason?  Don't
> > > > try to be smarter than the driver core by providing an empty release
> > > > function to make the kernel be quiet.  That's NOT what the kernel is
> > > > trying to tell you here at all."
> > > > 
> > > > So fix this, that is the correct thing to do.  Memory is freed in a
> > > > release function, to not do so is almost always a sign of a design bug.
> > > 
> > > The root cause is that both q->kobj and q->mq_kobj are bound to lifetime
> > > of same 'request_queue' instance.
> > > 
> > > We may allocate q->mq_kobj dynamically to fix this issue.
> > > 
> > > And now all drivers are converted to mq, and it is really necessary to keep
> > > 'q->mq_kobj'
> > > 
> > > > 
> > > > > That it is an empty function seems to be just as good or bad as having
> > > > > no function at all. An empty function just avoids the "broken" debug
> > > > > message. That message is still better than rendering DEBUG_KOBJECT_RELEASE
> > > > > useless.
> > > > > 
> > > > > Copying GregKH for input.
> > > > 
> > > > Thanks, this code is broken and needs to be fixed.
> > > 
> > > I will post a patch to address the 'issue'.
> > 
> > Not only q->mq_kobj, there is also each 'struct blk_mq_ctx' which is
> > referenced via percpu variable of q->queue_ctx, we still may introduce
> > one indirect table and allocate each blk_mq_ctx dynamically just for
> > making DEBUG_KOBJECT_RELEASE happy. But code can become not readable
> > as before.
> > 
> > Just wondering if it is possible to deal with this kind of shared
> > lifetime issue among kobjects.
> > 
> > In this case, q->mq_kobj, every 'struct blk_mq_ctx' referenced via
> > q->queue_ctx and q->kobj share same lifetime, it is fine to just
> > release them in one release handler.
> > 
> > I will think further and see if it can be done in clean/simple way.
> 
> Hi Guenter, Greg and Guys,
> 
> What do you think about the following approach?
> 
> ---
> diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
> index 2d737f9e7ba7..e15f5760ea1f 100644
> --- a/block/blk-mq-sysfs.c
> +++ b/block/blk-mq-sysfs.c
> @@ -298,10 +298,12 @@ void blk_mq_sysfs_init(struct request_queue *q)
>  	int cpu;
>  
>  	kobject_init(&q->mq_kobj, &blk_mq_ktype);
> +	kobject_mark_no_delay_release(&q->mq_kobj);
>  
>  	for_each_possible_cpu(cpu) {
>  		ctx = per_cpu_ptr(q->queue_ctx, cpu);
>  		kobject_init(&ctx->kobj, &blk_mq_ctx_ktype);
> +		kobject_mark_no_delay_release(&ctx->kobj);
>  	}
>  }
>  
> diff --git a/include/linux/kobject.h b/include/linux/kobject.h
> index 1ab0d624fb36..a8857366fb76 100644
> --- a/include/linux/kobject.h
> +++ b/include/linux/kobject.h
> @@ -78,6 +78,7 @@ struct kobject {
>  	unsigned int state_add_uevent_sent:1;
>  	unsigned int state_remove_uevent_sent:1;
>  	unsigned int uevent_suppress:1;
> +	unsigned int no_delay_release:1;
>  };
>  
>  extern __printf(2, 3)
> @@ -136,6 +137,16 @@ static inline bool kobject_has_children(struct kobject *kobj)
>  	return kobj->sd && kobj->sd->dir.subdirs;
>  }
>  
> +/*
> + * DEBUG_KOBJECT_RELEASE may break case in which more than one kboject
> + * share one same external object, and have same lifetime, so these
> + * users may tell kobject core to not do delay release via this helper.
> + */
> +static inline void kobject_mark_no_delay_release(struct kobject *kobj)
> +{
> +	kobj->no_delay_release = 1;
> +}

Ick, no, that's not ok.  "delay release" is a debugging tool, there is
nothing keeping userspace from grabbing a reference to your kobject and
holding on to it to emulate this "delay" test.

So even if you think the kernel is not going to do this, remember, you
have no control over it.  Reference counted objects are done this way
for a reason, you really do not know who has a reference and you really
do not care.

You are just papering over the real issue here, see my previous email
for how to start working on resolving it.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kobject lifetime issues in blk-mq
  2018-11-14 15:14               ` Greg Kroah-Hartman
@ 2018-11-15  0:36                 ` Ming Lei
  2018-11-15  0:56                   ` Greg Kroah-Hartman
  0 siblings, 1 reply; 18+ messages in thread
From: Ming Lei @ 2018-11-15  0:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Guenter Roeck, Ming Lei, Jens Axboe, Peter Zijlstra, linux-block,
	linux-kernel, Hannes Reinecke, Paolo Bonzini, Christoph Hellwig,
	Martin K. Petersen, James E.J. Bottomley, linux-scsi

On Wed, Nov 14, 2018 at 07:14:10AM -0800, Greg Kroah-Hartman wrote:
> On Wed, Nov 14, 2018 at 07:08:28PM +0800, Ming Lei wrote:
> > On Tue, Nov 13, 2018 at 08:41:24AM +0800, Ming Lei wrote:
> > > On Tue, Nov 13, 2018 at 08:22:26AM +0800, Ming Lei wrote:
> > > > On Mon, Nov 12, 2018 at 12:02:36PM -0800, Greg Kroah-Hartman wrote:
> > > > > On Mon, Nov 12, 2018 at 08:48:48AM -0800, Guenter Roeck wrote:
> > > > > > On Mon, Nov 12, 2018 at 05:44:07PM +0800, Ming Lei wrote:
> > > > > > > On Mon, Nov 12, 2018 at 05:20:51PM +0800, Ming Lei wrote:
> > > > > > > > On Fri, Nov 09, 2018 at 12:35:18PM -0800, Guenter Roeck wrote:
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > Images with CONFIG_DEBUG_KOBJECT=y, CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > > > > > > > enabled report kobject lifetime issues when booting an image in qemu
> > > > > > > > > using virtio-scsi-pci.
> > > > > > > > > 
> > > > > > > > > Bisect points to two different commits, depending on the kernel
> > > > > > > > > configuration.
> > > > > > > > > 
> > > > > > > > > 1st configuration: defconfig plus:
> > > > > > > > > 
> > > > > > > > > CONFIG_VIRTIO_BLK=y
> > > > > > > > > CONFIG_SCSI_LOWLEVEL=y
> > > > > > > > > CONFIG_SCSI_VIRTIO=y
> > > > > > > > > CONFIG_VIRTIO_PCI=y
> > > > > > > > > CONFIG_VIRTIO_MMIO=y
> > > > > > > > > CONFIG_DEBUG_OBJECTS=y
> > > > > > > > > CONFIG_DEBUG_OBJECTS_FREE=y
> > > > > > > > > CONFIG_DEBUG_OBJECTS_TIMERS=y
> > > > > > > > > CONFIG_DEBUG_KOBJECT=y
> > > > > > > > > CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > > > > > > > 
> > > > > > > > > Bisecting this configuration points to commit b5b6e8c8d3b4 ("scsi:
> > > > > > > > > virtio_scsi: fix IO hang caused by automatic irq vector affinity")
> > > > > > > > > as the culprit. This commit enforces the use of blk-mq for virtio_scsi;
> > > > > > > > > the bisect log is therefore a bit misleading.
> > > > > > > > > 
> > > > > > > > > 2nd configuration: As above, plus:
> > > > > > > > > CONFIG_SCSI_MQ_DEFAULT=y
> > > > > > > > > 
> > > > > > > > > With this configuration, bisect points to commit 7ea5fe31c12d
> > > > > > > > > ("blk-mq: make lifetime consitent between q/ctx and its kobject").
> > > > > > > > > 
> > > > > > > > > The qemu command line used to reproduce the problem is as follows.
> > > > > > > > > The qemu version does not seem to matter (I tested with qemu versions
> > > > > > > > > 2.5, 2.11, and 3.0).
> > > > > > > > > 
> > > > > > > > > qemu-system-x86_64 \
> > > > > > > > > 	-kernel arch/x86/boot/bzImage \
> > > > > > > > > 	-device virtio-scsi-pci,id=scsi \
> > > > > > > > > 	-device scsi-hd,bus=scsi.0,drive=d0 \
> > > > > > > > > 	-drive file=./wheezy.img,format=raw,if=none,id=d0 \
> > > > > > > > > 	-snapshot \
> > > > > > > > > 	-m 2G -smp 4 -enable-kvm \
> > > > > > > > > 	-net user,host=10.0.2.10,hostfwd=tcp::10022-:22 -net nic \
> > > > > > > > > 	-nographic -monitor none \
> > > > > > > > > 	-no-reboot \
> > > > > > > > > 	-append "console=ttyS0 root=/dev/sda earlyprintk=serial panic_on_warn=1 panic=-1"
> > > > > > > > > 
> > > > > > > > > Root file system:
> > > > > > > > > 	https://storage.googleapis.com/syzkaller/wheezy.img
> > > > > > > > > or:
> > > > > > > > > 	https://github.com/groeck/linux-build-test/blob/master/rootfs/x86_64/rootfs.ext2.gz
> > > > > > > > > 
> > > > > > > > > It seems that any root file system can be used to reproduce the problem.
> > > > > > > > > Also, the problem is not limited to virtio-scsi-pci. I also tried
> > > > > > > > > am53c974, lsi53c810, and lsi53c895a (after enabling the respective
> > > > > > > > > drivers), with the same result.
> > > > > > > > > 
> > > > > > > > > Overall, this suggests that the problem is related to blk-mq and was
> > > > > > > > > indeed introduced with commit 7ea5fe31c12d.
> > > > > > > > > 
> > > > > > > > > For reference, I attached an actual crash log as well as a set of bisect
> > > > > > > > > logs below.
> > > > > > > > > 
> > > > > > > > > Unfortunately, my understanding of blk-mq is not good enough to find the
> > > > > > > > > underlying problem and suggest a fix. Please let me know if there is
> > > > > > > > > anything else I can do to help fixing the problem.
> > > > > > > > > 
> > > > > > > > > Note that the problem was originally reported by syzbot running a test
> > > > > > > > > on chromeos-4.14. It may well be that the problem has already been
> > > > > > > > > reported and is being worked on. If so, please apologize the noise.
> > > > > > > > > 
> > > > > > > > > Thanks,
> > > > > > > > > Guenter
> > > > > > > > > 
> > > > > > > > > ---
> > > > > > > > > [    8.700038] ------------[ cut here ]------------
> > > > > > > > > [    8.700376] ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10
> > > > > > > > > [    8.701032] WARNING: CPU: 0 PID: 0 at lib/debugobjects.c:329 debug_print_object+0x6a/0x80
> > > > > > > > > [    8.701032] Kernel panic - not syncing: panic_on_warn set ...
> > > > > > > > > [    8.701032] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.20.0-rc1 #30
> > > > > > > > > [    8.701032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > > > > > > > > [    8.701032] Call Trace:
> > > > > > > > > [    8.701032]  <IRQ>
> > > > > > > > > [    8.701032]  dump_stack+0x46/0x5b
> > > > > > > > > [    8.701032]  panic+0xf3/0x246
> > > > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > > > [    8.701032]  __warn+0xeb/0xf0
> > > > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > > > [    8.701032]  report_bug+0xb1/0x120
> > > > > > > > > [    8.701032]  fixup_bug.part.10+0x13/0x30
> > > > > > > > > [    8.701032]  do_error_trap+0x8f/0xb0
> > > > > > > > > [    8.701032]  do_invalid_op+0x31/0x40
> > > > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > > > [    8.701032]  invalid_op+0x14/0x20
> > > > > > > > > [    8.701032] RIP: 0010:debug_print_object+0x6a/0x80
> > > > > > > > > [    8.701032] Code: 8b 43 10 83 c2 01 8b 4b 14 89 15 c1 e2 78 01 4c 8b 45 00 4c
> > > > > > > > > 89 e6 48 c7 c7 f0 ca 41 a6 48 8b 14 c5 a0 f7 24 a6 e8 06 8e cc ff <0f> 0b 5b 83
> > > > > > > > > 05 70 47 1a 01 01 5d 41 5c c3 83 05 65 47 1a 01 01 c3
> > > > > > > > > [    8.701032] RSP: 0018:ffff89b9fda03e48 EFLAGS: 00010082
> > > > > > > > > [    8.701032] RAX: 0000000000000000 RBX: ffff89b9fd49d0c8 RCX: ffffffffa6646e18
> > > > > > > > > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000082 RDI: ffffffffa6cc596c
> > > > > > > > > [    8.701032] RBP: ffffffffa664a5c0 R08: 79616c6564203a74 R09: 5f6b726f775f6465
> > > > > > > > > [    8.701032] R10: ffff89b9fda03f10 R11: 3178302f3078302b R12: ffffffffa6403e22
> > > > > > > > > [    8.701032] R13: 0000000000000001 R14: ffffffffa6d4d458 R15: ffff89b9fc4c8780
> > > > > > > > > [    8.701032]  ? debug_print_object+0x6a/0x80
> > > > > > > > > [    8.701032]  debug_check_no_obj_freed+0x1b8/0x1ea
> > > > > > > > > [    8.701032]  ? rcu_process_callbacks+0x2a7/0x750
> > > > > > > > > [    8.701032]  kmem_cache_free+0x6e/0x1a0
> > > > > > > > > [    8.701032]  ? __blk_release_queue+0x150/0x150
> > > > > > > > > [    8.701032]  rcu_process_callbacks+0x2a7/0x750
> > > > > > > > > [    8.701032]  __do_softirq+0xf2/0x2c7
> > > > > > > > > [    8.701032]  irq_exit+0xb7/0xc0
> > > > > > > > > [    8.701032]  smp_apic_timer_interrupt+0x67/0x130
> > > > > > > > > [    8.701032]  apic_timer_interrupt+0xf/0x20
> > > > > > > > > [    8.701032]  </IRQ>
> > > > > > > > > [    8.701032] RIP: 0010:default_idle+0x12/0x140
> > > > > > > > > [    8.701032] Code: fe ff ff c6 43 08 00 fb eb b0 31 c0 eb b4 e8 65 c8 60 ff 90
> > > > > > > > > 90 90 90 90 41 54 55 53 65 8b 2d 15 c4 3b 5a 66 66 66 66 90 fb f4 <65> 8b 2d 07
> > > > > > > > > c4 3b 5a 66 66 66 66 90 5b 5d 41 5c c3 65 8b 05 f6 c3
> > > > > > > > > [    8.701032] RSP: 0018:ffffffffa6603e98 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > > > > > > > > [    8.701032] RAX: ffffffffa5c52d10 RBX: 0000000000000000 RCX: 0000000000000000
> > > > > > > > > [    8.701032] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000020605092a
> > > > > > > > > [    8.701032] RBP: 0000000000000000 R08: ffffffffcff7766a R09: ffff89b9fda20b00
> > > > > > > > > [    8.701032] R10: 0000000000000000 R11: 0000000000002400 R12: ffffffffa6611740
> > > > > > > > > [    8.701032] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffa6611740
> > > > > > > > > [    8.701032]  ? __sched_text_end+0x5/0x5
> > > > > > > > > [    8.701032]  do_idle+0x19c/0x230
> > > > > > > > > [    8.701032]  cpu_startup_entry+0x14/0x20
> > > > > > > > > [    8.701032]  start_kernel+0x491/0x4b1
> > > > > > > > > [    8.701032]  secondary_startup_64+0xa4/0xb0
> > > > > > > > > [    8.701032] Kernel Offset: 0x24200000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > That is simply caused by enabling DEBUG_KOBJECT_RELEASE itself, which
> > > > > > > > tries to do kboject_release() after delaying a while.
> > > > > > > > 
> > > > > > > > Wrt. this issue, q->mq_kobj is embedded in 'request_queue' whose
> > > > > > > > instance is freed in release handler of q->kobj. So when one instance
> > > > > > > > of 'request_queue' is freed, DEBUG_KOBJECT_RELEASE still expects that
> > > > > > > > the q->mq_kobj is active.
> > > > > > > > 
> > > > > > > > The release handler of q->mq_kobj is NOP actually, so in reality it won't
> > > > > > > > be one issue at all.
> > > > > > > 
> > > > > > > Given we have removed legacy IO path completely, it should be fine to
> > > > > > > remove q->mq_kobj and simply put everything under the kobj of queue,
> > > > > > > especially any attributes under mq aren't marked as stable ABI.
> > > > > > 
> > > > > > That might be possible going forward, but that does not appear to be
> > > > > > a feasible solution for stable releases. I am not sure though if the
> > > > > > "stable ABI" argument is valid. Almost all sysfs attributes are not
> > > > > > marked stable. If there are applications using it, removing the atttributes
> > > > > > would still break userspace.
> > > > > 
> > > > > Yeah, don't move attributes around for no good reason, you will make
> > > > > userspace grumpy :)
> > > > 
> > > > There are very less attributes left under /sys/block/$DEV/mq.
> > > > 
> > > > > 
> > > > > > Since the release function of mq_kobj is NOP (for reference, I assume
> > > > > > we are talking about blk_mq_sysfs_release), can it just be NULL ?
> > > > > 
> > > > > Someone tried to work around the internal kernel warning that comes
> > > > > about when you set a release function to NULL.  As per the in-kernel
> > > > > documentation, I now get to make fun of that author by saying "do you
> > > > > think I added those checks and message just for no good reason?  Don't
> > > > > try to be smarter than the driver core by providing an empty release
> > > > > function to make the kernel be quiet.  That's NOT what the kernel is
> > > > > trying to tell you here at all."
> > > > > 
> > > > > So fix this, that is the correct thing to do.  Memory is freed in a
> > > > > release function, to not do so is almost always a sign of a design bug.
> > > > 
> > > > The root cause is that both q->kobj and q->mq_kobj are bound to lifetime
> > > > of same 'request_queue' instance.
> > > > 
> > > > We may allocate q->mq_kobj dynamically to fix this issue.
> > > > 
> > > > And now all drivers are converted to mq, and it is really necessary to keep
> > > > 'q->mq_kobj'
> > > > 
> > > > > 
> > > > > > That it is an empty function seems to be just as good or bad as having
> > > > > > no function at all. An empty function just avoids the "broken" debug
> > > > > > message. That message is still better than rendering DEBUG_KOBJECT_RELEASE
> > > > > > useless.
> > > > > > 
> > > > > > Copying GregKH for input.
> > > > > 
> > > > > Thanks, this code is broken and needs to be fixed.
> > > > 
> > > > I will post a patch to address the 'issue'.
> > > 
> > > Not only q->mq_kobj, there is also each 'struct blk_mq_ctx' which is
> > > referenced via percpu variable of q->queue_ctx, we still may introduce
> > > one indirect table and allocate each blk_mq_ctx dynamically just for
> > > making DEBUG_KOBJECT_RELEASE happy. But code can become not readable
> > > as before.
> > > 
> > > Just wondering if it is possible to deal with this kind of shared
> > > lifetime issue among kobjects.
> > > 
> > > In this case, q->mq_kobj, every 'struct blk_mq_ctx' referenced via
> > > q->queue_ctx and q->kobj share same lifetime, it is fine to just
> > > release them in one release handler.
> > > 
> > > I will think further and see if it can be done in clean/simple way.
> > 
> > Hi Guenter, Greg and Guys,
> > 
> > What do you think about the following approach?
> > 
> > ---
> > diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
> > index 2d737f9e7ba7..e15f5760ea1f 100644
> > --- a/block/blk-mq-sysfs.c
> > +++ b/block/blk-mq-sysfs.c
> > @@ -298,10 +298,12 @@ void blk_mq_sysfs_init(struct request_queue *q)
> >  	int cpu;
> >  
> >  	kobject_init(&q->mq_kobj, &blk_mq_ktype);
> > +	kobject_mark_no_delay_release(&q->mq_kobj);
> >  
> >  	for_each_possible_cpu(cpu) {
> >  		ctx = per_cpu_ptr(q->queue_ctx, cpu);
> >  		kobject_init(&ctx->kobj, &blk_mq_ctx_ktype);
> > +		kobject_mark_no_delay_release(&ctx->kobj);
> >  	}
> >  }
> >  
> > diff --git a/include/linux/kobject.h b/include/linux/kobject.h
> > index 1ab0d624fb36..a8857366fb76 100644
> > --- a/include/linux/kobject.h
> > +++ b/include/linux/kobject.h
> > @@ -78,6 +78,7 @@ struct kobject {
> >  	unsigned int state_add_uevent_sent:1;
> >  	unsigned int state_remove_uevent_sent:1;
> >  	unsigned int uevent_suppress:1;
> > +	unsigned int no_delay_release:1;
> >  };
> >  
> >  extern __printf(2, 3)
> > @@ -136,6 +137,16 @@ static inline bool kobject_has_children(struct kobject *kobj)
> >  	return kobj->sd && kobj->sd->dir.subdirs;
> >  }
> >  
> > +/*
> > + * DEBUG_KOBJECT_RELEASE may break case in which more than one kboject
> > + * share one same external object, and have same lifetime, so these
> > + * users may tell kobject core to not do delay release via this helper.
> > + */
> > +static inline void kobject_mark_no_delay_release(struct kobject *kobj)
> > +{
> > +	kobj->no_delay_release = 1;
> > +}
> 
> Ick, no, that's not ok.  "delay release" is a debugging tool, there is
> nothing keeping userspace from grabbing a reference to your kobject and
> holding on to it to emulate this "delay" test.
> 
> So even if you think the kernel is not going to do this, remember, you
> have no control over it.  Reference counted objects are done this way
> for a reason, you really do not know who has a reference and you really
> do not care.
> 
> You are just papering over the real issue here, see my previous email
> for how to start working on resolving it.

IMO, there isn't real issue, and the issue is actually in 'delay release'.

Please look at the code in block/blk-mq-sysfs.c, both q->mq_kobj and all
ctx->kobj share same lifetime with q->kobj, we even don't call get/put
on q->mq_kobj & all ctx->kobj, and all are simply released in q->kobj's
release handler.

So in short, if DEBUG_KOBJECT_RELEASE won't play the delay release
trick, there shouldn't be any issue.


thanks,
Ming

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kobject lifetime issues in blk-mq
  2018-11-15  0:36                 ` Ming Lei
@ 2018-11-15  0:56                   ` Greg Kroah-Hartman
  2018-11-20 11:34                     ` Dmitry Vyukov
  0 siblings, 1 reply; 18+ messages in thread
From: Greg Kroah-Hartman @ 2018-11-15  0:56 UTC (permalink / raw)
  To: Ming Lei
  Cc: Guenter Roeck, Ming Lei, Jens Axboe, Peter Zijlstra, linux-block,
	linux-kernel, Hannes Reinecke, Paolo Bonzini, Christoph Hellwig,
	Martin K. Petersen, James E.J. Bottomley, linux-scsi

On Thu, Nov 15, 2018 at 08:36:17AM +0800, Ming Lei wrote:
> > So even if you think the kernel is not going to do this, remember, you
> > have no control over it.  Reference counted objects are done this way
> > for a reason, you really do not know who has a reference and you really
> > do not care.
> > 
> > You are just papering over the real issue here, see my previous email
> > for how to start working on resolving it.
> 
> IMO, there isn't real issue, and the issue is actually in 'delay release'.

Nope, sorry, that is not true.

> Please look at the code in block/blk-mq-sysfs.c, both q->mq_kobj and all
> ctx->kobj share same lifetime with q->kobj, we even don't call get/put
> on q->mq_kobj & all ctx->kobj, and all are simply released in q->kobj's
> release handler.

How do you "know" you are keeping those lifetimes in sync?  The joy of a
kobject is that _ANYTHING_ can grab a reference to your object without
you knowing about it.  That includes userspace programs.  Yes, sysfs is
now much better and it trys to release that reference "quickly" when it
determines you are trying to delete a kobject, but it's not perfict,
there are still races there.

And that is what the delay release code is showing you.  It is showing
you that you "think" your reference counting is wrong, but it is not.
It is showing you that if someone else grabs a reference, you are not
correctly cleaning up for yourself.

Never think that you really know the lifetime of a kobject, once you
realize that your code gets simpler and you can then just "trust" that
the kernel will do the right thing no matter what.

Because really, you are using a kobject because you want that correct
reference counting logic.  By ignoring that logic, you are ignoring the
reason to be using that object at all.  If you don't need reference
counting, then don't use it at all.

And if you need sysfs files, then you need to use the kobject and then
you need to handle it properly, because again, you do NOT have full
control over the lifetime of your object.  That's the basis for
reference counting in the firstplace.

So this code is broken without me evening having to look at it, please
fix it to handle release properly.  Again, the kernel tried to tell you
this, but you hacked around the kernel core to remove that warning
incorrectly.  Please go read the kobject documentation again for even
more details about this than what I said here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kobject lifetime issues in blk-mq
  2018-11-15  0:56                   ` Greg Kroah-Hartman
@ 2018-11-20 11:34                     ` Dmitry Vyukov
  2018-11-20 12:05                       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 18+ messages in thread
From: Dmitry Vyukov @ 2018-11-20 11:34 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Ming Lei, Guenter Roeck, Ming Lei, Jens Axboe, Peter Zijlstra,
	linux-block, LKML, Hannes Reinecke, Paolo Bonzini,
	Christoph Hellwig, Martin K. Petersen, James E.J. Bottomley,
	linux-scsi

On Thu, Nov 15, 2018 at 1:56 AM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Thu, Nov 15, 2018 at 08:36:17AM +0800, Ming Lei wrote:
>> > So even if you think the kernel is not going to do this, remember, you
>> > have no control over it.  Reference counted objects are done this way
>> > for a reason, you really do not know who has a reference and you really
>> > do not care.
>> >
>> > You are just papering over the real issue here, see my previous email
>> > for how to start working on resolving it.
>>
>> IMO, there isn't real issue, and the issue is actually in 'delay release'.
>
> Nope, sorry, that is not true.
>
>> Please look at the code in block/blk-mq-sysfs.c, both q->mq_kobj and all
>> ctx->kobj share same lifetime with q->kobj, we even don't call get/put
>> on q->mq_kobj & all ctx->kobj, and all are simply released in q->kobj's
>> release handler.
>
> How do you "know" you are keeping those lifetimes in sync?  The joy of a
> kobject is that _ANYTHING_ can grab a reference to your object without
> you knowing about it.  That includes userspace programs.  Yes, sysfs is
> now much better and it trys to release that reference "quickly" when it
> determines you are trying to delete a kobject, but it's not perfict,
> there are still races there.
>
> And that is what the delay release code is showing you.  It is showing
> you that you "think" your reference counting is wrong, but it is not.
> It is showing you that if someone else grabs a reference, you are not
> correctly cleaning up for yourself.
>
> Never think that you really know the lifetime of a kobject, once you
> realize that your code gets simpler and you can then just "trust" that
> the kernel will do the right thing no matter what.
>
> Because really, you are using a kobject because you want that correct
> reference counting logic.  By ignoring that logic, you are ignoring the
> reason to be using that object at all.  If you don't need reference
> counting, then don't use it at all.
>
> And if you need sysfs files, then you need to use the kobject and then
> you need to handle it properly, because again, you do NOT have full
> control over the lifetime of your object.  That's the basis for
> reference counting in the firstplace.
>
> So this code is broken without me evening having to look at it, please
> fix it to handle release properly.  Again, the kernel tried to tell you
> this, but you hacked around the kernel core to remove that warning
> incorrectly.  Please go read the kobject documentation again for even
> more details about this than what I said here.
>
> thanks,
>
> greg k-h

Whoever is the right person to fix this, please prioritize this to the
degree possible.
This issue does not allow to use DEBUG_KOBJECT_RELEASE in any
automated testing (in particular syzbot) on both upstream and stable
trees. We have to disable it for now, so other bugs won't be noticed
and will pile up.

Thanks

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kobject lifetime issues in blk-mq
  2018-11-20 11:34                     ` Dmitry Vyukov
@ 2018-11-20 12:05                       ` Greg Kroah-Hartman
  2018-11-20 12:53                         ` Dmitry Vyukov
  0 siblings, 1 reply; 18+ messages in thread
From: Greg Kroah-Hartman @ 2018-11-20 12:05 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ming Lei, Guenter Roeck, Ming Lei, Jens Axboe, Peter Zijlstra,
	linux-block, LKML, Hannes Reinecke, Paolo Bonzini,
	Christoph Hellwig, Martin K. Petersen, James E.J. Bottomley,
	linux-scsi

On Tue, Nov 20, 2018 at 12:34:40PM +0100, Dmitry Vyukov wrote:
> On Thu, Nov 15, 2018 at 1:56 AM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Thu, Nov 15, 2018 at 08:36:17AM +0800, Ming Lei wrote:
> >> > So even if you think the kernel is not going to do this, remember, you
> >> > have no control over it.  Reference counted objects are done this way
> >> > for a reason, you really do not know who has a reference and you really
> >> > do not care.
> >> >
> >> > You are just papering over the real issue here, see my previous email
> >> > for how to start working on resolving it.
> >>
> >> IMO, there isn't real issue, and the issue is actually in 'delay release'.
> >
> > Nope, sorry, that is not true.
> >
> >> Please look at the code in block/blk-mq-sysfs.c, both q->mq_kobj and all
> >> ctx->kobj share same lifetime with q->kobj, we even don't call get/put
> >> on q->mq_kobj & all ctx->kobj, and all are simply released in q->kobj's
> >> release handler.
> >
> > How do you "know" you are keeping those lifetimes in sync?  The joy of a
> > kobject is that _ANYTHING_ can grab a reference to your object without
> > you knowing about it.  That includes userspace programs.  Yes, sysfs is
> > now much better and it trys to release that reference "quickly" when it
> > determines you are trying to delete a kobject, but it's not perfict,
> > there are still races there.
> >
> > And that is what the delay release code is showing you.  It is showing
> > you that you "think" your reference counting is wrong, but it is not.
> > It is showing you that if someone else grabs a reference, you are not
> > correctly cleaning up for yourself.
> >
> > Never think that you really know the lifetime of a kobject, once you
> > realize that your code gets simpler and you can then just "trust" that
> > the kernel will do the right thing no matter what.
> >
> > Because really, you are using a kobject because you want that correct
> > reference counting logic.  By ignoring that logic, you are ignoring the
> > reason to be using that object at all.  If you don't need reference
> > counting, then don't use it at all.
> >
> > And if you need sysfs files, then you need to use the kobject and then
> > you need to handle it properly, because again, you do NOT have full
> > control over the lifetime of your object.  That's the basis for
> > reference counting in the firstplace.
> >
> > So this code is broken without me evening having to look at it, please
> > fix it to handle release properly.  Again, the kernel tried to tell you
> > this, but you hacked around the kernel core to remove that warning
> > incorrectly.  Please go read the kobject documentation again for even
> > more details about this than what I said here.
> >
> > thanks,
> >
> > greg k-h
> 
> Whoever is the right person to fix this, please prioritize this to the
> degree possible.
> This issue does not allow to use DEBUG_KOBJECT_RELEASE in any
> automated testing (in particular syzbot) on both upstream and stable
> trees. We have to disable it for now, so other bugs won't be noticed
> and will pile up.

Patches for this have already been posted :)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kobject lifetime issues in blk-mq
  2018-11-20 12:05                       ` Greg Kroah-Hartman
@ 2018-11-20 12:53                         ` Dmitry Vyukov
  2018-11-20 13:27                           ` Greg Kroah-Hartman
  2018-11-20 13:28                           ` Ming Lei
  0 siblings, 2 replies; 18+ messages in thread
From: Dmitry Vyukov @ 2018-11-20 12:53 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Ming Lei, Guenter Roeck, Ming Lei, Jens Axboe, Peter Zijlstra,
	linux-block, LKML, Hannes Reinecke, Paolo Bonzini,
	Christoph Hellwig, Martin K. Petersen, James E.J. Bottomley,
	linux-scsi

On Tue, Nov 20, 2018 at 1:05 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Tue, Nov 20, 2018 at 12:34:40PM +0100, Dmitry Vyukov wrote:
>> On Thu, Nov 15, 2018 at 1:56 AM, Greg Kroah-Hartman
>> <gregkh@linuxfoundation.org> wrote:
>> > On Thu, Nov 15, 2018 at 08:36:17AM +0800, Ming Lei wrote:
>> >> > So even if you think the kernel is not going to do this, remember, you
>> >> > have no control over it.  Reference counted objects are done this way
>> >> > for a reason, you really do not know who has a reference and you really
>> >> > do not care.
>> >> >
>> >> > You are just papering over the real issue here, see my previous email
>> >> > for how to start working on resolving it.
>> >>
>> >> IMO, there isn't real issue, and the issue is actually in 'delay release'.
>> >
>> > Nope, sorry, that is not true.
>> >
>> >> Please look at the code in block/blk-mq-sysfs.c, both q->mq_kobj and all
>> >> ctx->kobj share same lifetime with q->kobj, we even don't call get/put
>> >> on q->mq_kobj & all ctx->kobj, and all are simply released in q->kobj's
>> >> release handler.
>> >
>> > How do you "know" you are keeping those lifetimes in sync?  The joy of a
>> > kobject is that _ANYTHING_ can grab a reference to your object without
>> > you knowing about it.  That includes userspace programs.  Yes, sysfs is
>> > now much better and it trys to release that reference "quickly" when it
>> > determines you are trying to delete a kobject, but it's not perfict,
>> > there are still races there.
>> >
>> > And that is what the delay release code is showing you.  It is showing
>> > you that you "think" your reference counting is wrong, but it is not.
>> > It is showing you that if someone else grabs a reference, you are not
>> > correctly cleaning up for yourself.
>> >
>> > Never think that you really know the lifetime of a kobject, once you
>> > realize that your code gets simpler and you can then just "trust" that
>> > the kernel will do the right thing no matter what.
>> >
>> > Because really, you are using a kobject because you want that correct
>> > reference counting logic.  By ignoring that logic, you are ignoring the
>> > reason to be using that object at all.  If you don't need reference
>> > counting, then don't use it at all.
>> >
>> > And if you need sysfs files, then you need to use the kobject and then
>> > you need to handle it properly, because again, you do NOT have full
>> > control over the lifetime of your object.  That's the basis for
>> > reference counting in the firstplace.
>> >
>> > So this code is broken without me evening having to look at it, please
>> > fix it to handle release properly.  Again, the kernel tried to tell you
>> > this, but you hacked around the kernel core to remove that warning
>> > incorrectly.  Please go read the kobject documentation again for even
>> > more details about this than what I said here.
>> >
>> > thanks,
>> >
>> > greg k-h
>>
>> Whoever is the right person to fix this, please prioritize this to the
>> degree possible.
>> This issue does not allow to use DEBUG_KOBJECT_RELEASE in any
>> automated testing (in particular syzbot) on both upstream and stable
>> trees. We have to disable it for now, so other bugs won't be noticed
>> and will pile up.
>
> Patches for this have already been posted :)


This is great.
What is the patch name? I can't find anything that looks relevant on
LKML searching by kobject.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kobject lifetime issues in blk-mq
  2018-11-20 12:53                         ` Dmitry Vyukov
@ 2018-11-20 13:27                           ` Greg Kroah-Hartman
  2018-11-20 13:28                           ` Ming Lei
  1 sibling, 0 replies; 18+ messages in thread
From: Greg Kroah-Hartman @ 2018-11-20 13:27 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Ming Lei, Guenter Roeck, Ming Lei, Jens Axboe, Peter Zijlstra,
	linux-block, LKML, Hannes Reinecke, Paolo Bonzini,
	Christoph Hellwig, Martin K. Petersen, James E.J. Bottomley,
	linux-scsi

On Tue, Nov 20, 2018 at 01:53:47PM +0100, Dmitry Vyukov wrote:
> On Tue, Nov 20, 2018 at 1:05 PM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Tue, Nov 20, 2018 at 12:34:40PM +0100, Dmitry Vyukov wrote:
> >> On Thu, Nov 15, 2018 at 1:56 AM, Greg Kroah-Hartman
> >> <gregkh@linuxfoundation.org> wrote:
> >> > On Thu, Nov 15, 2018 at 08:36:17AM +0800, Ming Lei wrote:
> >> >> > So even if you think the kernel is not going to do this, remember, you
> >> >> > have no control over it.  Reference counted objects are done this way
> >> >> > for a reason, you really do not know who has a reference and you really
> >> >> > do not care.
> >> >> >
> >> >> > You are just papering over the real issue here, see my previous email
> >> >> > for how to start working on resolving it.
> >> >>
> >> >> IMO, there isn't real issue, and the issue is actually in 'delay release'.
> >> >
> >> > Nope, sorry, that is not true.
> >> >
> >> >> Please look at the code in block/blk-mq-sysfs.c, both q->mq_kobj and all
> >> >> ctx->kobj share same lifetime with q->kobj, we even don't call get/put
> >> >> on q->mq_kobj & all ctx->kobj, and all are simply released in q->kobj's
> >> >> release handler.
> >> >
> >> > How do you "know" you are keeping those lifetimes in sync?  The joy of a
> >> > kobject is that _ANYTHING_ can grab a reference to your object without
> >> > you knowing about it.  That includes userspace programs.  Yes, sysfs is
> >> > now much better and it trys to release that reference "quickly" when it
> >> > determines you are trying to delete a kobject, but it's not perfict,
> >> > there are still races there.
> >> >
> >> > And that is what the delay release code is showing you.  It is showing
> >> > you that you "think" your reference counting is wrong, but it is not.
> >> > It is showing you that if someone else grabs a reference, you are not
> >> > correctly cleaning up for yourself.
> >> >
> >> > Never think that you really know the lifetime of a kobject, once you
> >> > realize that your code gets simpler and you can then just "trust" that
> >> > the kernel will do the right thing no matter what.
> >> >
> >> > Because really, you are using a kobject because you want that correct
> >> > reference counting logic.  By ignoring that logic, you are ignoring the
> >> > reason to be using that object at all.  If you don't need reference
> >> > counting, then don't use it at all.
> >> >
> >> > And if you need sysfs files, then you need to use the kobject and then
> >> > you need to handle it properly, because again, you do NOT have full
> >> > control over the lifetime of your object.  That's the basis for
> >> > reference counting in the firstplace.
> >> > > >> > So this code is broken without me evening having to look at it, please
> >> > fix it to handle release properly.  Again, the kernel tried to tell you
> >> > this, but you hacked around the kernel core to remove that warning
> >> > incorrectly.  Please go read the kobject documentation again for even
> >> > more details about this than what I said here.
> >> >
> >> > thanks,
> >> >
> >> > greg k-h
> >>
> >> Whoever is the right person to fix this, please prioritize this to the
> >> degree possible.
> >> This issue does not allow to use DEBUG_KOBJECT_RELEASE in any
> >> automated testing (in particular syzbot) on both upstream and stable
> >> trees. We have to disable it for now, so other bugs won't be noticed
> >> and will pile up.
> >
> > Patches for this have already been posted :)
> 
> 
> This is great.
> What is the patch name? I can't find anything that looks relevant on
> LKML searching by kobject.

Subject: [PATCH 4.21 V3] blk-mq: not embed .mq_kobj and ctx->kobj into queue instance

Hope this helps,

greg k-h

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: kobject lifetime issues in blk-mq
  2018-11-20 12:53                         ` Dmitry Vyukov
  2018-11-20 13:27                           ` Greg Kroah-Hartman
@ 2018-11-20 13:28                           ` Ming Lei
  1 sibling, 0 replies; 18+ messages in thread
From: Ming Lei @ 2018-11-20 13:28 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: Greg Kroah-Hartman, Guenter Roeck, Ming Lei, Jens Axboe,
	Peter Zijlstra, linux-block, LKML, Hannes Reinecke,
	Paolo Bonzini, Christoph Hellwig, Martin K. Petersen,
	James E.J. Bottomley, linux-scsi

On Tue, Nov 20, 2018 at 01:53:47PM +0100, Dmitry Vyukov wrote:
> On Tue, Nov 20, 2018 at 1:05 PM, Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> > On Tue, Nov 20, 2018 at 12:34:40PM +0100, Dmitry Vyukov wrote:
> >> On Thu, Nov 15, 2018 at 1:56 AM, Greg Kroah-Hartman
> >> <gregkh@linuxfoundation.org> wrote:
> >> > On Thu, Nov 15, 2018 at 08:36:17AM +0800, Ming Lei wrote:
> >> >> > So even if you think the kernel is not going to do this, remember, you
> >> >> > have no control over it.  Reference counted objects are done this way
> >> >> > for a reason, you really do not know who has a reference and you really
> >> >> > do not care.
> >> >> >
> >> >> > You are just papering over the real issue here, see my previous email
> >> >> > for how to start working on resolving it.
> >> >>
> >> >> IMO, there isn't real issue, and the issue is actually in 'delay release'.
> >> >
> >> > Nope, sorry, that is not true.
> >> >
> >> >> Please look at the code in block/blk-mq-sysfs.c, both q->mq_kobj and all
> >> >> ctx->kobj share same lifetime with q->kobj, we even don't call get/put
> >> >> on q->mq_kobj & all ctx->kobj, and all are simply released in q->kobj's
> >> >> release handler.
> >> >
> >> > How do you "know" you are keeping those lifetimes in sync?  The joy of a
> >> > kobject is that _ANYTHING_ can grab a reference to your object without
> >> > you knowing about it.  That includes userspace programs.  Yes, sysfs is
> >> > now much better and it trys to release that reference "quickly" when it
> >> > determines you are trying to delete a kobject, but it's not perfict,
> >> > there are still races there.
> >> >
> >> > And that is what the delay release code is showing you.  It is showing
> >> > you that you "think" your reference counting is wrong, but it is not.
> >> > It is showing you that if someone else grabs a reference, you are not
> >> > correctly cleaning up for yourself.
> >> >
> >> > Never think that you really know the lifetime of a kobject, once you
> >> > realize that your code gets simpler and you can then just "trust" that
> >> > the kernel will do the right thing no matter what.
> >> >
> >> > Because really, you are using a kobject because you want that correct
> >> > reference counting logic.  By ignoring that logic, you are ignoring the
> >> > reason to be using that object at all.  If you don't need reference
> >> > counting, then don't use it at all.
> >> >
> >> > And if you need sysfs files, then you need to use the kobject and then
> >> > you need to handle it properly, because again, you do NOT have full
> >> > control over the lifetime of your object.  That's the basis for
> >> > reference counting in the firstplace.
> >> >
> >> > So this code is broken without me evening having to look at it, please
> >> > fix it to handle release properly.  Again, the kernel tried to tell you
> >> > this, but you hacked around the kernel core to remove that warning
> >> > incorrectly.  Please go read the kobject documentation again for even
> >> > more details about this than what I said here.
> >> >
> >> > thanks,
> >> >
> >> > greg k-h
> >>
> >> Whoever is the right person to fix this, please prioritize this to the
> >> degree possible.
> >> This issue does not allow to use DEBUG_KOBJECT_RELEASE in any
> >> automated testing (in particular syzbot) on both upstream and stable
> >> trees. We have to disable it for now, so other bugs won't be noticed
> >> and will pile up.
> >
> > Patches for this have already been posted :)
> 
> 
> This is great.
> What is the patch name? I can't find anything that looks relevant on
> LKML searching by kobject.

https://marc.info/?l=linux-block&m=154270006101625&w=2

Thanks,
Ming

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-11-20 13:29 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-09 20:35 kobject lifetime issues in blk-mq Guenter Roeck
2018-11-12  9:20 ` Ming Lei
2018-11-12  9:44   ` Ming Lei
2018-11-12 16:48     ` Guenter Roeck
2018-11-12 20:02       ` Greg Kroah-Hartman
2018-11-13  0:22         ` Ming Lei
2018-11-13  0:41           ` Ming Lei
2018-11-14 11:08             ` Ming Lei
2018-11-14 15:14               ` Greg Kroah-Hartman
2018-11-15  0:36                 ` Ming Lei
2018-11-15  0:56                   ` Greg Kroah-Hartman
2018-11-20 11:34                     ` Dmitry Vyukov
2018-11-20 12:05                       ` Greg Kroah-Hartman
2018-11-20 12:53                         ` Dmitry Vyukov
2018-11-20 13:27                           ` Greg Kroah-Hartman
2018-11-20 13:28                           ` Ming Lei
2018-11-14 15:12             ` Greg Kroah-Hartman
2018-11-12 16:16   ` Guenter Roeck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).