linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
@ 2012-01-20 18:56 Jeff Layton
  2012-01-22  8:46 ` Stephen Boyd
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Layton @ 2012-01-20 18:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: bfields, sgruszka

Help! I'm working on some patches that overhaul the client state
tracking on stable storage for knfsd. With this patchset, when I start
nfsd it does an upcall to userspace to get some info. That upcall is
done via rpc_pipefs.

What I'm finding though is that if I stop and start nfsd multiple
times, then I eventually get a warning like this on start:

[ 5157.128514] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
[ 5157.128742] Hardware name: Bochs
[ 5157.128742] ODEBUG: activate not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
[ 5157.128742] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
[ 5157.128742] Pid: 1312, comm: rpc.nfsd Tainted: G        W  O 3.3.0-rc1+ #1
[ 5157.128742] Call Trace:
[ 5157.128742]  [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
[ 5157.128742]  [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
[ 5157.128742]  [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
[ 5157.128742]  [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
[ 5157.128742]  [<ffffffff8132c02b>] debug_object_activate+0xfb/0x190
[ 5157.128742]  [<ffffffff81072728>] ? lock_timer_base.isra.24+0x38/0x70
[ 5157.128742]  [<ffffffff81074676>] mod_timer+0xf6/0x450
[ 5157.128742]  [<ffffffff810749e8>] add_timer+0x18/0x20
[ 5157.128742]  [<ffffffff8108168e>] queue_delayed_work_on+0xbe/0x140
[ 5157.128742]  [<ffffffff81084441>] queue_delayed_work+0x21/0x40
[ 5157.128742]  [<ffffffffa006e568>] rpc_queue_upcall+0xe8/0x100 [sunrpc]
[ 5157.128742]  [<ffffffffa012a121>] __cld_pipe_upcall+0x61/0xc0 [nfsd]
[ 5157.128742]  [<ffffffffa012ad98>] nfsd4_cld_init+0x48/0x140 [nfsd]
[ 5157.128742]  [<ffffffffa012b22a>] nfsd4_client_tracking_init+0x2a/0xc0 [nfsd]
[ 5157.128742]  [<ffffffff8169797e>] ? mutex_unlock+0xe/0x10
[ 5157.128742]  [<ffffffffa01266fa>] nfs4_state_start+0x1a/0x100 [nfsd]
[ 5157.128742]  [<ffffffffa01028c5>] nfsd_svc+0x135/0x200 [nfsd]
[ 5157.128742]  [<ffffffffa0103df0>] ? write_maxblksize+0x130/0x130 [nfsd]
[ 5157.128742]  [<ffffffffa0103e6d>] write_threads+0x7d/0xd0 [nfsd]
[ 5157.128742]  [<ffffffff811dd16a>] ? simple_transaction_get+0xca/0xe0
[ 5157.128742]  [<ffffffffa0102ee7>] nfsctl_transaction_write+0x57/0x90 [nfsd]
[ 5157.128742]  [<ffffffff811b4c9f>] vfs_write+0xaf/0x190
[ 5157.128742]  [<ffffffff811b4fdd>] sys_write+0x4d/0x90
[ 5157.128742]  [<ffffffff816a3469>] system_call_fastpath+0x16/0x1b

...or this on stop:

[ 5200.804410] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
[ 5200.808205] Hardware name: Bochs
[ 5200.810121] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
[ 5200.812734] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
[ 5200.821596] Pid: 1394, comm: nfsd Tainted: G        W  O 3.3.0-rc1+ #1
[ 5200.822409] Call Trace:
[ 5200.822726]  [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
[ 5200.823509]  [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
[ 5200.824243]  [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
[ 5200.824935]  [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
[ 5200.825665]  [<ffffffff8132c543>] debug_object_assert_init+0xe3/0x120
[ 5200.826474]  [<ffffffff810737b4>] del_timer+0x24/0x1b0
[ 5200.827129]  [<ffffffff81085314>] __cancel_work_timer+0x34/0x140
[ 5200.827829]  [<ffffffff81085432>] cancel_delayed_work_sync+0x12/0x20
[ 5200.832905]  [<ffffffffa006e430>] rpc_unlink+0x1e0/0x230 [sunrpc]
[ 5200.833716]  [<ffffffffa012a195>] nfsd4_remove_cld_pipe+0x15/0x40 [nfsd]
[ 5200.834558]  [<ffffffffa012b2e0>] nfsd4_client_tracking_exit+0x20/0x30 [nfsd]
[ 5200.835434]  [<ffffffffa0126984>] nfs4_state_shutdown+0x1a4/0x1c0 [nfsd]
[ 5200.836271]  [<ffffffffa010219a>] nfsd_last_thread+0x2a/0x60 [nfsd]
[ 5200.837045]  [<ffffffffa005f66c>] svc_destroy+0x5c/0x140 [sunrpc]
[ 5200.837778]  [<ffffffffa005f9f6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
[ 5200.838545]  [<ffffffffa0102123>] nfsd+0x123/0x170 [nfsd]
[ 5200.839221]  [<ffffffffa0102000>] ? 0xffffffffa0101fff
[ 5200.839820]  [<ffffffff8108a747>] kthread+0xb7/0xc0
[ 5200.840431]  [<ffffffff816a48b4>] kernel_thread_helper+0x4/0x10
[ 5200.850883]  [<ffffffff8169acf4>] ? retint_restore_args+0x13/0x13
[ 5200.851654]  [<ffffffff8108a690>] ? kthread_worker_fn+0x1a0/0x1a0
[ 5200.852418]  [<ffffffff816a48b0>] ? gs_change+0x13/0x13

The problem is that even after staring at this code for a while, I'm
still clueless as to what this is telling me. It doesn't happen every
time either, so maybe there's a race of some sort involved.

It's possible that my patch is broken and doing something wrong, but
I'm starting not to think so. I'm not really using timers directly --
I'm using rpc_pipefs, which queues a delayed job to a workqueue, and
that is manipulating the timer. The delayed_work (and hence the timer)
are part of an rpc_inode.

Can anyone shed a little light on what this is complaining about?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
  2012-01-20 18:56 WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0() Jeff Layton
@ 2012-01-22  8:46 ` Stephen Boyd
  2012-01-22 12:14   ` Jeff Layton
  2012-01-23 15:23   ` Jeff Layton
  0 siblings, 2 replies; 13+ messages in thread
From: Stephen Boyd @ 2012-01-22  8:46 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-kernel, bfields, sgruszka

On 1/20/2012 10:56 AM, Jeff Layton wrote:
> What I'm finding though is that if I stop and start nfsd multiple
> times, then I eventually get a warning like this on start:
>
> [ 5157.128514] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
> [ 5157.128742] Hardware name: Bochs
> [ 5157.128742] ODEBUG: activate not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
> [ 5157.128742] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
> [ 5157.128742] Pid: 1312, comm: rpc.nfsd Tainted: G        W  O 3.3.0-rc1+ #1
> [ 5157.128742] Call Trace:
> [ 5157.128742]  [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
> [ 5157.128742]  [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
> [ 5157.128742]  [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
> [ 5157.128742]  [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
> [ 5157.128742]  [<ffffffff8132c02b>] debug_object_activate+0xfb/0x190
> [ 5157.128742]  [<ffffffff81072728>] ? lock_timer_base.isra.24+0x38/0x70
> [ 5157.128742]  [<ffffffff81074676>] mod_timer+0xf6/0x450
> [ 5157.128742]  [<ffffffff810749e8>] add_timer+0x18/0x20
> [ 5157.128742]  [<ffffffff8108168e>] queue_delayed_work_on+0xbe/0x140
> [ 5157.128742]  [<ffffffff81084441>] queue_delayed_work+0x21/0x40
> [ 5157.128742]  [<ffffffffa006e568>] rpc_queue_upcall+0xe8/0x100 [sunrpc]
> [ 5157.128742]  [<ffffffffa012a121>] __cld_pipe_upcall+0x61/0xc0 [nfsd]
> [ 5157.128742]  [<ffffffffa012ad98>] nfsd4_cld_init+0x48/0x140 [nfsd]
> [ 5157.128742]  [<ffffffffa012b22a>] nfsd4_client_tracking_init+0x2a/0xc0 [nfsd]
> [ 5157.128742]  [<ffffffff8169797e>] ? mutex_unlock+0xe/0x10
> [ 5157.128742]  [<ffffffffa01266fa>] nfs4_state_start+0x1a/0x100 [nfsd]
> [ 5157.128742]  [<ffffffffa01028c5>] nfsd_svc+0x135/0x200 [nfsd]
> [ 5157.128742]  [<ffffffffa0103df0>] ? write_maxblksize+0x130/0x130 [nfsd]
> [ 5157.128742]  [<ffffffffa0103e6d>] write_threads+0x7d/0xd0 [nfsd]
> [ 5157.128742]  [<ffffffff811dd16a>] ? simple_transaction_get+0xca/0xe0
> [ 5157.128742]  [<ffffffffa0102ee7>] nfsctl_transaction_write+0x57/0x90 [nfsd]
> [ 5157.128742]  [<ffffffff811b4c9f>] vfs_write+0xaf/0x190
> [ 5157.128742]  [<ffffffff811b4fdd>] sys_write+0x4d/0x90
> [ 5157.128742]  [<ffffffff816a3469>] system_call_fastpath+0x16/0x1b

This one is telling you that the timer you're about to run hasn't been 
registered with debug_objects. Most likely the work item hasn't been 
initialized properly and so it hasn't had INIT_DELAYED_WORK() called on it.

>
> ...or this on stop:
>
> [ 5200.804410] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
> [ 5200.808205] Hardware name: Bochs
> [ 5200.810121] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
> [ 5200.812734] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
> [ 5200.821596] Pid: 1394, comm: nfsd Tainted: G        W  O 3.3.0-rc1+ #1
> [ 5200.822409] Call Trace:
> [ 5200.822726]  [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
> [ 5200.823509]  [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
> [ 5200.824243]  [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
> [ 5200.824935]  [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
> [ 5200.825665]  [<ffffffff8132c543>] debug_object_assert_init+0xe3/0x120
> [ 5200.826474]  [<ffffffff810737b4>] del_timer+0x24/0x1b0
> [ 5200.827129]  [<ffffffff81085314>] __cancel_work_timer+0x34/0x140
> [ 5200.827829]  [<ffffffff81085432>] cancel_delayed_work_sync+0x12/0x20
> [ 5200.832905]  [<ffffffffa006e430>] rpc_unlink+0x1e0/0x230 [sunrpc]
> [ 5200.833716]  [<ffffffffa012a195>] nfsd4_remove_cld_pipe+0x15/0x40 [nfsd]
> [ 5200.834558]  [<ffffffffa012b2e0>] nfsd4_client_tracking_exit+0x20/0x30 [nfsd]
> [ 5200.835434]  [<ffffffffa0126984>] nfs4_state_shutdown+0x1a4/0x1c0 [nfsd]
> [ 5200.836271]  [<ffffffffa010219a>] nfsd_last_thread+0x2a/0x60 [nfsd]
> [ 5200.837045]  [<ffffffffa005f66c>] svc_destroy+0x5c/0x140 [sunrpc]
> [ 5200.837778]  [<ffffffffa005f9f6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
> [ 5200.838545]  [<ffffffffa0102123>] nfsd+0x123/0x170 [nfsd]
> [ 5200.839221]  [<ffffffffa0102000>] ? 0xffffffffa0101fff
> [ 5200.839820]  [<ffffffff8108a747>] kthread+0xb7/0xc0
> [ 5200.840431]  [<ffffffff816a48b4>] kernel_thread_helper+0x4/0x10
> [ 5200.850883]  [<ffffffff8169acf4>] ? retint_restore_args+0x13/0x13
> [ 5200.851654]  [<ffffffff8108a690>] ? kthread_worker_fn+0x1a0/0x1a0
> [ 5200.852418]  [<ffffffff816a48b0>] ? gs_change+0x13/0x13

This is similar. Now you're deleting a timer that debug_objects doesn't 
know about.

>
> The problem is that even after staring at this code for a while, I'm
> still clueless as to what this is telling me. It doesn't happen every
> time either, so maybe there's a race of some sort involved.
>
> It's possible that my patch is broken and doing something wrong, but
> I'm starting not to think so. I'm not really using timers directly --
> I'm using rpc_pipefs, which queues a delayed job to a workqueue, and
> that is manipulating the timer. The delayed_work (and hence the timer)
> are part of an rpc_inode.
>
> Can anyone shed a little light on what this is complaining about?
>

Do you have workqueue debugging enabled too (DEBUG_OBJECTS_WORK)? I 
would hope that work item debugging would say the same things and then 
we would know that the work item itself wasn't initialized properly.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
  2012-01-22  8:46 ` Stephen Boyd
@ 2012-01-22 12:14   ` Jeff Layton
  2012-01-23 15:23   ` Jeff Layton
  1 sibling, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2012-01-22 12:14 UTC (permalink / raw)
  To: Stephen Boyd; +Cc: linux-kernel, bfields, sgruszka

On Sun, 22 Jan 2012 00:46:14 -0800
Stephen Boyd <sboyd@codeaurora.org> wrote:

> On 1/20/2012 10:56 AM, Jeff Layton wrote:
> > What I'm finding though is that if I stop and start nfsd multiple
> > times, then I eventually get a warning like this on start:
> >
> > [ 5157.128514] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
> > [ 5157.128742] Hardware name: Bochs
> > [ 5157.128742] ODEBUG: activate not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
> > [ 5157.128742] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
> > [ 5157.128742] Pid: 1312, comm: rpc.nfsd Tainted: G        W  O 3.3.0-rc1+ #1
> > [ 5157.128742] Call Trace:
> > [ 5157.128742]  [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
> > [ 5157.128742]  [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
> > [ 5157.128742]  [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
> > [ 5157.128742]  [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
> > [ 5157.128742]  [<ffffffff8132c02b>] debug_object_activate+0xfb/0x190
> > [ 5157.128742]  [<ffffffff81072728>] ? lock_timer_base.isra.24+0x38/0x70
> > [ 5157.128742]  [<ffffffff81074676>] mod_timer+0xf6/0x450
> > [ 5157.128742]  [<ffffffff810749e8>] add_timer+0x18/0x20
> > [ 5157.128742]  [<ffffffff8108168e>] queue_delayed_work_on+0xbe/0x140
> > [ 5157.128742]  [<ffffffff81084441>] queue_delayed_work+0x21/0x40
> > [ 5157.128742]  [<ffffffffa006e568>] rpc_queue_upcall+0xe8/0x100 [sunrpc]
> > [ 5157.128742]  [<ffffffffa012a121>] __cld_pipe_upcall+0x61/0xc0 [nfsd]
> > [ 5157.128742]  [<ffffffffa012ad98>] nfsd4_cld_init+0x48/0x140 [nfsd]
> > [ 5157.128742]  [<ffffffffa012b22a>] nfsd4_client_tracking_init+0x2a/0xc0 [nfsd]
> > [ 5157.128742]  [<ffffffff8169797e>] ? mutex_unlock+0xe/0x10
> > [ 5157.128742]  [<ffffffffa01266fa>] nfs4_state_start+0x1a/0x100 [nfsd]
> > [ 5157.128742]  [<ffffffffa01028c5>] nfsd_svc+0x135/0x200 [nfsd]
> > [ 5157.128742]  [<ffffffffa0103df0>] ? write_maxblksize+0x130/0x130 [nfsd]
> > [ 5157.128742]  [<ffffffffa0103e6d>] write_threads+0x7d/0xd0 [nfsd]
> > [ 5157.128742]  [<ffffffff811dd16a>] ? simple_transaction_get+0xca/0xe0
> > [ 5157.128742]  [<ffffffffa0102ee7>] nfsctl_transaction_write+0x57/0x90 [nfsd]
> > [ 5157.128742]  [<ffffffff811b4c9f>] vfs_write+0xaf/0x190
> > [ 5157.128742]  [<ffffffff811b4fdd>] sys_write+0x4d/0x90
> > [ 5157.128742]  [<ffffffff816a3469>] system_call_fastpath+0x16/0x1b
> 
> This one is telling you that the timer you're about to run hasn't been 
> registered with debug_objects. Most likely the work item hasn't been 
> initialized properly and so it hasn't had INIT_DELAYED_WORK() called on it.
> 
> >
> > ...or this on stop:
> >
> > [ 5200.804410] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
> > [ 5200.808205] Hardware name: Bochs
> > [ 5200.810121] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
> > [ 5200.812734] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
> > [ 5200.821596] Pid: 1394, comm: nfsd Tainted: G        W  O 3.3.0-rc1+ #1
> > [ 5200.822409] Call Trace:
> > [ 5200.822726]  [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
> > [ 5200.823509]  [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
> > [ 5200.824243]  [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
> > [ 5200.824935]  [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
> > [ 5200.825665]  [<ffffffff8132c543>] debug_object_assert_init+0xe3/0x120
> > [ 5200.826474]  [<ffffffff810737b4>] del_timer+0x24/0x1b0
> > [ 5200.827129]  [<ffffffff81085314>] __cancel_work_timer+0x34/0x140
> > [ 5200.827829]  [<ffffffff81085432>] cancel_delayed_work_sync+0x12/0x20
> > [ 5200.832905]  [<ffffffffa006e430>] rpc_unlink+0x1e0/0x230 [sunrpc]
> > [ 5200.833716]  [<ffffffffa012a195>] nfsd4_remove_cld_pipe+0x15/0x40 [nfsd]
> > [ 5200.834558]  [<ffffffffa012b2e0>] nfsd4_client_tracking_exit+0x20/0x30 [nfsd]
> > [ 5200.835434]  [<ffffffffa0126984>] nfs4_state_shutdown+0x1a4/0x1c0 [nfsd]
> > [ 5200.836271]  [<ffffffffa010219a>] nfsd_last_thread+0x2a/0x60 [nfsd]
> > [ 5200.837045]  [<ffffffffa005f66c>] svc_destroy+0x5c/0x140 [sunrpc]
> > [ 5200.837778]  [<ffffffffa005f9f6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
> > [ 5200.838545]  [<ffffffffa0102123>] nfsd+0x123/0x170 [nfsd]
> > [ 5200.839221]  [<ffffffffa0102000>] ? 0xffffffffa0101fff
> > [ 5200.839820]  [<ffffffff8108a747>] kthread+0xb7/0xc0
> > [ 5200.840431]  [<ffffffff816a48b4>] kernel_thread_helper+0x4/0x10
> > [ 5200.850883]  [<ffffffff8169acf4>] ? retint_restore_args+0x13/0x13
> > [ 5200.851654]  [<ffffffff8108a690>] ? kthread_worker_fn+0x1a0/0x1a0
> > [ 5200.852418]  [<ffffffff816a48b0>] ? gs_change+0x13/0x13
> 
> This is similar. Now you're deleting a timer that debug_objects doesn't 
> know about.
> 
> >
> > The problem is that even after staring at this code for a while, I'm
> > still clueless as to what this is telling me. It doesn't happen every
> > time either, so maybe there's a race of some sort involved.
> >
> > It's possible that my patch is broken and doing something wrong, but
> > I'm starting not to think so. I'm not really using timers directly --
> > I'm using rpc_pipefs, which queues a delayed job to a workqueue, and
> > that is manipulating the timer. The delayed_work (and hence the timer)
> > are part of an rpc_inode.
> >
> > Can anyone shed a little light on what this is complaining about?
> >
> 
> Do you have workqueue debugging enabled too (DEBUG_OBJECTS_WORK)? I 
> would hope that work item debugging would say the same things and then 
> we would know that the work item itself wasn't initialized properly.

Ahh, that info helps -- thanks. The delayed work is being initialized
in init_once() in rpc_pipe.c. That's a slab initializer routine, so
maybe when these objects are being recycled they aren't getting
reinitialized in a way that the debugobjects code likes.

I'll play around with that theory and see what I come up with...

Thanks for the help!
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
  2012-01-22  8:46 ` Stephen Boyd
  2012-01-22 12:14   ` Jeff Layton
@ 2012-01-23 15:23   ` Jeff Layton
  2012-01-24  7:45     ` Stanislaw Gruszka
  1 sibling, 1 reply; 13+ messages in thread
From: Jeff Layton @ 2012-01-23 15:23 UTC (permalink / raw)
  To: Stephen Boyd; +Cc: linux-kernel, bfields, linux-nfs, sgruszka

On Sun, 22 Jan 2012 00:46:14 -0800
Stephen Boyd <sboyd@codeaurora.org> wrote:

> On 1/20/2012 10:56 AM, Jeff Layton wrote:
> > What I'm finding though is that if I stop and start nfsd multiple
> > times, then I eventually get a warning like this on start:
> >
> > [ 5157.128514] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
> > [ 5157.128742] Hardware name: Bochs
> > [ 5157.128742] ODEBUG: activate not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
> > [ 5157.128742] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
> > [ 5157.128742] Pid: 1312, comm: rpc.nfsd Tainted: G        W  O 3.3.0-rc1+ #1
> > [ 5157.128742] Call Trace:
> > [ 5157.128742]  [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
> > [ 5157.128742]  [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
> > [ 5157.128742]  [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
> > [ 5157.128742]  [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
> > [ 5157.128742]  [<ffffffff8132c02b>] debug_object_activate+0xfb/0x190
> > [ 5157.128742]  [<ffffffff81072728>] ? lock_timer_base.isra.24+0x38/0x70
> > [ 5157.128742]  [<ffffffff81074676>] mod_timer+0xf6/0x450
> > [ 5157.128742]  [<ffffffff810749e8>] add_timer+0x18/0x20
> > [ 5157.128742]  [<ffffffff8108168e>] queue_delayed_work_on+0xbe/0x140
> > [ 5157.128742]  [<ffffffff81084441>] queue_delayed_work+0x21/0x40
> > [ 5157.128742]  [<ffffffffa006e568>] rpc_queue_upcall+0xe8/0x100 [sunrpc]
> > [ 5157.128742]  [<ffffffffa012a121>] __cld_pipe_upcall+0x61/0xc0 [nfsd]
> > [ 5157.128742]  [<ffffffffa012ad98>] nfsd4_cld_init+0x48/0x140 [nfsd]
> > [ 5157.128742]  [<ffffffffa012b22a>] nfsd4_client_tracking_init+0x2a/0xc0 [nfsd]
> > [ 5157.128742]  [<ffffffff8169797e>] ? mutex_unlock+0xe/0x10
> > [ 5157.128742]  [<ffffffffa01266fa>] nfs4_state_start+0x1a/0x100 [nfsd]
> > [ 5157.128742]  [<ffffffffa01028c5>] nfsd_svc+0x135/0x200 [nfsd]
> > [ 5157.128742]  [<ffffffffa0103df0>] ? write_maxblksize+0x130/0x130 [nfsd]
> > [ 5157.128742]  [<ffffffffa0103e6d>] write_threads+0x7d/0xd0 [nfsd]
> > [ 5157.128742]  [<ffffffff811dd16a>] ? simple_transaction_get+0xca/0xe0
> > [ 5157.128742]  [<ffffffffa0102ee7>] nfsctl_transaction_write+0x57/0x90 [nfsd]
> > [ 5157.128742]  [<ffffffff811b4c9f>] vfs_write+0xaf/0x190
> > [ 5157.128742]  [<ffffffff811b4fdd>] sys_write+0x4d/0x90
> > [ 5157.128742]  [<ffffffff816a3469>] system_call_fastpath+0x16/0x1b
> 
> This one is telling you that the timer you're about to run hasn't been 
> registered with debug_objects. Most likely the work item hasn't been 
> initialized properly and so it hasn't had INIT_DELAYED_WORK() called on it.
> 
> >
> > ...or this on stop:
> >
> > [ 5200.804410] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
> > [ 5200.808205] Hardware name: Bochs
> > [ 5200.810121] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
> > [ 5200.812734] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
> > [ 5200.821596] Pid: 1394, comm: nfsd Tainted: G        W  O 3.3.0-rc1+ #1
> > [ 5200.822409] Call Trace:
> > [ 5200.822726]  [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
> > [ 5200.823509]  [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
> > [ 5200.824243]  [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
> > [ 5200.824935]  [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
> > [ 5200.825665]  [<ffffffff8132c543>] debug_object_assert_init+0xe3/0x120
> > [ 5200.826474]  [<ffffffff810737b4>] del_timer+0x24/0x1b0
> > [ 5200.827129]  [<ffffffff81085314>] __cancel_work_timer+0x34/0x140
> > [ 5200.827829]  [<ffffffff81085432>] cancel_delayed_work_sync+0x12/0x20
> > [ 5200.832905]  [<ffffffffa006e430>] rpc_unlink+0x1e0/0x230 [sunrpc]
> > [ 5200.833716]  [<ffffffffa012a195>] nfsd4_remove_cld_pipe+0x15/0x40 [nfsd]
> > [ 5200.834558]  [<ffffffffa012b2e0>] nfsd4_client_tracking_exit+0x20/0x30 [nfsd]
> > [ 5200.835434]  [<ffffffffa0126984>] nfs4_state_shutdown+0x1a4/0x1c0 [nfsd]
> > [ 5200.836271]  [<ffffffffa010219a>] nfsd_last_thread+0x2a/0x60 [nfsd]
> > [ 5200.837045]  [<ffffffffa005f66c>] svc_destroy+0x5c/0x140 [sunrpc]
> > [ 5200.837778]  [<ffffffffa005f9f6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
> > [ 5200.838545]  [<ffffffffa0102123>] nfsd+0x123/0x170 [nfsd]
> > [ 5200.839221]  [<ffffffffa0102000>] ? 0xffffffffa0101fff
> > [ 5200.839820]  [<ffffffff8108a747>] kthread+0xb7/0xc0
> > [ 5200.840431]  [<ffffffff816a48b4>] kernel_thread_helper+0x4/0x10
> > [ 5200.850883]  [<ffffffff8169acf4>] ? retint_restore_args+0x13/0x13
> > [ 5200.851654]  [<ffffffff8108a690>] ? kthread_worker_fn+0x1a0/0x1a0
> > [ 5200.852418]  [<ffffffff816a48b0>] ? gs_change+0x13/0x13
> 
> This is similar. Now you're deleting a timer that debug_objects doesn't 
> know about.
> 
> >
> > The problem is that even after staring at this code for a while, I'm
> > still clueless as to what this is telling me. It doesn't happen every
> > time either, so maybe there's a race of some sort involved.
> >
> > It's possible that my patch is broken and doing something wrong, but
> > I'm starting not to think so. I'm not really using timers directly --
> > I'm using rpc_pipefs, which queues a delayed job to a workqueue, and
> > that is manipulating the timer. The delayed_work (and hence the timer)
> > are part of an rpc_inode.
> >
> > Can anyone shed a little light on what this is complaining about?
> >
> 
> Do you have workqueue debugging enabled too (DEBUG_OBJECTS_WORK)? I 
> would hope that work item debugging would say the same things and then 
> we would know that the work item itself wasn't initialized properly.

(adding linux-nfs to cc list...)

Ok, I think I sort of see what's happening, but I'm not sure if it's a
bug in the debug objects code or something else. To answer the question
that I didn't before, the kernels I've been testing have this set:

    CONFIG_DEBUG_OBJECTS_WORK=y

...so workqueue debugging is on but I haven't seen any messages that
come from it AFAICT. The rpc_inode objects have their delayed_work
fields initialized via a slab "constructor". Here's the rpc_inode_cache
creation call:

        rpc_inode_cachep = kmem_cache_create("rpc_inode_cache",
                                sizeof(struct rpc_inode),
                                0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
                                                SLAB_MEM_SPREAD),
                                init_once);

...and the init_once() routine does this:

        INIT_DELAYED_WORK(&rpci->queue_timeout,
                            rpc_timeout_upcall_queue);

On the first use of an rpc_inode object, everything works fine. I think
the problem comes in when rpc_inode objects get recycled without the
INIT_DELAYED_WORK() getting called on it again.

Before the object is freed it gets cancel_delayed_work_sync() called on
it, but that's apparently not enough to convince the debugobjects code
that it's already correctly initialized. As a Q&D check, the following
patch seems to stop the warnings. Is there some better way to do this
that doesn't require the reinitialization of the delayed work on each
inode allocation?

--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -178,6 +178,8 @@ rpc_alloc_inode(struct super_block *sb)
        rpci = (struct rpc_inode *)kmem_cache_alloc(rpc_inode_cachep, GFP_KERNEL);
        if (!rpci)
                return NULL;
+
+       INIT_DELAYED_WORK(&rpci->queue_timeout, rpc_timeout_upcall_queue);
        return &rpci->vfs_inode;
 }

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
  2012-01-23 15:23   ` Jeff Layton
@ 2012-01-24  7:45     ` Stanislaw Gruszka
  2012-01-24  9:51       ` Boaz Harrosh
  0 siblings, 1 reply; 13+ messages in thread
From: Stanislaw Gruszka @ 2012-01-24  7:45 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Stephen Boyd, linux-kernel, bfields, linux-nfs, Thomas Gleixner

On Mon, Jan 23, 2012 at 10:23:11AM -0500, Jeff Layton wrote:
> On Sun, 22 Jan 2012 00:46:14 -0800
> Stephen Boyd <sboyd@codeaurora.org> wrote:
> 
> > On 1/20/2012 10:56 AM, Jeff Layton wrote:
> > > What I'm finding though is that if I stop and start nfsd multiple
> > > times, then I eventually get a warning like this on start:
> > >
> > > [ 5157.128514] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
> > > [ 5157.128742] Hardware name: Bochs
> > > [ 5157.128742] ODEBUG: activate not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
> > > [ 5157.128742] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
> > > [ 5157.128742] Pid: 1312, comm: rpc.nfsd Tainted: G        W  O 3.3.0-rc1+ #1
> > > [ 5157.128742] Call Trace:
> > > [ 5157.128742]  [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
> > > [ 5157.128742]  [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
> > > [ 5157.128742]  [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
> > > [ 5157.128742]  [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
> > > [ 5157.128742]  [<ffffffff8132c02b>] debug_object_activate+0xfb/0x190
> > > [ 5157.128742]  [<ffffffff81072728>] ? lock_timer_base.isra.24+0x38/0x70
> > > [ 5157.128742]  [<ffffffff81074676>] mod_timer+0xf6/0x450
> > > [ 5157.128742]  [<ffffffff810749e8>] add_timer+0x18/0x20
> > > [ 5157.128742]  [<ffffffff8108168e>] queue_delayed_work_on+0xbe/0x140
> > > [ 5157.128742]  [<ffffffff81084441>] queue_delayed_work+0x21/0x40
> > > [ 5157.128742]  [<ffffffffa006e568>] rpc_queue_upcall+0xe8/0x100 [sunrpc]
> > > [ 5157.128742]  [<ffffffffa012a121>] __cld_pipe_upcall+0x61/0xc0 [nfsd]
> > > [ 5157.128742]  [<ffffffffa012ad98>] nfsd4_cld_init+0x48/0x140 [nfsd]
> > > [ 5157.128742]  [<ffffffffa012b22a>] nfsd4_client_tracking_init+0x2a/0xc0 [nfsd]
> > > [ 5157.128742]  [<ffffffff8169797e>] ? mutex_unlock+0xe/0x10
> > > [ 5157.128742]  [<ffffffffa01266fa>] nfs4_state_start+0x1a/0x100 [nfsd]
> > > [ 5157.128742]  [<ffffffffa01028c5>] nfsd_svc+0x135/0x200 [nfsd]
> > > [ 5157.128742]  [<ffffffffa0103df0>] ? write_maxblksize+0x130/0x130 [nfsd]
> > > [ 5157.128742]  [<ffffffffa0103e6d>] write_threads+0x7d/0xd0 [nfsd]
> > > [ 5157.128742]  [<ffffffff811dd16a>] ? simple_transaction_get+0xca/0xe0
> > > [ 5157.128742]  [<ffffffffa0102ee7>] nfsctl_transaction_write+0x57/0x90 [nfsd]
> > > [ 5157.128742]  [<ffffffff811b4c9f>] vfs_write+0xaf/0x190
> > > [ 5157.128742]  [<ffffffff811b4fdd>] sys_write+0x4d/0x90
> > > [ 5157.128742]  [<ffffffff816a3469>] system_call_fastpath+0x16/0x1b
> > 
> > This one is telling you that the timer you're about to run hasn't been 
> > registered with debug_objects. Most likely the work item hasn't been 
> > initialized properly and so it hasn't had INIT_DELAYED_WORK() called on it.
> > 
> > >
> > > ...or this on stop:
> > >
> > > [ 5200.804410] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
> > > [ 5200.808205] Hardware name: Bochs
> > > [ 5200.810121] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
> > > [ 5200.812734] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
> > > [ 5200.821596] Pid: 1394, comm: nfsd Tainted: G        W  O 3.3.0-rc1+ #1
> > > [ 5200.822409] Call Trace:
> > > [ 5200.822726]  [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
> > > [ 5200.823509]  [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
> > > [ 5200.824243]  [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
> > > [ 5200.824935]  [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
> > > [ 5200.825665]  [<ffffffff8132c543>] debug_object_assert_init+0xe3/0x120
> > > [ 5200.826474]  [<ffffffff810737b4>] del_timer+0x24/0x1b0
> > > [ 5200.827129]  [<ffffffff81085314>] __cancel_work_timer+0x34/0x140
> > > [ 5200.827829]  [<ffffffff81085432>] cancel_delayed_work_sync+0x12/0x20
> > > [ 5200.832905]  [<ffffffffa006e430>] rpc_unlink+0x1e0/0x230 [sunrpc]
> > > [ 5200.833716]  [<ffffffffa012a195>] nfsd4_remove_cld_pipe+0x15/0x40 [nfsd]
> > > [ 5200.834558]  [<ffffffffa012b2e0>] nfsd4_client_tracking_exit+0x20/0x30 [nfsd]
> > > [ 5200.835434]  [<ffffffffa0126984>] nfs4_state_shutdown+0x1a4/0x1c0 [nfsd]
> > > [ 5200.836271]  [<ffffffffa010219a>] nfsd_last_thread+0x2a/0x60 [nfsd]
> > > [ 5200.837045]  [<ffffffffa005f66c>] svc_destroy+0x5c/0x140 [sunrpc]
> > > [ 5200.837778]  [<ffffffffa005f9f6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
> > > [ 5200.838545]  [<ffffffffa0102123>] nfsd+0x123/0x170 [nfsd]
> > > [ 5200.839221]  [<ffffffffa0102000>] ? 0xffffffffa0101fff
> > > [ 5200.839820]  [<ffffffff8108a747>] kthread+0xb7/0xc0
> > > [ 5200.840431]  [<ffffffff816a48b4>] kernel_thread_helper+0x4/0x10
> > > [ 5200.850883]  [<ffffffff8169acf4>] ? retint_restore_args+0x13/0x13
> > > [ 5200.851654]  [<ffffffff8108a690>] ? kthread_worker_fn+0x1a0/0x1a0
> > > [ 5200.852418]  [<ffffffff816a48b0>] ? gs_change+0x13/0x13
> > 
> > This is similar. Now you're deleting a timer that debug_objects doesn't 
> > know about.
> > 
> > >
> > > The problem is that even after staring at this code for a while, I'm
> > > still clueless as to what this is telling me. It doesn't happen every
> > > time either, so maybe there's a race of some sort involved.
> > >
> > > It's possible that my patch is broken and doing something wrong, but
> > > I'm starting not to think so. I'm not really using timers directly --
> > > I'm using rpc_pipefs, which queues a delayed job to a workqueue, and
> > > that is manipulating the timer. The delayed_work (and hence the timer)
> > > are part of an rpc_inode.
> > >
> > > Can anyone shed a little light on what this is complaining about?
> > >
> > 
> > Do you have workqueue debugging enabled too (DEBUG_OBJECTS_WORK)? I 
> > would hope that work item debugging would say the same things and then 
> > we would know that the work item itself wasn't initialized properly.
> 
> (adding linux-nfs to cc list...)
> 
> Ok, I think I sort of see what's happening, but I'm not sure if it's a
> bug in the debug objects code or something else. To answer the question
> that I didn't before, the kernels I've been testing have this set:
> 
>     CONFIG_DEBUG_OBJECTS_WORK=y
> 
> ...so workqueue debugging is on but I haven't seen any messages that
> come from it AFAICT. The rpc_inode objects have their delayed_work
> fields initialized via a slab "constructor". Here's the rpc_inode_cache
> creation call:
> 
>         rpc_inode_cachep = kmem_cache_create("rpc_inode_cache",
>                                 sizeof(struct rpc_inode),
>                                 0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
>                                                 SLAB_MEM_SPREAD),
>                                 init_once);
> 
> ...and the init_once() routine does this:
> 
>         INIT_DELAYED_WORK(&rpci->queue_timeout,
>                             rpc_timeout_upcall_queue);
> 
> On the first use of an rpc_inode object, everything works fine. I think
> the problem comes in when rpc_inode objects get recycled without the
> INIT_DELAYED_WORK() getting called on it again.
> 
> Before the object is freed it gets cancel_delayed_work_sync() called on
> it, but that's apparently not enough to convince the debugobjects code
> that it's already correctly initialized. As a Q&D check, the following
> patch seems to stop the warnings. Is there some better way to do this
> that doesn't require the reinitialization of the delayed work on each
> inode allocation?

If this is debug objects false positive, it should be rather fixed in
debug object code itself, not workaround in nfs code, but I do not have
any idea how to do this.

Stanislaw
 
> --- a/net/sunrpc/rpc_pipe.c
> +++ b/net/sunrpc/rpc_pipe.c
> @@ -178,6 +178,8 @@ rpc_alloc_inode(struct super_block *sb)
>         rpci = (struct rpc_inode *)kmem_cache_alloc(rpc_inode_cachep, GFP_KERNEL);
>         if (!rpci)
>                 return NULL;
> +
> +       INIT_DELAYED_WORK(&rpci->queue_timeout, rpc_timeout_upcall_queue);
>         return &rpci->vfs_inode;
>  }
> 
> -- 
> Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
  2012-01-24  7:45     ` Stanislaw Gruszka
@ 2012-01-24  9:51       ` Boaz Harrosh
  2012-01-24 12:36         ` Jeff Layton
  0 siblings, 1 reply; 13+ messages in thread
From: Boaz Harrosh @ 2012-01-24  9:51 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Jeff Layton, Stephen Boyd, linux-kernel, bfields, linux-nfs,
	Thomas Gleixner, Tejun Heo

On 01/24/2012 09:45 AM, Stanislaw Gruszka wrote:
> On Mon, Jan 23, 2012 at 10:23:11AM -0500, Jeff Layton wrote:
>> On Sun, 22 Jan 2012 00:46:14 -0800
>> Stephen Boyd <sboyd@codeaurora.org> wrote:
>>
>>> On 1/20/2012 10:56 AM, Jeff Layton wrote:
>>>> What I'm finding though is that if I stop and start nfsd multiple
>>>> times, then I eventually get a warning like this on start:
>>>>
>>>> [ 5157.128514] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
>>>> [ 5157.128742] Hardware name: Bochs
>>>> [ 5157.128742] ODEBUG: activate not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
>>>> [ 5157.128742] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
>>>> [ 5157.128742] Pid: 1312, comm: rpc.nfsd Tainted: G        W  O 3.3.0-rc1+ #1
>>>> [ 5157.128742] Call Trace:
>>>> [ 5157.128742]  [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
>>>> [ 5157.128742]  [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
>>>> [ 5157.128742]  [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
>>>> [ 5157.128742]  [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
>>>> [ 5157.128742]  [<ffffffff8132c02b>] debug_object_activate+0xfb/0x190
>>>> [ 5157.128742]  [<ffffffff81072728>] ? lock_timer_base.isra.24+0x38/0x70
>>>> [ 5157.128742]  [<ffffffff81074676>] mod_timer+0xf6/0x450
>>>> [ 5157.128742]  [<ffffffff810749e8>] add_timer+0x18/0x20
>>>> [ 5157.128742]  [<ffffffff8108168e>] queue_delayed_work_on+0xbe/0x140
>>>> [ 5157.128742]  [<ffffffff81084441>] queue_delayed_work+0x21/0x40
>>>> [ 5157.128742]  [<ffffffffa006e568>] rpc_queue_upcall+0xe8/0x100 [sunrpc]
>>>> [ 5157.128742]  [<ffffffffa012a121>] __cld_pipe_upcall+0x61/0xc0 [nfsd]
>>>> [ 5157.128742]  [<ffffffffa012ad98>] nfsd4_cld_init+0x48/0x140 [nfsd]
>>>> [ 5157.128742]  [<ffffffffa012b22a>] nfsd4_client_tracking_init+0x2a/0xc0 [nfsd]
>>>> [ 5157.128742]  [<ffffffff8169797e>] ? mutex_unlock+0xe/0x10
>>>> [ 5157.128742]  [<ffffffffa01266fa>] nfs4_state_start+0x1a/0x100 [nfsd]
>>>> [ 5157.128742]  [<ffffffffa01028c5>] nfsd_svc+0x135/0x200 [nfsd]
>>>> [ 5157.128742]  [<ffffffffa0103df0>] ? write_maxblksize+0x130/0x130 [nfsd]
>>>> [ 5157.128742]  [<ffffffffa0103e6d>] write_threads+0x7d/0xd0 [nfsd]
>>>> [ 5157.128742]  [<ffffffff811dd16a>] ? simple_transaction_get+0xca/0xe0
>>>> [ 5157.128742]  [<ffffffffa0102ee7>] nfsctl_transaction_write+0x57/0x90 [nfsd]
>>>> [ 5157.128742]  [<ffffffff811b4c9f>] vfs_write+0xaf/0x190
>>>> [ 5157.128742]  [<ffffffff811b4fdd>] sys_write+0x4d/0x90
>>>> [ 5157.128742]  [<ffffffff816a3469>] system_call_fastpath+0x16/0x1b
>>>
>>> This one is telling you that the timer you're about to run hasn't been 
>>> registered with debug_objects. Most likely the work item hasn't been 
>>> initialized properly and so it hasn't had INIT_DELAYED_WORK() called on it.
>>>
>>>>
>>>> ...or this on stop:
>>>>
>>>> [ 5200.804410] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
>>>> [ 5200.808205] Hardware name: Bochs
>>>> [ 5200.810121] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
>>>> [ 5200.812734] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
>>>> [ 5200.821596] Pid: 1394, comm: nfsd Tainted: G        W  O 3.3.0-rc1+ #1
>>>> [ 5200.822409] Call Trace:
>>>> [ 5200.822726]  [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
>>>> [ 5200.823509]  [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
>>>> [ 5200.824243]  [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
>>>> [ 5200.824935]  [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
>>>> [ 5200.825665]  [<ffffffff8132c543>] debug_object_assert_init+0xe3/0x120
>>>> [ 5200.826474]  [<ffffffff810737b4>] del_timer+0x24/0x1b0
>>>> [ 5200.827129]  [<ffffffff81085314>] __cancel_work_timer+0x34/0x140
>>>> [ 5200.827829]  [<ffffffff81085432>] cancel_delayed_work_sync+0x12/0x20
>>>> [ 5200.832905]  [<ffffffffa006e430>] rpc_unlink+0x1e0/0x230 [sunrpc]
>>>> [ 5200.833716]  [<ffffffffa012a195>] nfsd4_remove_cld_pipe+0x15/0x40 [nfsd]
>>>> [ 5200.834558]  [<ffffffffa012b2e0>] nfsd4_client_tracking_exit+0x20/0x30 [nfsd]
>>>> [ 5200.835434]  [<ffffffffa0126984>] nfs4_state_shutdown+0x1a4/0x1c0 [nfsd]
>>>> [ 5200.836271]  [<ffffffffa010219a>] nfsd_last_thread+0x2a/0x60 [nfsd]
>>>> [ 5200.837045]  [<ffffffffa005f66c>] svc_destroy+0x5c/0x140 [sunrpc]
>>>> [ 5200.837778]  [<ffffffffa005f9f6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
>>>> [ 5200.838545]  [<ffffffffa0102123>] nfsd+0x123/0x170 [nfsd]
>>>> [ 5200.839221]  [<ffffffffa0102000>] ? 0xffffffffa0101fff
>>>> [ 5200.839820]  [<ffffffff8108a747>] kthread+0xb7/0xc0
>>>> [ 5200.840431]  [<ffffffff816a48b4>] kernel_thread_helper+0x4/0x10
>>>> [ 5200.850883]  [<ffffffff8169acf4>] ? retint_restore_args+0x13/0x13
>>>> [ 5200.851654]  [<ffffffff8108a690>] ? kthread_worker_fn+0x1a0/0x1a0
>>>> [ 5200.852418]  [<ffffffff816a48b0>] ? gs_change+0x13/0x13
>>>
>>> This is similar. Now you're deleting a timer that debug_objects doesn't 
>>> know about.
>>>
>>>>
>>>> The problem is that even after staring at this code for a while, I'm
>>>> still clueless as to what this is telling me. It doesn't happen every
>>>> time either, so maybe there's a race of some sort involved.
>>>>
>>>> It's possible that my patch is broken and doing something wrong, but
>>>> I'm starting not to think so. I'm not really using timers directly --
>>>> I'm using rpc_pipefs, which queues a delayed job to a workqueue, and
>>>> that is manipulating the timer. The delayed_work (and hence the timer)
>>>> are part of an rpc_inode.
>>>>
>>>> Can anyone shed a little light on what this is complaining about?
>>>>
>>>
>>> Do you have workqueue debugging enabled too (DEBUG_OBJECTS_WORK)? I 
>>> would hope that work item debugging would say the same things and then 
>>> we would know that the work item itself wasn't initialized properly.
>>
>> (adding linux-nfs to cc list...)
>>
>> Ok, I think I sort of see what's happening, but I'm not sure if it's a
>> bug in the debug objects code or something else. To answer the question
>> that I didn't before, the kernels I've been testing have this set:
>>
>>     CONFIG_DEBUG_OBJECTS_WORK=y
>>
>> ...so workqueue debugging is on but I haven't seen any messages that
>> come from it AFAICT. The rpc_inode objects have their delayed_work
>> fields initialized via a slab "constructor". Here's the rpc_inode_cache
>> creation call:
>>
>>         rpc_inode_cachep = kmem_cache_create("rpc_inode_cache",
>>                                 sizeof(struct rpc_inode),
>>                                 0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
>>                                                 SLAB_MEM_SPREAD),
>>                                 init_once);
>>
>> ...and the init_once() routine does this:
>>
>>         INIT_DELAYED_WORK(&rpci->queue_timeout,
>>                             rpc_timeout_upcall_queue);
>>
>> On the first use of an rpc_inode object, everything works fine. I think
>> the problem comes in when rpc_inode objects get recycled without the
>> INIT_DELAYED_WORK() getting called on it again.
>>
>> Before the object is freed it gets cancel_delayed_work_sync() called on
>> it, but that's apparently not enough to convince the debugobjects code
>> that it's already correctly initialized. As a Q&D check, the following
>> patch seems to stop the warnings. Is there some better way to do this
>> that doesn't require the reinitialization of the delayed work on each
>> inode allocation?
> 
> If this is debug objects false positive, it should be rather fixed in
> debug object code itself, not workaround in nfs code, but I do not have
> any idea how to do this.
> 
> Stanislaw
>  

I don't think so. Look at what INIT_DELAYED_WORK does.
   INIT_WORK - Set some start values, register with the objects-debugger if on.
   init_timer - 

I'd imagine that cancel_delayed_work_sync() would: 1st cancel some of those
registrations, specifically with the objects-debugger. 2nd will leave some
state undefined as left from the last round.

I think this patch is cardinal and you might have found a potential bug.

Also the cost of INIT_DELAYED_WORK is marginal compare to the extra safety.

Recomended-by: Boaz Harrosh <bharrosh@panasas.com>

>> --- a/net/sunrpc/rpc_pipe.c
>> +++ b/net/sunrpc/rpc_pipe.c
>> @@ -178,6 +178,8 @@ rpc_alloc_inode(struct super_block *sb)
>>         rpci = (struct rpc_inode *)kmem_cache_alloc(rpc_inode_cachep, GFP_KERNEL);
>>         if (!rpci)
>>                 return NULL;
>> +
>> +       INIT_DELAYED_WORK(&rpci->queue_timeout, rpc_timeout_upcall_queue);
>>         return &rpci->vfs_inode;
>>  }
>>
>> -- 
>> Jeff Layton <jlayton@redhat.com>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
  2012-01-24  9:51       ` Boaz Harrosh
@ 2012-01-24 12:36         ` Jeff Layton
  2012-01-24 15:01           ` Boaz Harrosh
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Layton @ 2012-01-24 12:36 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Stanislaw Gruszka, Stephen Boyd, linux-kernel, bfields,
	linux-nfs, Thomas Gleixner, Tejun Heo

On Tue, 24 Jan 2012 11:51:59 +0200
Boaz Harrosh <bharrosh@panasas.com> wrote:

> On 01/24/2012 09:45 AM, Stanislaw Gruszka wrote:
> > On Mon, Jan 23, 2012 at 10:23:11AM -0500, Jeff Layton wrote:
> >> On Sun, 22 Jan 2012 00:46:14 -0800
> >> Stephen Boyd <sboyd@codeaurora.org> wrote:
> >>
> >>> On 1/20/2012 10:56 AM, Jeff Layton wrote:
> >>>> What I'm finding though is that if I stop and start nfsd multiple
> >>>> times, then I eventually get a warning like this on start:
> >>>>
> >>>> [ 5157.128514] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
> >>>> [ 5157.128742] Hardware name: Bochs
> >>>> [ 5157.128742] ODEBUG: activate not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
> >>>> [ 5157.128742] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
> >>>> [ 5157.128742] Pid: 1312, comm: rpc.nfsd Tainted: G        W  O 3.3.0-rc1+ #1
> >>>> [ 5157.128742] Call Trace:
> >>>> [ 5157.128742]  [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
> >>>> [ 5157.128742]  [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
> >>>> [ 5157.128742]  [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
> >>>> [ 5157.128742]  [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
> >>>> [ 5157.128742]  [<ffffffff8132c02b>] debug_object_activate+0xfb/0x190
> >>>> [ 5157.128742]  [<ffffffff81072728>] ? lock_timer_base.isra.24+0x38/0x70
> >>>> [ 5157.128742]  [<ffffffff81074676>] mod_timer+0xf6/0x450
> >>>> [ 5157.128742]  [<ffffffff810749e8>] add_timer+0x18/0x20
> >>>> [ 5157.128742]  [<ffffffff8108168e>] queue_delayed_work_on+0xbe/0x140
> >>>> [ 5157.128742]  [<ffffffff81084441>] queue_delayed_work+0x21/0x40
> >>>> [ 5157.128742]  [<ffffffffa006e568>] rpc_queue_upcall+0xe8/0x100 [sunrpc]
> >>>> [ 5157.128742]  [<ffffffffa012a121>] __cld_pipe_upcall+0x61/0xc0 [nfsd]
> >>>> [ 5157.128742]  [<ffffffffa012ad98>] nfsd4_cld_init+0x48/0x140 [nfsd]
> >>>> [ 5157.128742]  [<ffffffffa012b22a>] nfsd4_client_tracking_init+0x2a/0xc0 [nfsd]
> >>>> [ 5157.128742]  [<ffffffff8169797e>] ? mutex_unlock+0xe/0x10
> >>>> [ 5157.128742]  [<ffffffffa01266fa>] nfs4_state_start+0x1a/0x100 [nfsd]
> >>>> [ 5157.128742]  [<ffffffffa01028c5>] nfsd_svc+0x135/0x200 [nfsd]
> >>>> [ 5157.128742]  [<ffffffffa0103df0>] ? write_maxblksize+0x130/0x130 [nfsd]
> >>>> [ 5157.128742]  [<ffffffffa0103e6d>] write_threads+0x7d/0xd0 [nfsd]
> >>>> [ 5157.128742]  [<ffffffff811dd16a>] ? simple_transaction_get+0xca/0xe0
> >>>> [ 5157.128742]  [<ffffffffa0102ee7>] nfsctl_transaction_write+0x57/0x90 [nfsd]
> >>>> [ 5157.128742]  [<ffffffff811b4c9f>] vfs_write+0xaf/0x190
> >>>> [ 5157.128742]  [<ffffffff811b4fdd>] sys_write+0x4d/0x90
> >>>> [ 5157.128742]  [<ffffffff816a3469>] system_call_fastpath+0x16/0x1b
> >>>
> >>> This one is telling you that the timer you're about to run hasn't been 
> >>> registered with debug_objects. Most likely the work item hasn't been 
> >>> initialized properly and so it hasn't had INIT_DELAYED_WORK() called on it.
> >>>
> >>>>
> >>>> ...or this on stop:
> >>>>
> >>>> [ 5200.804410] WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
> >>>> [ 5200.808205] Hardware name: Bochs
> >>>> [ 5200.810121] ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
> >>>> [ 5200.812734] Modules linked in: nfsd(O) nfs_acl auth_rpcgss lockd sunrpc floppy virtio_net i2c_piix4 i2c_core virtio_balloon joydev pcspkr virtio_blk [last unloaded: nfsd]
> >>>> [ 5200.821596] Pid: 1394, comm: nfsd Tainted: G        W  O 3.3.0-rc1+ #1
> >>>> [ 5200.822409] Call Trace:
> >>>> [ 5200.822726]  [<ffffffff8106135f>] warn_slowpath_common+0x7f/0xc0
> >>>> [ 5200.823509]  [<ffffffff81061456>] warn_slowpath_fmt+0x46/0x50
> >>>> [ 5200.824243]  [<ffffffff8132ba2c>] debug_print_object+0x8c/0xb0
> >>>> [ 5200.824935]  [<ffffffff81070db0>] ? timer_debug_hint+0x10/0x10
> >>>> [ 5200.825665]  [<ffffffff8132c543>] debug_object_assert_init+0xe3/0x120
> >>>> [ 5200.826474]  [<ffffffff810737b4>] del_timer+0x24/0x1b0
> >>>> [ 5200.827129]  [<ffffffff81085314>] __cancel_work_timer+0x34/0x140
> >>>> [ 5200.827829]  [<ffffffff81085432>] cancel_delayed_work_sync+0x12/0x20
> >>>> [ 5200.832905]  [<ffffffffa006e430>] rpc_unlink+0x1e0/0x230 [sunrpc]
> >>>> [ 5200.833716]  [<ffffffffa012a195>] nfsd4_remove_cld_pipe+0x15/0x40 [nfsd]
> >>>> [ 5200.834558]  [<ffffffffa012b2e0>] nfsd4_client_tracking_exit+0x20/0x30 [nfsd]
> >>>> [ 5200.835434]  [<ffffffffa0126984>] nfs4_state_shutdown+0x1a4/0x1c0 [nfsd]
> >>>> [ 5200.836271]  [<ffffffffa010219a>] nfsd_last_thread+0x2a/0x60 [nfsd]
> >>>> [ 5200.837045]  [<ffffffffa005f66c>] svc_destroy+0x5c/0x140 [sunrpc]
> >>>> [ 5200.837778]  [<ffffffffa005f9f6>] svc_exit_thread+0xa6/0xb0 [sunrpc]
> >>>> [ 5200.838545]  [<ffffffffa0102123>] nfsd+0x123/0x170 [nfsd]
> >>>> [ 5200.839221]  [<ffffffffa0102000>] ? 0xffffffffa0101fff
> >>>> [ 5200.839820]  [<ffffffff8108a747>] kthread+0xb7/0xc0
> >>>> [ 5200.840431]  [<ffffffff816a48b4>] kernel_thread_helper+0x4/0x10
> >>>> [ 5200.850883]  [<ffffffff8169acf4>] ? retint_restore_args+0x13/0x13
> >>>> [ 5200.851654]  [<ffffffff8108a690>] ? kthread_worker_fn+0x1a0/0x1a0
> >>>> [ 5200.852418]  [<ffffffff816a48b0>] ? gs_change+0x13/0x13
> >>>
> >>> This is similar. Now you're deleting a timer that debug_objects doesn't 
> >>> know about.
> >>>
> >>>>
> >>>> The problem is that even after staring at this code for a while, I'm
> >>>> still clueless as to what this is telling me. It doesn't happen every
> >>>> time either, so maybe there's a race of some sort involved.
> >>>>
> >>>> It's possible that my patch is broken and doing something wrong, but
> >>>> I'm starting not to think so. I'm not really using timers directly --
> >>>> I'm using rpc_pipefs, which queues a delayed job to a workqueue, and
> >>>> that is manipulating the timer. The delayed_work (and hence the timer)
> >>>> are part of an rpc_inode.
> >>>>
> >>>> Can anyone shed a little light on what this is complaining about?
> >>>>
> >>>
> >>> Do you have workqueue debugging enabled too (DEBUG_OBJECTS_WORK)? I 
> >>> would hope that work item debugging would say the same things and then 
> >>> we would know that the work item itself wasn't initialized properly.
> >>
> >> (adding linux-nfs to cc list...)
> >>
> >> Ok, I think I sort of see what's happening, but I'm not sure if it's a
> >> bug in the debug objects code or something else. To answer the question
> >> that I didn't before, the kernels I've been testing have this set:
> >>
> >>     CONFIG_DEBUG_OBJECTS_WORK=y
> >>
> >> ...so workqueue debugging is on but I haven't seen any messages that
> >> come from it AFAICT. The rpc_inode objects have their delayed_work
> >> fields initialized via a slab "constructor". Here's the rpc_inode_cache
> >> creation call:
> >>
> >>         rpc_inode_cachep = kmem_cache_create("rpc_inode_cache",
> >>                                 sizeof(struct rpc_inode),
> >>                                 0, (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT|
> >>                                                 SLAB_MEM_SPREAD),
> >>                                 init_once);
> >>
> >> ...and the init_once() routine does this:
> >>
> >>         INIT_DELAYED_WORK(&rpci->queue_timeout,
> >>                             rpc_timeout_upcall_queue);
> >>
> >> On the first use of an rpc_inode object, everything works fine. I think
> >> the problem comes in when rpc_inode objects get recycled without the
> >> INIT_DELAYED_WORK() getting called on it again.
> >>
> >> Before the object is freed it gets cancel_delayed_work_sync() called on
> >> it, but that's apparently not enough to convince the debugobjects code
> >> that it's already correctly initialized. As a Q&D check, the following
> >> patch seems to stop the warnings. Is there some better way to do this
> >> that doesn't require the reinitialization of the delayed work on each
> >> inode allocation?
> > 
> > If this is debug objects false positive, it should be rather fixed in
> > debug object code itself, not workaround in nfs code, but I do not have
> > any idea how to do this.
> > 
> > Stanislaw
> >  
> 
> I don't think so. Look at what INIT_DELAYED_WORK does.
>    INIT_WORK - Set some start values, register with the objects-debugger if on.
>    init_timer - 
> 
> I'd imagine that cancel_delayed_work_sync() would: 1st cancel some of those
> registrations, specifically with the objects-debugger. 2nd will leave some
> state undefined as left from the last round.
> 
> I think this patch is cardinal and you might have found a potential bug.
> 
> Also the cost of INIT_DELAYED_WORK is marginal compare to the extra safety.
> 
> Recomended-by: Boaz Harrosh <bharrosh@panasas.com>
> 

No, I don't think the state would be undefined after
cancel_delayed_work_sync. In principle you could requeue that work
again if you like without needing to reinitialize it.

I think this is a problem in the debugobjects code. It doesn't have
any way to know that when the object is recycled out of the slab that
the work is already initialized.

Certainly it's simple enough to reinitialize the work every time we
allocate an inode here, but I don't think this is really a rpc_pipefs
bug per-se. I can send a patch that works around this problem, but
if there are plans to fix this in the debugobjects code, I won't
bother...

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
  2012-01-24 12:36         ` Jeff Layton
@ 2012-01-24 15:01           ` Boaz Harrosh
  2012-01-24 16:32             ` Jeff Layton
  0 siblings, 1 reply; 13+ messages in thread
From: Boaz Harrosh @ 2012-01-24 15:01 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Stanislaw Gruszka, Stephen Boyd, linux-kernel, bfields,
	linux-nfs, Thomas Gleixner, Tejun Heo

On 01/24/2012 02:36 PM, Jeff Layton wrote:
> 
> No, I don't think the state would be undefined after
> cancel_delayed_work_sync. In principle you could requeue that work
> again if you like without needing to reinitialize it.
> 
> I think this is a problem in the debugobjects code. It doesn't have
> any way to know that when the object is recycled out of the slab that
> the work is already initialized.
> 

The only difference between your above example of requeue after
cancel_delayed_work_sync, and this here is the visit back to the
slab. Does the slab (Maybe in debug mode) stumps over some of the
record memory?

If the memory is constant what is then the difference between the two
cases?

> Certainly it's simple enough to reinitialize the work every time we
> allocate an inode here, but I don't think this is really a rpc_pipefs
> bug per-se. 

That depends on the API intention. If an init is intended after
SLAB free then yes if not then not. We should ask for the intention
of this API.

> I can send a patch that works around this problem, but
> if there are plans to fix this in the debugobjects code, I won't
> bother...
> 

You mean other fix then calling INIT_DELAYED_WORK? is that so
bad that we need more code to avoid it?

Thanks
Boaz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
  2012-01-24 15:01           ` Boaz Harrosh
@ 2012-01-24 16:32             ` Jeff Layton
  2012-01-24 17:43               ` Jeff Layton
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Layton @ 2012-01-24 16:32 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Stanislaw Gruszka, Stephen Boyd, linux-kernel, bfields,
	linux-nfs, Thomas Gleixner, Tejun Heo

On Tue, 24 Jan 2012 17:01:29 +0200
Boaz Harrosh <bharrosh@panasas.com> wrote:

> On 01/24/2012 02:36 PM, Jeff Layton wrote:
> > 
> > No, I don't think the state would be undefined after
> > cancel_delayed_work_sync. In principle you could requeue that work
> > again if you like without needing to reinitialize it.
> > 
> > I think this is a problem in the debugobjects code. It doesn't have
> > any way to know that when the object is recycled out of the slab that
> > the work is already initialized.
> > 
> 
> The only difference between your above example of requeue after
> cancel_delayed_work_sync, and this here is the visit back to the
> slab. Does the slab (Maybe in debug mode) stumps over some of the
> record memory?
> 
> If the memory is constant what is then the difference between the two
> cases?
> 
> > Certainly it's simple enough to reinitialize the work every time we
> > allocate an inode here, but I don't think this is really a rpc_pipefs
> > bug per-se. 
> 
> That depends on the API intention. If an init is intended after
> SLAB free then yes if not then not. We should ask for the intention
> of this API.
> 
> > I can send a patch that works around this problem, but
> > if there are plans to fix this in the debugobjects code, I won't
> > bother...
> > 
> 
> You mean other fix then calling INIT_DELAYED_WORK? is that so
> bad that we need more code to avoid it?
> 

I'm not opposed to a patch that sidesteps this problem, but I want to
make sure we understand it so that we don't get bitten by it in other
places. That's a good point. I hadn't considered whether memory
poisoning is a factor. In the kernel I was testing:

CONFIG_SLUB=y
CONFIG_SLUB_DEBUG_ON=y

...just to be sure:

# cat /sys/kernel/slab/rpc_inode_cache/poison 
1

Looking at the code...

It looks like SLAB will call the ctor on every object when it's
allocated, even if it was recycled from an existing slab. SLUB doesn't
do that however -- as best I can tell it avoids poisoning objects when
there is a ctor function, so they don't get reinitialized like they
would with SLAB.

Probably the best solution here is to eliminate the ctor function and
just initialize the objects whenever they're allocated. Since these
objects aren't frequently recycled then there's little benefit to
keeping that around, IMO. I'll spin up a patch for that soon.

Still, I wonder if there are other problems like this around. The slab
allocators seem to call debug_check_no_obj_freed() on kmem_cache_free,
but parts of the objects themselves (like the timer in the work object
here) get initialized in other places and aren't necessarily
reinitialized when they're recycled out of the slab...

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
  2012-01-24 16:32             ` Jeff Layton
@ 2012-01-24 17:43               ` Jeff Layton
  2012-01-25 14:05                 ` Thomas Gleixner
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Layton @ 2012-01-24 17:43 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Stanislaw Gruszka, Stephen Boyd, linux-kernel, bfields,
	linux-nfs, Thomas Gleixner, Tejun Heo

On Tue, 24 Jan 2012 11:32:34 -0500
Jeff Layton <jlayton@redhat.com> wrote:

> On Tue, 24 Jan 2012 17:01:29 +0200
> Boaz Harrosh <bharrosh@panasas.com> wrote:
> 
> > On 01/24/2012 02:36 PM, Jeff Layton wrote:
> > > 
> > > No, I don't think the state would be undefined after
> > > cancel_delayed_work_sync. In principle you could requeue that work
> > > again if you like without needing to reinitialize it.
> > > 
> > > I think this is a problem in the debugobjects code. It doesn't have
> > > any way to know that when the object is recycled out of the slab that
> > > the work is already initialized.
> > > 
> > 
> > The only difference between your above example of requeue after
> > cancel_delayed_work_sync, and this here is the visit back to the
> > slab. Does the slab (Maybe in debug mode) stumps over some of the
> > record memory?
> > 
> > If the memory is constant what is then the difference between the two
> > cases?
> > 
> > > Certainly it's simple enough to reinitialize the work every time we
> > > allocate an inode here, but I don't think this is really a rpc_pipefs
> > > bug per-se. 
> > 
> > That depends on the API intention. If an init is intended after
> > SLAB free then yes if not then not. We should ask for the intention
> > of this API.
> > 
> > > I can send a patch that works around this problem, but
> > > if there are plans to fix this in the debugobjects code, I won't
> > > bother...
> > > 
> > 
> > You mean other fix then calling INIT_DELAYED_WORK? is that so
> > bad that we need more code to avoid it?
> > 
> 
> I'm not opposed to a patch that sidesteps this problem, but I want to
> make sure we understand it so that we don't get bitten by it in other
> places. That's a good point. I hadn't considered whether memory
> poisoning is a factor. In the kernel I was testing:
> 
> CONFIG_SLUB=y
> CONFIG_SLUB_DEBUG_ON=y
> 
> ...just to be sure:
> 
> # cat /sys/kernel/slab/rpc_inode_cache/poison 
> 1
> 
> Looking at the code...
> 
> It looks like SLAB will call the ctor on every object when it's
> allocated, even if it was recycled from an existing slab. SLUB doesn't
> do that however -- as best I can tell it avoids poisoning objects when
> there is a ctor function, so they don't get reinitialized like they
> would with SLAB.
> 
> Probably the best solution here is to eliminate the ctor function and
> just initialize the objects whenever they're allocated. Since these
> objects aren't frequently recycled then there's little benefit to
> keeping that around, IMO. I'll spin up a patch for that soon.
> 
> Still, I wonder if there are other problems like this around. The slab
> allocators seem to call debug_check_no_obj_freed() on kmem_cache_free,
> but parts of the objects themselves (like the timer in the work object
> here) get initialized in other places and aren't necessarily
> reinitialized when they're recycled out of the slab...
> 

On second thought...getting rid of the ctor function here might be
problematic. We have to call inode_init_once, etc...

Almost all of the inode slabs have one, so I've settled for just moving
the INIT_DELAYED_WORK call out of init_once and into rpc_alloc_inode. I
sent a patch to Trond and linux-nfs to do that. That will fix this
case, but I do wonder if there are other places in the kernel that have
similar problems with debugobject initialization.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
  2012-01-24 17:43               ` Jeff Layton
@ 2012-01-25 14:05                 ` Thomas Gleixner
  2012-01-25 14:46                   ` Jeff Layton
  2012-01-25 14:47                   ` Boaz Harrosh
  0 siblings, 2 replies; 13+ messages in thread
From: Thomas Gleixner @ 2012-01-25 14:05 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Boaz Harrosh, Stanislaw Gruszka, Stephen Boyd, linux-kernel,
	bfields, linux-nfs, Tejun Heo

On Tue, 24 Jan 2012, Jeff Layton wrote:
> > Still, I wonder if there are other problems like this around. The slab
> > allocators seem to call debug_check_no_obj_freed() on kmem_cache_free,
> > but parts of the objects themselves (like the timer in the work object
> > here) get initialized in other places and aren't necessarily
> > reinitialized when they're recycled out of the slab...
> > 
> 
> On second thought...getting rid of the ctor function here might be
> problematic. We have to call inode_init_once, etc...
> 
> Almost all of the inode slabs have one, so I've settled for just moving
> the INIT_DELAYED_WORK call out of init_once and into rpc_alloc_inode. I
> sent a patch to Trond and linux-nfs to do that. That will fix this
> case, but I do wonder if there are other places in the kernel that have
> similar problems with debugobject initialization.

The problem is that debugobject requires that a newly allocated object
is reinitialized and made available to the debugobjects code again
simply because we remove it from the debugobjects core on
kmem_cache_free(). 

The real question is why the heck kmem_cache_alloc() does not call the
ctor on each allocation and just expects the previously used and freed
object to be in a consistent initialiazed state.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
  2012-01-25 14:05                 ` Thomas Gleixner
@ 2012-01-25 14:46                   ` Jeff Layton
  2012-01-25 14:47                   ` Boaz Harrosh
  1 sibling, 0 replies; 13+ messages in thread
From: Jeff Layton @ 2012-01-25 14:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Boaz Harrosh, Stanislaw Gruszka, Stephen Boyd, linux-kernel,
	bfields, linux-nfs, Tejun Heo

On Wed, 25 Jan 2012 15:05:03 +0100 (CET)
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Tue, 24 Jan 2012, Jeff Layton wrote:
> > > Still, I wonder if there are other problems like this around. The slab
> > > allocators seem to call debug_check_no_obj_freed() on kmem_cache_free,
> > > but parts of the objects themselves (like the timer in the work object
> > > here) get initialized in other places and aren't necessarily
> > > reinitialized when they're recycled out of the slab...
> > > 
> > 
> > On second thought...getting rid of the ctor function here might be
> > problematic. We have to call inode_init_once, etc...
> > 
> > Almost all of the inode slabs have one, so I've settled for just moving
> > the INIT_DELAYED_WORK call out of init_once and into rpc_alloc_inode. I
> > sent a patch to Trond and linux-nfs to do that. That will fix this
> > case, but I do wonder if there are other places in the kernel that have
> > similar problems with debugobject initialization.
> 
> The problem is that debugobject requires that a newly allocated object
> is reinitialized and made available to the debugobjects code again
> simply because we remove it from the debugobjects core on
> kmem_cache_free(). 
> 
> The real question is why the heck kmem_cache_alloc() does not call the
> ctor on each allocation and just expects the previously used and freed
> object to be in a consistent initialiazed state.
> 
> Thanks,
> 
> 	tglx


I believe that's by design. The comments at the top of slab.c say:

 * This means, that your constructor is used only for newly allocated
 * slabs and you must pass objects with the same initializations to
 * kmem_cache_free.

I assume that it was done that way for efficiency, but not passing
"clean" objects to kmem_cache_free has been the source of bugs in the
past.

Rerunning the ctor should be safe. SLAB already does that when memory
poisoning is enabled. Perhaps we could make sure all the allocators do
that when debug objects are enabled?

OTOH, doing that might paper over bugs in some cases...

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0()
  2012-01-25 14:05                 ` Thomas Gleixner
  2012-01-25 14:46                   ` Jeff Layton
@ 2012-01-25 14:47                   ` Boaz Harrosh
  1 sibling, 0 replies; 13+ messages in thread
From: Boaz Harrosh @ 2012-01-25 14:47 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Jeff Layton, Stanislaw Gruszka, Stephen Boyd, linux-kernel,
	bfields, linux-nfs, Tejun Heo

On 01/25/2012 04:05 PM, Thomas Gleixner wrote:
> On Tue, 24 Jan 2012, Jeff Layton wrote:
>>> Still, I wonder if there are other problems like this around. The slab
>>> allocators seem to call debug_check_no_obj_freed() on kmem_cache_free,
>>> but parts of the objects themselves (like the timer in the work object
>>> here) get initialized in other places and aren't necessarily
>>> reinitialized when they're recycled out of the slab...
>>>
>>
>> On second thought...getting rid of the ctor function here might be
>> problematic. We have to call inode_init_once, etc...
>>
>> Almost all of the inode slabs have one, so I've settled for just moving
>> the INIT_DELAYED_WORK call out of init_once and into rpc_alloc_inode. I
>> sent a patch to Trond and linux-nfs to do that. That will fix this
>> case, but I do wonder if there are other places in the kernel that have
>> similar problems with debugobject initialization.
> 
> The problem is that debugobject requires that a newly allocated object
> is reinitialized and made available to the debugobjects code again
> simply because we remove it from the debugobjects core on
> kmem_cache_free(). 
> 
> The real question is why the heck kmem_cache_alloc() does not call the
> ctor on each allocation and just expects the previously used and freed
> object to be in a consistent initialiazed state.
> 

Perhaps it's some flags that is to do with the RCU delete thing.
You know for the lockless walks and stuff

> Thanks,
> 
> 	tglx


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-01-25 14:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-20 18:56 WARNING: at lib/debugobjects.c:262 debug_print_object+0x8c/0xb0() Jeff Layton
2012-01-22  8:46 ` Stephen Boyd
2012-01-22 12:14   ` Jeff Layton
2012-01-23 15:23   ` Jeff Layton
2012-01-24  7:45     ` Stanislaw Gruszka
2012-01-24  9:51       ` Boaz Harrosh
2012-01-24 12:36         ` Jeff Layton
2012-01-24 15:01           ` Boaz Harrosh
2012-01-24 16:32             ` Jeff Layton
2012-01-24 17:43               ` Jeff Layton
2012-01-25 14:05                 ` Thomas Gleixner
2012-01-25 14:46                   ` Jeff Layton
2012-01-25 14:47                   ` Boaz Harrosh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).