All of lore.kernel.org
 help / color / mirror / Atom feed
* NFS invalid refcount warnings
@ 2017-03-22 14:37 Marcin Nowakowski
  0 siblings, 0 replies; only message in thread
From: Marcin Nowakowski @ 2017-03-22 14:37 UTC (permalink / raw)
  To: linux-nfs

Hi,

I'm trying to debug an issue I'm seeing on my test machine that occurs 
quite reliably, although I'm unfortunately unable to descibe any 
specific steps to reproduce the issue.

The system is running kernel 4.10.4
The rootfs is on an NFS share mounted with the following opts:

<***> on / type nfs 
(rw,relatime,vers=3,rsize=4096,wsize=4096,namlen=255,hard,nolock,
proto=udp,timeo=10,retrans=3,sec=sys,mountaddr=<***>,
mountvers=3,mountproto=udp,local_lock=all,addr=<***>)

The system running linux is an FPGA so it is relatively slow and it 
performs various stability tests running a lot of applications in 
parallel, which makes it particularly slow due to heavy load ;)

It usually takes 30 to 60 minutes for the following error to occur:

warning in nfs_scan_commit_list::kref_get()
[ 3671.685359] [<80453ae4>] nfs_scan_commit_list+0x228/0x248
[ 3671.685359] [<80453ba0>] nfs_scan_commit+0x9c/0x118
[ 3671.685359] [<80453ef8>] nfs_commit_inode+0xf8/0x17c
[ 3671.752838] [<80454300>] nfs_wb_all+0x140/0x278
[ 3671.752838] [<80443390>] nfs_setattr+0x364/0x47c
[ 3671.752838] [<8032ae58>] notify_change+0x1c0/0x4c4
[ 3671.752838] [<80349ab0>] utimes_common+0xc8/0x194
[ 3671.752838] [<80349cd8>] do_utimes+0x15c/0x188
[ 3671.752838] [<80349e9c>] SyS_utimensat+0xa8/0xf8
[ 3671.752838] [<8011a5d8>] syscall_common+0x34/0x58

After the first error, there are usually more that follow, sometimes 
with the same call stack, sometimes different, eg.
[ 3674.001118] [<80453ae4>] nfs_scan_commit_list+0x228/0x248
[ 3674.001118] [<80453ba0>] nfs_scan_commit+0x9c/0x118
[ 3674.001118] [<80453ef8>] nfs_commit_inode+0xf8/0x17c
[ 3674.001118] [<80454198>] nfs_write_inode+0xa4/0xcc
[ 3674.001118] [<80342da4>] __writeback_single_inode+0x360/0x6e0
[ 3674.001118] [<80343934>] writeback_sb_inodes+0x2b8/0x514
[ 3674.001118] [<80343c50>] __writeback_inodes_wb+0xc0/0x114
[ 3674.001118] [<80343fd4>] wb_writeback+0x330/0x494
[ 3674.001118] [<80344eb0>] wb_workfn+0x2cc/0x77c
[ 3674.001118] [<80179154>] process_one_work+0x20c/0x69c
[ 3674.001118] [<80179760>] worker_thread+0x17c/0x530
[ 3674.001118] [<8018077c>] kthread+0x164/0x194
[ 3674.001118] [<80105dd4>] ret_from_kernel_thread+0x14/0x1c

A few of those warnings are usually followed by a linked-list debug 
warnings or dereferences of NULL pointers in  nfs_inode_remove_request 
(req->wb_context is null)

I'd appreciate any help with debugging this issue, as I'm struggling to 
get a better understanding of what may be happening (obviously this 
looks like it might be caused by incorrect locking somewhere, but as I'm 
not familiar with the nfs code it's not easy to understand how it works, 
especially given its async structure)


thanks,
Marcin


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2017-03-22 14:38 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-22 14:37 NFS invalid refcount warnings Marcin Nowakowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.