linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Any known soft lockup issue with vfs_write()->fsnotify()?
@ 2018-03-02 22:28 Dexuan Cui
  2018-03-05 20:48 ` Jan Kara
  0 siblings, 1 reply; 3+ messages in thread
From: Dexuan Cui @ 2018-03-02 22:28 UTC (permalink / raw)
  To: linux-fsdevel, Jan Kara, Amir Goldstein, Miklos Szeredi
  Cc: Haiyang Zhang, 'linux-kernel@vger.kernel.org', Jork Loeser

Hi,
Recently people are getting a soft lock issue with vfs_write()->fsnotify(). 
The detailed calltrace is available at:
https://github.com/coreos/bugs/issues/2356
https://github.com/coreos/bugs/issues/2364

The kernel versions showing up the issue are:
4.14.11-coreos 
4.14.19-coreos
4.13.0-1009 -- this is the kernel with which I'm personally seeing the lockup.

I have not got a chance to try the latest mainline kernel yet.

Before the lockup error message suddenly appears, Linux has been running fine for many hours.
I have NOT found a consistent way to reproduce the lockup yet.

Looks the kernel is stuck in fsnotify(), when it tries to get the fsnotify_mark_srcu lock.

"git log fs/notify/fsnotify.c" on the latest mainline shows that some recent patches might help.

I'd like to check if this is a known issue.

Looking forward to your insights!

Thanks,
-- Dexuan

For your convenience, this is a calltrace from the first link:

18h 30m 8.626s(   4ms): ip-172-45-43-199 login: [67361.641359] watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [java:87260]
18h 42m 40.116s(751490ms): [67361.644600] Modules linked in: xfs xt_statistic xt_nat xt_recent ipt_REJECT nf_reject_ipv4 xt_comment xt_mark veth nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter xt_conntrack br_netfilter bridge stp llc ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c crc32c_generic vxlan ip6_udp_tunnel udp_tunnel overlay mousedev psmouse sb_edac i2c_piix4 i2c_core evdev edac_core button xenfs xen_privcmd sch_fq_codel nls_ascii nls_cp437 vfat fat dm_verity dm_bufio ext4 crc16 mbcache jbd2 fscrypto crc32c_intel ata_piix aesni_intel xen_blkfront libata aes_x86_64 crypto_simd cryptd glue_helper scsi_mod ixgbevf dm_mirror dm_region_hash dm_log dm_mod dax
18h 42m 40.142s(  26ms): [67361.668103] CPU: 10 PID: 87260 Comm: java Not tainted 4.14.11-coreos #1
18h 42m 40.142s(   0ms): [67361.670391] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006
18h 42m 40.144s(   2ms): [67361.672581] task: ffff90d6009dbc80 task.stack: ffffb2388f704000
18h 42m 40.149s(   5ms): [67361.674604] RIP: 0010:fsnotify+0x166/0x520
18h 42m 40.149s(   0ms): [67361.675971] RSP: 0018:ffffb2388f707e10 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff0c
18h 42m 40.150s(   1ms): [67361.678462] RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
18h 42m 40.152s(   2ms): [67361.680986] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffffffff907294c0
18h 42m 40.157s(   5ms): [67361.683340] RBP: ffff90d2eddffed8 R08: 0000000000000000 R09: 0000000000000000
18h 42m 40.157s(   0ms): [67361.685709] R10: ffffdd941da7a100 R11: 0000000000000000 R12: ffff90d2eddfff00
18h 42m 40.159s(   2ms): [67361.688199] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
18h 42m 40.165s(   6ms): [67361.690579] FS:  00007f491c3f4700(0000) GS:ffff90d6ef880000(0000) knlGS:0000000000000000
18h 42m 40.165s(   0ms): [67361.693227] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
18h 42m 40.166s(   1ms): [67361.695206] CR2: 000000c421288300 CR3: 00000005ba5fc002 CR4: 00000000001606e0
18h 42m 40.175s(   9ms): [67361.697655] Call Trace:
18h 42m 40.175s(   0ms): [67361.698499]  vfs_write+0x14f/0x1a0
18h 42m 40.175s(   0ms): [67361.699656]  SyS_write+0x52/0xc0
18h 42m 40.175s(   0ms): [67361.700745]  do_syscall_64+0x59/0x1c0
18h 42m 40.175s(   0ms): [67361.701996]  entry_SYSCALL64_slow_path+0x25/0x25
18h 42m 40.175s(   0ms): [67361.703536] RIP: 0033:0x7f4b5566643d
18h 42m 40.176s(   1ms): [67361.704751] RSP: 002b:00007f491c3f0ef0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
18h 42m 40.179s(   3ms): [67361.707375] RAX: ffffffffffffffda RBX: 0000000000000032 RCX: 00007f4b5566643d
18h 42m 40.188s(   9ms): [67361.709849] RDX: 00000000000000f0 RSI: 00007f491c3f0f50 RDI: 0000000000000f40
18h 42m 40.188s(   0ms): [67361.712205] RBP: 00007f491c3f0f20 R08: 00007f491c3f1030 R09: 00000005f4ff70d8
18h 42m 40.188s(   0ms): [67361.714573] R10: 0000000000052f06 R11: 0000000000000293 R12: 00000000000000f0
18h 42m 40.188s(   0ms): [67361.716924] R13: 00007f491c3f0f50 R14: 0000000000000f40 R15: 0000000000000000
18h 42m 40.191s(   3ms): [67361.719331] Code: 40 4c 89 7c 24 48 4c 89 7c 24 08 8b 44 24 18 25 00 00 03 00 89 44 24 34 4d 85 e4 0f 95 c2 48 83 7c 24 08 00 0f 95 c1 89 c8 08 d0 <0f> 84 96 03 00 00 84 d2 0f 84 e8 02 00 00 48 8b 44 24 40 84 c9

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Any known soft lockup issue with vfs_write()->fsnotify()?
  2018-03-02 22:28 Any known soft lockup issue with vfs_write()->fsnotify()? Dexuan Cui
@ 2018-03-05 20:48 ` Jan Kara
  2018-03-08 15:07   ` Haiyang Zhang
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Kara @ 2018-03-05 20:48 UTC (permalink / raw)
  To: Dexuan Cui
  Cc: linux-fsdevel, Jan Kara, Amir Goldstein, Miklos Szeredi,
	Haiyang Zhang, 'linux-kernel@vger.kernel.org',
	Jork Loeser

Hi!

On Fri 02-03-18 22:28:50, Dexuan Cui wrote:
> Recently people are getting a soft lock issue with vfs_write()->fsnotify(). 
> The detailed calltrace is available at:
> https://github.com/coreos/bugs/issues/2356
> https://github.com/coreos/bugs/issues/2364

I didn't see them yet.

> The kernel versions showing up the issue are:
> 4.14.11-coreos 
> 4.14.19-coreos
> 4.13.0-1009 -- this is the kernel with which I'm personally seeing the lockup.
> 
> I have not got a chance to try the latest mainline kernel yet.

It would be good to try 4.15 kernel to see whether recent fixes from Miklos
didn't fix your problem. They should be present in 4.14.11/19 kernels as
well but one never knows...

> Before the lockup error message suddenly appears, Linux has been running
> fine for many hours.  I have NOT found a consistent way to reproduce the
> lockup yet.
> 
> Looks the kernel is stuck in fsnotify(), when it tries to get the
> fsnotify_mark_srcu lock.

It is not possible that we would 'hang' in srcu_read_lock() - that is
just a read of one variable and increment of another. We'd have to be
looping somewhere and watchdog would have to happen to hit us always at
that place. Weird. Are you sure RIP points to srcu_read_lock?

> "git log fs/notify/fsnotify.c" on the latest mainline shows that some
> recent patches might help.
> 
> I'd like to check if this is a known issue.

As I've mentioned above, so far I didn't see reports like this...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: Any known soft lockup issue with vfs_write()->fsnotify()?
  2018-03-05 20:48 ` Jan Kara
@ 2018-03-08 15:07   ` Haiyang Zhang
  0 siblings, 0 replies; 3+ messages in thread
From: Haiyang Zhang @ 2018-03-08 15:07 UTC (permalink / raw)
  To: Jan Kara, Dexuan Cui
  Cc: linux-fsdevel, Amir Goldstein, Miklos Szeredi,
	'linux-kernel@vger.kernel.org',
	Jork Loeser, Stephen Hemminger

There was another report of the same issue on CoreOS, 4.14.11-coreos. The host/guest is AWS G4. So the problem is not limited to Azure VMs. It doesn't happen on older kernel like 4.4. Maybe the problem is related to some (recent) changes on fsnotify or other fs code?

Soft lockup kernel panic reboot on AWS instance on fsnotify and vfs_write  #2356
	https://github.com/coreos/bugs/issues/2356

Thanks,
- Haiyang

> -----Original Message-----
> From: Jan Kara <jack@suse.cz>
> Sent: Monday, March 5, 2018 3:49 PM
> To: Dexuan Cui <decui@microsoft.com>
> Cc: linux-fsdevel@vger.kernel.org; Jan Kara <jack@suse.cz>; Amir Goldstein
> <amir73il@gmail.com>; Miklos Szeredi <mszeredi@redhat.com>; Haiyang
> Zhang <haiyangz@microsoft.com>; 'linux-kernel@vger.kernel.org' <linux-
> kernel@vger.kernel.org>; Jork Loeser <Jork.Loeser@microsoft.com>
> Subject: Re: Any known soft lockup issue with vfs_write()->fsnotify()?
> 
> Hi!
> 
> On Fri 02-03-18 22:28:50, Dexuan Cui wrote:
> > Recently people are getting a soft lock issue with vfs_write()->fsnotify().
> > The detailed calltrace is available at:
> > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> >
> b.com%2Fcoreos%2Fbugs%2Fissues%2F2356&data=04%7C01%7Chaiyangz%40
> micros
> >
> oft.com%7Ca1b1bc6822c9442195ad08d582da7942%7C72f988bf86f141af91ab2
> d7cd
> >
> 011db47%7C1%7C0%7C636558797237925702%7CUnknown%7CTWFpbGZsb3d8
> eyJWIjoiM
> > C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-
> 2&sdata=pdwtsbU
> > 0%2FW3y7Zy%2BX%2Ffkbx%2FPktoKVBgimfxMyVk6Lyw%3D&reserved=0
> > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithu
> >
> b.com%2Fcoreos%2Fbugs%2Fissues%2F2364&data=04%7C01%7Chaiyangz%40
> micros
> >
> oft.com%7Ca1b1bc6822c9442195ad08d582da7942%7C72f988bf86f141af91ab2
> d7cd
> >
> 011db47%7C1%7C0%7C636558797237925702%7CUnknown%7CTWFpbGZsb3d8
> eyJWIjoiM
> > C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwifQ%3D%3D%7C-
> 2&sdata=w%2Bjed
> > u0yIYlpRut5sHa2%2Bhs5cdcdxp1dd3sHkyvRCPw%3D&reserved=0
> 
> I didn't see them yet.
> 
> > The kernel versions showing up the issue are:
> > 4.14.11-coreos
> > 4.14.19-coreos
> > 4.13.0-1009 -- this is the kernel with which I'm personally seeing the lockup.
> >
> > I have not got a chance to try the latest mainline kernel yet.
> 
> It would be good to try 4.15 kernel to see whether recent fixes from Miklos
> didn't fix your problem. They should be present in 4.14.11/19 kernels as well
> but one never knows...
> 
> > Before the lockup error message suddenly appears, Linux has been
> > running fine for many hours.  I have NOT found a consistent way to
> > reproduce the lockup yet.
> >
> > Looks the kernel is stuck in fsnotify(), when it tries to get the
> > fsnotify_mark_srcu lock.
> 
> It is not possible that we would 'hang' in srcu_read_lock() - that is just a read of
> one variable and increment of another. We'd have to be looping somewhere
> and watchdog would have to happen to hit us always at that place. Weird. Are
> you sure RIP points to srcu_read_lock?
> 
> > "git log fs/notify/fsnotify.c" on the latest mainline shows that some
> > recent patches might help.
> >
> > I'd like to check if this is a known issue.
> 
> As I've mentioned above, so far I didn't see reports like this...
> 
> 								Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-03-08 15:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-02 22:28 Any known soft lockup issue with vfs_write()->fsnotify()? Dexuan Cui
2018-03-05 20:48 ` Jan Kara
2018-03-08 15:07   ` Haiyang Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).