From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de ([195.135.220.15]:41694 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727516AbeI0WnR (ORCPT ); Thu, 27 Sep 2018 18:43:17 -0400 Date: Thu, 27 Sep 2018 18:24:12 +0200 From: Jan Kara To: Nigel Banks Cc: jack@suse.cz, linux-fsdevel@vger.kernel.org, Amir Goldstein Subject: Re: Deadlock in fsnotify for Message-ID: <20180927162412.GA12883@quack2.suse.cz> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Hello, [added to CC other relevant mails] On Thu 27-09-18 16:44:53, Nigel Banks wrote: > Sorry to trouble you, but from looking through the git history of linux/fs/ > notify you seem to be the best person to contact. > > I've encounter a hard to reproduce situation that happens on our CI > servers, in which it becomes impossible to release any inotify file > descriptors. We're currently running Ubuntu 18.04 (Kernel 4.15) using > ext4 fs, and our code is running in docker containers (overlay2) if that > makes a difference. > > Essentially we're running a number of concurrent tests which internally > use inotify to monitor some directories this all works fine and they > clean up after themselves, but after several days there will be a > deadlock in the kernel code (sys stack below): > > [<0>] flush_work+0x126/0x1e0 > [<0>] flush_delayed_work+0x3f/0x50 > [<0>] fsnotify_wait_marks_destroyed+0x15/0x20 > [<0>] fsnotify_destroy_group+0x48/0xd0 > [<0>] inotify_release+0x1e/0x50 > [<0>] __fput+0xea/0x220 > [<0>] ____fput+0xe/0x10 > [<0>] task_work_run+0x9d/0xc0 > [<0>] exit_to_usermode_loop+0xc0/0xd0 > [<0>] do_syscall_64+0x115/0x130 > [<0>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > [<0>] 0xffffffffffffffff Hum, I don't remember seeing any deadlock like this. When a system hangs like this, can you please do: echo w >/proc/sysrq-trigger and send me the output of 'dmesg' command after that. In that output we should see all hung tasks (including kernel threads) and their traces and hopefully it will tell us more. > Once a processes gets stuck in this uninterruptable sleep it will never wake. > At this point the system is still usable, we're able to create more inotify > instances and receive messages for them, but we are not able to close any of > them. So eventually we run out of handles and the system becomes unstable, not > to mention we can't run any more tests on the machine at this point, and a > reboot is required. Yes, this is expected. I looks like some deadlock in the fsnotify subsystem. > From my research, it looks like lxc project has also encountered this issue: > https://github.com/lxc/lxc/issues/2456, like them we also didn't experience > this behaviour with our previous set-up Ubuntu 16.04 (Kernel 14.04). > > I had a look through the bug lists and through the commit history for linux/fs/ > notify and could not find this issue listed anywhere. > > I've attempted to write a small C program using pthreads and the inotify > sys-calls, but was unable to create a program that could reproduce this issue. Thanks for report. Honza -- Jan Kara SUSE Labs, CR