From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A7B4C43381 for ; Thu, 28 Mar 2019 09:52:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 4C683206B6 for ; Thu, 28 Mar 2019 09:52:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726158AbfC1JwJ (ORCPT ); Thu, 28 Mar 2019 05:52:09 -0400 Received: from mx2.suse.de ([195.135.220.15]:56434 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725815AbfC1JwJ (ORCPT ); Thu, 28 Mar 2019 05:52:09 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 8AE08AE62; Thu, 28 Mar 2019 09:52:08 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id E478C1E424A; Thu, 28 Mar 2019 10:52:07 +0100 (CET) Date: Thu, 28 Mar 2019 10:52:07 +0100 From: Jan Kara To: Olivier Chapelliere Cc: jack@suse.cz, linux-fsdevel@vger.kernel.org Subject: Re: stuck in inotify_release Message-ID: <20190328095207.GD22915@quack2.suse.cz> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Hello, On Thu 28-03-19 09:26:45, Olivier Chapelliere wrote: > According to what I read on internet you seem to be the right person to get > in touch with when one has problems with inotify. Yes, there's also linux-fsdevel@vger.kernel.org mailing list which we use (added to CC). > We are monitoring several directories in python processes through inotify. > But after few days all processes are stuck in a call to inotify_release. > Once I detected the problem, I dumped info to dmesg with sysrq-trigger > (dmesg content attached): > echo w > /proc/sysrq-trigger Looking through the stack traces, all of them wait in fput() -> inotify_release() -> ... -> fsnotify_wait_marks_destroyed() -> flush_delayed_work(&reaper_work). So they wait for worker process to destroy all marks for the group. However that worker (kworker/u8:4) is stuck in: fsnotify_mark_destroy_workfn() -> synchronize_srcu(&fsnotify_mark_srcu) So the question is who is holding fsnotify_mark_srcu so that SRCU cannot declare new grace period. I don't see any such process among the processes you've shown in the dump (but it should be there) so it's a bit of a mystery. > Our production env is ubuntu 18.04 kernel 4.15 fs ext4 > This problem appears on a weekly basis so I will be able to run additional > commands to track down the issue if needed. So when this happens again, try grabbing output of sysrq-l and sysrq-t if we can find the task holding fsnotify_mark_srcu. Honza -- Jan Kara SUSE Labs, CR