From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752592AbdC0L5V (ORCPT ); Mon, 27 Mar 2017 07:57:21 -0400 Received: from s3.sipsolutions.net ([5.9.151.49]:46768 "EHLO sipsolutions.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752248AbdC0L4Y (ORCPT ); Mon, 27 Mar 2017 07:56:24 -0400 Message-ID: <1490614617.3393.4.camel@sipsolutions.net> Subject: Re: deadlock in synchronize_srcu() in debugfs? From: Johannes Berg To: Nicolai Stange Cc: linux-kernel , "Paul E.McKenney" , gregkh Date: Mon, 27 Mar 2017 13:36:57 +0200 In-Reply-To: <87o9ws6m4s.fsf@gmail.com> (sfid-20170323_163621_602585_CBD64B58) References: <1490280886.2766.4.camel@sipsolutions.net> <87o9ws6m4s.fsf@gmail.com> (sfid-20170323_163621_602585_CBD64B58) Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.22.4-1 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, > > Before I go hunting - has anyone seen a deadlock in > > synchronize_srcu() in debugfs_remove() before? > > Not yet. How reproducible is this? So ... this turned out to be a livelock of sorts. We have a debugfs file (not upstream (yet?), it seems) that basically blocks reading data. At the point of system hanging, there was a process reading from that file, with no data being generated. A second process was trying to remove a completely unrelated debugfs file (*), with the RTNL held. A third and many other processes were waiting to acquire the RTNL. Obviously, in light of things like nfp_net_debugfs_tx_q_read(), wil_write_file_reset(), lowpan_short_addr_get() and quite a few more, nobody in the whole system can now remove debugfs files while holding the RTNL. Not sure how many people that affects, but it's IMHO a pretty major new restriction, and one that isn't even flagged at all. Similarly, nobody should be blocking in debugfs files, like we did in ours, but also smsdvb_stats_read(), crtc_crc_open() look like they could block for quite a while. Again, there's no warning here that blocking in debugfs files can now indefinitely defer completely unrelated debugfs_remove() calls in the entire system. Overall, while I can solve this problem for our driver, possibly by making the debugfs file return some dummy data periodically if no real data exists, which may not easily be possible for all such files, I'm not convinced that all of this really is the right thing to actually impose. Perhaps if it was per directory, or per some kind of subsystem? johannes (*) before removing first first we'd obviously wake up and thereby more or less terminate the readers first