All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolai Stange <nicstange@gmail.com>
To: Johannes Berg <johannes@sipsolutions.net>
Cc: Nicolai Stange <nicstange@gmail.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	"Paul E.McKenney" <paulmck@linux.vnet.ibm.com>,
	gregkh <gregkh@linuxfoundation.org>
Subject: Re: deadlock in synchronize_srcu() in debugfs?
Date: Thu, 30 Mar 2017 09:32:20 +0200	[thread overview]
Message-ID: <87lgrnmd8b.fsf@gmail.com> (raw)
In-Reply-To: <1490614617.3393.4.camel@sipsolutions.net> (Johannes Berg's message of "Mon, 27 Mar 2017 13:36:57 +0200")

Hi Johannes,

On Mon, Mar 27 2017, Johannes Berg wrote:

>> > Before I go hunting - has anyone seen a deadlock in
>> > synchronize_srcu() in debugfs_remove() before?
>> 
>> Not yet. How reproducible is this?
>
> So ... this turned out to be a livelock of sorts.
>
> We have a debugfs file (not upstream (yet?), it seems) that basically
> blocks reading data.
>
> At the point of system hanging, there was a process reading from that
> file, with no data being generated.

>
> A second process was trying to remove a completely unrelated debugfs
> file (*), with the RTNL held.

I wonder if holding the RTNL during the debugfs file removal is really
needed. I'll try to have a look in the next couple of days.

>
> A third and many other processes were waiting to acquire the RTNL.
>
>
> Obviously, in light of things like nfp_net_debugfs_tx_q_read(),
> wil_write_file_reset(), lowpan_short_addr_get() and quite a few more,
> nobody in the whole system can now remove debugfs files while holding
> the RTNL. Not sure how many people that affects, but it's IMHO a pretty
> major new restriction, and one that isn't even flagged at all.

To be honest, I didn't have this scenario, i.e. removing a debugfs file
under a lock, in mind when writing this removal protection series.

Thank you very much for your debugging work and for pointing me to this
sort of problem!

Summarizing, the problem is the call to the indefinitely blocking
srcu_synchronize() while having a lock held? I'll see whether I can ask
lockdep if any lock is held and spit out a warning then.


> Similarly, nobody should be blocking in debugfs files, like we did in
> ours, but also smsdvb_stats_read(), crtc_crc_open() look like they
> could block for quite a while.

Blocking in the debugfs files' fops shall be fine by itself, that's why
SRCU is used for the removal stuff.


> Again, there's no warning here that blocking in debugfs files can now
> indefinitely defer completely unrelated debugfs_remove() calls in the
> entire system.

Yes, there's only one global srcu_struct for debugfs. So far this hasn't
been a problem and if I understand things correctly, it's also not the
problem at hand? If it really becomes an issue, we can very well
introduce per directory srcu_structs as you suggested.


> Overall, while I can solve this problem for our driver, possibly by
> making the debugfs file return some dummy data periodically if no real
> data exists, which may not easily be possible for all such files, I'm
> not convinced that all of this really is the right thing to actually
> impose.

No, I agree: imposing dummy data reads certainly isn't.

Let me have a look
- whether holding the RTNL lock while removing the debugfs files is
  actually needed and
- whether there is an easy way to spot similar scenarios and emit
  a warning for them.

If this doesn't solve the problem, I'll have to think of a different way
to fix this...

> (*) before removing first first we'd obviously wake up and thereby more
> or less terminate the readers first

With the current implementation, I can't see an easy way to identify the
tasks blocking on a particular debugfs file. But maybe this is
resolvable and the way to go here...


Thanks,

Nicolai

  reply	other threads:[~2017-03-30  7:32 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-23 14:54 deadlock in synchronize_srcu() in debugfs? Johannes Berg
2017-03-23 15:29 ` Johannes Berg
2017-03-24  8:56   ` Johannes Berg
2017-03-24  9:24     ` Johannes Berg
2017-03-24 17:45       ` Paul E. McKenney
2017-03-24 18:51         ` Johannes Berg
2017-03-24 19:33           ` Paul E. McKenney
2017-03-24 20:20             ` Paul E. McKenney
2017-03-27 11:18               ` Johannes Berg
2017-03-23 15:36 ` Nicolai Stange
2017-03-23 15:47   ` Johannes Berg
2017-03-27 11:36   ` Johannes Berg
2017-03-30  7:32     ` Nicolai Stange [this message]
2017-03-30  7:55       ` Johannes Berg
2017-03-30 10:27         ` Nicolai Stange
2017-03-30 11:11           ` Johannes Berg
2017-03-31  9:03             ` Nicolai Stange
2017-03-31  9:44               ` Johannes Berg
2017-04-16  9:51               ` [RFC PATCH 0/9] debugfs: per-file removal protection Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 1/9] debugfs: add support for more elaborate ->d_fsdata Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 2/9] debugfs: implement per-file removal protection Nicolai Stange
2017-04-18  2:23                   ` [lkp-robot] [debugfs] f3e7155d08: BUG:unable_to_handle_kernel kernel test robot
2017-04-18  2:23                     ` kernel test robot
2017-04-23 18:37                     ` Nicolai Stange
2017-04-23 18:37                       ` Nicolai Stange
2017-04-24  6:36                       ` Ye Xiaolong
2017-04-24  6:36                         ` Ye Xiaolong
2017-04-16  9:51                 ` [RFC PATCH 3/9] debugfs: debugfs_real_fops(): drop __must_hold sparse annotation Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 4/9] debugfs: convert to debugfs_file_get() and -put() Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 5/9] IB/hfi1: " Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 6/9] debugfs: purge obsolete SRCU based removal protection Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 7/9] debugfs: call debugfs_real_fops() only after debugfs_file_get() Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 8/9] debugfs: defer debugfs_fsdata allocation to first usage Nicolai Stange
2017-04-18  9:36                   ` Johannes Berg
2017-05-02 20:05                     ` Nicolai Stange
2017-05-03  5:43                       ` Johannes Berg
2017-04-16  9:51                 ` [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances Nicolai Stange
2017-04-17 16:01                   ` Paul E. McKenney
2017-04-18  9:39                     ` Johannes Berg
2017-04-18 13:31                       ` Paul E. McKenney
2017-04-18 13:40                         ` Johannes Berg
2017-04-18 15:17                           ` Paul E. McKenney
2017-04-18 15:20                             ` Johannes Berg
2017-04-18 17:19                               ` Paul E. McKenney
2017-03-23 15:37 ` deadlock in synchronize_srcu() in debugfs? Paul E. McKenney
2017-03-23 15:46   ` Johannes Berg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lgrnmd8b.fsf@gmail.com \
    --to=nicstange@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=johannes@sipsolutions.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.