All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolai Stange <nicstange@gmail.com>
To: Johannes Berg <johannes@sipsolutions.net>
Cc: Nicolai Stange <nicstange@gmail.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	"Paul E.McKenney" <paulmck@linux.vnet.ibm.com>,
	gregkh <gregkh@linuxfoundation.org>
Subject: Re: deadlock in synchronize_srcu() in debugfs?
Date: Fri, 31 Mar 2017 11:03:29 +0200	[thread overview]
Message-ID: <871stdyg0u.fsf@gmail.com> (raw)
In-Reply-To: <1490872283.32041.4.camel@sipsolutions.net> (Johannes Berg's message of "Thu, 30 Mar 2017 13:11:23 +0200")

On Thu, Mar 30 2017, Johannes Berg wrote:

> On Thu, 2017-03-30 at 12:27 +0200, Nicolai Stange wrote:
>> So, please correct me if I'm wrong, there are two problems with
>> indefinitely blocking debugfs files' fops:
>> 
>> 1. The one which actually hung your system:
>>    An indefinitely blocking debugfs_remove() while holding a lock.
>>    Other tasks attempting to grab that same lock get stuck as well.
>> 
>> 2. The other one you've found, namely that the locking granularity is
>>    too coarse: a debugfs_remove() would get blocked by unrelated
>> files'
>>    pending fops.
>
> No, this isn't really an accurate description of the two problems.
>
>> AFAICS, the first one can't get resolved by simply refining the
>> blocking granularity: a debugfs_remove() on the indefinitely blocking
>> file would still block as well.
>
> Correct.
>
> The first problem - the one I ran into - is the following:
>
> 1)
> A given debugfs file's .read() was waiting for some event to happen
> (being a blocking file), and I was trying to debugfs_remove() some
> completely unrelated file, this got stuck.

I got it now. I was missing the "completely unrelated file" part.
(Admittedly, a related file would have made no sense at all -- the
remover would have been responsible to cancel any indefinite blocking in
there, as you said).

> Due to me holding a lock while doing this debugfs_remove(), other tasks
> *also* got stuck, but that's just a sub-problem - having the
> debugfs_remove() of an unrelated file get stuck would already have been
> a problem - the fact that other tasks also got stuck was just an
> additional wrinkle.
>
> Mind - this is a livelock of sorts - if the debugfs file will ever make
> progress, the system can recover.
>


> 2)
> There's a complete deadlock situation if this happens:
>
> CPU1					CPU2
>
> debugfs_file_read(file="foo")		mutex_lock(&M);
> srcu_read_lock(&debugfs_srcu);		debugfs_remove(file="bar")
> mutex_lock(&M);				synchronize_srcu(&debugfs_srcu)
>
> This is intrinsically unrecoverable.

Let's address this in a second step.


> This is the core of the problem really - that you're tying completely
> unrelated processes together.
>
> Therefore, to continue using SRCU in this way means that you have to
> disallow blocking debugfs files. There may not be many of those, but
> any single one of them would be a problem.
>
> If we stop using SRCU this way we can discuss how we can fix it - but
> anything more coarse grained than per-file (which really makes SRCU
> unsuitable) would still have the same problem one way or another. And
> we haven't even addressed the deadlock situation (2 above) either.
>
>> When I did this, per-file reader/writer locks actuallt came to my
>> mind first. The problem here is that debugfs_use_file_start() must
>> grab the lock first and check whether the file has been deleted in
>> the meanwhile. But as it stands, there's nothing that would guarantee
>> the existence of the lock at the time it's to be taken.
>
> That seems like a strange argument to me - something has to exist for a
> process to be able to look up the file, and currently the proxy also
> has to exist?

No, the proxies are created at file _open_ time and installed at the
struct file.

Rationale: there are potentially many debugfs files with only few of them
opened at a time and a proxy, i.e. a struct file_operations, is quite
large.


> So when a file is created you can allocate the proxy for it, and if you
> can look up the proxy object - perhaps even using plain RCU - then you
> also have the lock? IOW, instead of storing just the real_fops in
> d_fsdata, you can store a small object that holds a lock and the
> real_fops. You can always access that object, and lock it, but the
> real_fops inside it might eventually end up NULL which you handle
> through proxying. No?

As said, there isn't always a proxy object around.

Of course, attaching some sort of lock on a per-file basis should be
doable. I just refrained from doing it so far (and resorted to SRCU
instead) because I wasn't aware of those indefinite blockers and wanted
to avoid the additional complexity (namely avoiding use-after-frees on
that lock).

I'll work out a solution this weekend and send some RFC patches then.

Thanks for your clarifications!

Nicolai

  reply	other threads:[~2017-03-31  9:03 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-23 14:54 deadlock in synchronize_srcu() in debugfs? Johannes Berg
2017-03-23 15:29 ` Johannes Berg
2017-03-24  8:56   ` Johannes Berg
2017-03-24  9:24     ` Johannes Berg
2017-03-24 17:45       ` Paul E. McKenney
2017-03-24 18:51         ` Johannes Berg
2017-03-24 19:33           ` Paul E. McKenney
2017-03-24 20:20             ` Paul E. McKenney
2017-03-27 11:18               ` Johannes Berg
2017-03-23 15:36 ` Nicolai Stange
2017-03-23 15:47   ` Johannes Berg
2017-03-27 11:36   ` Johannes Berg
2017-03-30  7:32     ` Nicolai Stange
2017-03-30  7:55       ` Johannes Berg
2017-03-30 10:27         ` Nicolai Stange
2017-03-30 11:11           ` Johannes Berg
2017-03-31  9:03             ` Nicolai Stange [this message]
2017-03-31  9:44               ` Johannes Berg
2017-04-16  9:51               ` [RFC PATCH 0/9] debugfs: per-file removal protection Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 1/9] debugfs: add support for more elaborate ->d_fsdata Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 2/9] debugfs: implement per-file removal protection Nicolai Stange
2017-04-18  2:23                   ` [lkp-robot] [debugfs] f3e7155d08: BUG:unable_to_handle_kernel kernel test robot
2017-04-18  2:23                     ` kernel test robot
2017-04-23 18:37                     ` Nicolai Stange
2017-04-23 18:37                       ` Nicolai Stange
2017-04-24  6:36                       ` Ye Xiaolong
2017-04-24  6:36                         ` Ye Xiaolong
2017-04-16  9:51                 ` [RFC PATCH 3/9] debugfs: debugfs_real_fops(): drop __must_hold sparse annotation Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 4/9] debugfs: convert to debugfs_file_get() and -put() Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 5/9] IB/hfi1: " Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 6/9] debugfs: purge obsolete SRCU based removal protection Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 7/9] debugfs: call debugfs_real_fops() only after debugfs_file_get() Nicolai Stange
2017-04-16  9:51                 ` [RFC PATCH 8/9] debugfs: defer debugfs_fsdata allocation to first usage Nicolai Stange
2017-04-18  9:36                   ` Johannes Berg
2017-05-02 20:05                     ` Nicolai Stange
2017-05-03  5:43                       ` Johannes Berg
2017-04-16  9:51                 ` [RFC PATCH 9/9] debugfs: free debugfs_fsdata instances Nicolai Stange
2017-04-17 16:01                   ` Paul E. McKenney
2017-04-18  9:39                     ` Johannes Berg
2017-04-18 13:31                       ` Paul E. McKenney
2017-04-18 13:40                         ` Johannes Berg
2017-04-18 15:17                           ` Paul E. McKenney
2017-04-18 15:20                             ` Johannes Berg
2017-04-18 17:19                               ` Paul E. McKenney
2017-03-23 15:37 ` deadlock in synchronize_srcu() in debugfs? Paul E. McKenney
2017-03-23 15:46   ` Johannes Berg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871stdyg0u.fsf@gmail.com \
    --to=nicstange@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=johannes@sipsolutions.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.