From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-fsdevel-owner@vger.kernel.org>
Received: from mail-pl0-f67.google.com ([209.85.160.67]:39188 "EHLO
        mail-pl0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1728614AbeGYKXH (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Wed, 25 Jul 2018 06:23:07 -0400
Received: by mail-pl0-f67.google.com with SMTP id m1-v6so3023994plt.6
        for <linux-fsdevel@vger.kernel.org>; Wed, 25 Jul 2018 02:12:21 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <CAJfpegtXhRmSTxEbvGZGUCORoxKvsjeJEfFtUEzGuC76ZE3rmA@mail.gmail.com>
References: <000000000000bc17b60571a60434@google.com> <CACT4Y+bKU8f4jVENYHX=fzNVd95A4vce2F=UCV12paVNFv-LNg@mail.gmail.com>
 <CAJfpegsKWGZ4LVeQXrrCr47+Bch4yfOWcWMFSniQsRzjRof=RQ@mail.gmail.com>
 <CACT4Y+ZbRi=0kRiR-j-SkngsB_QuALfnOX5nGF4agQD-weFsew@mail.gmail.com>
 <CAJfpegs0by5OJ7iqtg6L3T1w2RrFRiU6yRufVNbt=tNpJCbf2A@mail.gmail.com>
 <CACT4Y+bSnJjtgeLdusj6czbH8080XfRs2b8L0V4R0TAixqxX6Q@mail.gmail.com>
 <CAJfpegvYHYUMdo14J_of10rmg0krjd_eiATJ3D4+XqaNys9bqQ@mail.gmail.com>
 <CACT4Y+Y2ec4-ywG+FWocir6XXkwt-11qD+SAFwswtG2Ng6BS4A@mail.gmail.com>
 <CAJfpeguni8JReXKXU_yUFqBfHav1-JFHfHzhxc=kaXu3ME6Qsw@mail.gmail.com>
 <CACT4Y+YiGEX7_naGDTDNWCKnjQAisYrmBOYNejasfidaCRP7xA@mail.gmail.com> <CAJfpegtXhRmSTxEbvGZGUCORoxKvsjeJEfFtUEzGuC76ZE3rmA@mail.gmail.com>
From: Dmitry Vyukov <dvyukov@google.com>
Date: Wed, 25 Jul 2018 11:12:00 +0200
Message-ID: <CACT4Y+ZZ_v6izsiCGym=r+MiRmxaFE=eH54dz7t0BOQ7VxTNiw@mail.gmail.com>
Subject: Re: INFO: task hung in fuse_reverse_inval_entry
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
        LKML <linux-kernel@vger.kernel.org>,
        syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
        syzbot <syzbot+bb6d800770577a083f8c@syzkaller.appspotmail.com>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Tue, Jul 24, 2018 at 5:17 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>>>> Biggest conceptual problem: your definition of fuse-server is weak.
>>>>> Take the following example: process A is holding the fuse device fd
>>>>> and is forwarding requests and replies to/from process B via a pipe.
>>>>> So basically A is just a proxy that does nothing interesting, the
>>>>> "real" server is B.  But according to your definition B is not a
>>>>> server, only A is.
>>>>
>>>> I proposed to abort fuse conn when all fuse device fd's are "killed"
>>>> (all processes having the fd opened are killed). So if _only_ process
>>>> B is killed, then, yes, it will still hang. However if A is killed or
>>>> both A and B (say, process group, everything inside of pid namespace,
>>>> etc) then the deadlock will be autoresolved without human
>>>> intervention.
>>>
>>> Okay, so you're saying:
>>>
>>> 1) when process gets SIGKILL and is uninterruptible sleep mark process as doomed
>>> 2) for a particular fuse instance find set of fuse device fd
>>> references that are in non-doomed tasks; if there are none then abort
>>> fuse instance
>>>
>>> Right?
>>
>>
>> Yes, something like this.
>> Perhaps checking for "uninterruptible sleep" is excessive. If it has
>> SIGKILL pending it's pretty much doomed already. This info should be
>> already available for tasks.
>> Not saying that it's better, but what I described was the other way
>> around: when a task killed it drops a reference to all opened fuse
>> fds, when the last fd is dropped, the connection can be aborted.
>
> struct task_struct {
> [...]
>     struct files_struct        *files;
> [...]
> };
>
> struct files_struct {
> [...]
>     struct fdtable __rcu *fdt;
> [...]
> };
>
> struct fdtable {
> [...]
>     struct file __rcu **fd;      /* current fd array */
> [...]
> };
>
> So there we have an array of pointers to struct files.  Suppose we'd
> magically be able to find files that point to fuse devices upon
> receiving SIGKILL, what would we do with them?  We can't close them:
> other tasks might still be pointing to the same files_struct.
>
> We could do a global search for non-doomed tasks referencing the same
> fuse device, but I have no clue how we'd go about doing that without
> racing with forks, fd sending, etc...


Good questions for which I don't have answers.

Maybe more waits in fuse need to be interruptible? E.g. request_wait_answer?