linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: harshad shirwadkar <harshadshirwadkar@gmail.com>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Jan Kara <jack@suse.cz>,
	jack@suse.com,  Ext4 Developers List <linux-ext4@vger.kernel.org>,
	linux-kernel@vger.kernel.org,  syzkaller-bugs@googlegroups.com,
	"Theodore Y. Ts'o" <tytso@mit.edu>,
	mhocko@suse.cz,  linux-mm@kvack.org,
	 syzbot <syzbot+bfdded10ab7dcd7507ae@syzkaller.appspotmail.com>
Subject: Re: possible deadlock in start_this_handle (2)
Date: Fri, 19 Feb 2021 09:22:40 -0800	[thread overview]
Message-ID: <CAD+ocbynNULCAPuYyVaBThdvQxUy_Zg8phDNn_wnD4bqPBW1uw@mail.gmail.com> (raw)
In-Reply-To: <34341830-f74f-57fa-2d21-c141f239b017@i-love.sakura.ne.jp>

On Fri, Feb 19, 2021 at 2:20 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2021/02/15 23:29, Jan Kara wrote:
> > On Mon 15-02-21 23:06:15, Tetsuo Handa wrote:
> >> On 2021/02/15 21:45, Jan Kara wrote:
> >>> On Sat 13-02-21 23:26:37, Tetsuo Handa wrote:
> >>>> Excuse me, but it seems to me that nothing prevents
> >>>> ext4_xattr_set_handle() from reaching ext4_xattr_inode_lookup_create()
> >>>> without memalloc_nofs_save() when hitting ext4_get_nojournal() path.
> >>>> Will you explain when ext4_get_nojournal() path is executed?
> >>>
> >>> That's a good question but sadly I don't think that's it.
> >>> ext4_get_nojournal() is called when the filesystem is created without a
> >>> journal. In that case we also don't acquire jbd2_handle lockdep map. In the
> >>> syzbot report we can see:
> >>
> >> Since syzbot can test filesystem images, syzbot might have tested a filesystem
> >> image created both with and without journal within this boot.
> >
> > a) I think that syzbot reboots the VM between executing different tests to
> > get reproducible conditions. But in theory I agree the test may have
> > contained one image with and one image without a journal.
>
> syzkaller reboots the VM upon a crash.
> syzkaller can test multiple filesystem images within one boot.
>
> https://storage.googleapis.com/syzkaller/cover/ci-qemu-upstream-386.html (this
> statistic snapshot is volatile) reports that ext4_get_nojournal() is partially covered
> ( https://github.com/google/syzkaller/blob/master/docs/coverage.md ) by syzkaller.
>
>       /* Just increment the non-pointer handle value */
>       static handle_t *ext4_get_nojournal(void)
>       {
>    86         handle_t *handle = current->journal_info;
>               unsigned long ref_cnt = (unsigned long)handle;
>
>               BUG_ON(ref_cnt >= EXT4_NOJOURNAL_MAX_REF_COUNT);
>
>    86         ref_cnt++;
>               handle = (handle_t *)ref_cnt;
>
>               current->journal_info = handle;
>  2006         return handle;
>       }
>
>
>       /* Decrement the non-pointer handle value */
>       static void ext4_put_nojournal(handle_t *handle)
>       {
>               unsigned long ref_cnt = (unsigned long)handle;
>
>    85         BUG_ON(ref_cnt == 0);
>
>    85         ref_cnt--;
>               handle = (handle_t *)ref_cnt;
>
>               current->journal_info = handle;
>       }
>
>
>       handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line,
>                                         int type, int blocks, int rsv_blocks,
>                                         int revoke_creds)
>       {
>               journal_t *journal;
>               int err;
>
>  2006         trace_ext4_journal_start(sb, blocks, rsv_blocks, revoke_creds,
>  2006                                  _RET_IP_);
>  2006         err = ext4_journal_check_start(sb);
>               if (err < 0)
>                       return ERR_PTR(err);
>
>  2005         journal = EXT4_SB(sb)->s_journal;
>  1969         if (!journal || (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY))
>  2006                 return ext4_get_nojournal();
>  1969         return jbd2__journal_start(journal, blocks, rsv_blocks, revoke_creds,
>                                          GFP_NOFS, type, line);
>       }
>
> >
> > *but*
> >
> > b) as I wrote in the email you are replying to, the jbd2_handle key is
> > private per filesystem. Thus for lockdep to complain about
> > jbd2_handle->fs_reclaim->jbd2_handle deadlock, those jbd2_handle lockdep
> > maps must come from the same filesystem.
> >
> > *and*
> >
> > c) filesystem without journal doesn't use jbd2_handle lockdep map at all so
> > for such filesystems lockdep creates no dependency for jbd2_handle map.
> >
>
> What about "EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)" case?
> Does this case happen on filesystem with journal, and can this case be executed
> by fuzzing a crafted (a sort of erroneous) filesystem with journal, and are
> the jbd2_handle for calling ext4_get_nojournal() case and the jbd2_handle for
> calling jbd2__journal_start() case the same?
EXT4_FC_REPLAY is a mount state that is only set during jbd2 journal
recovery. The only way for jbd2 journal recovery to set EXT4_FC_REPLAY
option is if after a journal crash there are special fast_commit
blocks present in the journal. For these fast_commit blocks to be
present in the journal, the Ext4 file system prior to crash should
have had "fast_commit" feature enabled.

If we have a way to look at the Ext4 partition that syzbot used for
reporting this bug, it is very easy to see if this FC_REPLAY will ever
be set or not. Just running "debugfs <image>" and inside debugfs,
running logdump will show us if there are any fast commit blocks
present in the journal.

Having said that, I have following reason to believe that this option
wasn't set during the syzbot failure:

EXT4_FC_REPLAY will only be set during journal recovery and is cleared
immediately after. Which means EXT4_FC_REPLAY will only be set during
mount and as soon as mount returns the option will be cleared. Looking
at the stack trace, it shows no evidence that we are in the journal
recovery phase. It seems like most of the traces are resulting from
system calls made by the user. I checked if we are accidentally
setting this flag even after journal recovery, but that doesn't seem
to be the case. On a successfully mounted file system, we
unconditionally clear this flag.

- Harshad

>
> Also, I worry that jbd2__journal_restart() is problematic, for it calls
> stop_this_handle() (which calls memalloc_nofs_restore()) and then calls
> start_this_handle() (which fails to call memalloc_nofs_save() if an error
> occurs). An error from start_this_handle() from journal restart operation
> might need special handling (i.e. either remount-ro or panic) ?
>


  reply	other threads:[~2021-02-19 17:22 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <000000000000563a0205bafb7970@google.com>
2021-02-11 10:49 ` possible deadlock in start_this_handle (2) Jan Kara
2021-02-11 10:55   ` Michal Hocko
2021-02-11 11:22   ` Dmitry Vyukov
2021-02-11 11:28     ` Dmitry Vyukov
2021-02-11 12:10       ` Jan Kara
2021-02-11 12:34         ` Michal Hocko
2021-02-11 12:57           ` Matthew Wilcox
2021-02-11 13:07             ` Michal Hocko
2021-02-11 13:25               ` Matthew Wilcox
2021-02-11 14:20                 ` Michal Hocko
2021-02-11 14:26                   ` Matthew Wilcox
2021-02-11 16:41                     ` Michal Hocko
2021-02-12 11:18                       ` Tetsuo Handa
2021-02-12 12:22                         ` Matthew Wilcox
2021-02-12 12:30                           ` Michal Hocko
2021-02-12 12:58                             ` Tetsuo Handa
2021-02-12 13:12                               ` Michal Hocko
2021-02-12 13:34                                 ` Tetsuo Handa
2021-02-12 15:43                               ` Michal Hocko
2021-02-13 10:58                                 ` Dmitry Vyukov
2021-02-11 13:18             ` Dmitry Vyukov
2021-02-11 11:46     ` Jan Kara
2021-02-13 14:26   ` Tetsuo Handa
2021-02-15 12:45     ` Jan Kara
2021-02-15 14:06       ` Tetsuo Handa
2021-02-15 14:29         ` Jan Kara
2021-02-19 10:15           ` Tetsuo Handa
2021-02-19 17:22             ` harshad shirwadkar [this message]
2021-03-20 10:02           ` Tetsuo Handa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAD+ocbynNULCAPuYyVaBThdvQxUy_Zg8phDNn_wnD4bqPBW1uw@mail.gmail.com \
    --to=harshadshirwadkar@gmail.com \
    --cc=jack@suse.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=syzbot+bfdded10ab7dcd7507ae@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).