From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=9nlm=SA=vger.kernel.org=linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8A4FFC43381
	for <linux-fsdevel@archiver.kernel.org>; Fri, 29 Mar 2019 22:13:13 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 59DD2218A5
	for <linux-fsdevel@archiver.kernel.org>; Fri, 29 Mar 2019 22:13:13 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1730401AbfC2WNM (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Fri, 29 Mar 2019 18:13:12 -0400
Received: from mail-ot1-f47.google.com ([209.85.210.47]:41131 "EHLO
        mail-ot1-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1730240AbfC2WNM (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Fri, 29 Mar 2019 18:13:12 -0400
Received: by mail-ot1-f47.google.com with SMTP id 64so3423888otb.8
        for <linux-fsdevel@vger.kernel.org>; Fri, 29 Mar 2019 15:13:12 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=46OPwj9GVVwIioRDdYR6PLz6bZ16+w+vS15737Q+3Zw=;
        b=AzlYuyOS0OyuvQ3QRVcl1EJ+QQUxR9sQhPauu8VAzhuzLRa1Qbjl8GGVx96yzZn/k+
         Mz1OMdYfVaOdo0tlxoIQxWL/0hbKdgQ6hacsZOFf7lhFVPQ+IqF85XiFXWP+TxFlrPSV
         0dZRiN7hMdK/knmdHomYxrO1VILqdBDgnOAOAEbJ+c5ceGU2U0QMIvuuxYqIgWCcPwQx
         C+xVXvLz223Li95ogcB8xsdMtDXxvmWY2fHdYcFCLCakXkedQMkrlOqFqHbukPnedDit
         ORE6cdWWOvtHe0AnKq/MWGC7eOklAVshesCcX5hXiZ3C88efy4fMTx+Uh1uhuvWyhuPE
         5e/g==
X-Gm-Message-State: APjAAAWQQCAw4vITwFm6P/ysnljC5ZPQG0wQREZaP4UfVmkzreQxzdZc
        TIem4ddLAmQkuI3rkPYJBZx0aC4xQR3HsaOZu0pweg==
X-Google-Smtp-Source: APXvYqyjzhKwJZba4Vd2UoZyGR8W7rl8dTPWp73j0PCz4kEakuUYfOsHYJ0ne8Qf77er3cWxE7I8mtKnXNLYeqWNgqs=
X-Received: by 2002:a9d:6397:: with SMTP id w23mr15907015otk.332.1553897592008;
 Fri, 29 Mar 2019 15:13:12 -0700 (PDT)
MIME-Version: 1.0
References: <20190321131304.21618-1-agruenba@redhat.com> <20190328165104.GA21552@lst.de>
In-Reply-To: <20190328165104.GA21552@lst.de>
From:   Andreas Gruenbacher <agruenba@redhat.com>
Date:   Fri, 29 Mar 2019 23:13:00 +0100
Message-ID: <CAHc6FU49oBdo8mAq7hb1greR+B1C_Fpy5JU7RBHfRYACt1S4wA@mail.gmail.com>
Subject: Re: gfs2 iomap dealock, IOMAP_F_UNBALANCED
To:     Christoph Hellwig <hch@lst.de>
Cc:     cluster-devel <cluster-devel@redhat.com>,
        Dave Chinner <david@fromorbit.com>,
        Ross Lagerwall <ross.lagerwall@citrix.com>,
        Mark Syms <Mark.Syms@citrix.com>,
        =?UTF-8?B?RWR3aW4gVMO2csO2aw==?= <edvin.torok@citrix.com>,
        linux-fsdevel <linux-fsdevel@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

On Thu, 28 Mar 2019 at 17:51, Christoph Hellwig <hch@lst.de> wrote:
> On Thu, Mar 21, 2019 at 02:13:04PM +0100, Andreas Gruenbacher wrote:
> > Hi Christoph,
> >
> > we need your help fixing a gfs2 deadlock involving iomap.  What's going
> > on is the following:
> >
> > * During iomap_file_buffered_write, gfs2_iomap_begin grabs the log flush
> >   lock and keeps it until gfs2_iomap_end.  It currently always does that
> >   even though there is no point other than for journaled data writes.
> >
> > * iomap_file_buffered_write then calls balance_dirty_pages_ratelimited.
> >   If that ends up calling gfs2_write_inode, gfs2 will try to grab the
> >   log flush lock again and deadlock.
>
> What is the exact call chain?

It's laid out here:
https://www.redhat.com/archives/cluster-devel/2019-March/msg00000.html

> balance_dirty_pages_ratelimited these days doesn't start I/O, but just
> wakes up the flusher threads.  Or do we have a issue where it is blocking
> on those threads?

Yes, the writer is holding sd_log_flush_lock at the point where it
ends up kicking the flusher thread and waiting for writeback to
happen. The flusher thread calls gfs2_write_inode, and that tries to
grab sd_log_flush_lock again.

> Also why do you need to flush the log for background writeback in
> ->write_inode?

If we stop doing that in the (wbc->sync_mode == WB_SYNC_NONE) case,
then inodes will remain dirty until the journal is flushed for some
other reason (or a write_inode with WB_SYNC_ALL). That doesn't seem
right. We could perhaps trigger a background journal flush in the
WB_SYNC_NONE case, but that would remove the back pressure on
balance_dirty_pages. Not sure this is a good idea, either.

> balance_dirty_pages_ratelimited is per definition not a data integrity
> writeback, so there shouldn't be a good reason to flush the log
> (which I assume the log flush log is for).
>
> If we look gfs2_write_inode, this seems to be the code:
>
>         bool flush_all = (wbc->sync_mode == WB_SYNC_ALL || gfs2_is_jdata(ip));
>
>         if (flush_all)
>                 gfs2_log_flush(GFS2_SB(inode), ip->i_gl,
>                                GFS2_LOG_HEAD_FLUSH_NORMAL |
>                                GFS2_LFC_WRITE_INODE);
>
> But what is the requirement to do this in writeback context?  Can't
> we move it out into another context instead?

Indeed, this isn't for data integrity in this case but because the
dirty limit is exceeded. What other context would you suggest to move
this to?

(The iomap flag I've proposed would save us from getting into this
situation in the first place.)

Thanks,
Andreas