From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A4FFC43381 for ; Fri, 29 Mar 2019 22:13:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 59DD2218A5 for ; Fri, 29 Mar 2019 22:13:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730401AbfC2WNM (ORCPT ); Fri, 29 Mar 2019 18:13:12 -0400 Received: from mail-ot1-f47.google.com ([209.85.210.47]:41131 "EHLO mail-ot1-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730240AbfC2WNM (ORCPT ); Fri, 29 Mar 2019 18:13:12 -0400 Received: by mail-ot1-f47.google.com with SMTP id 64so3423888otb.8 for ; Fri, 29 Mar 2019 15:13:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=46OPwj9GVVwIioRDdYR6PLz6bZ16+w+vS15737Q+3Zw=; b=AzlYuyOS0OyuvQ3QRVcl1EJ+QQUxR9sQhPauu8VAzhuzLRa1Qbjl8GGVx96yzZn/k+ Mz1OMdYfVaOdo0tlxoIQxWL/0hbKdgQ6hacsZOFf7lhFVPQ+IqF85XiFXWP+TxFlrPSV 0dZRiN7hMdK/knmdHomYxrO1VILqdBDgnOAOAEbJ+c5ceGU2U0QMIvuuxYqIgWCcPwQx C+xVXvLz223Li95ogcB8xsdMtDXxvmWY2fHdYcFCLCakXkedQMkrlOqFqHbukPnedDit ORE6cdWWOvtHe0AnKq/MWGC7eOklAVshesCcX5hXiZ3C88efy4fMTx+Uh1uhuvWyhuPE 5e/g== X-Gm-Message-State: APjAAAWQQCAw4vITwFm6P/ysnljC5ZPQG0wQREZaP4UfVmkzreQxzdZc TIem4ddLAmQkuI3rkPYJBZx0aC4xQR3HsaOZu0pweg== X-Google-Smtp-Source: APXvYqyjzhKwJZba4Vd2UoZyGR8W7rl8dTPWp73j0PCz4kEakuUYfOsHYJ0ne8Qf77er3cWxE7I8mtKnXNLYeqWNgqs= X-Received: by 2002:a9d:6397:: with SMTP id w23mr15907015otk.332.1553897592008; Fri, 29 Mar 2019 15:13:12 -0700 (PDT) MIME-Version: 1.0 References: <20190321131304.21618-1-agruenba@redhat.com> <20190328165104.GA21552@lst.de> In-Reply-To: <20190328165104.GA21552@lst.de> From: Andreas Gruenbacher Date: Fri, 29 Mar 2019 23:13:00 +0100 Message-ID: Subject: Re: gfs2 iomap dealock, IOMAP_F_UNBALANCED To: Christoph Hellwig Cc: cluster-devel , Dave Chinner , Ross Lagerwall , Mark Syms , =?UTF-8?B?RWR3aW4gVMO2csO2aw==?= , linux-fsdevel Content-Type: text/plain; charset="UTF-8" Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Thu, 28 Mar 2019 at 17:51, Christoph Hellwig wrote: > On Thu, Mar 21, 2019 at 02:13:04PM +0100, Andreas Gruenbacher wrote: > > Hi Christoph, > > > > we need your help fixing a gfs2 deadlock involving iomap. What's going > > on is the following: > > > > * During iomap_file_buffered_write, gfs2_iomap_begin grabs the log flush > > lock and keeps it until gfs2_iomap_end. It currently always does that > > even though there is no point other than for journaled data writes. > > > > * iomap_file_buffered_write then calls balance_dirty_pages_ratelimited. > > If that ends up calling gfs2_write_inode, gfs2 will try to grab the > > log flush lock again and deadlock. > > What is the exact call chain? It's laid out here: https://www.redhat.com/archives/cluster-devel/2019-March/msg00000.html > balance_dirty_pages_ratelimited these days doesn't start I/O, but just > wakes up the flusher threads. Or do we have a issue where it is blocking > on those threads? Yes, the writer is holding sd_log_flush_lock at the point where it ends up kicking the flusher thread and waiting for writeback to happen. The flusher thread calls gfs2_write_inode, and that tries to grab sd_log_flush_lock again. > Also why do you need to flush the log for background writeback in > ->write_inode? If we stop doing that in the (wbc->sync_mode == WB_SYNC_NONE) case, then inodes will remain dirty until the journal is flushed for some other reason (or a write_inode with WB_SYNC_ALL). That doesn't seem right. We could perhaps trigger a background journal flush in the WB_SYNC_NONE case, but that would remove the back pressure on balance_dirty_pages. Not sure this is a good idea, either. > balance_dirty_pages_ratelimited is per definition not a data integrity > writeback, so there shouldn't be a good reason to flush the log > (which I assume the log flush log is for). > > If we look gfs2_write_inode, this seems to be the code: > > bool flush_all = (wbc->sync_mode == WB_SYNC_ALL || gfs2_is_jdata(ip)); > > if (flush_all) > gfs2_log_flush(GFS2_SB(inode), ip->i_gl, > GFS2_LOG_HEAD_FLUSH_NORMAL | > GFS2_LFC_WRITE_INODE); > > But what is the requirement to do this in writeback context? Can't > we move it out into another context instead? Indeed, this isn't for data integrity in this case but because the dirty limit is exceeded. What other context would you suggest to move this to? (The iomap flag I've proposed would save us from getting into this situation in the first place.) Thanks, Andreas