From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2156C282CE for ; Mon, 8 Apr 2019 08:53:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B6FBE20880 for ; Mon, 8 Apr 2019 08:53:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726663AbfDHIxr (ORCPT ); Mon, 8 Apr 2019 04:53:47 -0400 Received: from mail-oi1-f195.google.com ([209.85.167.195]:44102 "EHLO mail-oi1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726189AbfDHIxq (ORCPT ); Mon, 8 Apr 2019 04:53:46 -0400 Received: by mail-oi1-f195.google.com with SMTP id i21so9840141oib.11 for ; Mon, 08 Apr 2019 01:53:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=tv3xxipDSLiRgH4b3LyJ9t2U3T11OYsUY8ozdoYoEuE=; b=TwKnH8pZ09EtCM+B0sb64+fldGRwWjA0TnKV8vz5fWJHRlqtcOsxwqmDZtboLTzGH/ jNFU+HkTLUgAGQv3PFyZ2DIsQpvTNvjqe2vHtzPSk9w1ys+de7gBpdpJR2Q1AgxGD5pT /cZ24JHLn/zY0W5jhTnBYvp0kITO+n39+MELLeAv2qWIJPfFCbVK7E3TPP2qVcIchbR6 yZd7qlGfmb6fX2gDzEBbfJXOIG9RhzeM/PwgImpnMzwoAvkSCMJ89V74Nggl9xYDQEs3 AYVdde+cdLt7qfIt5Yk+xxTaVemPSMwVH5G/JFvZBtJl8uP8rT8+v/N3mz0vl3VuRmIp yChA== X-Gm-Message-State: APjAAAWRvsX1xq0GBtOXUgORzu9aOmcg3BwTB84iqaQzbPSup6oQdu+Q M/79wDSBXYlQ8Eva6gF4rC0CbezFY1K2XMynQwHGyQ== X-Google-Smtp-Source: APXvYqwePke8F+WzS2PGVoDO7oUGCB0z8kx/GDfFUo+4YpVomYAkx5vq68UtvtvZvMbIHr7aFi4rrvXLl7gn875V/MI= X-Received: by 2002:aca:f2c2:: with SMTP id q185mr16011157oih.147.1554713626054; Mon, 08 Apr 2019 01:53:46 -0700 (PDT) MIME-Version: 1.0 References: <20190321131304.21618-1-agruenba@redhat.com> <20190328165104.GA21552@lst.de> <20190407073213.GA9509@lst.de> In-Reply-To: <20190407073213.GA9509@lst.de> From: Andreas Gruenbacher Date: Mon, 8 Apr 2019 10:53:34 +0200 Message-ID: Subject: Re: gfs2 iomap dealock, IOMAP_F_UNBALANCED To: Christoph Hellwig Cc: cluster-devel , Dave Chinner , Ross Lagerwall , Mark Syms , =?UTF-8?B?RWR3aW4gVMO2csO2aw==?= , linux-fsdevel , Jan Kara , linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Sun, 7 Apr 2019 at 09:32, Christoph Hellwig wrote: > > [adding Jan and linux-mm] > > On Fri, Mar 29, 2019 at 11:13:00PM +0100, Andreas Gruenbacher wrote: > > > But what is the requirement to do this in writeback context? Can't > > > we move it out into another context instead? > > > > Indeed, this isn't for data integrity in this case but because the > > dirty limit is exceeded. What other context would you suggest to move > > this to? > > > > (The iomap flag I've proposed would save us from getting into this > > situation in the first place.) > > Your patch does two things: > > - it only calls balance_dirty_pages_ratelimited once per write > operation instead of once per page. In the past btrfs did > hacks like that, but IIRC they caused VM balancing issues. > That is why everyone now calls balance_dirty_pages_ratelimited > one per page. If calling it at a coarse granularity would > be fine we should do it everywhere instead of just in gfs2 > in journaled mode > - it artifically reduces the size of writes to a low value, > which I suspect is going to break real life application Not quite, balance_dirty_pages_ratelimited is called from iomap_end, so once per iomap mapping returned, not per write. (The first version of this patch got that wrong by accident, but not the second.) We can limit the size of the mappings returned just in that case. I'm aware that there is a risk of balancing problems, I just don't have any better ideas. This is a problem all filesystems with data-journaling will have with iomap, it's not that gfs2 is doing anything particularly stupid. > So I really think we need to fix this properly. And if that means > that you can't make use of the iomap batching for gfs2 in journaled > mode that is still a better option. That would mean using the old-style, page-size allocations, and a completely separate write path in that case. That would be quite a nightmare. > But I really think you need > to look into the scope of your flush_log and figure out a good way > to reduce that as solve the root cause. We won't be able to do a log flush while another transaction is active, but that's what's needed to clean dirty pages. iomap doesn't allow us to put the block allocation into a separate transaction from the page writes; for that, the opposite to the page_done hook would probably be needed. Thanks, Andreas