From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98385C10F13 for ; Mon, 8 Apr 2019 13:44:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6A681214C6 for ; Mon, 8 Apr 2019 13:44:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726629AbfDHNoI (ORCPT ); Mon, 8 Apr 2019 09:44:08 -0400 Received: from mx2.suse.de ([195.135.220.15]:55328 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726568AbfDHNoI (ORCPT ); Mon, 8 Apr 2019 09:44:08 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 85307AA71; Mon, 8 Apr 2019 13:44:06 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id D998F1E424A; Mon, 8 Apr 2019 15:44:05 +0200 (CEST) Date: Mon, 8 Apr 2019 15:44:05 +0200 From: Jan Kara To: Andreas Gruenbacher Cc: Christoph Hellwig , cluster-devel , Dave Chinner , Ross Lagerwall , Mark Syms , Edwin =?iso-8859-1?B?VPZy9ms=?= , linux-fsdevel , Jan Kara , linux-mm@kvack.org Subject: Re: gfs2 iomap dealock, IOMAP_F_UNBALANCED Message-ID: <20190408134405.GA15023@quack2.suse.cz> References: <20190321131304.21618-1-agruenba@redhat.com> <20190328165104.GA21552@lst.de> <20190407073213.GA9509@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Mon 08-04-19 10:53:34, Andreas Gruenbacher wrote: > On Sun, 7 Apr 2019 at 09:32, Christoph Hellwig wrote: > > > > [adding Jan and linux-mm] > > > > On Fri, Mar 29, 2019 at 11:13:00PM +0100, Andreas Gruenbacher wrote: > > > > But what is the requirement to do this in writeback context? Can't > > > > we move it out into another context instead? > > > > > > Indeed, this isn't for data integrity in this case but because the > > > dirty limit is exceeded. What other context would you suggest to move > > > this to? > > > > > > (The iomap flag I've proposed would save us from getting into this > > > situation in the first place.) > > > > Your patch does two things: > > > > - it only calls balance_dirty_pages_ratelimited once per write > > operation instead of once per page. In the past btrfs did > > hacks like that, but IIRC they caused VM balancing issues. > > That is why everyone now calls balance_dirty_pages_ratelimited > > one per page. If calling it at a coarse granularity would > > be fine we should do it everywhere instead of just in gfs2 > > in journaled mode > > - it artifically reduces the size of writes to a low value, > > which I suspect is going to break real life application > > Not quite, balance_dirty_pages_ratelimited is called from iomap_end, > so once per iomap mapping returned, not per write. (The first version > of this patch got that wrong by accident, but not the second.) > > We can limit the size of the mappings returned just in that case. I'm > aware that there is a risk of balancing problems, I just don't have > any better ideas. > > This is a problem all filesystems with data-journaling will have with > iomap, it's not that gfs2 is doing anything particularly stupid. I agree that if ext4 would be using iomap, it would have similar issues. > > So I really think we need to fix this properly. And if that means > > that you can't make use of the iomap batching for gfs2 in journaled > > mode that is still a better option. > > That would mean using the old-style, page-size allocations, and a > completely separate write path in that case. That would be quite a > nightmare. > > > But I really think you need > > to look into the scope of your flush_log and figure out a good way > > to reduce that as solve the root cause. > > We won't be able to do a log flush while another transaction is > active, but that's what's needed to clean dirty pages. iomap doesn't > allow us to put the block allocation into a separate transaction from > the page writes; for that, the opposite to the page_done hook would > probably be needed. I agree that a ->page_prepare() hook would be probably the cleanest solution for this. Honza -- Jan Kara SUSE Labs, CR