From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86A7DC433E0 for ; Fri, 29 May 2020 12:45:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7180E206A4 for ; Fri, 29 May 2020 12:45:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726816AbgE2MpR (ORCPT ); Fri, 29 May 2020 08:45:17 -0400 Received: from mx2.suse.de ([195.135.220.15]:57280 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726549AbgE2MpQ (ORCPT ); Fri, 29 May 2020 08:45:16 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 4AD7FAE16; Fri, 29 May 2020 12:45:14 +0000 (UTC) Date: Fri, 29 May 2020 07:45:10 -0500 From: Goldwyn Rodrigues To: Filipe Manana Cc: Matthew Wilcox , "Darrick J. Wong" , linux-fsdevel , linux-btrfs , Johannes Thumshirn , Christoph Hellwig , dsterba@suse.cz Subject: Re: [PATCH] iomap: Return zero in case of unsuccessful pagecache invalidation before DIO Message-ID: <20200529124510.rqpd5nfivafiswiw@fiona> References: <20200528192103.xm45qoxqmkw7i5yl@fiona> <20200529002319.GQ252930@magnolia> <20200529113116.GU17206@bombadil.infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20180716 Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On 12:50 29/05, Filipe Manana wrote: > On Fri, May 29, 2020 at 12:31 PM Matthew Wilcox wrote: > > > > On Fri, May 29, 2020 at 11:55:33AM +0100, Filipe Manana wrote: > > > On Fri, May 29, 2020 at 1:23 AM Darrick J. Wong wrote: > > > > > > > > On Thu, May 28, 2020 at 02:21:03PM -0500, Goldwyn Rodrigues wrote: > > > > > > > > > > Filesystems such as btrfs are unable to guarantee page invalidation > > > > > because pages could be locked as a part of the extent. Return zero > > > > > > > > Locked for what? filemap_write_and_wait_range should have just cleaned > > > > them off. > > > > > > Yes, it will be confusing even for someone more familiar with btrfs. > > > The changelog could be more detailed to make it clear what's happening and why. > > > > > > So what happens: > > > > > > 1) iomap_dio_rw() calls filemap_write_and_wait_range(). > > > That starts delalloc for all dirty pages in the range and then > > > waits for writeback to complete. > > > This is enough for most filesystems at least (if not all except btrfs). > > > > > > 2) However, in btrfs once writeback finishes, a job is queued to run > > > on a dedicated workqueue, to execute the function > > > btrfs_finish_ordered_io(). > > > So that job will be run after filemap_write_and_wait_range() returns. > > > That function locks the file range (using a btrfs specific data > > > structure), does a bunch of things (updating several btrees), and then > > > unlocks the file range. > > > > > > 3) While iomap calls invalidate_inode_pages2_range(), which ends up > > > calling the btrfs callback btfs_releasepage(), > > > btrfs_finish_ordered_io() is running and has the file range locked > > > (this is what Goldwyn means by "pages could be locked", which is > > > confusing because it's not about any locked struct page). > > > > > > 4) Because the file range is locked, btrfs_releasepage() returns 0 > > > (page can't be released), this happens in the helper function > > > try_release_extent_state(). > > > Any page in that range is not dirty nor under writeback anymore > > > and, in fact, btrfs_finished_ordered_io() doesn't do anything with the > > > pages, it's only updating metadata. > > > > > > btrfs_releasepage() in this case could release the pages, but > > > there are other contextes where the file range is locked, the pages > > > are still not dirty and not under writeback, where this would not be > > > safe to do. > > > > Isn't this the bug, though? Rather than returning "page can't be > > released", shouldn't ->releasepage sleep on the extent state, at least > > if the GFP mask indicates you can sleep? > > Goldwyn mentioned in another thread that he had tried that with the > following patch: > > https://patchwork.kernel.org/patch/11275063/ > > But he mentioned it didn't work though, caused some locking problems. > I don't know the details and I haven't tried the patchset yet. > Goldwyn? > Yes, direct I/O would wait for extent bits to be unlocked forever and hang. I think it was against an fsync call, but I don't remember. In the light of new developments, I would pursue this further. This should be valid even in the current (before iomap patches) source. -- Goldwyn