From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74811C433ED for ; Fri, 14 May 2021 11:07:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4C7E5613B6 for ; Fri, 14 May 2021 11:07:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231394AbhENLIP (ORCPT ); Fri, 14 May 2021 07:08:15 -0400 Received: from mx2.suse.de ([195.135.220.15]:40048 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229445AbhENLIP (ORCPT ); Fri, 14 May 2021 07:08:15 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 724A5AF11; Fri, 14 May 2021 11:07:02 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id AB2D71F2B4A; Fri, 14 May 2021 13:07:00 +0200 (CEST) Date: Fri, 14 May 2021 13:07:00 +0200 From: Jan Kara To: Matthew Wilcox Cc: Jan Kara , linux-fsdevel@vger.kernel.org, Christoph Hellwig , Dave Chinner , ceph-devel@vger.kernel.org, Chao Yu , Damien Le Moal , "Darrick J. Wong" , Jaegeuk Kim , Jeff Layton , Johannes Thumshirn , linux-cifs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-mm@kvack.org, linux-xfs@vger.kernel.org, Miklos Szeredi , Steve French , Ted Tso Subject: Re: [PATCH 03/11] mm: Protect operations adding pages to page cache with invalidate_lock Message-ID: <20210514110700.GA27655@quack2.suse.cz> References: <20210512101639.22278-1-jack@suse.cz> <20210512134631.4053-3-jack@suse.cz> <20210513190114.GJ2734@quack2.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org On Thu 13-05-21 20:38:47, Matthew Wilcox wrote: > On Thu, May 13, 2021 at 09:01:14PM +0200, Jan Kara wrote: > > On Wed 12-05-21 15:40:21, Matthew Wilcox wrote: > > > Remind me (or, rather, add to the documentation) why we have to hold the > > > invalidate_lock during the call to readpage / readahead, and we don't just > > > hold it around the call to add_to_page_cache / add_to_page_cache_locked > > > / add_to_page_cache_lru ? I appreciate that ->readpages is still going > > > to suck, but we're down to just three implementations of ->readpages now > > > (9p, cifs & nfs). > > > > There's a comment in filemap_create_page() trying to explain this. We need > > to protect against cases like: Filesystem with 1k blocksize, file F has > > page at index 0 with uptodate buffer at 0-1k, rest not uptodate. All blocks > > underlying page are allocated. Now let read at offset 1k race with hole > > punch at offset 1k, length 1k. > > > > read() hole punch > > ... > > filemap_read() > > filemap_get_pages() > > - page found in the page cache but !Uptodate > > filemap_update_page() > > locks everything > > truncate_inode_pages_range() > > lock_page(page) > > do_invalidatepage() > > unlock_page(page) > > locks page > > filemap_read_page() > > Ah, this is the partial_start case, which means that page->mapping > is still valid. But that means that do_invalidatepage() was called > with (offset 1024, length 1024), immediately after we called > zero_user_segment(). So isn't this a bug in the fs do_invalidatepage()? > The range from 1k-2k _is_ uptodate. It's been zeroed in memory, > and if we were to run after the "free block" below, we'd get that > memory zeroed again. Well, yes, do_invalidatepage() could mark zeroed region as uptodate. But I don't think we want to rely on 'uptodate' not getting spuriously cleared (which would reopen the problem). Generally the assumption is that there's no problem clearing (or not setting) uptodate flag of a clean buffer because the fs can always provide the data again. Similarly, fs is free to refetch data into clean & uptodate page, if it thinks it's worth it. Now all these would become correctness issues. So IMHO the fragility is not worth the shorter lock hold times. That's why I went for the rule that read-IO submission is still protected by invalidate_lock to make things simple. Honza -- Jan Kara SUSE Labs, CR From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7BDAC433ED for ; Fri, 14 May 2021 11:07:18 +0000 (UTC) Received: from lists.sourceforge.net (lists.sourceforge.net [216.105.38.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7FDA4613CB; Fri, 14 May 2021 11:07:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7FDA4613CB Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linux-f2fs-devel-bounces@lists.sourceforge.net Received: from [127.0.0.1] (helo=sfs-ml-1.v29.lw.sourceforge.com) by sfs-ml-1.v29.lw.sourceforge.com with esmtp (Exim 4.90_1) (envelope-from ) id 1lhVf3-0003IS-83; Fri, 14 May 2021 11:07:17 +0000 Received: from [172.30.20.202] (helo=mx.sourceforge.net) by sfs-ml-1.v29.lw.sourceforge.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lhVf1-0003IB-TK for linux-f2fs-devel@lists.sourceforge.net; Fri, 14 May 2021 11:07:15 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sourceforge.net; s=x; h=In-Reply-To:Content-Type:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=J7fqKdpg80TWgM1fzlC0KEIDrxDrwZ5e9fXWHt3GVjg=; b=koqp88GMkeH6Gg0kXmauaUdgIO hiEEaQXxoPHPmhiTYWwI76my91+8Qb6MjkOyZjntClesHlstrnk7OzqmYVms5FB5WrVddRNWOtivy Hm5iFI7k97TU7qVWnzcB/wQlueOFOkuLPfmaI8cHwGvZBeQQilpfbwLDLBrgee72NI/M=; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sf.net; s=x ; h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To :From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=J7fqKdpg80TWgM1fzlC0KEIDrxDrwZ5e9fXWHt3GVjg=; b=WkN2/IZWlEQQYrk/4HViw3+enO XDdMp/OETONrr1szpV88rbgQEKVG/NW/nv/d/pBDdnir6PdHtoFREGznJAhfup4CF1X55yy/Ew4VS p0rrKlAgFAFDGqawHs8CgTSNpRqk/VGTLvjRAj/Xt1GOhL5Y62ddHpydEOsECN8PKPDo=; Received: from mx2.suse.de ([195.135.220.15]) by sfi-mx-2.v28.lw.sourceforge.com with esmtps (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.92.3) id 1lhVew-0003cl-US for linux-f2fs-devel@lists.sourceforge.net; Fri, 14 May 2021 11:07:15 +0000 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 724A5AF11; Fri, 14 May 2021 11:07:02 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id AB2D71F2B4A; Fri, 14 May 2021 13:07:00 +0200 (CEST) Date: Fri, 14 May 2021 13:07:00 +0200 From: Jan Kara To: Matthew Wilcox Message-ID: <20210514110700.GA27655@quack2.suse.cz> References: <20210512101639.22278-1-jack@suse.cz> <20210512134631.4053-3-jack@suse.cz> <20210513190114.GJ2734@quack2.suse.cz> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) X-Headers-End: 1lhVew-0003cl-US Subject: Re: [f2fs-dev] [PATCH 03/11] mm: Protect operations adding pages to page cache with invalidate_lock X-BeenThere: linux-f2fs-devel@lists.sourceforge.net X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-cifs@vger.kernel.org, Damien Le Moal , linux-ext4@vger.kernel.org, Jan Kara , "Darrick J. Wong" , Jeff Layton , Steve French , Dave Chinner , linux-f2fs-devel@lists.sourceforge.net, Christoph Hellwig , linux-mm@kvack.org, Miklos Szeredi , Ted Tso , linux-fsdevel@vger.kernel.org, Jaegeuk Kim , ceph-devel@vger.kernel.org, Johannes Thumshirn , linux-xfs@vger.kernel.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-f2fs-devel-bounces@lists.sourceforge.net On Thu 13-05-21 20:38:47, Matthew Wilcox wrote: > On Thu, May 13, 2021 at 09:01:14PM +0200, Jan Kara wrote: > > On Wed 12-05-21 15:40:21, Matthew Wilcox wrote: > > > Remind me (or, rather, add to the documentation) why we have to hold the > > > invalidate_lock during the call to readpage / readahead, and we don't just > > > hold it around the call to add_to_page_cache / add_to_page_cache_locked > > > / add_to_page_cache_lru ? I appreciate that ->readpages is still going > > > to suck, but we're down to just three implementations of ->readpages now > > > (9p, cifs & nfs). > > > > There's a comment in filemap_create_page() trying to explain this. We need > > to protect against cases like: Filesystem with 1k blocksize, file F has > > page at index 0 with uptodate buffer at 0-1k, rest not uptodate. All blocks > > underlying page are allocated. Now let read at offset 1k race with hole > > punch at offset 1k, length 1k. > > > > read() hole punch > > ... > > filemap_read() > > filemap_get_pages() > > - page found in the page cache but !Uptodate > > filemap_update_page() > > locks everything > > truncate_inode_pages_range() > > lock_page(page) > > do_invalidatepage() > > unlock_page(page) > > locks page > > filemap_read_page() > > Ah, this is the partial_start case, which means that page->mapping > is still valid. But that means that do_invalidatepage() was called > with (offset 1024, length 1024), immediately after we called > zero_user_segment(). So isn't this a bug in the fs do_invalidatepage()? > The range from 1k-2k _is_ uptodate. It's been zeroed in memory, > and if we were to run after the "free block" below, we'd get that > memory zeroed again. Well, yes, do_invalidatepage() could mark zeroed region as uptodate. But I don't think we want to rely on 'uptodate' not getting spuriously cleared (which would reopen the problem). Generally the assumption is that there's no problem clearing (or not setting) uptodate flag of a clean buffer because the fs can always provide the data again. Similarly, fs is free to refetch data into clean & uptodate page, if it thinks it's worth it. Now all these would become correctness issues. So IMHO the fragility is not worth the shorter lock hold times. That's why I went for the rule that read-IO submission is still protected by invalidate_lock to make things simple. Honza -- Jan Kara SUSE Labs, CR _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel