From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7CC46C07E96 for ; Tue, 13 Jul 2021 11:11:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 593096128B for ; Tue, 13 Jul 2021 11:11:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235413AbhGMLOa (ORCPT ); Tue, 13 Jul 2021 07:14:30 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:35626 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235390AbhGMLOa (ORCPT ); Tue, 13 Jul 2021 07:14:30 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 661E4200A3; Tue, 13 Jul 2021 11:11:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1626174699; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=fyfxSkHG6hwIFaTZZwzH64axAQ9uhCZWiMEIgFfPd5w=; b=SvCFB6+vDIf8Ufi+U+qCl/kMdXbumeGBaEqoXR+GZBRxhR4IDKqpE2c0zcGl+N6cCDC8KF gaoB2/+tuPOuB8ZdBQipnidL3m7Mjc9y7S9C1no9auhbaKrKdlQwuGXrskdgVasTkAXr73 p1Bfi16wJBCyfh0qV9V2mtnDcHciz5o= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1626174699; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=fyfxSkHG6hwIFaTZZwzH64axAQ9uhCZWiMEIgFfPd5w=; b=/S4tY5hsn1E8XhBz1UlPg/4OtMs32QpnHfCCGCeNU93DxE2frczkRaUERn0Wm77094ed5w 9YjLrvb5LCzD1LBQ== Received: from quack2.suse.cz (unknown [10.100.224.230]) by relay2.suse.de (Postfix) with ESMTP id 39890A3B85; Tue, 13 Jul 2021 11:11:39 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 159D21E0BBC; Tue, 13 Jul 2021 13:11:39 +0200 (CEST) Date: Tue, 13 Jul 2021 13:11:39 +0200 From: Jan Kara To: "Darrick J. Wong" Cc: Jan Kara , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, Christoph Hellwig , Ted Tso , Dave Chinner , Matthew Wilcox , linux-mm@kvack.org, linux-xfs@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, Christoph Hellwig Subject: Re: [PATCH 03/14] mm: Protect operations adding pages to page cache with invalidate_lock Message-ID: <20210713111139.GG12142@quack2.suse.cz> References: <20210712163901.29514-1-jack@suse.cz> <20210712165609.13215-3-jack@suse.cz> <20210713012514.GB22402@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210713012514.GB22402@magnolia> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org On Mon 12-07-21 18:25:14, Darrick J. Wong wrote: > On Mon, Jul 12, 2021 at 06:55:54PM +0200, Jan Kara wrote: > > @@ -2967,6 +2992,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > pgoff_t max_off; > > struct page *page; > > vm_fault_t ret = 0; > > + bool mapping_locked = false; > > > > max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); > > if (unlikely(offset >= max_off)) > > @@ -2988,15 +3014,30 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT); > > ret = VM_FAULT_MAJOR; > > fpin = do_sync_mmap_readahead(vmf); > > + } > > + > > + if (!page) { > > Is it still necessary to re-evaluate !page here? No, you are right it is not necessary. I'll remove it. > > retry_find: > > + /* > > + * See comment in filemap_create_page() why we need > > + * invalidate_lock > > + */ > > + if (!mapping_locked) { > > + filemap_invalidate_lock_shared(mapping); > > + mapping_locked = true; > > + } > > page = pagecache_get_page(mapping, offset, > > FGP_CREAT|FGP_FOR_MMAP, > > vmf->gfp_mask); > > if (!page) { > > if (fpin) > > goto out_retry; > > + filemap_invalidate_unlock_shared(mapping); > > return VM_FAULT_OOM; > > } > > + } else if (unlikely(!PageUptodate(page))) { > > + filemap_invalidate_lock_shared(mapping); > > + mapping_locked = true; > > } > > > > if (!lock_page_maybe_drop_mmap(vmf, page, &fpin)) > > @@ -3014,8 +3055,20 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > * We have a locked page in the page cache, now we need to check > > * that it's up-to-date. If not, it is going to be due to an error. > > */ > > - if (unlikely(!PageUptodate(page))) > > + if (unlikely(!PageUptodate(page))) { > > + /* > > + * The page was in cache and uptodate and now it is not. > > + * Strange but possible since we didn't hold the page lock all > > + * the time. Let's drop everything get the invalidate lock and > > + * try again. > > + */ > > + if (!mapping_locked) { > > + unlock_page(page); > > + put_page(page); > > + goto retry_find; > > + } > > goto page_not_uptodate; > > + } > > > > /* > > * We've made it this far and we had to drop our mmap_lock, now is the > > @@ -3026,6 +3079,8 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > unlock_page(page); > > goto out_retry; > > } > > + if (mapping_locked) > > + filemap_invalidate_unlock_shared(mapping); > > > > /* > > * Found the page and have a reference on it. > > @@ -3056,6 +3111,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > > > if (!error || error == AOP_TRUNCATED_PAGE) > > goto retry_find; > > + filemap_invalidate_unlock_shared(mapping); > > Hm. I /think/ it's the case that mapping_locked==true always holds here > because the new "The page was in cache and uptodate and now it is not." > block above will take the invalidate_lock and retry pagecache_get_page, > right? Yes. page_not_uptodate block can only be entered with mapping_locked == true - the only place that can enter this block is: if (unlikely(!PageUptodate(page))) { /* * The page was in cache and uptodate and now it is not. * Strange but possible since we didn't hold the page lock all * the time. Let's drop everything get the invalidate lock and * try again. */ if (!mapping_locked) { unlock_page(page); put_page(page); goto retry_find; } goto page_not_uptodate; } > > > > return VM_FAULT_SIGBUS; > > > > @@ -3067,6 +3123,8 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > */ > > if (page) > > put_page(page); > > + if (mapping_locked) > > + filemap_invalidate_unlock_shared(mapping); > > Hm. I think this looks ok, even though this patch now contains the > subtlety that we've both hoisted the xfs mmaplock to page cache /and/ > reduced the scope of the invalidate_lock. > > As for fancy things like remap_range, I think they're still safe with > this latest iteration because those functions grab the invalidate_lock > in exclusive mode and invalidate the mappings before proceeding, which > means that other programs will never find the lockless path (i.e. page > locked, uptodate, and attached to the mapping) and will instead block on > the invalidate lock until the remap operation completes. Is that > right? Correct. For operations such as hole punch or destination of remap_range, we lock invalidate_lock exclusively and invalidate pagecache in the involved range. No new pages can be created in that range until you drop invalidate_lock (places creating pages without holding i_rwsem are read, readahead, fault and all those take invalidate_lock when they should create the page). There's also the case someone pointed out that *source* of remap_range needs to be protected (but only from modifications through mmap). This is achieved by having invalidate_lock taken in .page_mkwrite handlers and thus not impacted by these changes to filemap_fault(). Honza -- Jan Kara SUSE Labs, CR From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5FD13C07E95 for ; Tue, 13 Jul 2021 11:12:13 +0000 (UTC) Received: from lists.sourceforge.net (lists.sourceforge.net [216.105.38.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0FD7D6128B; Tue, 13 Jul 2021 11:12:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0FD7D6128B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linux-f2fs-devel-bounces@lists.sourceforge.net Received: from [127.0.0.1] (helo=sfs-ml-1.v29.lw.sourceforge.com) by sfs-ml-1.v29.lw.sourceforge.com with esmtp (Exim 4.90_1) (envelope-from ) id 1m3GKh-0004qT-0h; Tue, 13 Jul 2021 11:12:11 +0000 Received: from [172.30.20.202] (helo=mx.sourceforge.net) by sfs-ml-1.v29.lw.sourceforge.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m3GKT-0004nk-KT for linux-f2fs-devel@lists.sourceforge.net; Tue, 13 Jul 2021 11:11:57 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sourceforge.net; s=x; h=In-Reply-To:Content-Type:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=fyfxSkHG6hwIFaTZZwzH64axAQ9uhCZWiMEIgFfPd5w=; b=D+dHN0ejoS4MYUTFqAcjseyYQ9 jroC+rE/b6jyB6hoyJH6fX23Y3TbA/O7wZUs8tVxMOmUYkWBKYTYrCtnn8USUGx1FOKVjRLQHzO/t 47rx3fdy4n4e6pT9fLsifL7uMrUua51RqRW8nvpMTCka7yS74EFzlW08bt/XZoL1Cww0=; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sf.net; s=x ; h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To :From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=fyfxSkHG6hwIFaTZZwzH64axAQ9uhCZWiMEIgFfPd5w=; b=LwbwvfMWQRWqCNafozN/akPtRR WPPUhmN5ZxFu842uByw5M9TBYfwf5Pyl7LsP2oV5Xg/mmL13UD4aghs0CFxM9f+AkqtkOeSQbwag4 vLJOIFmT6e4ixxYbC0p6krDa7Z0090adu5JamV2QlPJPK3bxD6Ih0ETV3XYY2IrDi4S4=; Received: from smtp-out2.suse.de ([195.135.220.29]) by sfi-mx-1.v28.lw.sourceforge.com with esmtps (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.92.3) id 1m3GKJ-006rlR-91 for linux-f2fs-devel@lists.sourceforge.net; Tue, 13 Jul 2021 11:11:57 +0000 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 661E4200A3; Tue, 13 Jul 2021 11:11:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1626174699; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=fyfxSkHG6hwIFaTZZwzH64axAQ9uhCZWiMEIgFfPd5w=; b=SvCFB6+vDIf8Ufi+U+qCl/kMdXbumeGBaEqoXR+GZBRxhR4IDKqpE2c0zcGl+N6cCDC8KF gaoB2/+tuPOuB8ZdBQipnidL3m7Mjc9y7S9C1no9auhbaKrKdlQwuGXrskdgVasTkAXr73 p1Bfi16wJBCyfh0qV9V2mtnDcHciz5o= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1626174699; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=fyfxSkHG6hwIFaTZZwzH64axAQ9uhCZWiMEIgFfPd5w=; b=/S4tY5hsn1E8XhBz1UlPg/4OtMs32QpnHfCCGCeNU93DxE2frczkRaUERn0Wm77094ed5w 9YjLrvb5LCzD1LBQ== Received: from quack2.suse.cz (unknown [10.100.224.230]) by relay2.suse.de (Postfix) with ESMTP id 39890A3B85; Tue, 13 Jul 2021 11:11:39 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 159D21E0BBC; Tue, 13 Jul 2021 13:11:39 +0200 (CEST) Date: Tue, 13 Jul 2021 13:11:39 +0200 From: Jan Kara To: "Darrick J. Wong" Message-ID: <20210713111139.GG12142@quack2.suse.cz> References: <20210712163901.29514-1-jack@suse.cz> <20210712165609.13215-3-jack@suse.cz> <20210713012514.GB22402@magnolia> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20210713012514.GB22402@magnolia> User-Agent: Mutt/1.10.1 (2018-07-13) X-Headers-End: 1m3GKJ-006rlR-91 Subject: Re: [f2fs-dev] [PATCH 03/14] mm: Protect operations adding pages to page cache with invalidate_lock X-BeenThere: linux-f2fs-devel@lists.sourceforge.net X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-xfs@vger.kernel.org, Jan Kara , linux-cifs@vger.kernel.org, Dave Chinner , Matthew Wilcox , linux-f2fs-devel@lists.sourceforge.net, Christoph Hellwig , linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Ted Tso , ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, Christoph Hellwig Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-f2fs-devel-bounces@lists.sourceforge.net On Mon 12-07-21 18:25:14, Darrick J. Wong wrote: > On Mon, Jul 12, 2021 at 06:55:54PM +0200, Jan Kara wrote: > > @@ -2967,6 +2992,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > pgoff_t max_off; > > struct page *page; > > vm_fault_t ret = 0; > > + bool mapping_locked = false; > > > > max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE); > > if (unlikely(offset >= max_off)) > > @@ -2988,15 +3014,30 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT); > > ret = VM_FAULT_MAJOR; > > fpin = do_sync_mmap_readahead(vmf); > > + } > > + > > + if (!page) { > > Is it still necessary to re-evaluate !page here? No, you are right it is not necessary. I'll remove it. > > retry_find: > > + /* > > + * See comment in filemap_create_page() why we need > > + * invalidate_lock > > + */ > > + if (!mapping_locked) { > > + filemap_invalidate_lock_shared(mapping); > > + mapping_locked = true; > > + } > > page = pagecache_get_page(mapping, offset, > > FGP_CREAT|FGP_FOR_MMAP, > > vmf->gfp_mask); > > if (!page) { > > if (fpin) > > goto out_retry; > > + filemap_invalidate_unlock_shared(mapping); > > return VM_FAULT_OOM; > > } > > + } else if (unlikely(!PageUptodate(page))) { > > + filemap_invalidate_lock_shared(mapping); > > + mapping_locked = true; > > } > > > > if (!lock_page_maybe_drop_mmap(vmf, page, &fpin)) > > @@ -3014,8 +3055,20 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > * We have a locked page in the page cache, now we need to check > > * that it's up-to-date. If not, it is going to be due to an error. > > */ > > - if (unlikely(!PageUptodate(page))) > > + if (unlikely(!PageUptodate(page))) { > > + /* > > + * The page was in cache and uptodate and now it is not. > > + * Strange but possible since we didn't hold the page lock all > > + * the time. Let's drop everything get the invalidate lock and > > + * try again. > > + */ > > + if (!mapping_locked) { > > + unlock_page(page); > > + put_page(page); > > + goto retry_find; > > + } > > goto page_not_uptodate; > > + } > > > > /* > > * We've made it this far and we had to drop our mmap_lock, now is the > > @@ -3026,6 +3079,8 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > unlock_page(page); > > goto out_retry; > > } > > + if (mapping_locked) > > + filemap_invalidate_unlock_shared(mapping); > > > > /* > > * Found the page and have a reference on it. > > @@ -3056,6 +3111,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > > > if (!error || error == AOP_TRUNCATED_PAGE) > > goto retry_find; > > + filemap_invalidate_unlock_shared(mapping); > > Hm. I /think/ it's the case that mapping_locked==true always holds here > because the new "The page was in cache and uptodate and now it is not." > block above will take the invalidate_lock and retry pagecache_get_page, > right? Yes. page_not_uptodate block can only be entered with mapping_locked == true - the only place that can enter this block is: if (unlikely(!PageUptodate(page))) { /* * The page was in cache and uptodate and now it is not. * Strange but possible since we didn't hold the page lock all * the time. Let's drop everything get the invalidate lock and * try again. */ if (!mapping_locked) { unlock_page(page); put_page(page); goto retry_find; } goto page_not_uptodate; } > > > > return VM_FAULT_SIGBUS; > > > > @@ -3067,6 +3123,8 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > > */ > > if (page) > > put_page(page); > > + if (mapping_locked) > > + filemap_invalidate_unlock_shared(mapping); > > Hm. I think this looks ok, even though this patch now contains the > subtlety that we've both hoisted the xfs mmaplock to page cache /and/ > reduced the scope of the invalidate_lock. > > As for fancy things like remap_range, I think they're still safe with > this latest iteration because those functions grab the invalidate_lock > in exclusive mode and invalidate the mappings before proceeding, which > means that other programs will never find the lockless path (i.e. page > locked, uptodate, and attached to the mapping) and will instead block on > the invalidate lock until the remap operation completes. Is that > right? Correct. For operations such as hole punch or destination of remap_range, we lock invalidate_lock exclusively and invalidate pagecache in the involved range. No new pages can be created in that range until you drop invalidate_lock (places creating pages without holding i_rwsem are read, readahead, fault and all those take invalidate_lock when they should create the page). There's also the case someone pointed out that *source* of remap_range needs to be protected (but only from modifications through mmap). This is achieved by having invalidate_lock taken in .page_mkwrite handlers and thus not impacted by these changes to filemap_fault(). Honza -- Jan Kara SUSE Labs, CR _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel