From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A144C4743C for ; Wed, 23 Jun 2021 18:51:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D834A611C1 for ; Wed, 23 Jun 2021 18:51:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229850AbhFWSx6 (ORCPT ); Wed, 23 Jun 2021 14:53:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34302 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229758AbhFWSx6 (ORCPT ); Wed, 23 Jun 2021 14:53:58 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6742CC061574 for ; Wed, 23 Jun 2021 11:51:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:Message-ID: Subject:To:From:Date:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:In-Reply-To:References; bh=0VM9Vb+7yiaC15aeJZQ2as92Y4o/n6/N9D+u37acwiI=; b=WKiIxpahdonWx5gHcD591g6LWb aEaX8JYEyadZvmGWLgrvWWYHWRYospGy3SB7iMvsbOBAnnzky/KZtg6dTTwnMGysmaDQK1cZdvfjw 5iyfDgAkpEEAVVN3EWYq2NfUoyw36Tf/UhIzMepBHcSenGQL2w06qiP5ojT2wGXtWUWm2JLODFl8X 0Dd3llPxEzSZsUt11FwgHavl+fXYuBf5IwdOPfXnOq/8qT6plm7iiF7rDLcu2Wa9ZQckRPfpJhWfi pp11UDuz09y9sCnYBiEsYBsiXWXJ91djHfYl6f3mMJtBIFjeuqiBhUNg+blLHQ3SyQfoMqiX+e5Kv NPbBtUpQ==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1lw7y2-00FlUu-IH for linux-fsdevel@vger.kernel.org; Wed, 23 Jun 2021 18:51:27 +0000 Date: Wed, 23 Jun 2021 19:51:18 +0100 From: Matthew Wilcox To: linux-fsdevel@vger.kernel.org Subject: page split failures in truncate_inode_pages_range Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org When we have large pages in the page cache, we can end up in truncate_inode_pages_range() with an 'lstart' that is in the middle of a tail page. My approach has generally been to split the large page, and that works except when split_huge_page() fails, which it can do at random due to a racing access having the page refcount elevated. I've been simulating split_huge_page() failures, and found a problem I don't know how to solve. truncate_inode_pages_range() is called by COLLAPSE_RANGE in order to evict the part of the page cache after the start of the range being collapsed (any part of the page cache remaining would now have data for the wrong part of the file in it). xfs_flush_unmap_range() (and I presume the other filesystems which support COLLAPSE_RANGE) calls filemap_write_and_wait_range() first, so we can just drop the partial large page if split doesn't succeed. But truncate_inode_pages_range() is also called by, for example, truncate(). In that case, nobody calls filemap_write_and_wait_range(), so we can't discard the page because it might still be dirty. Is that an acceptable way to choose behaviour -- if the split fails, discard the page if it's clean and keep it if it's dirty? I'll put a great big comment on it, because it's not entirely obvious.