From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5D87C33CAA for ; Mon, 20 Jan 2020 11:48:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 862432073D for ; Mon, 20 Jan 2020 11:48:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726860AbgATLsA (ORCPT ); Mon, 20 Jan 2020 06:48:00 -0500 Received: from mx2.suse.de ([195.135.220.15]:59634 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726573AbgATLsA (ORCPT ); Mon, 20 Jan 2020 06:48:00 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 74F4BB3CC; Mon, 20 Jan 2020 11:47:58 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id BF0761E0CF1; Mon, 20 Jan 2020 12:47:57 +0100 (CET) Date: Mon, 20 Jan 2020 12:47:57 +0100 From: Jan Kara To: Amir Goldstein Cc: Jan Kara , linux-xfs , Linux MM , "Darrick J. Wong" , Boaz Harrosh , linux-fsdevel , Matthew Wilcox , Jens Axboe , Dave Chinner Subject: Re: [PATCH 0/3 v2] xfs: Fix races between readahead and hole punching Message-ID: <20200120114757.GF19861@quack2.suse.cz> References: <20190829131034.10563-1-jack@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Sun 19-01-20 10:35:08, Amir Goldstein wrote: > On Fri, Jan 17, 2020 at 12:50 PM Amir Goldstein wrote: > > > > On Thu, Aug 29, 2019 at 4:10 PM Jan Kara wrote: > > > > > > Hello, > > > > > > this is a patch series that addresses a possible race between readahead and > > > hole punching Amir has discovered [1]. The first patch makes madvise(2) to > > > handle readahead requests through fadvise infrastructure, the third patch > > > then adds necessary locking to XFS to protect against the race. Note that > > > other filesystems need similar protections but e.g. in case of ext4 it isn't > > > so simple without seriously regressing mixed rw workload performance so > > > I'm pushing just xfs fix at this moment which is simple. > > > > > > > Jan, > > > > Could you give a quick status update about the state of this issue for > > ext4 and other fs. I remember some solutions were discussed. > > Perhaps this could be a good topic for a cross track session in LSF/MM? > > Aren't the challenges posed by this race also relevant for RWF_UNCACHED? > > > > Maybe a silly question: > > Can someone please explain to me why we even bother truncating pages on > punch hole? > Wouldn't it solve the race if instead we zeroed those pages and marked them > readonly? Not if we also didn't keep them locked. Page reclaim can reclaim clean unlocked pages any time it wants... Plus the CPU overhead of zeroing potentially large ranges of pages would be significant. > The comment above trunacte_pagecache_range() says: > * This function should typically be called before the filesystem > * releases resources associated with the freed range (eg. deallocates > * blocks). This way, pagecache will always stay logically coherent > * with on-disk format, and the filesystem would not have to deal with > * situations such as writepage being called for a page that has already > * had its underlying blocks deallocated. > > So in order to prevent writepage from being called on a punched hole, > we need to make sure that page write fault will be called, which is the same > state as if an exiting hole has been read into page cache but not written yet. > Right? Wrong? Also the writeback in the comment you mention above is just an example. As the race you've found shows, there is a problem with reading as well. Honza -- Jan Kara SUSE Labs, CR