From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 10A7F210E38C2 for ; Fri, 10 Aug 2018 12:26:49 -0700 (PDT) Subject: Re: [PATCH v2 2/2] [PATCH] xfs: Close race between direct IO and xfs_break_layouts() References: <153374942137.42241.10539674028265137668.stgit@djiang5-desk3.ch.intel.com> From: Eric Sandeen Message-ID: <7930740d-7097-90b7-a4c2-f81d520f411f@redhat.com> Date: Fri, 10 Aug 2018 14:26:42 -0500 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Ross Zwisler , dave.jiang@intel.com Cc: Jan Kara , linux-nvdimm@lists.01.org, darrick.wong@oracle.com, Dave Chinner , linux-xfs , lczerner@redhat.com, linux-fsdevel , Theodore Ts'o , linux-ext4 , Christoph Hellwig List-ID: On 8/10/18 2:24 PM, Ross Zwisler wrote: > On Fri, Aug 10, 2018 at 9:23 AM Dave Jiang wrote: >> On 08/10/2018 11:31 AM, Eric Sandeen wrote: >>> On 8/8/18 12:31 PM, Dave Jiang wrote: >>>> This patch is the duplicate of ross's fix for ext4 for xfs. >>>> >>>> If the refcount of a page is lowered between the time that it is returned >>>> by dax_busy_page() and when the refcount is again checked in >>>> xfs_break_layouts() => ___wait_var_event(), the waiting function >>>> xfs_wait_dax_page() will never be called. This means that >>>> xfs_break_layouts() will still have 'retry' set to false, so we'll stop >>>> looping and never check the refcount of other pages in this inode. >>>> >>>> Instead, always continue looping as long as dax_layout_busy_page() gives us >>>> a page which it found with an elevated refcount. >>> >>> Hi Dave, does this have a testcase? Have you seen the issue using Ross's >>> xfstest generic/503 or is there some other test? Apologies if I missed >>> prior discussion on a testcase or race frequency... >> >> I do not have a testcase. I know Ross replicated it on ext4. And Jan >> asked to create the same fix with XFS when he reviewed Ross's fix for ext4. > > In my testing I couldn't get this race to hit with XFS. I couldn't > even get a failure with generic/503 when testing XFS before Dan's > initial patches went in which added xfs_break_layouts() et al. I > think that Dan had to manually insert timing delays to get the warning > to hit for XFS when testing his patches. > > The race we're fixing happens consistently with ext4 and through code > inspection we can see that the race exists in XFS. Ok, thanks for the info Dave & Ross! -Eric _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm