From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 9416021A0BAA1 for ; Mon, 8 May 2017 10:25:30 -0700 (PDT) Date: Mon, 8 May 2017 11:25:27 -0600 From: Ross Zwisler Subject: Re: [PATCH 4/4] dax: Fix data corruption when fault races with write Message-ID: <20170508172527.GA18408@linux.intel.com> References: <20170505072500.25692-1-jack@suse.cz> <20170505072500.25692-5-jack@suse.cz> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20170505072500.25692-5-jack@suse.cz> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-nvdimm-bounces@lists.01.org Sender: "Linux-nvdimm" To: Jan Kara Cc: Andrew Morton , linux-nvdimm@lists.01.org, stable@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org List-ID: On Fri, May 05, 2017 at 09:25:00AM +0200, Jan Kara wrote: > Currently DAX read fault can race with write(2) in the following way: > > CPU1 - write(2) CPU2 - read fault > dax_iomap_pte_fault() > ->iomap_begin() - sees hole > dax_iomap_rw() > iomap_apply() > ->iomap_begin - allocates blocks > dax_iomap_actor() > invalidate_inode_pages2_range() > - there's nothing to invalidate > grab_mapping_entry() > - we add zero page in the radix tree > and map it to page tables > > The result is that hole page is mapped into page tables (and thus zeros > are seen in mmap) while file has data written in that place. > > Fix the problem by locking exception entry before mapping blocks for the > fault. That way we are sure invalidate_inode_pages2_range() call for > racing write will either block on entry lock waiting for the fault to > finish (and unmap stale page tables after that) or read fault will see > already allocated blocks by write(2). > > Fixes: 9f141d6ef6258a3a37a045842d9ba7e68f368956 > CC: stable@vger.kernel.org > Signed-off-by: Jan Kara Yep, this looks correct to me. Thanks! Reviewed-by: Ross Zwisler _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com ([192.55.52.115]:28726 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752179AbdEHRZr (ORCPT ); Mon, 8 May 2017 13:25:47 -0400 Date: Mon, 8 May 2017 11:25:27 -0600 From: Ross Zwisler To: Jan Kara Cc: Ross Zwisler , Andrew Morton , Dan Williams , linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-nvdimm@lists.01.org, stable@vger.kernel.org Subject: Re: [PATCH 4/4] dax: Fix data corruption when fault races with write Message-ID: <20170508172527.GA18408@linux.intel.com> References: <20170505072500.25692-1-jack@suse.cz> <20170505072500.25692-5-jack@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170505072500.25692-5-jack@suse.cz> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Fri, May 05, 2017 at 09:25:00AM +0200, Jan Kara wrote: > Currently DAX read fault can race with write(2) in the following way: > > CPU1 - write(2) CPU2 - read fault > dax_iomap_pte_fault() > ->iomap_begin() - sees hole > dax_iomap_rw() > iomap_apply() > ->iomap_begin - allocates blocks > dax_iomap_actor() > invalidate_inode_pages2_range() > - there's nothing to invalidate > grab_mapping_entry() > - we add zero page in the radix tree > and map it to page tables > > The result is that hole page is mapped into page tables (and thus zeros > are seen in mmap) while file has data written in that place. > > Fix the problem by locking exception entry before mapping blocks for the > fault. That way we are sure invalidate_inode_pages2_range() call for > racing write will either block on entry lock waiting for the fault to > finish (and unmap stale page tables after that) or read fault will see > already allocated blocks by write(2). > > Fixes: 9f141d6ef6258a3a37a045842d9ba7e68f368956 > CC: stable@vger.kernel.org > Signed-off-by: Jan Kara Yep, this looks correct to me. Thanks! Reviewed-by: Ross Zwisler From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ross Zwisler Subject: Re: [PATCH 4/4] dax: Fix data corruption when fault races with write Date: Mon, 8 May 2017 11:25:27 -0600 Message-ID: <20170508172527.GA18408@linux.intel.com> References: <20170505072500.25692-1-jack@suse.cz> <20170505072500.25692-5-jack@suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: Andrew Morton , linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Jan Kara Return-path: Content-Disposition: inline In-Reply-To: <20170505072500.25692-5-jack-AlSwsSmVLrQ@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-nvdimm-bounces-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org Sender: "Linux-nvdimm" List-Id: linux-ext4.vger.kernel.org On Fri, May 05, 2017 at 09:25:00AM +0200, Jan Kara wrote: > Currently DAX read fault can race with write(2) in the following way: > > CPU1 - write(2) CPU2 - read fault > dax_iomap_pte_fault() > ->iomap_begin() - sees hole > dax_iomap_rw() > iomap_apply() > ->iomap_begin - allocates blocks > dax_iomap_actor() > invalidate_inode_pages2_range() > - there's nothing to invalidate > grab_mapping_entry() > - we add zero page in the radix tree > and map it to page tables > > The result is that hole page is mapped into page tables (and thus zeros > are seen in mmap) while file has data written in that place. > > Fix the problem by locking exception entry before mapping blocks for the > fault. That way we are sure invalidate_inode_pages2_range() call for > racing write will either block on entry lock waiting for the fault to > finish (and unmap stale page tables after that) or read fault will see > already allocated blocks by write(2). > > Fixes: 9f141d6ef6258a3a37a045842d9ba7e68f368956 > CC: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Signed-off-by: Jan Kara Yep, this looks correct to me. Thanks! Reviewed-by: Ross Zwisler