From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot0-f193.google.com ([74.125.82.193]:42513 "EHLO mail-ot0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750866AbeC2TCO (ORCPT ); Thu, 29 Mar 2018 15:02:14 -0400 Received: by mail-ot0-f193.google.com with SMTP id h55-v6so5956297ote.9 for ; Thu, 29 Mar 2018 12:02:14 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20180329160215.glnrmyunujcc4vwg@quack2.suse.cz> References: <152167302988.5268.4370226749268662682.stgit@dwillia2-desk3.amr.corp.intel.com> <152167306807.5268.8483232024444414342.stgit@dwillia2-desk3.amr.corp.intel.com> <20180329160215.glnrmyunujcc4vwg@quack2.suse.cz> From: Dan Williams Date: Thu, 29 Mar 2018 12:02:13 -0700 Message-ID: Subject: Re: [PATCH v7 07/14] fs, dax: use page->mapping to warn if truncate collides with a busy page To: Jan Kara Cc: linux-nvdimm , Jeff Moyer , Matthew Wilcox , Ross Zwisler , Christoph Hellwig , david , linux-fsdevel , linux-xfs , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Thu, Mar 29, 2018 at 9:02 AM, Jan Kara wrote: > On Wed 21-03-18 15:57:48, Dan Williams wrote: >> Catch cases where extent unmap operations encounter pages that are >> pinned / busy. Typically this is pinned pages that are under active dma. >> This warning is a canary for potential data corruption as truncated >> blocks could be allocated to a new file while the device is still >> performing i/o. >> >> Here is an example of a collision that this implementation catches: >> >> WARNING: CPU: 2 PID: 1286 at fs/dax.c:343 dax_disassociate_entry+0x55/0x80 >> [..] >> Call Trace: >> __dax_invalidate_mapping_entry+0x6c/0xf0 >> dax_delete_mapping_entry+0xf/0x20 >> truncate_exceptional_pvec_entries.part.12+0x1af/0x200 >> truncate_inode_pages_range+0x268/0x970 >> ? tlb_gather_mmu+0x10/0x20 >> ? up_write+0x1c/0x40 >> ? unmap_mapping_range+0x73/0x140 >> xfs_free_file_space+0x1b6/0x5b0 [xfs] >> ? xfs_file_fallocate+0x7f/0x320 [xfs] >> ? down_write_nested+0x40/0x70 >> ? xfs_ilock+0x21d/0x2f0 [xfs] >> xfs_file_fallocate+0x162/0x320 [xfs] >> ? rcu_read_lock_sched_held+0x3f/0x70 >> ? rcu_sync_lockdep_assert+0x2a/0x50 >> ? __sb_start_write+0xd0/0x1b0 >> ? vfs_fallocate+0x20c/0x270 >> vfs_fallocate+0x154/0x270 >> SyS_fallocate+0x43/0x80 >> entry_SYSCALL_64_fastpath+0x1f/0x96 >> >> Cc: Jeff Moyer >> Cc: Matthew Wilcox >> Cc: Ross Zwisler >> Reviewed-by: Jan Kara >> Reviewed-by: Christoph Hellwig >> Signed-off-by: Dan Williams > > Two comments when looking at this now: > >> +#define for_each_entry_pfn(entry, pfn, end_pfn) \ >> + for (pfn = dax_radix_pfn(entry), \ >> + end_pfn = pfn + dax_entry_size(entry) / PAGE_SIZE; \ >> + pfn < end_pfn; \ >> + pfn++) > > Why don't you declare 'end_pfn' inside the for() block? That way you don't > have to pass the variable as an argument to for_each_entry_pfn(). It's not > like you need end_pfn anywhere in the loop body, you just use it to cache > loop termination index. Agreed, good catch. > >> @@ -547,6 +599,10 @@ static void *dax_insert_mapping_entry(struct address_space *mapping, >> >> spin_lock_irq(&mapping->tree_lock); >> new_entry = dax_radix_locked_entry(pfn, flags); >> + if (dax_entry_size(entry) != dax_entry_size(new_entry)) { >> + dax_disassociate_entry(entry, mapping, false); >> + dax_associate_entry(new_entry, mapping); >> + } > > I find it quite tricky that in case we pass zero page / empty entry into > dax_[dis]associate_entry(), it will not do anything because > dax_entry_size() will return 0. Can we add an explicit check into > dax_[dis]associate_entry() or at least a comment there? Ok, will do.