linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Remove nrexceptional tracking
@ 2020-08-04 16:17 Matthew Wilcox (Oracle)
  2020-08-04 16:17 ` [PATCH 1/4] mm: Introduce and use page_cache_empty Matthew Wilcox (Oracle)
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-04 16:17 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-nvdimm

We actually use nrexceptional for very little these days.  It's a constant
source of pain with the THP patches because we don't know how large a
shadow entry is, so either we have to ask the xarray how many indices
it covers, or store that information in the shadow entry (and reduce
the amount of other information in the shadow entry proportionally).
While tracking down the most recent case of "evict tells me I've got
the accounting wrong again", I wondered if it might not be simpler to
just remove it.  So here's a patch set to do just that.  I think each
of these patches is an improvement in isolation, but the combination of
all four is larger than the sum of its parts.

I'm running xfstests on this patchset right now.  If one of the DAX
people could try it out, that'd be fantastic.

Matthew Wilcox (Oracle) (4):
  mm: Introduce and use page_cache_empty
  mm: Stop accounting shadow entries
  dax: Account DAX entries as nrpages
  mm: Remove nrexceptional from inode

 fs/block_dev.c          |  2 +-
 fs/dax.c                |  8 ++++----
 fs/inode.c              |  2 +-
 include/linux/fs.h      |  2 --
 include/linux/pagemap.h |  5 +++++
 mm/filemap.c            | 15 ---------------
 mm/truncate.c           | 19 +++----------------
 mm/workingset.c         |  1 -
 8 files changed, 14 insertions(+), 40 deletions(-)

-- 
2.27.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] mm: Introduce and use page_cache_empty
  2020-08-04 16:17 [PATCH 0/4] Remove nrexceptional tracking Matthew Wilcox (Oracle)
@ 2020-08-04 16:17 ` Matthew Wilcox (Oracle)
  2020-08-06 23:24   ` Kirill A. Shutemov
  2020-08-04 16:17 ` [PATCH 2/4] mm: Stop accounting shadow entries Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-04 16:17 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-nvdimm

Instead of checking the two counters (nrpages and nrexceptional), we
can just check whether i_pages is empty.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/block_dev.c          |  2 +-
 fs/dax.c                |  2 +-
 include/linux/pagemap.h |  5 +++++
 mm/truncate.c           | 18 +++---------------
 4 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 0ae656e022fd..2a77bd2c6144 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -79,7 +79,7 @@ static void kill_bdev(struct block_device *bdev)
 {
 	struct address_space *mapping = bdev->bd_inode->i_mapping;
 
-	if (mapping->nrpages == 0 && mapping->nrexceptional == 0)
+	if (page_cache_empty(mapping))
 		return;
 
 	invalidate_bh_lrus();
diff --git a/fs/dax.c b/fs/dax.c
index 11b16729b86f..2f75ee2cd41f 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -949,7 +949,7 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 	if (WARN_ON_ONCE(inode->i_blkbits != PAGE_SHIFT))
 		return -EIO;
 
-	if (!mapping->nrexceptional || wbc->sync_mode != WB_SYNC_ALL)
+	if (page_cache_empty(mapping) || wbc->sync_mode != WB_SYNC_ALL)
 		return 0;
 
 	trace_dax_writeback_range(inode, xas.xa_index, end_index);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 484a36185bb5..a474a92a2a72 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -18,6 +18,11 @@
 
 struct pagevec;
 
+static inline bool page_cache_empty(struct address_space *mapping)
+{
+	return xa_empty(&mapping->i_pages);
+}
+
 /*
  * Bits in mapping->flags.
  */
diff --git a/mm/truncate.c b/mm/truncate.c
index dd9ebc1da356..7c4c8ac140be 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -300,7 +300,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 	pgoff_t		index;
 	int		i;
 
-	if (mapping->nrpages == 0 && mapping->nrexceptional == 0)
+	if (page_cache_empty(mapping))
 		goto out;
 
 	/* Offsets within partial pages */
@@ -488,9 +488,6 @@ EXPORT_SYMBOL(truncate_inode_pages);
  */
 void truncate_inode_pages_final(struct address_space *mapping)
 {
-	unsigned long nrexceptional;
-	unsigned long nrpages;
-
 	/*
 	 * Page reclaim can not participate in regular inode lifetime
 	 * management (can't call iput()) and thus can race with the
@@ -500,16 +497,7 @@ void truncate_inode_pages_final(struct address_space *mapping)
 	 */
 	mapping_set_exiting(mapping);
 
-	/*
-	 * When reclaim installs eviction entries, it increases
-	 * nrexceptional first, then decreases nrpages.  Make sure we see
-	 * this in the right order or we might miss an entry.
-	 */
-	nrpages = mapping->nrpages;
-	smp_rmb();
-	nrexceptional = mapping->nrexceptional;
-
-	if (nrpages || nrexceptional) {
+	if (!page_cache_empty(mapping)) {
 		/*
 		 * As truncation uses a lockless tree lookup, cycle
 		 * the tree lock to make sure any ongoing tree
@@ -692,7 +680,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
 	int ret2 = 0;
 	int did_range_unmap = 0;
 
-	if (mapping->nrpages == 0 && mapping->nrexceptional == 0)
+	if (page_cache_empty(mapping))
 		goto out;
 
 	pagevec_init(&pvec);
-- 
2.27.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/4] mm: Stop accounting shadow entries
  2020-08-04 16:17 [PATCH 0/4] Remove nrexceptional tracking Matthew Wilcox (Oracle)
  2020-08-04 16:17 ` [PATCH 1/4] mm: Introduce and use page_cache_empty Matthew Wilcox (Oracle)
@ 2020-08-04 16:17 ` Matthew Wilcox (Oracle)
  2020-08-04 16:17 ` [PATCH 3/4] dax: Account DAX entries as nrpages Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-04 16:17 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-nvdimm

We no longer need to keep track of how many shadow entries are
present in a mapping.  This saves a few writes to the inode and
memory barriers.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 mm/filemap.c    | 12 ------------
 mm/truncate.c   |  1 -
 mm/workingset.c |  1 -
 3 files changed, 14 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 76383d558b7c..7c3f97bd6dcd 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -139,17 +139,6 @@ static void page_cache_delete(struct address_space *mapping,
 
 	page->mapping = NULL;
 	/* Leave page->index set: truncation lookup relies upon it */
-
-	if (shadow) {
-		mapping->nrexceptional += nr;
-		/*
-		 * Make sure the nrexceptional update is committed before
-		 * the nrpages update so that final truncate racing
-		 * with reclaim does not see both counters 0 at the
-		 * same time and miss a shadow entry.
-		 */
-		smp_wmb();
-	}
 	mapping->nrpages -= nr;
 }
 
@@ -860,7 +849,6 @@ static int __add_to_page_cache_locked(struct page *page,
 			goto unlock;
 
 		if (xa_is_value(old)) {
-			mapping->nrexceptional--;
 			if (shadowp)
 				*shadowp = old;
 		}
diff --git a/mm/truncate.c b/mm/truncate.c
index 7c4c8ac140be..a59184793607 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -40,7 +40,6 @@ static inline void __clear_shadow_entry(struct address_space *mapping,
 	if (xas_load(&xas) != entry)
 		return;
 	xas_store(&xas, NULL);
-	mapping->nrexceptional--;
 }
 
 static void clear_shadow_entry(struct address_space *mapping, pgoff_t index,
diff --git a/mm/workingset.c b/mm/workingset.c
index fdeabea54e77..0649bfb1ca33 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -547,7 +547,6 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
 		goto out_invalid;
 	if (WARN_ON_ONCE(node->count != node->nr_values))
 		goto out_invalid;
-	mapping->nrexceptional -= node->nr_values;
 	xas.xa_node = xa_parent_locked(&mapping->i_pages, node);
 	xas.xa_offset = node->offset;
 	xas.xa_shift = node->shift + XA_CHUNK_SHIFT;
-- 
2.27.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 3/4] dax: Account DAX entries as nrpages
  2020-08-04 16:17 [PATCH 0/4] Remove nrexceptional tracking Matthew Wilcox (Oracle)
  2020-08-04 16:17 ` [PATCH 1/4] mm: Introduce and use page_cache_empty Matthew Wilcox (Oracle)
  2020-08-04 16:17 ` [PATCH 2/4] mm: Stop accounting shadow entries Matthew Wilcox (Oracle)
@ 2020-08-04 16:17 ` Matthew Wilcox (Oracle)
  2020-08-04 16:17 ` [PATCH 4/4] mm: Remove nrexceptional from inode Matthew Wilcox (Oracle)
  2020-08-06 19:44 ` [PATCH 0/4] Remove nrexceptional tracking Verma, Vishal L
  4 siblings, 0 replies; 11+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-04 16:17 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-nvdimm

Simplify mapping_needs_writeback() by accounting DAX entries as
pages instead of exceptional entries.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/dax.c     | 6 +++---
 mm/filemap.c | 3 ---
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 2f75ee2cd41f..71ddab04d91d 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -525,7 +525,7 @@ static void *grab_mapping_entry(struct xa_state *xas,
 		dax_disassociate_entry(entry, mapping, false);
 		xas_store(xas, NULL);	/* undo the PMD join */
 		dax_wake_entry(xas, entry, true);
-		mapping->nrexceptional--;
+		mapping->nrpages -= PG_PMD_NR;
 		entry = NULL;
 		xas_set(xas, index);
 	}
@@ -541,7 +541,7 @@ static void *grab_mapping_entry(struct xa_state *xas,
 		dax_lock_entry(xas, entry);
 		if (xas_error(xas))
 			goto out_unlock;
-		mapping->nrexceptional++;
+		mapping->nrpages += 1UL << order;
 	}
 
 out_unlock:
@@ -644,7 +644,7 @@ static int __dax_invalidate_entry(struct address_space *mapping,
 		goto out;
 	dax_disassociate_entry(entry, mapping, trunc);
 	xas_store(&xas, NULL);
-	mapping->nrexceptional--;
+	mapping->nrpages -= dax_entry_order(entry);
 	ret = 1;
 out:
 	put_unlocked_entry(&xas, entry);
diff --git a/mm/filemap.c b/mm/filemap.c
index 7c3f97bd6dcd..ef8ecfba7f9b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -615,9 +615,6 @@ EXPORT_SYMBOL(filemap_fdatawait_keep_errors);
 /* Returns true if writeback might be needed or already in progress. */
 static bool mapping_needs_writeback(struct address_space *mapping)
 {
-	if (dax_mapping(mapping))
-		return mapping->nrexceptional;
-
 	return mapping->nrpages;
 }
 
-- 
2.27.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 4/4] mm: Remove nrexceptional from inode
  2020-08-04 16:17 [PATCH 0/4] Remove nrexceptional tracking Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2020-08-04 16:17 ` [PATCH 3/4] dax: Account DAX entries as nrpages Matthew Wilcox (Oracle)
@ 2020-08-04 16:17 ` Matthew Wilcox (Oracle)
  2020-08-06 19:44 ` [PATCH 0/4] Remove nrexceptional tracking Verma, Vishal L
  4 siblings, 0 replies; 11+ messages in thread
From: Matthew Wilcox (Oracle) @ 2020-08-04 16:17 UTC (permalink / raw)
  To: linux-mm, linux-fsdevel; +Cc: Matthew Wilcox (Oracle), linux-nvdimm

We no longer track anything in nrexceptional, so remove it, saving 8
bytes per inode.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/inode.c         | 2 +-
 include/linux/fs.h | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index 72c4c347afb7..b5c4aff077b7 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -528,7 +528,7 @@ void clear_inode(struct inode *inode)
 	 */
 	xa_lock_irq(&inode->i_data.i_pages);
 	BUG_ON(inode->i_data.nrpages);
-	BUG_ON(inode->i_data.nrexceptional);
+	BUG_ON(!page_cache_empty(&inode->i_data));
 	xa_unlock_irq(&inode->i_data.i_pages);
 	BUG_ON(!list_empty(&inode->i_data.private_list));
 	BUG_ON(!(inode->i_state & I_FREEING));
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f5abba86107d..4fd8923cba43 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -436,7 +436,6 @@ int pagecache_write_end(struct file *, struct address_space *mapping,
  * @i_mmap: Tree of private and shared mappings.
  * @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable.
  * @nrpages: Number of page entries, protected by the i_pages lock.
- * @nrexceptional: Shadow or DAX entries, protected by the i_pages lock.
  * @writeback_index: Writeback starts here.
  * @a_ops: Methods.
  * @flags: Error bits and flags (AS_*).
@@ -457,7 +456,6 @@ struct address_space {
 	struct rb_root_cached	i_mmap;
 	struct rw_semaphore	i_mmap_rwsem;
 	unsigned long		nrpages;
-	unsigned long		nrexceptional;
 	pgoff_t			writeback_index;
 	const struct address_space_operations *a_ops;
 	unsigned long		flags;
-- 
2.27.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] Remove nrexceptional tracking
  2020-08-04 16:17 [PATCH 0/4] Remove nrexceptional tracking Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2020-08-04 16:17 ` [PATCH 4/4] mm: Remove nrexceptional from inode Matthew Wilcox (Oracle)
@ 2020-08-06 19:44 ` Verma, Vishal L
  2020-08-06 20:16   ` Verma, Vishal L
  4 siblings, 1 reply; 11+ messages in thread
From: Verma, Vishal L @ 2020-08-06 19:44 UTC (permalink / raw)
  To: linux-mm, willy, linux-fsdevel; +Cc: linux-nvdimm

On Tue, 2020-08-04 at 17:17 +0100, Matthew Wilcox (Oracle) wrote:
> We actually use nrexceptional for very little these days.  It's a
> constant
> source of pain with the THP patches because we don't know how large a
> shadow entry is, so either we have to ask the xarray how many indices
> it covers, or store that information in the shadow entry (and reduce
> the amount of other information in the shadow entry proportionally).
> While tracking down the most recent case of "evict tells me I've got
> the accounting wrong again", I wondered if it might not be simpler to
> just remove it.  So here's a patch set to do just that.  I think each
> of these patches is an improvement in isolation, but the combination
> of
> all four is larger than the sum of its parts.
> 
> I'm running xfstests on this patchset right now.  If one of the DAX
> people could try it out, that'd be fantastic.
> 
> Matthew Wilcox (Oracle) (4):
>   mm: Introduce and use page_cache_empty
>   mm: Stop accounting shadow entries
>   dax: Account DAX entries as nrpages
>   mm: Remove nrexceptional from inode

Hi Matthew,

I applied these on top of 5.8 and ran them through the nvdimm unit test
suite, and saw some test failures. The first failing test signature is:

  + umount test_dax_mnt
  ./dax-ext4.sh: line 62: 15749 Segmentation fault      umount $MNT
  FAIL dax-ext4.sh (exit status: 139)

The line is: https://github.com/pmem/ndctl/blob/master/test/dax.sh#L79
And the failing umount happens right after 'run_test', which calls this:
https://github.com/pmem/ndctl/blob/master/test/dax-pmd.c


> 
>  fs/block_dev.c          |  2 +-
>  fs/dax.c                |  8 ++++----
>  fs/inode.c              |  2 +-
>  include/linux/fs.h      |  2 --
>  include/linux/pagemap.h |  5 +++++
>  mm/filemap.c            | 15 ---------------
>  mm/truncate.c           | 19 +++----------------
>  mm/workingset.c         |  1 -
>  8 files changed, 14 insertions(+), 40 deletions(-)
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] Remove nrexceptional tracking
  2020-08-06 19:44 ` [PATCH 0/4] Remove nrexceptional tracking Verma, Vishal L
@ 2020-08-06 20:16   ` Verma, Vishal L
  2020-10-08 19:33     ` Matthew Wilcox
  0 siblings, 1 reply; 11+ messages in thread
From: Verma, Vishal L @ 2020-08-06 20:16 UTC (permalink / raw)
  To: linux-mm, willy, linux-fsdevel; +Cc: linux-nvdimm

On Thu, 2020-08-06 at 19:44 +0000, Verma, Vishal L wrote:
> > 
> > I'm running xfstests on this patchset right now.  If one of the DAX
> > people could try it out, that'd be fantastic.
> > 
> > Matthew Wilcox (Oracle) (4):
> >   mm: Introduce and use page_cache_empty
> >   mm: Stop accounting shadow entries
> >   dax: Account DAX entries as nrpages
> >   mm: Remove nrexceptional from inode
> 
> Hi Matthew,
> 
> I applied these on top of 5.8 and ran them through the nvdimm unit test
> suite, and saw some test failures. The first failing test signature is:
> 
>   + umount test_dax_mnt
>   ./dax-ext4.sh: line 62: 15749 Segmentation fault      umount $MNT
>   FAIL dax-ext4.sh (exit status: 139)
> 
> The line is: https://github.com/pmem/ndctl/blob/master/test/dax.sh#L79
> And the failing umount happens right after 'run_test', which calls this:
> https://github.com/pmem/ndctl/blob/master/test/dax-pmd.c
> 
> 
And here is the associated kernel splat:

[   89.570142] EXT4-fs (pmem4): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
[   89.573407] EXT4-fs (pmem4): mounted filesystem with ordered data mode. Opts: dax
[   90.450644] Injecting memory failure for pfn 0x148900 at process virtual address 0x7f2ec9b00000
[   90.452421] Memory failure: 0x148900: Sending SIGBUS to dax-pmd:16886 due to hardware memory corruption
[   90.454067] Memory failure: 0x148900: recovery action for dax page: Recovered
[   91.656624] ------------[ cut here ]------------
[   91.657822] kernel BUG at fs/inode.c:530!
[   91.658850] invalid opcode: 0000 [#1] SMP PTI
[   91.659883] CPU: 4 PID: 16891 Comm: umount Tainted: G           O      5.8.0-00004-g6161e2d6ac48 #38
[   91.661861] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[   91.664920] RIP: 0010:clear_inode+0x78/0x90
[   91.665934] Code: 00 a8 20 74 2b a8 40 75 29 48 8b 93 f0 01 00 00 48 8d 83 f0 01 00 00 48 39 c2 75 18 48 c7 83 e0 00 00 00 60 00 00 00 5b 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 66 66 2e 0f 1f 84 00 00 00 00
[   91.669979] RSP: 0018:ffffc9000212fd98 EFLAGS: 00010002
[   91.671151] RAX: 0000000000000004 RBX: ffff88810e33d470 RCX: 8c6318c6318c6319
[   91.672689] RDX: ffff888110b48040 RSI: 0000000000000004 RDI: 0000000000000046
[   91.674208] RBP: ffff88810e33d6b8 R08: 0000000000000000 R09: 0000000000000001
[   91.675760] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88810e33d6b0
[   91.677332] R13: ffff88811014c828 R14: ffff88811014c9c0 R15: ffff88810e3398c0
[   91.678940] FS:  00007f86a7e74c80(0000) GS:ffff888117c00000(0000) knlGS:0000000000000000
[   91.680745] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   91.682001] CR2: 000055bcfb4c3008 CR3: 0000000110b7c000 CR4: 00000000000006e0
[   91.683581] Call Trace:
[   91.684268]  ext4_clear_inode+0x16/0x80
[   91.685298]  ext4_evict_inode+0x5f/0x610
[   91.686245]  evict+0xcf/0x1f0
[   91.687017]  dispose_list+0x48/0x70
[   91.687937]  evict_inodes+0x152/0x190
[   91.688918]  generic_shutdown_super+0x37/0x100
[   91.689880]  kill_block_super+0x21/0x50
[   91.690784]  deactivate_locked_super+0x36/0xa0
[   91.691790]  cleanup_mnt+0x12d/0x190
[   91.692658]  task_work_run+0x5c/0xa0
[   91.694739]  __prepare_exit_to_usermode+0x1b6/0x1c0
[   91.695806]  do_syscall_64+0x5e/0xb0
[   91.696656]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   91.697798] RIP: 0033:0x7f86a80c694b
[   91.698650] Code: 15 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1d 15 0c 00 f7 d8 64 89 01 48
[   91.702545] RSP: 002b:00007ffd49a8cb98 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[   91.704204] RAX: 0000000000000000 RBX: 00007f86a81ee204 RCX: 00007f86a80c694b
[   91.705732] RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000055bcfb4bf2a0
[   91.707207] RBP: 000055bcfb4bafa0 R08: 0000000000000000 R09: 00007f86a8188a40
[   91.708809] R10: 000055bcfb4c1550 R11: 0000000000000246 R12: 0000000000000000
[   91.710040] R13: 000055bcfb4bf2a0 R14: 000055bcfb4bb0b0 R15: 000055bcfb4bb1d0
[   91.711193] Modules linked in: nd_e820(O) nd_blk(O) nfit(O) kmem nd_pmem(O) nd_btt(O) dax_pmem(O) dax_pmem_core(O) libnvdimm(O) device_dax(O) nfit_test_iomap(O) [last unloaded: nfit_test_iomap]
[   91.714154] ---[ end trace 016cb116e8654993 ]---
[   91.715011] RIP: 0010:clear_inode+0x78/0x90
[   91.715803] Code: 00 a8 20 74 2b a8 40 75 29 48 8b 93 f0 01 00 00 48 8d 83 f0 01 00 00 48 39 c2 75 18 48 c7 83 e0 00 00 00 60 00 00 00 5b 5d c3 <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 0b 66 66 2e 0f 1f 84 00 00 00 00
[   91.718950] RSP: 0018:ffffc9000212fd98 EFLAGS: 00010002
[   91.719976] RAX: 0000000000000004 RBX: ffff88810e33d470 RCX: 8c6318c6318c6319
[   91.721187] RDX: ffff888110b48040 RSI: 0000000000000004 RDI: 0000000000000046
[   91.722473] RBP: ffff88810e33d6b8 R08: 0000000000000000 R09: 0000000000000001
[   91.723633] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88810e33d6b0
[   91.724838] R13: ffff88811014c828 R14: ffff88811014c9c0 R15: ffff88810e3398c0
[   91.726109] FS:  00007f86a7e74c80(0000) GS:ffff888117c00000(0000) knlGS:0000000000000000
[   91.727575] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   91.728596] CR2: 000055bcfb4c3008 CR3: 0000000110b7c000 CR4: 00000000000006e0
[   91.729794] note: umount[16891] exited with preempt_count 1
[   91.730746] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49
[   91.732304] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 16891, name: umount
[   91.733764] INFO: lockdep is turned off.
[   91.734476] irq event stamp: 6192
[   91.735067] hardirqs last  enabled at (6191): [<ffffffff81c7a974>] _raw_spin_unlock_irq+0x24/0x40
[   91.736641] hardirqs last disabled at (6192): [<ffffffff81c7ab27>] _raw_spin_lock_irq+0x17/0x80
[   91.738141] softirqs last  enabled at (6120): [<ffffffff8134b3b8>] bdi_split_work_to_wbs+0x248/0x580
[   91.739752] softirqs last disabled at (6116): [<ffffffff81347a5d>] wb_queue_work+0x4d/0x180
[   91.741297] CPU: 4 PID: 16891 Comm: umount Tainted: G      D    O      5.8.0-00004-g6161e2d6ac48 #38
[   91.742940] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[   91.744887] Call Trace:
[   91.745401]  dump_stack+0x92/0xc8
[   91.745987]  ___might_sleep.cold+0xb6/0xc6
[   91.746716]  exit_signals+0x1c/0x2d0
[   91.747355]  do_exit+0xc0/0xba0
[   91.747942]  ? __prepare_exit_to_usermode+0x1b6/0x1c0
[   91.748821]  rewind_stack_do_exit+0x17/0x20
[   91.749619] RIP: 0033:0x7f86a80c694b
[   91.750330] Code: 15 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1d 15 0c 00 f7 d8 64 89 01 48
[   91.753544] RSP: 002b:00007ffd49a8cb98 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[   91.754902] RAX: 0000000000000000 RBX: 00007f86a81ee204 RCX: 00007f86a80c694b
[   91.756176] RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 000055bcfb4bf2a0
[   91.757746] RBP: 000055bcfb4bafa0 R08: 0000000000000000 R09: 00007f86a8188a40
[   91.759034] R10: 000055bcfb4c1550 R11: 0000000000000246 R12: 0000000000000000
[   91.760166] R13: 000055bcfb4bf2a0 R14: 000055bcfb4bb0b0 R15: 000055bcfb4bb1d0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] mm: Introduce and use page_cache_empty
  2020-08-04 16:17 ` [PATCH 1/4] mm: Introduce and use page_cache_empty Matthew Wilcox (Oracle)
@ 2020-08-06 23:24   ` Kirill A. Shutemov
  2020-08-16  1:48     ` Matthew Wilcox
  0 siblings, 1 reply; 11+ messages in thread
From: Kirill A. Shutemov @ 2020-08-06 23:24 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: linux-mm, linux-fsdevel, linux-nvdimm

On Tue, Aug 04, 2020 at 05:17:52PM +0100, Matthew Wilcox (Oracle) wrote:
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 484a36185bb5..a474a92a2a72 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -18,6 +18,11 @@
>  
>  struct pagevec;
>  
> +static inline bool page_cache_empty(struct address_space *mapping)
> +{
> +	return xa_empty(&mapping->i_pages);

What about something like

	bool empty = xa_empty(&mapping->i_pages);
	VM_BUG_ON(empty && mapping->nrpages);
	return empty;

?

> +}
> +
>  /*
>   * Bits in mapping->flags.
>   */

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/4] mm: Introduce and use page_cache_empty
  2020-08-06 23:24   ` Kirill A. Shutemov
@ 2020-08-16  1:48     ` Matthew Wilcox
  0 siblings, 0 replies; 11+ messages in thread
From: Matthew Wilcox @ 2020-08-16  1:48 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: linux-mm, linux-fsdevel, linux-nvdimm

On Fri, Aug 07, 2020 at 02:24:00AM +0300, Kirill A. Shutemov wrote:
> On Tue, Aug 04, 2020 at 05:17:52PM +0100, Matthew Wilcox (Oracle) wrote:
> > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> > index 484a36185bb5..a474a92a2a72 100644
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -18,6 +18,11 @@
> >  
> >  struct pagevec;
> >  
> > +static inline bool page_cache_empty(struct address_space *mapping)
> > +{
> > +	return xa_empty(&mapping->i_pages);
> 
> What about something like
> 
> 	bool empty = xa_empty(&mapping->i_pages);
> 	VM_BUG_ON(empty && mapping->nrpages);
> 	return empty;

I tried this and it's triggered by generic/418.  The problem
is that it's called when the pagecache lock isn't held (by
invalidate_inode_pages2_range), so it's possible for xa_empty() to
return true, then a page be added to the page cache, and mapping->pages
be incremented to 1.  That seems to be what's happened here:

(gdb) p/x *(struct address_space *)0xffff88804b21b360
$2 = {host = 0xffff88804b21b200, i_pages = {xa_lock = {{rlock = {raw_lock = {{
              val = {counter = 0x0}, {locked = 0x0, pending = 0x0}, {
                locked_pending = 0x0, tail = 0x0}}}}}}, xa_flags = 0x21, 
*  xa_head = 0xffffea0001e187c0}, gfp_mask = 0x100c4a, i_mmap_writable = {
    counter = 0x0}, nr_thps = {counter = 0x0}, i_mmap = {rb_root = {
      rb_node = 0x0}, rb_leftmost = 0x0}, i_mmap_rwsem = {count = {
      counter = 0x0}, owner = {counter = 0x0}, osq = {tail = {counter = 0x0}}, 
    wait_lock = {raw_lock = {{val = {counter = 0x0}, {locked = 0x0, 
            pending = 0x0}, {locked_pending = 0x0, tail = 0x0}}}}, 
    wait_list = {next = 0xffff88804b21b3b0, prev = 0xffff88804b21b3b0}}, 
* nrpages = 0x1, writeback_index = 0x0, a_ops = 0xffffffff81c2ed60, 
  flags = 0x40, wb_err = 0x0, private_lock = {{rlock = {raw_lock = {{val = {
              counter = 0x0}, {locked = 0x0, pending = 0x0}, {
              locked_pending = 0x0, tail = 0x0}}}}}}, private_list = {
    next = 0xffff88804b21b3e8, prev = 0xffff88804b21b3e8}, private_data = 0x0}

(marked the critical lines with *)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] Remove nrexceptional tracking
  2020-08-06 20:16   ` Verma, Vishal L
@ 2020-10-08 19:33     ` Matthew Wilcox
  2020-10-09 23:04       ` Verma, Vishal L
  0 siblings, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2020-10-08 19:33 UTC (permalink / raw)
  To: Verma, Vishal L; +Cc: linux-mm, linux-fsdevel, linux-nvdimm

On Thu, Aug 06, 2020 at 08:16:02PM +0000, Verma, Vishal L wrote:
> On Thu, 2020-08-06 at 19:44 +0000, Verma, Vishal L wrote:
> > > 
> > > I'm running xfstests on this patchset right now.  If one of the DAX
> > > people could try it out, that'd be fantastic.
> > > 
> > > Matthew Wilcox (Oracle) (4):
> > >   mm: Introduce and use page_cache_empty
> > >   mm: Stop accounting shadow entries
> > >   dax: Account DAX entries as nrpages
> > >   mm: Remove nrexceptional from inode
> > 
> > Hi Matthew,
> > 
> > I applied these on top of 5.8 and ran them through the nvdimm unit test
> > suite, and saw some test failures. The first failing test signature is:
> > 
> >   + umount test_dax_mnt
> >   ./dax-ext4.sh: line 62: 15749 Segmentation fault      umount $MNT
> >   FAIL dax-ext4.sh (exit status: 139)

Thanks.  Fixed:

+++ b/fs/dax.c
@@ -644,7 +644,7 @@ static int __dax_invalidate_entry(struct address_space *mapping,
                goto out;
        dax_disassociate_entry(entry, mapping, trunc);
        xas_store(&xas, NULL);
-       mapping->nrpages -= dax_entry_order(entry);
+       mapping->nrpages -= 1UL << dax_entry_order(entry);
        ret = 1;
 out:
        put_unlocked_entry(&xas, entry);

Updated git tree at
https://git.infradead.org/users/willy/pagecache.git/

It survives an xfstests run on an fsdax namespace which supports 2MB pages.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] Remove nrexceptional tracking
  2020-10-08 19:33     ` Matthew Wilcox
@ 2020-10-09 23:04       ` Verma, Vishal L
  0 siblings, 0 replies; 11+ messages in thread
From: Verma, Vishal L @ 2020-10-09 23:04 UTC (permalink / raw)
  To: willy; +Cc: linux-mm, linux-nvdimm, linux-fsdevel

On Thu, 2020-10-08 at 20:33 +0100, Matthew Wilcox wrote:
> On Thu, Aug 06, 2020 at 08:16:02PM +0000, Verma, Vishal L wrote:
> > On Thu, 2020-08-06 at 19:44 +0000, Verma, Vishal L wrote:
> > > > I'm running xfstests on this patchset right now.  If one of the DAX
> > > > people could try it out, that'd be fantastic.
> > > > 
> > > > Matthew Wilcox (Oracle) (4):
> > > >   mm: Introduce and use page_cache_empty
> > > >   mm: Stop accounting shadow entries
> > > >   dax: Account DAX entries as nrpages
> > > >   mm: Remove nrexceptional from inode
> > > 
> > > Hi Matthew,
> > > 
> > > I applied these on top of 5.8 and ran them through the nvdimm unit test
> > > suite, and saw some test failures. The first failing test signature is:
> > > 
> > >   + umount test_dax_mnt
> > >   ./dax-ext4.sh: line 62: 15749 Segmentation fault      umount $MNT
> > >   FAIL dax-ext4.sh (exit status: 139)
> 
> Thanks.  Fixed:
> 
> +++ b/fs/dax.c
> @@ -644,7 +644,7 @@ static int __dax_invalidate_entry(struct address_space *mapping,
>                 goto out;
>         dax_disassociate_entry(entry, mapping, trunc);
>         xas_store(&xas, NULL);
> -       mapping->nrpages -= dax_entry_order(entry);
> +       mapping->nrpages -= 1UL << dax_entry_order(entry);
>         ret = 1;
>  out:
>         put_unlocked_entry(&xas, entry);
> 
> Updated git tree at
> https://git.infradead.org/users/willy/pagecache.git/

I ran this tree through the unit tests, and everything passes.
(Well, while the tests passed, this tree as-is did have an RCU warning
splat. I rebased to v5.9-rc8 and that was fine).

Feel free to add:

Tested-by: Vishal Verma <vishal.l.verma@intel.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-10-09 23:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-04 16:17 [PATCH 0/4] Remove nrexceptional tracking Matthew Wilcox (Oracle)
2020-08-04 16:17 ` [PATCH 1/4] mm: Introduce and use page_cache_empty Matthew Wilcox (Oracle)
2020-08-06 23:24   ` Kirill A. Shutemov
2020-08-16  1:48     ` Matthew Wilcox
2020-08-04 16:17 ` [PATCH 2/4] mm: Stop accounting shadow entries Matthew Wilcox (Oracle)
2020-08-04 16:17 ` [PATCH 3/4] dax: Account DAX entries as nrpages Matthew Wilcox (Oracle)
2020-08-04 16:17 ` [PATCH 4/4] mm: Remove nrexceptional from inode Matthew Wilcox (Oracle)
2020-08-06 19:44 ` [PATCH 0/4] Remove nrexceptional tracking Verma, Vishal L
2020-08-06 20:16   ` Verma, Vishal L
2020-10-08 19:33     ` Matthew Wilcox
2020-10-09 23:04       ` Verma, Vishal L

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).