All of lore.kernel.org
 help / color / mirror / Atom feed
* scheduling while atomic in z3fold
@ 2020-11-28 14:05 Oleksandr Natalenko
  2020-11-28 14:09 ` Oleksandr Natalenko
  0 siblings, 1 reply; 70+ messages in thread
From: Oleksandr Natalenko @ 2020-11-28 14:05 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Andrew Morton, Sebastian Andrzej Siewior,
	Steven Rostedt, Mike Galbraith, Thomas Gleixner

Hi.

While running v5.10-rc5-rt11 I bumped into the following:

```
BUG: scheduling while atomic: git/18695/0x00000002
Preemption disabled at:
[<ffffffffbb93fcb3>] z3fold_zpool_malloc+0x463/0x6e0
…
Call Trace:
 dump_stack+0x6d/0x88
 __schedule_bug.cold+0x88/0x96
 __schedule+0x69e/0x8c0
 preempt_schedule_lock+0x51/0x150
 rt_spin_lock_slowlock_locked+0x117/0x2c0
 rt_spin_lock_slowlock+0x58/0x80
 rt_spin_lock+0x2a/0x40
 z3fold_zpool_malloc+0x4c1/0x6e0
 zswap_frontswap_store+0x39c/0x980
 __frontswap_store+0x6e/0xf0
 swap_writepage+0x39/0x70
 shmem_writepage+0x31b/0x490
 pageout+0xf4/0x350
 shrink_page_list+0xa28/0xcc0
 shrink_inactive_list+0x300/0x690
 shrink_lruvec+0x59a/0x770
 shrink_node+0x2d6/0x8d0
 do_try_to_free_pages+0xda/0x530
 try_to_free_pages+0xff/0x260
 __alloc_pages_slowpath.constprop.0+0x3d5/0x1230
 __alloc_pages_nodemask+0x2f6/0x350
 allocate_slab+0x3da/0x660
 ___slab_alloc+0x4ff/0x760
 __slab_alloc.constprop.0+0x7a/0x100
 kmem_cache_alloc+0x27b/0x2c0
 __d_alloc+0x22/0x230
 d_alloc_parallel+0x67/0x5e0
 __lookup_slow+0x5c/0x150
 path_lookupat+0x2ea/0x4d0
 filename_lookup+0xbf/0x210
 vfs_statx.constprop.0+0x4d/0x110
 __do_sys_newlstat+0x3d/0x80
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
```

The preemption seems to be disabled here:

```
$ scripts/faddr2line mm/z3fold.o z3fold_zpool_malloc+0x463
z3fold_zpool_malloc+0x463/0x6e0:
add_to_unbuddied at mm/z3fold.c:645
(inlined by) z3fold_alloc at mm/z3fold.c:1195
(inlined by) z3fold_zpool_malloc at mm/z3fold.c:1737
```

The call to the rt_spin_lock() seems to be here:

```
$ scripts/faddr2line mm/z3fold.o z3fold_zpool_malloc+0x4c1
z3fold_zpool_malloc+0x4c1/0x6e0:
add_to_unbuddied at mm/z3fold.c:649
(inlined by) z3fold_alloc at mm/z3fold.c:1195
(inlined by) z3fold_zpool_malloc at mm/z3fold.c:1737
```

Or, in source code:

```
 639 /* Add to the appropriate unbuddied list */
 640 static inline void add_to_unbuddied(struct z3fold_pool *pool,
 641                 struct z3fold_header *zhdr)
 642 {
 643     if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 ||
 644             zhdr->middle_chunks == 0) {
 645         struct list_head *unbuddied = get_cpu_ptr(pool->unbuddied);
 646
 647         int freechunks = num_free_chunks(zhdr);
 648         spin_lock(&pool->lock);
 649         list_add(&zhdr->buddy, &unbuddied[freechunks]);
 650         spin_unlock(&pool->lock);
 651         zhdr->cpu = smp_processor_id();
 652         put_cpu_ptr(pool->unbuddied);
 653     }
 654 }
```

Shouldn't the list manipulation be protected with
local_lock+this_cpu_ptr instead of get_cpu_ptr+spin_lock?

Thanks.

-- 
  Oleksandr Natalenko (post-factum)

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-28 14:05 scheduling while atomic in z3fold Oleksandr Natalenko
@ 2020-11-28 14:09 ` Oleksandr Natalenko
  2020-11-28 14:27   ` Oleksandr Natalenko
  0 siblings, 1 reply; 70+ messages in thread
From: Oleksandr Natalenko @ 2020-11-28 14:09 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Andrew Morton, Sebastian Andrzej Siewior,
	Steven Rostedt, Mike Galbraith, Thomas Gleixner, linux-rt-users

On Sat, Nov 28, 2020 at 03:05:24PM +0100, Oleksandr Natalenko wrote:
> Hi.
> 
> While running v5.10-rc5-rt11 I bumped into the following:
> 
> ```
> BUG: scheduling while atomic: git/18695/0x00000002
> Preemption disabled at:
> [<ffffffffbb93fcb3>] z3fold_zpool_malloc+0x463/0x6e0
> …
> Call Trace:
>  dump_stack+0x6d/0x88
>  __schedule_bug.cold+0x88/0x96
>  __schedule+0x69e/0x8c0
>  preempt_schedule_lock+0x51/0x150
>  rt_spin_lock_slowlock_locked+0x117/0x2c0
>  rt_spin_lock_slowlock+0x58/0x80
>  rt_spin_lock+0x2a/0x40
>  z3fold_zpool_malloc+0x4c1/0x6e0
>  zswap_frontswap_store+0x39c/0x980
>  __frontswap_store+0x6e/0xf0
>  swap_writepage+0x39/0x70
>  shmem_writepage+0x31b/0x490
>  pageout+0xf4/0x350
>  shrink_page_list+0xa28/0xcc0
>  shrink_inactive_list+0x300/0x690
>  shrink_lruvec+0x59a/0x770
>  shrink_node+0x2d6/0x8d0
>  do_try_to_free_pages+0xda/0x530
>  try_to_free_pages+0xff/0x260
>  __alloc_pages_slowpath.constprop.0+0x3d5/0x1230
>  __alloc_pages_nodemask+0x2f6/0x350
>  allocate_slab+0x3da/0x660
>  ___slab_alloc+0x4ff/0x760
>  __slab_alloc.constprop.0+0x7a/0x100
>  kmem_cache_alloc+0x27b/0x2c0
>  __d_alloc+0x22/0x230
>  d_alloc_parallel+0x67/0x5e0
>  __lookup_slow+0x5c/0x150
>  path_lookupat+0x2ea/0x4d0
>  filename_lookup+0xbf/0x210
>  vfs_statx.constprop.0+0x4d/0x110
>  __do_sys_newlstat+0x3d/0x80
>  do_syscall_64+0x33/0x40
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> ```
> 
> The preemption seems to be disabled here:
> 
> ```
> $ scripts/faddr2line mm/z3fold.o z3fold_zpool_malloc+0x463
> z3fold_zpool_malloc+0x463/0x6e0:
> add_to_unbuddied at mm/z3fold.c:645
> (inlined by) z3fold_alloc at mm/z3fold.c:1195
> (inlined by) z3fold_zpool_malloc at mm/z3fold.c:1737
> ```
> 
> The call to the rt_spin_lock() seems to be here:
> 
> ```
> $ scripts/faddr2line mm/z3fold.o z3fold_zpool_malloc+0x4c1
> z3fold_zpool_malloc+0x4c1/0x6e0:
> add_to_unbuddied at mm/z3fold.c:649
> (inlined by) z3fold_alloc at mm/z3fold.c:1195
> (inlined by) z3fold_zpool_malloc at mm/z3fold.c:1737
> ```
> 
> Or, in source code:
> 
> ```
>  639 /* Add to the appropriate unbuddied list */
>  640 static inline void add_to_unbuddied(struct z3fold_pool *pool,
>  641                 struct z3fold_header *zhdr)
>  642 {
>  643     if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 ||
>  644             zhdr->middle_chunks == 0) {
>  645         struct list_head *unbuddied = get_cpu_ptr(pool->unbuddied);
>  646
>  647         int freechunks = num_free_chunks(zhdr);
>  648         spin_lock(&pool->lock);
>  649         list_add(&zhdr->buddy, &unbuddied[freechunks]);
>  650         spin_unlock(&pool->lock);
>  651         zhdr->cpu = smp_processor_id();
>  652         put_cpu_ptr(pool->unbuddied);
>  653     }
>  654 }
> ```
> 
> Shouldn't the list manipulation be protected with
> local_lock+this_cpu_ptr instead of get_cpu_ptr+spin_lock?
> 
> Thanks.
> 
> -- 
>   Oleksandr Natalenko (post-factum)

Forgot to Cc linux-rt-users@, sorry.

-- 
  Oleksandr Natalenko (post-factum)

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-28 14:09 ` Oleksandr Natalenko
@ 2020-11-28 14:27   ` Oleksandr Natalenko
  2020-11-29  6:41       ` Mike Galbraith
  0 siblings, 1 reply; 70+ messages in thread
From: Oleksandr Natalenko @ 2020-11-28 14:27 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, Andrew Morton, Sebastian Andrzej Siewior,
	Steven Rostedt, Mike Galbraith, Thomas Gleixner, linux-rt-users

On Sat, Nov 28, 2020 at 03:09:24PM +0100, Oleksandr Natalenko wrote:
> > While running v5.10-rc5-rt11 I bumped into the following:
> > 
> > ```
> > BUG: scheduling while atomic: git/18695/0x00000002
> > Preemption disabled at:
> > [<ffffffffbb93fcb3>] z3fold_zpool_malloc+0x463/0x6e0
> > …
> > Call Trace:
> >  dump_stack+0x6d/0x88
> >  __schedule_bug.cold+0x88/0x96
> >  __schedule+0x69e/0x8c0
> >  preempt_schedule_lock+0x51/0x150
> >  rt_spin_lock_slowlock_locked+0x117/0x2c0
> >  rt_spin_lock_slowlock+0x58/0x80
> >  rt_spin_lock+0x2a/0x40
> >  z3fold_zpool_malloc+0x4c1/0x6e0
> >  zswap_frontswap_store+0x39c/0x980
> >  __frontswap_store+0x6e/0xf0
> >  swap_writepage+0x39/0x70
> >  shmem_writepage+0x31b/0x490
> >  pageout+0xf4/0x350
> >  shrink_page_list+0xa28/0xcc0
> >  shrink_inactive_list+0x300/0x690
> >  shrink_lruvec+0x59a/0x770
> >  shrink_node+0x2d6/0x8d0
> >  do_try_to_free_pages+0xda/0x530
> >  try_to_free_pages+0xff/0x260
> >  __alloc_pages_slowpath.constprop.0+0x3d5/0x1230
> >  __alloc_pages_nodemask+0x2f6/0x350
> >  allocate_slab+0x3da/0x660
> >  ___slab_alloc+0x4ff/0x760
> >  __slab_alloc.constprop.0+0x7a/0x100
> >  kmem_cache_alloc+0x27b/0x2c0
> >  __d_alloc+0x22/0x230
> >  d_alloc_parallel+0x67/0x5e0
> >  __lookup_slow+0x5c/0x150
> >  path_lookupat+0x2ea/0x4d0
> >  filename_lookup+0xbf/0x210
> >  vfs_statx.constprop.0+0x4d/0x110
> >  __do_sys_newlstat+0x3d/0x80
> >  do_syscall_64+0x33/0x40
> >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > ```
> > 
> > The preemption seems to be disabled here:
> > 
> > ```
> > $ scripts/faddr2line mm/z3fold.o z3fold_zpool_malloc+0x463
> > z3fold_zpool_malloc+0x463/0x6e0:
> > add_to_unbuddied at mm/z3fold.c:645
> > (inlined by) z3fold_alloc at mm/z3fold.c:1195
> > (inlined by) z3fold_zpool_malloc at mm/z3fold.c:1737
> > ```
> > 
> > The call to the rt_spin_lock() seems to be here:
> > 
> > ```
> > $ scripts/faddr2line mm/z3fold.o z3fold_zpool_malloc+0x4c1
> > z3fold_zpool_malloc+0x4c1/0x6e0:
> > add_to_unbuddied at mm/z3fold.c:649
> > (inlined by) z3fold_alloc at mm/z3fold.c:1195
> > (inlined by) z3fold_zpool_malloc at mm/z3fold.c:1737
> > ```
> > 
> > Or, in source code:
> > 
> > ```
> >  639 /* Add to the appropriate unbuddied list */
> >  640 static inline void add_to_unbuddied(struct z3fold_pool *pool,
> >  641                 struct z3fold_header *zhdr)
> >  642 {
> >  643     if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 ||
> >  644             zhdr->middle_chunks == 0) {
> >  645         struct list_head *unbuddied = get_cpu_ptr(pool->unbuddied);
> >  646
> >  647         int freechunks = num_free_chunks(zhdr);
> >  648         spin_lock(&pool->lock);
> >  649         list_add(&zhdr->buddy, &unbuddied[freechunks]);
> >  650         spin_unlock(&pool->lock);
> >  651         zhdr->cpu = smp_processor_id();
> >  652         put_cpu_ptr(pool->unbuddied);
> >  653     }
> >  654 }
> > ```
> > 
> > Shouldn't the list manipulation be protected with
> > local_lock+this_cpu_ptr instead of get_cpu_ptr+spin_lock?

Totally untested:

```
diff --git a/mm/z3fold.c b/mm/z3fold.c
index 18feaa0bc537..53fcb80c6167 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -41,6 +41,7 @@
 #include <linux/workqueue.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
+#include <linux/local_lock.h>
 #include <linux/zpool.h>
 #include <linux/magic.h>
 #include <linux/kmemleak.h>
@@ -156,6 +157,7 @@ struct z3fold_pool {
 	const char *name;
 	spinlock_t lock;
 	spinlock_t stale_lock;
+	local_lock_t llock;
 	struct list_head *unbuddied;
 	struct list_head lru;
 	struct list_head stale;
@@ -642,14 +644,17 @@ static inline void add_to_unbuddied(struct z3fold_pool *pool,
 {
 	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 ||
 			zhdr->middle_chunks == 0) {
-		struct list_head *unbuddied = get_cpu_ptr(pool->unbuddied);
+		struct list_head *unbuddied;
+		int freechunks;
+		local_lock(&pool->llock);
+		unbuddied = *this_cpu_ptr(&pool->unbuddied);
 
-		int freechunks = num_free_chunks(zhdr);
+		freechunks = num_free_chunks(zhdr);
 		spin_lock(&pool->lock);
 		list_add(&zhdr->buddy, &unbuddied[freechunks]);
 		spin_unlock(&pool->lock);
 		zhdr->cpu = smp_processor_id();
-		put_cpu_ptr(pool->unbuddied);
+		local_unlock(&pool->llock);
 	}
 }
 
@@ -887,7 +892,8 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
 
 lookup:
 	/* First, try to find an unbuddied z3fold page. */
-	unbuddied = get_cpu_ptr(pool->unbuddied);
+	local_lock(&pool->llock);
+	unbuddied = *this_cpu_ptr(&pool->unbuddied);
 	for_each_unbuddied_list(i, chunks) {
 		struct list_head *l = &unbuddied[i];
 
@@ -905,7 +911,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
 		    !z3fold_page_trylock(zhdr)) {
 			spin_unlock(&pool->lock);
 			zhdr = NULL;
-			put_cpu_ptr(pool->unbuddied);
+			local_unlock(&pool->llock);
 			if (can_sleep)
 				cond_resched();
 			goto lookup;
@@ -919,7 +925,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
 		    test_bit(PAGE_CLAIMED, &page->private)) {
 			z3fold_page_unlock(zhdr);
 			zhdr = NULL;
-			put_cpu_ptr(pool->unbuddied);
+			local_unlock(&pool->llock);
 			if (can_sleep)
 				cond_resched();
 			goto lookup;
@@ -934,7 +940,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
 		kref_get(&zhdr->refcount);
 		break;
 	}
-	put_cpu_ptr(pool->unbuddied);
+	local_unlock(&pool->llock);
 
 	if (!zhdr) {
 		int cpu;
@@ -1005,6 +1011,7 @@ static struct z3fold_pool *z3fold_create_pool(const char *name, gfp_t gfp,
 		goto out_c;
 	spin_lock_init(&pool->lock);
 	spin_lock_init(&pool->stale_lock);
+	local_lock_init(&pool->llock);
 	pool->unbuddied = __alloc_percpu(sizeof(struct list_head)*NCHUNKS, 2);
 	if (!pool->unbuddied)
 		goto out_pool;

```

-- 
  Oleksandr Natalenko (post-factum)

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-28 14:27   ` Oleksandr Natalenko
@ 2020-11-29  6:41       ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-29  6:41 UTC (permalink / raw)
  To: Oleksandr Natalenko, linux-kernel
  Cc: linux-mm, Andrew Morton, Sebastian Andrzej Siewior,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Sat, 2020-11-28 at 15:27 +0100, Oleksandr Natalenko wrote:
>
> > > Shouldn't the list manipulation be protected with
> > > local_lock+this_cpu_ptr instead of get_cpu_ptr+spin_lock?
>
> Totally untested:

Hrm, the thing doesn't seem to care deeply about preemption being
disabled, so adding another lock may be overkill.  It looks like you
could get the job done via migrate_disable()+this_cpu_ptr().

In the case of the list in __z3fold_alloc(), after the unlocked peek,
it double checks under the existing lock.  There's not a whole lot of
difference between another cpu a few lines down in the same function
diddling the list and a preemption doing the same, so why bother?

Ditto add_to_unbuddied().  Flow decisions are being made based on zhdr-
>first/middle/last_chunks state prior to preemption being disabled as
the thing sits, so presumably their stability is not dependent thereon,
while the list is protected by the already existing lock, making the
preempt_disable() look more incidental than intentional.

Equally untested :)

---
 mm/z3fold.c |   17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -642,14 +642,16 @@ static inline void add_to_unbuddied(stru
 {
 	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 ||
 			zhdr->middle_chunks == 0) {
-		struct list_head *unbuddied = get_cpu_ptr(pool->unbuddied);
-
+		struct list_head *unbuddied;
 		int freechunks = num_free_chunks(zhdr);
+
+		migrate_disable();
+		unbuddied = this_cpu_ptr(pool->unbuddied);
 		spin_lock(&pool->lock);
 		list_add(&zhdr->buddy, &unbuddied[freechunks]);
 		spin_unlock(&pool->lock);
 		zhdr->cpu = smp_processor_id();
-		put_cpu_ptr(pool->unbuddied);
+		migrate_enable();
 	}
 }

@@ -886,8 +888,9 @@ static inline struct z3fold_header *__z3
 	int chunks = size_to_chunks(size), i;

 lookup:
+	migrate_disable();
 	/* First, try to find an unbuddied z3fold page. */
-	unbuddied = get_cpu_ptr(pool->unbuddied);
+	unbuddied = this_cpu_ptr(pool->unbuddied);
 	for_each_unbuddied_list(i, chunks) {
 		struct list_head *l = &unbuddied[i];

@@ -905,7 +908,7 @@ static inline struct z3fold_header *__z3
 		    !z3fold_page_trylock(zhdr)) {
 			spin_unlock(&pool->lock);
 			zhdr = NULL;
-			put_cpu_ptr(pool->unbuddied);
+			migrate_enable();
 			if (can_sleep)
 				cond_resched();
 			goto lookup;
@@ -919,7 +922,7 @@ static inline struct z3fold_header *__z3
 		    test_bit(PAGE_CLAIMED, &page->private)) {
 			z3fold_page_unlock(zhdr);
 			zhdr = NULL;
-			put_cpu_ptr(pool->unbuddied);
+			migrate_enable();
 			if (can_sleep)
 				cond_resched();
 			goto lookup;
@@ -934,7 +937,7 @@ static inline struct z3fold_header *__z3
 		kref_get(&zhdr->refcount);
 		break;
 	}
-	put_cpu_ptr(pool->unbuddied);
+	migrate_enable();

 	if (!zhdr) {
 		int cpu;


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-11-29  6:41       ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-29  6:41 UTC (permalink / raw)
  To: Oleksandr Natalenko, linux-kernel
  Cc: linux-mm, Andrew Morton, Sebastian Andrzej Siewior,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Sat, 2020-11-28 at 15:27 +0100, Oleksandr Natalenko wrote:
>
> > > Shouldn't the list manipulation be protected with
> > > local_lock+this_cpu_ptr instead of get_cpu_ptr+spin_lock?
>
> Totally untested:

Hrm, the thing doesn't seem to care deeply about preemption being
disabled, so adding another lock may be overkill.  It looks like you
could get the job done via migrate_disable()+this_cpu_ptr().

In the case of the list in __z3fold_alloc(), after the unlocked peek,
it double checks under the existing lock.  There's not a whole lot of
difference between another cpu a few lines down in the same function
diddling the list and a preemption doing the same, so why bother?

Ditto add_to_unbuddied().  Flow decisions are being made based on zhdr-
>first/middle/last_chunks state prior to preemption being disabled as
the thing sits, so presumably their stability is not dependent thereon,
while the list is protected by the already existing lock, making the
preempt_disable() look more incidental than intentional.

Equally untested :)

---
 mm/z3fold.c |   17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -642,14 +642,16 @@ static inline void add_to_unbuddied(stru
 {
 	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 ||
 			zhdr->middle_chunks == 0) {
-		struct list_head *unbuddied = get_cpu_ptr(pool->unbuddied);
-
+		struct list_head *unbuddied;
 		int freechunks = num_free_chunks(zhdr);
+
+		migrate_disable();
+		unbuddied = this_cpu_ptr(pool->unbuddied);
 		spin_lock(&pool->lock);
 		list_add(&zhdr->buddy, &unbuddied[freechunks]);
 		spin_unlock(&pool->lock);
 		zhdr->cpu = smp_processor_id();
-		put_cpu_ptr(pool->unbuddied);
+		migrate_enable();
 	}
 }

@@ -886,8 +888,9 @@ static inline struct z3fold_header *__z3
 	int chunks = size_to_chunks(size), i;

 lookup:
+	migrate_disable();
 	/* First, try to find an unbuddied z3fold page. */
-	unbuddied = get_cpu_ptr(pool->unbuddied);
+	unbuddied = this_cpu_ptr(pool->unbuddied);
 	for_each_unbuddied_list(i, chunks) {
 		struct list_head *l = &unbuddied[i];

@@ -905,7 +908,7 @@ static inline struct z3fold_header *__z3
 		    !z3fold_page_trylock(zhdr)) {
 			spin_unlock(&pool->lock);
 			zhdr = NULL;
-			put_cpu_ptr(pool->unbuddied);
+			migrate_enable();
 			if (can_sleep)
 				cond_resched();
 			goto lookup;
@@ -919,7 +922,7 @@ static inline struct z3fold_header *__z3
 		    test_bit(PAGE_CLAIMED, &page->private)) {
 			z3fold_page_unlock(zhdr);
 			zhdr = NULL;
-			put_cpu_ptr(pool->unbuddied);
+			migrate_enable();
 			if (can_sleep)
 				cond_resched();
 			goto lookup;
@@ -934,7 +937,7 @@ static inline struct z3fold_header *__z3
 		kref_get(&zhdr->refcount);
 		break;
 	}
-	put_cpu_ptr(pool->unbuddied);
+	migrate_enable();

 	if (!zhdr) {
 		int cpu;



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-29  6:41       ` Mike Galbraith
  (?)
@ 2020-11-29  7:48       ` Mike Galbraith
  2020-11-29  9:21           ` Mike Galbraith
  -1 siblings, 1 reply; 70+ messages in thread
From: Mike Galbraith @ 2020-11-29  7:48 UTC (permalink / raw)
  To: Oleksandr Natalenko, linux-kernel
  Cc: linux-mm, Andrew Morton, Sebastian Andrzej Siewior,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Sun, 2020-11-29 at 07:41 +0100, Mike Galbraith wrote:
> On Sat, 2020-11-28 at 15:27 +0100, Oleksandr Natalenko wrote:
> >
> > > > Shouldn't the list manipulation be protected with
> > > > local_lock+this_cpu_ptr instead of get_cpu_ptr+spin_lock?
> >
> > Totally untested:
>
> Hrm, the thing doesn't seem to care deeply about preemption being
> disabled, so adding another lock may be overkill.  It looks like you
> could get the job done via migrate_disable()+this_cpu_ptr().

There is however an ever so tiny chance that I'm wrong about that :)

crash.rt> bt -s
PID: 6699   TASK: ffff913c464b5640  CPU: 0   COMMAND: "oom01"
 #0 [ffffb6b94adff6f0] machine_kexec+366 at ffffffffbd05f87e
 #1 [ffffb6b94adff738] __crash_kexec+210 at ffffffffbd14c052
 #2 [ffffb6b94adff7f8] crash_kexec+48 at ffffffffbd14d240
 #3 [ffffb6b94adff808] oops_end+202 at ffffffffbd02680a
 #4 [ffffb6b94adff828] no_context+333 at ffffffffbd06d7ed
 #5 [ffffb6b94adff888] exc_page_fault+696 at ffffffffbd8c0b68
 #6 [ffffb6b94adff8e0] asm_exc_page_fault+30 at ffffffffbda00ace
 #7 [ffffb6b94adff968] mark_wakeup_next_waiter+81 at ffffffffbd0ea1e1
 #8 [ffffb6b94adff9c8] rt_mutex_futex_unlock+79 at ffffffffbd8cc3cf
 #9 [ffffb6b94adffa08] z3fold_zpool_free+1319 at ffffffffbd2b6b17
#10 [ffffb6b94adffa68] zswap_free_entry+67 at ffffffffbd27c6f3
#11 [ffffb6b94adffa78] zswap_frontswap_invalidate_page+138 at ffffffffbd27c7fa
#12 [ffffb6b94adffaa0] __frontswap_invalidate_page+72 at ffffffffbd27bee8
#13 [ffffb6b94adffac8] swapcache_free_entries+494 at ffffffffbd276e1e
#14 [ffffb6b94adffb10] free_swap_slot+173 at ffffffffbd27b7dd
#15 [ffffb6b94adffb30] __swap_entry_free+112 at ffffffffbd2768d0
#16 [ffffb6b94adffb58] free_swap_and_cache+57 at ffffffffbd278939
#17 [ffffb6b94adffb80] unmap_page_range+1485 at ffffffffbd24c52d
#18 [ffffb6b94adffc40] __oom_reap_task_mm+178 at ffffffffbd218f02
#19 [ffffb6b94adffd10] exit_mmap+339 at ffffffffbd257da3
#20 [ffffb6b94adffdb0] mmput+78 at ffffffffbd07fe7e
#21 [ffffb6b94adffdc0] do_exit+822 at ffffffffbd089bc6
#22 [ffffb6b94adffe28] do_group_exit+71 at ffffffffbd08a547
#23 [ffffb6b94adffe50] get_signal+319 at ffffffffbd0979ff
#24 [ffffb6b94adffe98] arch_do_signal+30 at ffffffffbd022cbe
#25 [ffffb6b94adfff28] exit_to_user_mode_prepare+293 at ffffffffbd1223e5
#26 [ffffb6b94adfff48] irqentry_exit_to_user_mode+5 at ffffffffbd8c1675
#27 [ffffb6b94adfff50] asm_exc_page_fault+30 at ffffffffbda00ace
    RIP: 0000000000414300  RSP: 00007f5ddf065ec0  RFLAGS: 00010206
    RAX: 0000000000001000  RBX: 00000000c0000000  RCX: 00000000adf28000
    RDX: 00007f5d0bf8d000  RSI: 00000000c0000000  RDI: 0000000000000000
    RBP: 00007f5c5e065000   R8: ffffffffffffffff   R9: 0000000000000000
    R10: 0000000000000022  R11: 0000000000000246  R12: 0000000000001000
    R13: 0000000000000001  R14: 0000000000000001  R15: 00007ffc953ebcd0
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b
crash.rt>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-29  7:48       ` Mike Galbraith
@ 2020-11-29  9:21           ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-29  9:21 UTC (permalink / raw)
  To: Oleksandr Natalenko, linux-kernel
  Cc: linux-mm, Andrew Morton, Sebastian Andrzej Siewior,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Sun, 2020-11-29 at 08:48 +0100, Mike Galbraith wrote:
> On Sun, 2020-11-29 at 07:41 +0100, Mike Galbraith wrote:
> > On Sat, 2020-11-28 at 15:27 +0100, Oleksandr Natalenko wrote:
> > >
> > > > > Shouldn't the list manipulation be protected with
> > > > > local_lock+this_cpu_ptr instead of get_cpu_ptr+spin_lock?
> > >
> > > Totally untested:
> >
> > Hrm, the thing doesn't seem to care deeply about preemption being
> > disabled, so adding another lock may be overkill.  It looks like you
> > could get the job done via migrate_disable()+this_cpu_ptr().
>
> There is however an ever so tiny chance that I'm wrong about that :)

Or not, your local_lock+this_cpu_ptr version exploded too.

Perhaps there's a bit of non-rt related racy racy going on in zswap
thingy that makes swap an even less wonderful idea for RT than usual.

crash.rt> bt -s
PID: 32399  TASK: ffff8e4528cd8000  CPU: 4   COMMAND: "cc1"
 #0 [ffff9f0f1228f488] machine_kexec+366 at ffffffff8c05f87e
 #1 [ffff9f0f1228f4d0] __crash_kexec+210 at ffffffff8c14c052
 #2 [ffff9f0f1228f590] crash_kexec+48 at ffffffff8c14d240
 #3 [ffff9f0f1228f5a0] oops_end+202 at ffffffff8c02680a
 #4 [ffff9f0f1228f5c0] exc_general_protection+403 at ffffffff8c8be413
 #5 [ffff9f0f1228f650] asm_exc_general_protection+30 at ffffffff8ca00a0e
 #6 [ffff9f0f1228f6d8] __z3fold_alloc+118 at ffffffff8c2b4ea6
 #7 [ffff9f0f1228f760] z3fold_zpool_malloc+115 at ffffffff8c2b56c3
 #8 [ffff9f0f1228f7c8] zswap_frontswap_store+789 at ffffffff8c27d335
 #9 [ffff9f0f1228f828] __frontswap_store+110 at ffffffff8c27bafe
#10 [ffff9f0f1228f858] swap_writepage+55 at ffffffff8c273b17
#11 [ffff9f0f1228f870] shmem_writepage+612 at ffffffff8c232964
#12 [ffff9f0f1228f8a8] pageout+210 at ffffffff8c225f12
#13 [ffff9f0f1228f928] shrink_page_list+2428 at ffffffff8c22744c
#14 [ffff9f0f1228f9c0] shrink_inactive_list+534 at ffffffff8c229746
#15 [ffff9f0f1228fa68] shrink_lruvec+927 at ffffffff8c22a35f
#16 [ffff9f0f1228fb78] shrink_node+567 at ffffffff8c22a7d7
#17 [ffff9f0f1228fbf8] do_try_to_free_pages+185 at ffffffff8c22ad39
#18 [ffff9f0f1228fc40] try_to_free_pages+201 at ffffffff8c22c239
#19 [ffff9f0f1228fcd0] __alloc_pages_slowpath.constprop.111+1056 at ffffffff8c26eb70
#20 [ffff9f0f1228fda8] __alloc_pages_nodemask+786 at ffffffff8c26f7e2
#21 [ffff9f0f1228fe00] alloc_pages_vma+309 at ffffffff8c288f15
#22 [ffff9f0f1228fe40] handle_mm_fault+1687 at ffffffff8c24ee97
#23 [ffff9f0f1228fef8] exc_page_fault+821 at ffffffff8c8c1be5
#24 [ffff9f0f1228ff50] asm_exc_page_fault+30 at ffffffff8ca00ace
    RIP: 00000000010fea3b  RSP: 00007ffc88ad5a50  RFLAGS: 00010206
    RAX: 00007f4a548d1638  RBX: 00007f4a5c0c5000  RCX: 0000000000000000
    RDX: 0000000000020000  RSI: 000000000000000f  RDI: 0000000000018001
    RBP: 00007f4a5547a000   R8: 0000000000018000   R9: 0000000000240000
    R10: 00007f4a5523a000  R11: 0000000000098628  R12: 00007f4a548d1638
    R13: 00007f4a54ab1478  R14: 000000005002ac29  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b
crash.rt>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-11-29  9:21           ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-29  9:21 UTC (permalink / raw)
  To: Oleksandr Natalenko, linux-kernel
  Cc: linux-mm, Andrew Morton, Sebastian Andrzej Siewior,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Sun, 2020-11-29 at 08:48 +0100, Mike Galbraith wrote:
> On Sun, 2020-11-29 at 07:41 +0100, Mike Galbraith wrote:
> > On Sat, 2020-11-28 at 15:27 +0100, Oleksandr Natalenko wrote:
> > >
> > > > > Shouldn't the list manipulation be protected with
> > > > > local_lock+this_cpu_ptr instead of get_cpu_ptr+spin_lock?
> > >
> > > Totally untested:
> >
> > Hrm, the thing doesn't seem to care deeply about preemption being
> > disabled, so adding another lock may be overkill.  It looks like you
> > could get the job done via migrate_disable()+this_cpu_ptr().
>
> There is however an ever so tiny chance that I'm wrong about that :)

Or not, your local_lock+this_cpu_ptr version exploded too.

Perhaps there's a bit of non-rt related racy racy going on in zswap
thingy that makes swap an even less wonderful idea for RT than usual.

crash.rt> bt -s
PID: 32399  TASK: ffff8e4528cd8000  CPU: 4   COMMAND: "cc1"
 #0 [ffff9f0f1228f488] machine_kexec+366 at ffffffff8c05f87e
 #1 [ffff9f0f1228f4d0] __crash_kexec+210 at ffffffff8c14c052
 #2 [ffff9f0f1228f590] crash_kexec+48 at ffffffff8c14d240
 #3 [ffff9f0f1228f5a0] oops_end+202 at ffffffff8c02680a
 #4 [ffff9f0f1228f5c0] exc_general_protection+403 at ffffffff8c8be413
 #5 [ffff9f0f1228f650] asm_exc_general_protection+30 at ffffffff8ca00a0e
 #6 [ffff9f0f1228f6d8] __z3fold_alloc+118 at ffffffff8c2b4ea6
 #7 [ffff9f0f1228f760] z3fold_zpool_malloc+115 at ffffffff8c2b56c3
 #8 [ffff9f0f1228f7c8] zswap_frontswap_store+789 at ffffffff8c27d335
 #9 [ffff9f0f1228f828] __frontswap_store+110 at ffffffff8c27bafe
#10 [ffff9f0f1228f858] swap_writepage+55 at ffffffff8c273b17
#11 [ffff9f0f1228f870] shmem_writepage+612 at ffffffff8c232964
#12 [ffff9f0f1228f8a8] pageout+210 at ffffffff8c225f12
#13 [ffff9f0f1228f928] shrink_page_list+2428 at ffffffff8c22744c
#14 [ffff9f0f1228f9c0] shrink_inactive_list+534 at ffffffff8c229746
#15 [ffff9f0f1228fa68] shrink_lruvec+927 at ffffffff8c22a35f
#16 [ffff9f0f1228fb78] shrink_node+567 at ffffffff8c22a7d7
#17 [ffff9f0f1228fbf8] do_try_to_free_pages+185 at ffffffff8c22ad39
#18 [ffff9f0f1228fc40] try_to_free_pages+201 at ffffffff8c22c239
#19 [ffff9f0f1228fcd0] __alloc_pages_slowpath.constprop.111+1056 at ffffffff8c26eb70
#20 [ffff9f0f1228fda8] __alloc_pages_nodemask+786 at ffffffff8c26f7e2
#21 [ffff9f0f1228fe00] alloc_pages_vma+309 at ffffffff8c288f15
#22 [ffff9f0f1228fe40] handle_mm_fault+1687 at ffffffff8c24ee97
#23 [ffff9f0f1228fef8] exc_page_fault+821 at ffffffff8c8c1be5
#24 [ffff9f0f1228ff50] asm_exc_page_fault+30 at ffffffff8ca00ace
    RIP: 00000000010fea3b  RSP: 00007ffc88ad5a50  RFLAGS: 00010206
    RAX: 00007f4a548d1638  RBX: 00007f4a5c0c5000  RCX: 0000000000000000
    RDX: 0000000000020000  RSI: 000000000000000f  RDI: 0000000000018001
    RBP: 00007f4a5547a000   R8: 0000000000018000   R9: 0000000000240000
    R10: 00007f4a5523a000  R11: 0000000000098628  R12: 00007f4a548d1638
    R13: 00007f4a54ab1478  R14: 000000005002ac29  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b
crash.rt>




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-29  9:21           ` Mike Galbraith
@ 2020-11-29 10:56             ` Mike Galbraith
  -1 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-29 10:56 UTC (permalink / raw)
  To: Oleksandr Natalenko, linux-kernel
  Cc: linux-mm, Andrew Morton, Sebastian Andrzej Siewior,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Sun, 2020-11-29 at 10:21 +0100, Mike Galbraith wrote:
> On Sun, 2020-11-29 at 08:48 +0100, Mike Galbraith wrote:
> > On Sun, 2020-11-29 at 07:41 +0100, Mike Galbraith wrote:
> > > On Sat, 2020-11-28 at 15:27 +0100, Oleksandr Natalenko wrote:
> > > >
> > > > > > Shouldn't the list manipulation be protected with
> > > > > > local_lock+this_cpu_ptr instead of get_cpu_ptr+spin_lock?
> > > >
> > > > Totally untested:
> > >
> > > Hrm, the thing doesn't seem to care deeply about preemption being
> > > disabled, so adding another lock may be overkill.  It looks like you
> > > could get the job done via migrate_disable()+this_cpu_ptr().
> >
> > There is however an ever so tiny chance that I'm wrong about that :)
>
> Or not, your local_lock+this_cpu_ptr version exploded too.
>
> Perhaps there's a bit of non-rt related racy racy going on in zswap
> thingy that makes swap an even less wonderful idea for RT than usual.

Raciness seems to be restricted to pool compressor.  "zbud" seems to be
solid, virgin "zsmalloc" explodes, as does "z3fold" regardless which of
us puts his grubby fingerprints on it.

Exploding compressors survived zero runs of runltp -f mm, I declared
zbud to be at least kinda sorta stable after box survived five runs.

	-Mike


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-11-29 10:56             ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-29 10:56 UTC (permalink / raw)
  To: Oleksandr Natalenko, linux-kernel
  Cc: linux-mm, Andrew Morton, Sebastian Andrzej Siewior,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Sun, 2020-11-29 at 10:21 +0100, Mike Galbraith wrote:
> On Sun, 2020-11-29 at 08:48 +0100, Mike Galbraith wrote:
> > On Sun, 2020-11-29 at 07:41 +0100, Mike Galbraith wrote:
> > > On Sat, 2020-11-28 at 15:27 +0100, Oleksandr Natalenko wrote:
> > > >
> > > > > > Shouldn't the list manipulation be protected with
> > > > > > local_lock+this_cpu_ptr instead of get_cpu_ptr+spin_lock?
> > > >
> > > > Totally untested:
> > >
> > > Hrm, the thing doesn't seem to care deeply about preemption being
> > > disabled, so adding another lock may be overkill.  It looks like you
> > > could get the job done via migrate_disable()+this_cpu_ptr().
> >
> > There is however an ever so tiny chance that I'm wrong about that :)
>
> Or not, your local_lock+this_cpu_ptr version exploded too.
>
> Perhaps there's a bit of non-rt related racy racy going on in zswap
> thingy that makes swap an even less wonderful idea for RT than usual.

Raciness seems to be restricted to pool compressor.  "zbud" seems to be
solid, virgin "zsmalloc" explodes, as does "z3fold" regardless which of
us puts his grubby fingerprints on it.

Exploding compressors survived zero runs of runltp -f mm, I declared
zbud to be at least kinda sorta stable after box survived five runs.

	-Mike



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-29 10:56             ` Mike Galbraith
  (?)
@ 2020-11-29 11:29             ` Oleksandr Natalenko
  2020-11-29 11:41                 ` Mike Galbraith
  -1 siblings, 1 reply; 70+ messages in thread
From: Oleksandr Natalenko @ 2020-11-29 11:29 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: linux-kernel, linux-mm, Andrew Morton, Sebastian Andrzej Siewior,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Sun, Nov 29, 2020 at 11:56:55AM +0100, Mike Galbraith wrote:
> On Sun, 2020-11-29 at 10:21 +0100, Mike Galbraith wrote:
> > On Sun, 2020-11-29 at 08:48 +0100, Mike Galbraith wrote:
> > > On Sun, 2020-11-29 at 07:41 +0100, Mike Galbraith wrote:
> > > > On Sat, 2020-11-28 at 15:27 +0100, Oleksandr Natalenko wrote:
> > > > >
> > > > > > > Shouldn't the list manipulation be protected with
> > > > > > > local_lock+this_cpu_ptr instead of get_cpu_ptr+spin_lock?
> > > > >
> > > > > Totally untested:
> > > >
> > > > Hrm, the thing doesn't seem to care deeply about preemption being
> > > > disabled, so adding another lock may be overkill.  It looks like you
> > > > could get the job done via migrate_disable()+this_cpu_ptr().
> > >
> > > There is however an ever so tiny chance that I'm wrong about that :)
> >
> > Or not, your local_lock+this_cpu_ptr version exploded too.
> >
> > Perhaps there's a bit of non-rt related racy racy going on in zswap
> > thingy that makes swap an even less wonderful idea for RT than usual.
> 
> Raciness seems to be restricted to pool compressor.  "zbud" seems to be
> solid, virgin "zsmalloc" explodes, as does "z3fold" regardless which of
> us puts his grubby fingerprints on it.
> 
> Exploding compressors survived zero runs of runltp -f mm, I declared
> zbud to be at least kinda sorta stable after box survived five runs.

Ummm so do compressors explode under non-rt kernel in your tests as
well, or it is just -rt that triggers this?

I've never seen that on both -rt and non-rt, thus asking.

-- 
  Oleksandr Natalenko (post-factum)

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-29 11:29             ` Oleksandr Natalenko
@ 2020-11-29 11:41                 ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-29 11:41 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: linux-kernel, linux-mm, Andrew Morton, Sebastian Andrzej Siewior,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Sun, 2020-11-29 at 12:29 +0100, Oleksandr Natalenko wrote:
>
> Ummm so do compressors explode under non-rt kernel in your tests as
> well, or it is just -rt that triggers this?

I only tested a non-rt kernel with z3fold, which worked just fine.

	-Mike


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-11-29 11:41                 ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-29 11:41 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: linux-kernel, linux-mm, Andrew Morton, Sebastian Andrzej Siewior,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Sun, 2020-11-29 at 12:29 +0100, Oleksandr Natalenko wrote:
>
> Ummm so do compressors explode under non-rt kernel in your tests as
> well, or it is just -rt that triggers this?

I only tested a non-rt kernel with z3fold, which worked just fine.

	-Mike



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-29 11:41                 ` Mike Galbraith
  (?)
@ 2020-11-30 13:20                 ` Sebastian Andrzej Siewior
  2020-11-30 13:53                   ` Oleksandr Natalenko
  2020-11-30 14:42                     ` Mike Galbraith
  -1 siblings, 2 replies; 70+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-11-30 13:20 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On 2020-11-29 12:41:14 [+0100], Mike Galbraith wrote:
> On Sun, 2020-11-29 at 12:29 +0100, Oleksandr Natalenko wrote:
> >
> > Ummm so do compressors explode under non-rt kernel in your tests as
> > well, or it is just -rt that triggers this?
> 
> I only tested a non-rt kernel with z3fold, which worked just fine.

I tried this and it did not not explode yet. Mike, can you please
confirm?

diff --git a/mm/z3fold.c b/mm/z3fold.c
index 18feaa0bc5377..0bf70f624a4bd 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -642,14 +642,17 @@ static inline void add_to_unbuddied(struct z3fold_pool *pool,
 {
 	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 ||
 			zhdr->middle_chunks == 0) {
-		struct list_head *unbuddied = get_cpu_ptr(pool->unbuddied);
-
+		struct list_head *unbuddied;
 		int freechunks = num_free_chunks(zhdr);
+
+		migrate_disable();
+		unbuddied = this_cpu_ptr(pool->unbuddied);
+
 		spin_lock(&pool->lock);
 		list_add(&zhdr->buddy, &unbuddied[freechunks]);
 		spin_unlock(&pool->lock);
 		zhdr->cpu = smp_processor_id();
-		put_cpu_ptr(pool->unbuddied);
+		migrate_enable();
 	}
 }
 
@@ -887,7 +890,8 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
 
 lookup:
 	/* First, try to find an unbuddied z3fold page. */
-	unbuddied = get_cpu_ptr(pool->unbuddied);
+	migrate_disable();
+	unbuddied = this_cpu_ptr(pool->unbuddied);
 	for_each_unbuddied_list(i, chunks) {
 		struct list_head *l = &unbuddied[i];
 
@@ -905,7 +909,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
 		    !z3fold_page_trylock(zhdr)) {
 			spin_unlock(&pool->lock);
 			zhdr = NULL;
-			put_cpu_ptr(pool->unbuddied);
+			migrate_enable();
 			if (can_sleep)
 				cond_resched();
 			goto lookup;
@@ -919,7 +923,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
 		    test_bit(PAGE_CLAIMED, &page->private)) {
 			z3fold_page_unlock(zhdr);
 			zhdr = NULL;
-			put_cpu_ptr(pool->unbuddied);
+			migrate_enable();
 			if (can_sleep)
 				cond_resched();
 			goto lookup;
@@ -934,7 +938,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
 		kref_get(&zhdr->refcount);
 		break;
 	}
-	put_cpu_ptr(pool->unbuddied);
+	migrate_enable();
 
 	if (!zhdr) {
 		int cpu;
diff --git a/mm/zswap.c b/mm/zswap.c
index 78a20f7b00f2c..b24f761b9241c 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -394,7 +394,9 @@ struct zswap_comp {
 	u8 *dstmem;
 };
 
-static DEFINE_PER_CPU(struct zswap_comp, zswap_comp);
+static DEFINE_PER_CPU(struct zswap_comp, zswap_comp) = {
+	.lock = INIT_LOCAL_LOCK(lock),
+};
 
 static int zswap_dstmem_prepare(unsigned int cpu)
 {

> 	-Mike

Sebastian

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-30 13:20                 ` Sebastian Andrzej Siewior
@ 2020-11-30 13:53                   ` Oleksandr Natalenko
  2020-11-30 14:28                     ` Sebastian Andrzej Siewior
  2020-11-30 14:42                     ` Mike Galbraith
  1 sibling, 1 reply; 70+ messages in thread
From: Oleksandr Natalenko @ 2020-11-30 13:53 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Mike Galbraith, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, Nov 30, 2020 at 02:20:14PM +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-29 12:41:14 [+0100], Mike Galbraith wrote:
> > On Sun, 2020-11-29 at 12:29 +0100, Oleksandr Natalenko wrote:
> > >
> > > Ummm so do compressors explode under non-rt kernel in your tests as
> > > well, or it is just -rt that triggers this?
> > 
> > I only tested a non-rt kernel with z3fold, which worked just fine.
> 
> I tried this and it did not not explode yet. Mike, can you please
> confirm?
> 
> diff --git a/mm/z3fold.c b/mm/z3fold.c
> index 18feaa0bc5377..0bf70f624a4bd 100644
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -642,14 +642,17 @@ static inline void add_to_unbuddied(struct z3fold_pool *pool,
>  {
>  	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 ||
>  			zhdr->middle_chunks == 0) {
> -		struct list_head *unbuddied = get_cpu_ptr(pool->unbuddied);
> -
> +		struct list_head *unbuddied;
>  		int freechunks = num_free_chunks(zhdr);
> +
> +		migrate_disable();
> +		unbuddied = this_cpu_ptr(pool->unbuddied);
> +
>  		spin_lock(&pool->lock);
>  		list_add(&zhdr->buddy, &unbuddied[freechunks]);
>  		spin_unlock(&pool->lock);
>  		zhdr->cpu = smp_processor_id();
> -		put_cpu_ptr(pool->unbuddied);
> +		migrate_enable();
>  	}
>  }
>  
> @@ -887,7 +890,8 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
>  
>  lookup:
>  	/* First, try to find an unbuddied z3fold page. */
> -	unbuddied = get_cpu_ptr(pool->unbuddied);
> +	migrate_disable();
> +	unbuddied = this_cpu_ptr(pool->unbuddied);
>  	for_each_unbuddied_list(i, chunks) {
>  		struct list_head *l = &unbuddied[i];
>  
> @@ -905,7 +909,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
>  		    !z3fold_page_trylock(zhdr)) {
>  			spin_unlock(&pool->lock);
>  			zhdr = NULL;
> -			put_cpu_ptr(pool->unbuddied);
> +			migrate_enable();
>  			if (can_sleep)
>  				cond_resched();
>  			goto lookup;
> @@ -919,7 +923,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
>  		    test_bit(PAGE_CLAIMED, &page->private)) {
>  			z3fold_page_unlock(zhdr);
>  			zhdr = NULL;
> -			put_cpu_ptr(pool->unbuddied);
> +			migrate_enable();
>  			if (can_sleep)
>  				cond_resched();
>  			goto lookup;
> @@ -934,7 +938,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
>  		kref_get(&zhdr->refcount);
>  		break;
>  	}
> -	put_cpu_ptr(pool->unbuddied);
> +	migrate_enable();
>  
>  	if (!zhdr) {
>  		int cpu;
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 78a20f7b00f2c..b24f761b9241c 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -394,7 +394,9 @@ struct zswap_comp {
>  	u8 *dstmem;
>  };
>  
> -static DEFINE_PER_CPU(struct zswap_comp, zswap_comp);
> +static DEFINE_PER_CPU(struct zswap_comp, zswap_comp) = {
> +	.lock = INIT_LOCAL_LOCK(lock),

Is it a residue?

> +};
>  
>  static int zswap_dstmem_prepare(unsigned int cpu)
>  {
> 
> > 	-Mike
> 
> Sebastian

-- 
  Oleksandr Natalenko (post-factum)

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-30 13:53                   ` Oleksandr Natalenko
@ 2020-11-30 14:28                     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 70+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-11-30 14:28 UTC (permalink / raw)
  To: Oleksandr Natalenko
  Cc: Mike Galbraith, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On 2020-11-30 14:53:22 [+0100], Oleksandr Natalenko wrote:
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 78a20f7b00f2c..b24f761b9241c 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -394,7 +394,9 @@ struct zswap_comp {
> >  	u8 *dstmem;
> >  };
> >  
> > -static DEFINE_PER_CPU(struct zswap_comp, zswap_comp);
> > +static DEFINE_PER_CPU(struct zswap_comp, zswap_comp) = {
> > +	.lock = INIT_LOCAL_LOCK(lock),
> 
> Is it a residue?

It as been added on purpose.

Sebastian

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-30 13:20                 ` Sebastian Andrzej Siewior
@ 2020-11-30 14:42                     ` Mike Galbraith
  2020-11-30 14:42                     ` Mike Galbraith
  1 sibling, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-30 14:42 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-11-30 at 14:20 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-29 12:41:14 [+0100], Mike Galbraith wrote:
> > On Sun, 2020-11-29 at 12:29 +0100, Oleksandr Natalenko wrote:
> > >
> > > Ummm so do compressors explode under non-rt kernel in your tests as
> > > well, or it is just -rt that triggers this?
> >
> > I only tested a non-rt kernel with z3fold, which worked just fine.
>
> I tried this and it did not not explode yet. Mike, can you please
> confirm?

This explodes in write_unlock() as mine did.   Oleksandr's local_lock()
variant explodes in the lock he added.  (ew, corruption)

I think I'll try a stable-rt tree.  This master tree _should_ be fine
given it seems to work just peachy for everything else, but my box is
the only one making boom... and also NOT making boom with the zbud
compressor.  Hrmph.

crash.rt> bt -sx
PID: 27961  TASK: ffff8f6ad5344500  CPU: 4   COMMAND: "oom01"
 #0 [ffffa439d4747708] machine_kexec+0x16e at ffffffffbb05eace
 #1 [ffffa439d4747750] __crash_kexec+0x5a at ffffffffbb145e5a
 #2 [ffffa439d4747810] crash_kexec+0x24 at ffffffffbb147004
 #3 [ffffa439d4747818] oops_end+0x93 at ffffffffbb025c03
 #4 [ffffa439d4747838] exc_page_fault+0x6b at ffffffffbb9011bb
 #5 [ffffa439d4747860] asm_exc_page_fault+0x1e at ffffffffbba00ace
 #6 [ffffa439d47478e8] mark_wakeup_next_waiter+0x51 at ffffffffbb0e6781
 #7 [ffffa439d4747950] rt_mutex_futex_unlock+0x50 at ffffffffbb90bae0
 #8 [ffffa439d4747990] z3fold_free+0x4a8 at ffffffffbb2bf8a8
 #9 [ffffa439d47479f0] zswap_free_entry+0x82 at ffffffffbb285dd2
#10 [ffffa439d4747a08] zswap_frontswap_invalidate_page+0x8c at ffffffffbb285edc
#11 [ffffa439d4747a30] __frontswap_invalidate_page+0x4e at ffffffffbb28548e
#12 [ffffa439d4747a58] swap_range_free.constprop.0+0x9e at ffffffffbb27fd4e
#13 [ffffa439d4747a78] swapcache_free_entries+0x10d at ffffffffbb28101d
#14 [ffffa439d4747ac0] free_swap_slot+0xac at ffffffffbb284d8c
#15 [ffffa439d4747ae0] __swap_entry_free+0x8f at ffffffffbb2808ff
#16 [ffffa439d4747b08] free_swap_and_cache+0x3b at ffffffffbb2829db
#17 [ffffa439d4747b38] zap_pte_range+0x164 at ffffffffbb258004
#18 [ffffa439d4747bc0] unmap_page_range+0x1dd at ffffffffbb258b6d
#19 [ffffa439d4747c38] __oom_reap_task_mm+0xd5 at ffffffffbb2235c5
#20 [ffffa439d4747d08] exit_mmap+0x154 at ffffffffbb264084
#21 [ffffa439d4747da0] mmput+0x4e at ffffffffbb07e66e
#22 [ffffa439d4747db0] exit_mm+0x172 at ffffffffbb088372
#23 [ffffa439d4747df0] do_exit+0x1a8 at ffffffffbb088588
#24 [ffffa439d4747e20] do_group_exit+0x39 at ffffffffbb0888a9
#25 [ffffa439d4747e48] get_signal+0x155 at ffffffffbb096ef5
#26 [ffffa439d4747e98] arch_do_signal+0x1a at ffffffffbb0224ba
#27 [ffffa439d4747f18] exit_to_user_mode_loop+0xc7 at ffffffffbb11c037
#28 [ffffa439d4747f38] exit_to_user_mode_prepare+0x6a at ffffffffbb11c12a
#29 [ffffa439d4747f48] irqentry_exit_to_user_mode+0x5 at ffffffffbb901ae5
#30 [ffffa439d4747f50] asm_exc_page_fault+0x1e at ffffffffbba00ace
    RIP: 0000000000414300  RSP: 00007f193033bec0  RFLAGS: 00010206
    RAX: 0000000000001000  RBX: 00000000c0000000  RCX: 00000000a6e6a000
    RDX: 00007f19159a4000  RSI: 00000000c0000000  RDI: 0000000000000000
    RBP: 00007f186eb3a000   R8: ffffffffffffffff   R9: 0000000000000000
    R10: 0000000000000022  R11: 0000000000000246  R12: 0000000000001000
    R13: 0000000000000001  R14: 0000000000000001  R15: 00007ffd0ffb96d0
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b

>


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-11-30 14:42                     ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-30 14:42 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-11-30 at 14:20 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-29 12:41:14 [+0100], Mike Galbraith wrote:
> > On Sun, 2020-11-29 at 12:29 +0100, Oleksandr Natalenko wrote:
> > >
> > > Ummm so do compressors explode under non-rt kernel in your tests as
> > > well, or it is just -rt that triggers this?
> >
> > I only tested a non-rt kernel with z3fold, which worked just fine.
>
> I tried this and it did not not explode yet. Mike, can you please
> confirm?

This explodes in write_unlock() as mine did.   Oleksandr's local_lock()
variant explodes in the lock he added.  (ew, corruption)

I think I'll try a stable-rt tree.  This master tree _should_ be fine
given it seems to work just peachy for everything else, but my box is
the only one making boom... and also NOT making boom with the zbud
compressor.  Hrmph.

crash.rt> bt -sx
PID: 27961  TASK: ffff8f6ad5344500  CPU: 4   COMMAND: "oom01"
 #0 [ffffa439d4747708] machine_kexec+0x16e at ffffffffbb05eace
 #1 [ffffa439d4747750] __crash_kexec+0x5a at ffffffffbb145e5a
 #2 [ffffa439d4747810] crash_kexec+0x24 at ffffffffbb147004
 #3 [ffffa439d4747818] oops_end+0x93 at ffffffffbb025c03
 #4 [ffffa439d4747838] exc_page_fault+0x6b at ffffffffbb9011bb
 #5 [ffffa439d4747860] asm_exc_page_fault+0x1e at ffffffffbba00ace
 #6 [ffffa439d47478e8] mark_wakeup_next_waiter+0x51 at ffffffffbb0e6781
 #7 [ffffa439d4747950] rt_mutex_futex_unlock+0x50 at ffffffffbb90bae0
 #8 [ffffa439d4747990] z3fold_free+0x4a8 at ffffffffbb2bf8a8
 #9 [ffffa439d47479f0] zswap_free_entry+0x82 at ffffffffbb285dd2
#10 [ffffa439d4747a08] zswap_frontswap_invalidate_page+0x8c at ffffffffbb285edc
#11 [ffffa439d4747a30] __frontswap_invalidate_page+0x4e at ffffffffbb28548e
#12 [ffffa439d4747a58] swap_range_free.constprop.0+0x9e at ffffffffbb27fd4e
#13 [ffffa439d4747a78] swapcache_free_entries+0x10d at ffffffffbb28101d
#14 [ffffa439d4747ac0] free_swap_slot+0xac at ffffffffbb284d8c
#15 [ffffa439d4747ae0] __swap_entry_free+0x8f at ffffffffbb2808ff
#16 [ffffa439d4747b08] free_swap_and_cache+0x3b at ffffffffbb2829db
#17 [ffffa439d4747b38] zap_pte_range+0x164 at ffffffffbb258004
#18 [ffffa439d4747bc0] unmap_page_range+0x1dd at ffffffffbb258b6d
#19 [ffffa439d4747c38] __oom_reap_task_mm+0xd5 at ffffffffbb2235c5
#20 [ffffa439d4747d08] exit_mmap+0x154 at ffffffffbb264084
#21 [ffffa439d4747da0] mmput+0x4e at ffffffffbb07e66e
#22 [ffffa439d4747db0] exit_mm+0x172 at ffffffffbb088372
#23 [ffffa439d4747df0] do_exit+0x1a8 at ffffffffbb088588
#24 [ffffa439d4747e20] do_group_exit+0x39 at ffffffffbb0888a9
#25 [ffffa439d4747e48] get_signal+0x155 at ffffffffbb096ef5
#26 [ffffa439d4747e98] arch_do_signal+0x1a at ffffffffbb0224ba
#27 [ffffa439d4747f18] exit_to_user_mode_loop+0xc7 at ffffffffbb11c037
#28 [ffffa439d4747f38] exit_to_user_mode_prepare+0x6a at ffffffffbb11c12a
#29 [ffffa439d4747f48] irqentry_exit_to_user_mode+0x5 at ffffffffbb901ae5
#30 [ffffa439d4747f50] asm_exc_page_fault+0x1e at ffffffffbba00ace
    RIP: 0000000000414300  RSP: 00007f193033bec0  RFLAGS: 00010206
    RAX: 0000000000001000  RBX: 00000000c0000000  RCX: 00000000a6e6a000
    RDX: 00007f19159a4000  RSI: 00000000c0000000  RDI: 0000000000000000
    RBP: 00007f186eb3a000   R8: ffffffffffffffff   R9: 0000000000000000
    R10: 0000000000000022  R11: 0000000000000246  R12: 0000000000001000
    R13: 0000000000000001  R14: 0000000000000001  R15: 00007ffd0ffb96d0
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b

>



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-30 14:42                     ` Mike Galbraith
  (?)
@ 2020-11-30 14:52                     ` Sebastian Andrzej Siewior
  2020-11-30 15:01                         ` Mike Galbraith
  -1 siblings, 1 reply; 70+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-11-30 14:52 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On 2020-11-30 15:42:46 [+0100], Mike Galbraith wrote:
> This explodes in write_unlock() as mine did.   Oleksandr's local_lock()
> variant explodes in the lock he added.  (ew, corruption)
> 
> I think I'll try a stable-rt tree.  This master tree _should_ be fine
> given it seems to work just peachy for everything else, but my box is
> the only one making boom... and also NOT making boom with the zbud
> compressor.  Hrmph.

How do you test this? I triggered a few oom-killer and I have here git
gc running for a few hours now… Everything is fine.

Sebastian

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-30 14:52                     ` Sebastian Andrzej Siewior
@ 2020-11-30 15:01                         ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-30 15:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-11-30 at 15:52 +0100, Sebastian Andrzej Siewior wrote:
> How do you test this? I triggered a few oom-killer and I have here git
> gc running for a few hours now… Everything is fine.

In an LTP install, ./runltp -f mm.  Shortly after box starts swapping
insanely, it explodes quite reliably here with either z3fold or
zsmalloc.. but not with zbud.

	-Mike


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-11-30 15:01                         ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-30 15:01 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-11-30 at 15:52 +0100, Sebastian Andrzej Siewior wrote:
> How do you test this? I triggered a few oom-killer and I have here git
> gc running for a few hours now… Everything is fine.

In an LTP install, ./runltp -f mm.  Shortly after box starts swapping
insanely, it explodes quite reliably here with either z3fold or
zsmalloc.. but not with zbud.

	-Mike



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-30 15:01                         ` Mike Galbraith
@ 2020-11-30 15:03                           ` Mike Galbraith
  -1 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-30 15:03 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 485 bytes --]

On Mon, 2020-11-30 at 16:01 +0100, Mike Galbraith wrote:
> On Mon, 2020-11-30 at 15:52 +0100, Sebastian Andrzej Siewior wrote:
> > How do you test this? I triggered a few oom-killer and I have here git
> > gc running for a few hours now… Everything is fine.
>
> In an LTP install, ./runltp -f mm.  Shortly after box starts swapping
> insanely, it explodes quite reliably here with either z3fold or
> zsmalloc.. but not with zbud.

My config attached in case it's relevant.

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 50792 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-11-30 15:03                           ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-30 15:03 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 485 bytes --]

On Mon, 2020-11-30 at 16:01 +0100, Mike Galbraith wrote:
> On Mon, 2020-11-30 at 15:52 +0100, Sebastian Andrzej Siewior wrote:
> > How do you test this? I triggered a few oom-killer and I have here git
> > gc running for a few hours now… Everything is fine.
>
> In an LTP install, ./runltp -f mm.  Shortly after box starts swapping
> insanely, it explodes quite reliably here with either z3fold or
> zsmalloc.. but not with zbud.

My config attached in case it's relevant.

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 50792 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-30 15:01                         ` Mike Galbraith
  (?)
  (?)
@ 2020-11-30 16:03                         ` Sebastian Andrzej Siewior
  2020-11-30 16:27                             ` Mike Galbraith
  2020-12-02  2:30                             ` Mike Galbraith
  -1 siblings, 2 replies; 70+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-11-30 16:03 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On 2020-11-30 16:01:11 [+0100], Mike Galbraith wrote:
> On Mon, 2020-11-30 at 15:52 +0100, Sebastian Andrzej Siewior wrote:
> > How do you test this? I triggered a few oom-killer and I have here git
> > gc running for a few hours now… Everything is fine.
> 
> In an LTP install, ./runltp -f mm.  Shortly after box starts swapping
> insanely, it explodes quite reliably here with either z3fold or
> zsmalloc.. but not with zbud.

This just passed. It however killed my git-gc task which wasn't done.
Let me try tomorrow with your config.

> 	-Mike

Sebastian

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-30 16:03                         ` Sebastian Andrzej Siewior
@ 2020-11-30 16:27                             ` Mike Galbraith
  2020-12-02  2:30                             ` Mike Galbraith
  1 sibling, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-30 16:27 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-11-30 at 17:03 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-30 16:01:11 [+0100], Mike Galbraith wrote:
> > On Mon, 2020-11-30 at 15:52 +0100, Sebastian Andrzej Siewior wrote:
> > > How do you test this? I triggered a few oom-killer and I have here git
> > > gc running for a few hours now… Everything is fine.
> >
> > In an LTP install, ./runltp -f mm.  Shortly after box starts swapping
> > insanely, it explodes quite reliably here with either z3fold or
> > zsmalloc.. but not with zbud.
>
> This just passed. It however killed my git-gc task which wasn't done.
> Let me try tomorrow with your config.

FYI, I tried 5.9-rt (after fixing 5.9.11), it exploded in the same way,
so (as expected) it's not some devel tree oopsie.

	-Mike


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-11-30 16:27                             ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-30 16:27 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-11-30 at 17:03 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-30 16:01:11 [+0100], Mike Galbraith wrote:
> > On Mon, 2020-11-30 at 15:52 +0100, Sebastian Andrzej Siewior wrote:
> > > How do you test this? I triggered a few oom-killer and I have here git
> > > gc running for a few hours now… Everything is fine.
> >
> > In an LTP install, ./runltp -f mm.  Shortly after box starts swapping
> > insanely, it explodes quite reliably here with either z3fold or
> > zsmalloc.. but not with zbud.
>
> This just passed. It however killed my git-gc task which wasn't done.
> Let me try tomorrow with your config.

FYI, I tried 5.9-rt (after fixing 5.9.11), it exploded in the same way,
so (as expected) it's not some devel tree oopsie.

	-Mike



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-30 16:27                             ` Mike Galbraith
  (?)
@ 2020-11-30 16:32                             ` Sebastian Andrzej Siewior
  2020-11-30 16:36                                 ` Mike Galbraith
  2020-11-30 19:09                                 ` Mike Galbraith
  -1 siblings, 2 replies; 70+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-11-30 16:32 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On 2020-11-30 17:27:17 [+0100], Mike Galbraith wrote:
> > This just passed. It however killed my git-gc task which wasn't done.
> > Let me try tomorrow with your config.
> 
> FYI, I tried 5.9-rt (after fixing 5.9.11), it exploded in the same way,
> so (as expected) it's not some devel tree oopsie.

v5.9 has the same missing local-lock init due to merged local-lock
upstream. I don't remember when this get_cpu_ptr() came it.

> 	-Mike

Sebastian

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-30 16:32                             ` Sebastian Andrzej Siewior
@ 2020-11-30 16:36                                 ` Mike Galbraith
  2020-11-30 19:09                                 ` Mike Galbraith
  1 sibling, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-30 16:36 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-11-30 at 17:32 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-30 17:27:17 [+0100], Mike Galbraith wrote:
> > > This just passed. It however killed my git-gc task which wasn't done.
> > > Let me try tomorrow with your config.
> >
> > FYI, I tried 5.9-rt (after fixing 5.9.11), it exploded in the same way,
> > so (as expected) it's not some devel tree oopsie.
>
> v5.9 has the same missing local-lock init due to merged local-lock
> upstream. I don't remember when this get_cpu_ptr() came it.

This was with your test patch applied.

	-Mike


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-11-30 16:36                                 ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-30 16:36 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-11-30 at 17:32 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-30 17:27:17 [+0100], Mike Galbraith wrote:
> > > This just passed. It however killed my git-gc task which wasn't done.
> > > Let me try tomorrow with your config.
> >
> > FYI, I tried 5.9-rt (after fixing 5.9.11), it exploded in the same way,
> > so (as expected) it's not some devel tree oopsie.
>
> v5.9 has the same missing local-lock init due to merged local-lock
> upstream. I don't remember when this get_cpu_ptr() came it.

This was with your test patch applied.

	-Mike



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-30 16:27                             ` Mike Galbraith
@ 2020-11-30 16:53                               ` Mike Galbraith
  -1 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-30 16:53 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-11-30 at 17:27 +0100, Mike Galbraith wrote:
> On Mon, 2020-11-30 at 17:03 +0100, Sebastian Andrzej Siewior wrote:
> > On 2020-11-30 16:01:11 [+0100], Mike Galbraith wrote:
> > > On Mon, 2020-11-30 at 15:52 +0100, Sebastian Andrzej Siewior wrote:
> > > > How do you test this? I triggered a few oom-killer and I have here git
> > > > gc running for a few hours now… Everything is fine.
> > >
> > > In an LTP install, ./runltp -f mm.  Shortly after box starts swapping
> > > insanely, it explodes quite reliably here with either z3fold or
> > > zsmalloc.. but not with zbud.
> >
> > This just passed. It however killed my git-gc task which wasn't done.
> > Let me try tomorrow with your config.
>
> FYI, I tried 5.9-rt (after fixing 5.9.11), it exploded in the same way,
> so (as expected) it's not some devel tree oopsie.

It reproduces in full distro KVM too.

	-Mike


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-11-30 16:53                               ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-30 16:53 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-11-30 at 17:27 +0100, Mike Galbraith wrote:
> On Mon, 2020-11-30 at 17:03 +0100, Sebastian Andrzej Siewior wrote:
> > On 2020-11-30 16:01:11 [+0100], Mike Galbraith wrote:
> > > On Mon, 2020-11-30 at 15:52 +0100, Sebastian Andrzej Siewior wrote:
> > > > How do you test this? I triggered a few oom-killer and I have here git
> > > > gc running for a few hours now… Everything is fine.
> > >
> > > In an LTP install, ./runltp -f mm.  Shortly after box starts swapping
> > > insanely, it explodes quite reliably here with either z3fold or
> > > zsmalloc.. but not with zbud.
> >
> > This just passed. It however killed my git-gc task which wasn't done.
> > Let me try tomorrow with your config.
>
> FYI, I tried 5.9-rt (after fixing 5.9.11), it exploded in the same way,
> so (as expected) it's not some devel tree oopsie.

It reproduces in full distro KVM too.

	-Mike



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-30 16:32                             ` Sebastian Andrzej Siewior
@ 2020-11-30 19:09                                 ` Mike Galbraith
  2020-11-30 19:09                                 ` Mike Galbraith
  1 sibling, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-30 19:09 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-11-30 at 17:32 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-30 17:27:17 [+0100], Mike Galbraith wrote:
> > > This just passed. It however killed my git-gc task which wasn't done.
> > > Let me try tomorrow with your config.
> >
> > FYI, I tried 5.9-rt (after fixing 5.9.11), it exploded in the same way,
> > so (as expected) it's not some devel tree oopsie.
>
> v5.9 has the same missing local-lock init due to merged local-lock
> upstream. I don't remember when this get_cpu_ptr() came it.

Looks like get_cpu_ptr() arrived in 4.14.

I patched up zswap/z3fold in my old 5.1-rt tree (old forward port of
4.19-rt), turned them on with a script, and it all worked fine, so the
RT zswap allergy lies somewhere upstream of there it seems.

If I were to start merging zswap and friends forward, I'd likely
eventually meet whatever RT is allergic to.  I might try that, but
won't be the least surprised if it turns into a god awful mess of
dependencies.

Hopping forward in rt branches first and patching them up will bracket
the little bugger a lot quicker.

	-Mike


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-11-30 19:09                                 ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-11-30 19:09 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-11-30 at 17:32 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-30 17:27:17 [+0100], Mike Galbraith wrote:
> > > This just passed. It however killed my git-gc task which wasn't done.
> > > Let me try tomorrow with your config.
> >
> > FYI, I tried 5.9-rt (after fixing 5.9.11), it exploded in the same way,
> > so (as expected) it's not some devel tree oopsie.
>
> v5.9 has the same missing local-lock init due to merged local-lock
> upstream. I don't remember when this get_cpu_ptr() came it.

Looks like get_cpu_ptr() arrived in 4.14.

I patched up zswap/z3fold in my old 5.1-rt tree (old forward port of
4.19-rt), turned them on with a script, and it all worked fine, so the
RT zswap allergy lies somewhere upstream of there it seems.

If I were to start merging zswap and friends forward, I'd likely
eventually meet whatever RT is allergic to.  I might try that, but
won't be the least surprised if it turns into a god awful mess of
dependencies.

Hopping forward in rt branches first and patching them up will bracket
the little bugger a lot quicker.

	-Mike



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-11-30 16:03                         ` Sebastian Andrzej Siewior
@ 2020-12-02  2:30                             ` Mike Galbraith
  2020-12-02  2:30                             ` Mike Galbraith
  1 sibling, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-12-02  2:30 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-11-30 at 17:03 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-30 16:01:11 [+0100], Mike Galbraith wrote:
> > On Mon, 2020-11-30 at 15:52 +0100, Sebastian Andrzej Siewior wrote:
> > > How do you test this? I triggered a few oom-killer and I have here git
> > > gc running for a few hours now… Everything is fine.
> >
> > In an LTP install, ./runltp -f mm.  Shortly after box starts swapping
> > insanely, it explodes quite reliably here with either z3fold or
> > zsmalloc.. but not with zbud.
>
> This just passed. It however killed my git-gc task which wasn't done.
> Let me try tomorrow with your config.

What I'm seeing is the below.  rt_mutex_has_waiters() says yup we have
a waiter, rt_mutex_top_waiter() emits the missing cached leftmost, and
rt_mutex_dequeue_pi() chokes on it.  Lock is buggered.

[  894.376962] BUG: kernel NULL pointer dereference, address: 0000000000000018
[  894.377639] #PF: supervisor read access in kernel mode
[  894.378130] #PF: error_code(0x0000) - not-present page
[  894.378735] PGD 0 P4D 0
[  894.378974] Oops: 0000 [#1] PREEMPT_RT SMP PTI
[  894.379384] CPU: 2 PID: 78 Comm: oom_reaper Kdump: loaded Tainted: G            E     5.9.11-rt20-rt #9
[  894.380253] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[  894.381352] RIP: 0010:mark_wakeup_next_waiter+0x51/0x150
[  894.381869] Code: 00 00 49 89 f5 e8 9f 1c 7c 00 48 8b 5d 10 48 85 db 74 0a 48 3b 6b 38 0f 85 00 01 00 00 65 4c 8b 3c 25 c0 8d 01 00 4c 8d 73 18 <4c> 39 73 18 0f 85 94 00 00 00 65 48 8b 3c 25 c0 8d 01 00 48 8b 87
[  894.383640] RSP: 0018:ffffb792802cfb18 EFLAGS: 00010046
[  894.384135] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
[  894.384804] RDX: 0000000000000001 RSI: ffffb792802cfb68 RDI: 0000000000000001
[  894.385473] RBP: ffff997b4e508628 R08: ffff997b39075000 R09: ffff997a47800db0
[  894.386134] R10: 0000000000000000 R11: ffffffff8a58f4d8 R12: ffffb792802cfb58
[  894.387030] R13: ffffb792802cfb68 R14: 0000000000000018 R15: ffff997a7f1d3300
[  894.387715] FS:  0000000000000000(0000) GS:ffff997b77c80000(0000) knlGS:0000000000000000
[  894.388476] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  894.389209] CR2: 0000000000000018 CR3: 00000001cc156006 CR4: 00000000001706e0
[  894.389881] Call Trace:
[  894.390127]  rt_mutex_futex_unlock+0x4f/0x90
[  894.390547]  z3fold_zpool_free+0x539/0x5c0
[  894.390930]  zswap_free_entry+0x43/0x50
[  894.391193]  zswap_frontswap_invalidate_page+0x8a/0x90
[  894.391544]  __frontswap_invalidate_page+0x48/0x80
[  894.391875]  swapcache_free_entries+0x1ee/0x330
[  894.392189]  ? rt_mutex_futex_unlock+0x65/0x90
[  894.392502]  free_swap_slot+0xad/0xc0
[  894.392757]  __swap_entry_free+0x70/0x90
[  894.393046]  free_swap_and_cache+0x39/0xe0
[  894.393351]  unmap_page_range+0x5e1/0xb30
[  894.393646]  ? flush_tlb_mm_range+0xfb/0x170
[  894.393965]  __oom_reap_task_mm+0xb2/0x170
[  894.394254]  ? __switch_to+0x12a/0x520
[  894.394514]  oom_reaper+0x119/0x540
[  894.394756]  ? wait_woken+0xa0/0xa0
[  894.394997]  ? __oom_reap_task_mm+0x170/0x170
[  894.395297]  kthread+0x169/0x180
[  894.395535]  ? kthread_park+0x90/0x90
[  894.395867]  ret_from_fork+0x22/0x30
[  894.396252] Modules linked in: ebtable_filter(E) ebtables(E) uinput(E) fuse(E) rpcsec_gss_krb5(E) nfsv4(E) xt_comment(E) dns_resolver(E) nfs(E) nf_log_ipv6(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) nfs_ssc(E) fscache(E>
[  894.396280]  cryptd(E) glue_helper(E) pcspkr(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) sch_fq_codel(E) hid_generic(E) usbhid(E) ext4(E) crc16(E) mbcache(E) jbd2(E) ata_generic(E) virtio_console(E) virtio_blk(E)>
[  894.406791] Dumping ftrace buffer:
[  894.407037]    (ftrace buffer empty)
[  894.407293] CR2: 0000000000000018

crash> gdb list *mark_wakeup_next_waiter+0x51
0xffffffff810e87e1 is in mark_wakeup_next_waiter (kernel/locking/rtmutex.c:362).
357     }
358
359     static void
360     rt_mutex_dequeue_pi(struct task_struct *task, struct rt_mutex_waiter *waiter)
361     {
362             if (RB_EMPTY_NODE(&waiter->pi_tree_entry))
363                     return;
364
365             rb_erase_cached(&waiter->pi_tree_entry, &task->pi_waiters);
366             RB_CLEAR_NODE(&waiter->pi_tree_entry);

crash> rwlock_t -x 0xffff997b4e508628
struct rwlock_t {
  rtmutex = {
    wait_lock = {
      raw_lock = {
        {
          val = {
            counter = 0x1
          },
          {
            locked = 0x1,
            pending = 0x0
          },
          {
            locked_pending = 0x1,
            tail = 0x0
          }
        }
      }
    },
    waiters = {
      rb_root = {
        rb_node = 0xffff997b4e508580
      },
      rb_leftmost = 0x0
    },
    owner = 0xffff997a7f1d3300,
    save_state = 0x1
  },
  readers = {
    counter = 0x80000000
  }
}
crash> rb_root 0xffff997b4e508580
struct rb_root {
  rb_node = 0x0
}


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-12-02  2:30                             ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-12-02  2:30 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-11-30 at 17:03 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-11-30 16:01:11 [+0100], Mike Galbraith wrote:
> > On Mon, 2020-11-30 at 15:52 +0100, Sebastian Andrzej Siewior wrote:
> > > How do you test this? I triggered a few oom-killer and I have here git
> > > gc running for a few hours now… Everything is fine.
> >
> > In an LTP install, ./runltp -f mm.  Shortly after box starts swapping
> > insanely, it explodes quite reliably here with either z3fold or
> > zsmalloc.. but not with zbud.
>
> This just passed. It however killed my git-gc task which wasn't done.
> Let me try tomorrow with your config.

What I'm seeing is the below.  rt_mutex_has_waiters() says yup we have
a waiter, rt_mutex_top_waiter() emits the missing cached leftmost, and
rt_mutex_dequeue_pi() chokes on it.  Lock is buggered.

[  894.376962] BUG: kernel NULL pointer dereference, address: 0000000000000018
[  894.377639] #PF: supervisor read access in kernel mode
[  894.378130] #PF: error_code(0x0000) - not-present page
[  894.378735] PGD 0 P4D 0
[  894.378974] Oops: 0000 [#1] PREEMPT_RT SMP PTI
[  894.379384] CPU: 2 PID: 78 Comm: oom_reaper Kdump: loaded Tainted: G            E     5.9.11-rt20-rt #9
[  894.380253] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[  894.381352] RIP: 0010:mark_wakeup_next_waiter+0x51/0x150
[  894.381869] Code: 00 00 49 89 f5 e8 9f 1c 7c 00 48 8b 5d 10 48 85 db 74 0a 48 3b 6b 38 0f 85 00 01 00 00 65 4c 8b 3c 25 c0 8d 01 00 4c 8d 73 18 <4c> 39 73 18 0f 85 94 00 00 00 65 48 8b 3c 25 c0 8d 01 00 48 8b 87
[  894.383640] RSP: 0018:ffffb792802cfb18 EFLAGS: 00010046
[  894.384135] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001
[  894.384804] RDX: 0000000000000001 RSI: ffffb792802cfb68 RDI: 0000000000000001
[  894.385473] RBP: ffff997b4e508628 R08: ffff997b39075000 R09: ffff997a47800db0
[  894.386134] R10: 0000000000000000 R11: ffffffff8a58f4d8 R12: ffffb792802cfb58
[  894.387030] R13: ffffb792802cfb68 R14: 0000000000000018 R15: ffff997a7f1d3300
[  894.387715] FS:  0000000000000000(0000) GS:ffff997b77c80000(0000) knlGS:0000000000000000
[  894.388476] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  894.389209] CR2: 0000000000000018 CR3: 00000001cc156006 CR4: 00000000001706e0
[  894.389881] Call Trace:
[  894.390127]  rt_mutex_futex_unlock+0x4f/0x90
[  894.390547]  z3fold_zpool_free+0x539/0x5c0
[  894.390930]  zswap_free_entry+0x43/0x50
[  894.391193]  zswap_frontswap_invalidate_page+0x8a/0x90
[  894.391544]  __frontswap_invalidate_page+0x48/0x80
[  894.391875]  swapcache_free_entries+0x1ee/0x330
[  894.392189]  ? rt_mutex_futex_unlock+0x65/0x90
[  894.392502]  free_swap_slot+0xad/0xc0
[  894.392757]  __swap_entry_free+0x70/0x90
[  894.393046]  free_swap_and_cache+0x39/0xe0
[  894.393351]  unmap_page_range+0x5e1/0xb30
[  894.393646]  ? flush_tlb_mm_range+0xfb/0x170
[  894.393965]  __oom_reap_task_mm+0xb2/0x170
[  894.394254]  ? __switch_to+0x12a/0x520
[  894.394514]  oom_reaper+0x119/0x540
[  894.394756]  ? wait_woken+0xa0/0xa0
[  894.394997]  ? __oom_reap_task_mm+0x170/0x170
[  894.395297]  kthread+0x169/0x180
[  894.395535]  ? kthread_park+0x90/0x90
[  894.395867]  ret_from_fork+0x22/0x30
[  894.396252] Modules linked in: ebtable_filter(E) ebtables(E) uinput(E) fuse(E) rpcsec_gss_krb5(E) nfsv4(E) xt_comment(E) dns_resolver(E) nfs(E) nf_log_ipv6(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) nfs_ssc(E) fscache(E>
[  894.396280]  cryptd(E) glue_helper(E) pcspkr(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) sch_fq_codel(E) hid_generic(E) usbhid(E) ext4(E) crc16(E) mbcache(E) jbd2(E) ata_generic(E) virtio_console(E) virtio_blk(E)>
[  894.406791] Dumping ftrace buffer:
[  894.407037]    (ftrace buffer empty)
[  894.407293] CR2: 0000000000000018

crash> gdb list *mark_wakeup_next_waiter+0x51
0xffffffff810e87e1 is in mark_wakeup_next_waiter (kernel/locking/rtmutex.c:362).
357     }
358
359     static void
360     rt_mutex_dequeue_pi(struct task_struct *task, struct rt_mutex_waiter *waiter)
361     {
362             if (RB_EMPTY_NODE(&waiter->pi_tree_entry))
363                     return;
364
365             rb_erase_cached(&waiter->pi_tree_entry, &task->pi_waiters);
366             RB_CLEAR_NODE(&waiter->pi_tree_entry);

crash> rwlock_t -x 0xffff997b4e508628
struct rwlock_t {
  rtmutex = {
    wait_lock = {
      raw_lock = {
        {
          val = {
            counter = 0x1
          },
          {
            locked = 0x1,
            pending = 0x0
          },
          {
            locked_pending = 0x1,
            tail = 0x0
          }
        }
      }
    },
    waiters = {
      rb_root = {
        rb_node = 0xffff997b4e508580
      },
      rb_leftmost = 0x0
    },
    owner = 0xffff997a7f1d3300,
    save_state = 0x1
  },
  readers = {
    counter = 0x80000000
  }
}
crash> rb_root 0xffff997b4e508580
struct rb_root {
  rb_node = 0x0
}



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-12-02  2:30                             ` Mike Galbraith
  (?)
@ 2020-12-02 22:08                             ` Sebastian Andrzej Siewior
  2020-12-03  2:16                                 ` Mike Galbraith
  -1 siblings, 1 reply; 70+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-12-02 22:08 UTC (permalink / raw)
  To: Mike Galbraith, Vitaly Wool
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On 2020-12-02 03:30:27 [+0100], Mike Galbraith wrote:
> > > In an LTP install, ./runltp -f mm.  Shortly after box starts swapping
> > > insanely, it explodes quite reliably here with either z3fold or
> > > zsmalloc.. but not with zbud.
> 
> What I'm seeing is the below.  rt_mutex_has_waiters() says yup we have
> a waiter, rt_mutex_top_waiter() emits the missing cached leftmost, and
> rt_mutex_dequeue_pi() chokes on it.  Lock is buggered.

correct. So this:

diff --git a/mm/z3fold.c b/mm/z3fold.c
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -342,7 +342,7 @@ static inline void free_handle(unsigned long handle)
 
 	if (is_free) {
 		struct z3fold_pool *pool = slots_to_pool(slots);
-
+		memset(slots, 0xee, sizeof(struct z3fold_buddy_slots));
 		kmem_cache_free(pool->c_handle, slots);
 	}
 }
@@ -548,8 +549,10 @@ static void __release_z3fold_page(struct z3fold_header *zhdr, bool locked)
 		set_bit(HANDLES_ORPHANED, &zhdr->slots->pool);
 	read_unlock(&zhdr->slots->lock);
 
-	if (is_free)
+	if (is_free) {
+		memset(zhdr->slots, 0xdd, sizeof(struct z3fold_buddy_slots));
 		kmem_cache_free(pool->c_handle, zhdr->slots);
+	}
 
 	if (locked)
 		z3fold_page_unlock(zhdr);

resulted in:

|[  377.200696] Out of memory: Killed process 284358 (oom01) total-vm:15780488kB, anon-rss:150624kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:16760kB oom_score_adj:0
|[  377.205438] ------------[ cut here ]------------
|[  377.205441] pvqspinlock: lock 0xffff8880105c6828 has corrupted value 0xdddddddd!
|[  377.205448] WARNING: CPU: 6 PID: 72 at kernel/locking/qspinlock_paravirt.h:498 __pv_queued_spin_unlock_slowpath+0xb3/0xc0
|[  377.205455] Modules linked in:
|[  377.205456] CPU: 6 PID: 72 Comm: oom_reaper Not tainted 5.10.0-rc6-rt13-rt+ #103
|[  377.205458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1 04/01/2014
|[  377.205460] RIP: 0010:__pv_queued_spin_unlock_slowpath+0xb3/0xc0
…
|[  377.205475] Call Trace:
|[  377.205477]  __raw_callee_save___pv_queued_spin_unlock_slowpath+0x11/0x20
|[  377.205481]  .slowpath+0x9/0xe
|[  377.205483]  _raw_spin_unlock_irqrestore+0x5/0x50
|[  377.205486]  rt_mutex_futex_unlock+0x9e/0xb0
|[  377.205488]  z3fold_free+0x2b0/0x470
|[  377.205491]  zswap_free_entry+0x7d/0xc0
|[  377.205493]  zswap_frontswap_invalidate_page+0x87/0x90
|[  377.205495]  __frontswap_invalidate_page+0x58/0x90
|[  377.205496]  swap_range_free.constprop.0+0x99/0xb0
|[  377.205499]  swapcache_free_entries+0x131/0x390
|[  377.205501]  free_swap_slot+0x99/0xc0
|[  377.205502]  __swap_entry_free+0x8a/0xa0
|[  377.205504]  free_swap_and_cache+0x36/0xd0
|[  377.205506]  zap_pte_range+0x16a/0x940
|[  377.205509]  unmap_page_range+0x1d8/0x310
|[  377.205514]  __oom_reap_task_mm+0xe7/0x190
|[  377.205520]  oom_reap_task_mm+0x5a/0x280
|[  377.205521]  oom_reaper+0x98/0x1c0
|[  377.205525]  kthread+0x18c/0x1b0
|[  377.205528]  ret_from_fork+0x22/0x30
|[  377.205531] ---[ end trace 0000000000000002 ]---

Then I reverted commit
   4a3ac9311dac3 ("mm/z3fold.c: add inter-page compaction")

and it seems to work now. Any suggestions? It looks like use-after-free.

Sebastian

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-12-02 22:08                             ` Sebastian Andrzej Siewior
@ 2020-12-03  2:16                                 ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-12-03  2:16 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Vitaly Wool
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Wed, 2020-12-02 at 23:08 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-12-02 03:30:27 [+0100], Mike Galbraith wrote:
>
> > What I'm seeing is the below.  rt_mutex_has_waiters() says yup we have
> > a waiter, rt_mutex_top_waiter() emits the missing cached leftmost, and
> > rt_mutex_dequeue_pi() chokes on it.  Lock is buggered.
>
> correct. So this:
>
> diff --git a/mm/z3fold.c b/mm/z3fold.c
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -342,7 +342,7 @@ static inline void free_handle(unsigned long handle)
>
>  	if (is_free) {
>  		struct z3fold_pool *pool = slots_to_pool(slots);
> -
> +		memset(slots, 0xee, sizeof(struct z3fold_buddy_slots));
>  		kmem_cache_free(pool->c_handle, slots);
>  	}
>  }
> @@ -548,8 +549,10 @@ static void __release_z3fold_page(struct z3fold_header *zhdr, bool locked)
>  		set_bit(HANDLES_ORPHANED, &zhdr->slots->pool);
>  	read_unlock(&zhdr->slots->lock);
>
> -	if (is_free)
> +	if (is_free) {
> +		memset(zhdr->slots, 0xdd, sizeof(struct z3fold_buddy_slots));
>  		kmem_cache_free(pool->c_handle, zhdr->slots);
> +	}
>
>  	if (locked)
>  		z3fold_page_unlock(zhdr);
>
> resulted in:
>
> |[  377.200696] Out of memory: Killed process 284358 (oom01) total-vm:15780488kB, anon-rss:150624kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:16760kB oom_score_adj:0
> |[  377.205438] ------------[ cut here ]------------
> |[  377.205441] pvqspinlock: lock 0xffff8880105c6828 has corrupted value 0xdddddddd!
> |[  377.205448] WARNING: CPU: 6 PID: 72 at kernel/locking/qspinlock_paravirt.h:498 __pv_queued_spin_unlock_slowpath+0xb3/0xc0
> |[  377.205455] Modules linked in:
> |[  377.205456] CPU: 6 PID: 72 Comm: oom_reaper Not tainted 5.10.0-rc6-rt13-rt+ #103
> |[  377.205458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1 04/01/2014
> |[  377.205460] RIP: 0010:__pv_queued_spin_unlock_slowpath+0xb3/0xc0
> …
> |[  377.205475] Call Trace:
> |[  377.205477]  __raw_callee_save___pv_queued_spin_unlock_slowpath+0x11/0x20
> |[  377.205481]  .slowpath+0x9/0xe
> |[  377.205483]  _raw_spin_unlock_irqrestore+0x5/0x50
> |[  377.205486]  rt_mutex_futex_unlock+0x9e/0xb0
> |[  377.205488]  z3fold_free+0x2b0/0x470
> |[  377.205491]  zswap_free_entry+0x7d/0xc0
> |[  377.205493]  zswap_frontswap_invalidate_page+0x87/0x90
> |[  377.205495]  __frontswap_invalidate_page+0x58/0x90
> |[  377.205496]  swap_range_free.constprop.0+0x99/0xb0
> |[  377.205499]  swapcache_free_entries+0x131/0x390
> |[  377.205501]  free_swap_slot+0x99/0xc0
> |[  377.205502]  __swap_entry_free+0x8a/0xa0
> |[  377.205504]  free_swap_and_cache+0x36/0xd0
> |[  377.205506]  zap_pte_range+0x16a/0x940
> |[  377.205509]  unmap_page_range+0x1d8/0x310
> |[  377.205514]  __oom_reap_task_mm+0xe7/0x190
> |[  377.205520]  oom_reap_task_mm+0x5a/0x280
> |[  377.205521]  oom_reaper+0x98/0x1c0
> |[  377.205525]  kthread+0x18c/0x1b0
> |[  377.205528]  ret_from_fork+0x22/0x30
> |[  377.205531] ---[ end trace 0000000000000002 ]---
>
> Then I reverted commit
>    4a3ac9311dac3 ("mm/z3fold.c: add inter-page compaction")
>
> and it seems to work now. Any suggestions? It looks like use-after-free.

Looks like...

d8f117abb380 z3fold: fix use-after-free when freeing handles

...wasn't completely effective.  write_unlock() in handle_free() is
where I see explosions.  Only trouble is that 4a3ac9311dac3 arrived in
5.5, and my 5.[5-7]-rt refuse to reproduce (d8f117abb380 applied),
whereas 5.9 and 5.10 do so quite reliably.  There is a heisenbug aspect
though, one trace_printk() in handle_free() made bug go hide, so the
heisen-fairies in earlier trees were probably just messing with me..
because they can, being the twisted little freaks they are ;-)

	-Mike


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-12-03  2:16                                 ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-12-03  2:16 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Vitaly Wool
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Wed, 2020-12-02 at 23:08 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-12-02 03:30:27 [+0100], Mike Galbraith wrote:
>
> > What I'm seeing is the below.  rt_mutex_has_waiters() says yup we have
> > a waiter, rt_mutex_top_waiter() emits the missing cached leftmost, and
> > rt_mutex_dequeue_pi() chokes on it.  Lock is buggered.
>
> correct. So this:
>
> diff --git a/mm/z3fold.c b/mm/z3fold.c
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -342,7 +342,7 @@ static inline void free_handle(unsigned long handle)
>
>  	if (is_free) {
>  		struct z3fold_pool *pool = slots_to_pool(slots);
> -
> +		memset(slots, 0xee, sizeof(struct z3fold_buddy_slots));
>  		kmem_cache_free(pool->c_handle, slots);
>  	}
>  }
> @@ -548,8 +549,10 @@ static void __release_z3fold_page(struct z3fold_header *zhdr, bool locked)
>  		set_bit(HANDLES_ORPHANED, &zhdr->slots->pool);
>  	read_unlock(&zhdr->slots->lock);
>
> -	if (is_free)
> +	if (is_free) {
> +		memset(zhdr->slots, 0xdd, sizeof(struct z3fold_buddy_slots));
>  		kmem_cache_free(pool->c_handle, zhdr->slots);
> +	}
>
>  	if (locked)
>  		z3fold_page_unlock(zhdr);
>
> resulted in:
>
> |[  377.200696] Out of memory: Killed process 284358 (oom01) total-vm:15780488kB, anon-rss:150624kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:16760kB oom_score_adj:0
> |[  377.205438] ------------[ cut here ]------------
> |[  377.205441] pvqspinlock: lock 0xffff8880105c6828 has corrupted value 0xdddddddd!
> |[  377.205448] WARNING: CPU: 6 PID: 72 at kernel/locking/qspinlock_paravirt.h:498 __pv_queued_spin_unlock_slowpath+0xb3/0xc0
> |[  377.205455] Modules linked in:
> |[  377.205456] CPU: 6 PID: 72 Comm: oom_reaper Not tainted 5.10.0-rc6-rt13-rt+ #103
> |[  377.205458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1 04/01/2014
> |[  377.205460] RIP: 0010:__pv_queued_spin_unlock_slowpath+0xb3/0xc0
> …
> |[  377.205475] Call Trace:
> |[  377.205477]  __raw_callee_save___pv_queued_spin_unlock_slowpath+0x11/0x20
> |[  377.205481]  .slowpath+0x9/0xe
> |[  377.205483]  _raw_spin_unlock_irqrestore+0x5/0x50
> |[  377.205486]  rt_mutex_futex_unlock+0x9e/0xb0
> |[  377.205488]  z3fold_free+0x2b0/0x470
> |[  377.205491]  zswap_free_entry+0x7d/0xc0
> |[  377.205493]  zswap_frontswap_invalidate_page+0x87/0x90
> |[  377.205495]  __frontswap_invalidate_page+0x58/0x90
> |[  377.205496]  swap_range_free.constprop.0+0x99/0xb0
> |[  377.205499]  swapcache_free_entries+0x131/0x390
> |[  377.205501]  free_swap_slot+0x99/0xc0
> |[  377.205502]  __swap_entry_free+0x8a/0xa0
> |[  377.205504]  free_swap_and_cache+0x36/0xd0
> |[  377.205506]  zap_pte_range+0x16a/0x940
> |[  377.205509]  unmap_page_range+0x1d8/0x310
> |[  377.205514]  __oom_reap_task_mm+0xe7/0x190
> |[  377.205520]  oom_reap_task_mm+0x5a/0x280
> |[  377.205521]  oom_reaper+0x98/0x1c0
> |[  377.205525]  kthread+0x18c/0x1b0
> |[  377.205528]  ret_from_fork+0x22/0x30
> |[  377.205531] ---[ end trace 0000000000000002 ]---
>
> Then I reverted commit
>    4a3ac9311dac3 ("mm/z3fold.c: add inter-page compaction")
>
> and it seems to work now. Any suggestions? It looks like use-after-free.

Looks like...

d8f117abb380 z3fold: fix use-after-free when freeing handles

...wasn't completely effective.  write_unlock() in handle_free() is
where I see explosions.  Only trouble is that 4a3ac9311dac3 arrived in
5.5, and my 5.[5-7]-rt refuse to reproduce (d8f117abb380 applied),
whereas 5.9 and 5.10 do so quite reliably.  There is a heisenbug aspect
though, one trace_printk() in handle_free() made bug go hide, so the
heisen-fairies in earlier trees were probably just messing with me..
because they can, being the twisted little freaks they are ;-)

	-Mike



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-12-03  2:16                                 ` Mike Galbraith
@ 2020-12-03  8:18                                   ` Mike Galbraith
  -1 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-12-03  8:18 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Vitaly Wool
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Thu, 2020-12-03 at 03:16 +0100, Mike Galbraith wrote:
> On Wed, 2020-12-02 at 23:08 +0100, Sebastian Andrzej Siewior wrote:
> Looks like...
>
> d8f117abb380 z3fold: fix use-after-free when freeing handles
>
> ...wasn't completely effective...

The top two hunks seem to have rendered the thing RT tolerant.

diff --git a/mm/z3fold.c b/mm/z3fold.c
index 18feaa0bc537..851d9f4f1644 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -537,7 +537,7 @@ static void __release_z3fold_page(struct z3fold_header *zhdr, bool locked)
 	spin_unlock(&pool->lock);

 	/* If there are no foreign handles, free the handles array */
-	read_lock(&zhdr->slots->lock);
+	write_lock(&zhdr->slots->lock);
 	for (i = 0; i <= BUDDY_MASK; i++) {
 		if (zhdr->slots->slot[i]) {
 			is_free = false;
@@ -546,7 +546,7 @@ static void __release_z3fold_page(struct z3fold_header *zhdr, bool locked)
 	}
 	if (!is_free)
 		set_bit(HANDLES_ORPHANED, &zhdr->slots->pool);
-	read_unlock(&zhdr->slots->lock);
+	write_unlock(&zhdr->slots->lock);

 	if (is_free)
 		kmem_cache_free(pool->c_handle, zhdr->slots);
@@ -642,14 +642,16 @@ static inline void add_to_unbuddied(struct z3fold_pool *pool,
 {
 	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 ||
 			zhdr->middle_chunks == 0) {
-		struct list_head *unbuddied = get_cpu_ptr(pool->unbuddied);
-
+		struct list_head *unbuddied;
 		int freechunks = num_free_chunks(zhdr);
+
+		migrate_disable();
+		unbuddied = this_cpu_ptr(pool->unbuddied);
 		spin_lock(&pool->lock);
 		list_add(&zhdr->buddy, &unbuddied[freechunks]);
 		spin_unlock(&pool->lock);
 		zhdr->cpu = smp_processor_id();
-		put_cpu_ptr(pool->unbuddied);
+		migrate_enable();
 	}
 }

@@ -886,8 +888,9 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
 	int chunks = size_to_chunks(size), i;

 lookup:
+	migrate_disable();
 	/* First, try to find an unbuddied z3fold page. */
-	unbuddied = get_cpu_ptr(pool->unbuddied);
+	unbuddied = this_cpu_ptr(pool->unbuddied);
 	for_each_unbuddied_list(i, chunks) {
 		struct list_head *l = &unbuddied[i];

@@ -905,7 +908,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
 		    !z3fold_page_trylock(zhdr)) {
 			spin_unlock(&pool->lock);
 			zhdr = NULL;
-			put_cpu_ptr(pool->unbuddied);
+			migrate_enable();
 			if (can_sleep)
 				cond_resched();
 			goto lookup;
@@ -919,7 +922,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
 		    test_bit(PAGE_CLAIMED, &page->private)) {
 			z3fold_page_unlock(zhdr);
 			zhdr = NULL;
-			put_cpu_ptr(pool->unbuddied);
+			migrate_enable();
 			if (can_sleep)
 				cond_resched();
 			goto lookup;
@@ -934,7 +937,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
 		kref_get(&zhdr->refcount);
 		break;
 	}
-	put_cpu_ptr(pool->unbuddied);
+	migrate_enable();

 	if (!zhdr) {
 		int cpu;


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-12-03  8:18                                   ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-12-03  8:18 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Vitaly Wool
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Thu, 2020-12-03 at 03:16 +0100, Mike Galbraith wrote:
> On Wed, 2020-12-02 at 23:08 +0100, Sebastian Andrzej Siewior wrote:
> Looks like...
>
> d8f117abb380 z3fold: fix use-after-free when freeing handles
>
> ...wasn't completely effective...

The top two hunks seem to have rendered the thing RT tolerant.

diff --git a/mm/z3fold.c b/mm/z3fold.c
index 18feaa0bc537..851d9f4f1644 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -537,7 +537,7 @@ static void __release_z3fold_page(struct z3fold_header *zhdr, bool locked)
 	spin_unlock(&pool->lock);

 	/* If there are no foreign handles, free the handles array */
-	read_lock(&zhdr->slots->lock);
+	write_lock(&zhdr->slots->lock);
 	for (i = 0; i <= BUDDY_MASK; i++) {
 		if (zhdr->slots->slot[i]) {
 			is_free = false;
@@ -546,7 +546,7 @@ static void __release_z3fold_page(struct z3fold_header *zhdr, bool locked)
 	}
 	if (!is_free)
 		set_bit(HANDLES_ORPHANED, &zhdr->slots->pool);
-	read_unlock(&zhdr->slots->lock);
+	write_unlock(&zhdr->slots->lock);

 	if (is_free)
 		kmem_cache_free(pool->c_handle, zhdr->slots);
@@ -642,14 +642,16 @@ static inline void add_to_unbuddied(struct z3fold_pool *pool,
 {
 	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 ||
 			zhdr->middle_chunks == 0) {
-		struct list_head *unbuddied = get_cpu_ptr(pool->unbuddied);
-
+		struct list_head *unbuddied;
 		int freechunks = num_free_chunks(zhdr);
+
+		migrate_disable();
+		unbuddied = this_cpu_ptr(pool->unbuddied);
 		spin_lock(&pool->lock);
 		list_add(&zhdr->buddy, &unbuddied[freechunks]);
 		spin_unlock(&pool->lock);
 		zhdr->cpu = smp_processor_id();
-		put_cpu_ptr(pool->unbuddied);
+		migrate_enable();
 	}
 }

@@ -886,8 +888,9 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
 	int chunks = size_to_chunks(size), i;

 lookup:
+	migrate_disable();
 	/* First, try to find an unbuddied z3fold page. */
-	unbuddied = get_cpu_ptr(pool->unbuddied);
+	unbuddied = this_cpu_ptr(pool->unbuddied);
 	for_each_unbuddied_list(i, chunks) {
 		struct list_head *l = &unbuddied[i];

@@ -905,7 +908,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
 		    !z3fold_page_trylock(zhdr)) {
 			spin_unlock(&pool->lock);
 			zhdr = NULL;
-			put_cpu_ptr(pool->unbuddied);
+			migrate_enable();
 			if (can_sleep)
 				cond_resched();
 			goto lookup;
@@ -919,7 +922,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
 		    test_bit(PAGE_CLAIMED, &page->private)) {
 			z3fold_page_unlock(zhdr);
 			zhdr = NULL;
-			put_cpu_ptr(pool->unbuddied);
+			migrate_enable();
 			if (can_sleep)
 				cond_resched();
 			goto lookup;
@@ -934,7 +937,7 @@ static inline struct z3fold_header *__z3fold_alloc(struct z3fold_pool *pool,
 		kref_get(&zhdr->refcount);
 		break;
 	}
-	put_cpu_ptr(pool->unbuddied);
+	migrate_enable();

 	if (!zhdr) {
 		int cpu;



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* zswap explosion when using zsmalloc pool compression
  2020-12-03  8:18                                   ` Mike Galbraith
  (?)
@ 2020-12-03  9:40                                   ` Mike Galbraith
  2020-12-03  9:48                                     ` Sebastian Andrzej Siewior
  -1 siblings, 1 reply; 70+ messages in thread
From: Mike Galbraith @ 2020-12-03  9:40 UTC (permalink / raw)
  To: RT; +Cc: Sebastian Andrzej Siewior

In thread "scheduling while atomic in z3fold", I mentioned that of the
three pool methods, only zbud did not explode.  Below is the explosion
encountered when zsmalloc is used in conjunction with runltp -f mm.  As
with the z3fold explosion, this one also seems to be highly repeatable,
this from one of my full distro "my box plays russian dolls [1]" KVMs.

[ 5763.720268] kernel BUG at kernel/locking/rtmutex.c:1057!
[ 5763.720273] invalid opcode: 0000 [#1] PREEMPT_RT SMP NOPTI
[ 5763.721831] CPU: 1 PID: 25163 Comm: AudioIPC0 Kdump: loaded Tainted: G            E     5.9.12-rt20-rt #18
[ 5763.722619] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[ 5763.723545] RIP: 0010:rt_spin_lock_slowlock_locked+0x29b/0x2d0
[ 5763.724027] Code: 7f 0f 85 23 ff ff ff 41 8b 46 0c 85 c0 0f 85 17 ff ff ff 49 8b 06 a9 00 00 08 00 0f 84 09 ff ff ff e9 ff fe ff ff 0f 0b 0f 0b <0f> 0b 48 89 df e8 fb 00 84 ff e9 fe fe ff ff 48 8b 43 10 48 85 c0
[ 5763.725587] RSP: 0000:ffff9e5a41b1faa8 EFLAGS: 00010086
[ 5763.726016] RAX: 00000000ffffffdd RBX: ffff903f77c68080 RCX: 0000000000000001
[ 5763.726592] RDX: 0000000000000001 RSI: 0000000000000400 RDI: 0000000000000000
[ 5763.727166] RBP: ffff903eadc919a8 R08: 0000000000000000 R09: 00000000000217e9
[ 5763.727820] R10: ffff9e5a41b1f910 R11: 00000000000003f0 R12: ffff903eadc91100
[ 5763.728500] R13: ffff9e5a41b1fae8 R14: ffff903f76189480 R15: 0000000000000286
[ 5763.729090] FS:  00007fbd86074700(0000) GS:ffff903f77c40000(0000) knlGS:0000000000000000
[ 5763.729985] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5763.730435] CR2: 00007fbd6d844000 CR3: 000000021dcbc005 CR4: 00000000001706e0
[ 5763.731028] Call Trace:
[ 5763.731243]  rt_spin_lock_slowlock+0x40/0x60
[ 5763.731603]  rt_spin_lock+0x45/0x50
[ 5763.731901]  zs_map_object+0x103/0x3a0
[ 5763.732224]  zswap_frontswap_load+0x10c/0x390
[ 5763.732596]  __frontswap_load+0x6e/0xd0
[ 5763.732933]  swap_readpage+0x88/0x240
[ 5763.733233]  read_swap_cache_async+0x2e/0x40
[ 5763.733579]  swap_cluster_readahead+0x202/0x290
[ 5763.733961]  ? swapin_readahead+0x85/0x480
[ 5763.734307]  swapin_readahead+0x85/0x480
[ 5763.734686]  ? find_get_entry+0xab/0x130
[ 5763.735018]  ? pagecache_get_page+0x30/0x2e0
[ 5763.735386]  ? do_swap_page+0x390/0xb00
[ 5763.735712]  do_swap_page+0x390/0xb00
[ 5763.736027]  handle_mm_fault+0x8db/0x13b0
[ 5763.736370]  exc_page_fault+0x335/0x630
[ 5763.736745]  ? asm_exc_page_fault+0x8/0x30
[ 5763.737237]  asm_exc_page_fault+0x1e/0x30
[ 5763.737594] RIP: 0033:0x7fbdae932e1f
[ 5763.737901] Code: 17 e0 c5 f8 77 c3 48 3b 15 26 c3 25 00 0f 83 25 01 00 00 48 39 f7 72 0f 74 12 4c 8d 0c 16 4c 39 cf 0f 82 c5 01 00 00 48 89 d1 <f3> a4 c3 80 fa 10 73 17 80 fa 08 73 27 80 fa 04 73 33 80 fa 01 77
[ 5763.739533] RSP: 002b:00007fbd86072dd8 EFLAGS: 00010287
[ 5763.739980] RAX: 00007fbd6ca00dc0 RBX: 00007fbd86072fa0 RCX: 0000000000000040
[ 5763.740571] RDX: 0000000000001040 RSI: 00007fbd6d844000 RDI: 00007fbd6ca01dc0
[ 5763.741421] RBP: 00007fbd87ae9fc0 R08: 0000000000000000 R09: 0000000000000000
[ 5763.742031] R10: 0000000000000001 R11: 00007fbd6ca00000 R12: 00007fbd86072de8
[ 5763.742615] R13: 00007fbd86072df0 R14: 0000000000000208 R15: 00007fbd86073100
[ 5763.743204] Modules linked in: ebtable_filter(E) ebtables(E) uinput(E) fuse(E) xt_comment(E) nf_log_ipv6(E) nf_log_ipv4(E) nf_log_common(E) xt_LOG(E) xt_limit(E) rpcsec_gss_krb5(E) nfsv4(E) dns_resolver(E) nfs(E) nfs_ssc(E) fscache(E) nfnetlink_cthelper(E) nfnetlink(E) af_packet(E) iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) xt_pkttype(E) xt_tcpudp(E) ip6t_REJECT(E) ipt_REJECT(E) iptable_filter(E) bpfilter(E) ip6table_mangle(E) nf_conntrack_netbios_ns(E) nf_conntrack_broadcast(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) ip6table_filter(E) ip6_tables(E) x_tables(E) joydev(E) intel_rapl_msr(E) intel_rapl_common(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) crypto_simd(E) snd_hda_codec_generic(E) cryptd(E) ledtrig_audio(E) glue_helper(E) virtio_balloon(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E) snd_pcm(E) tiny_power_button(E) snd_timer(E) snd(E) 8139too(E)
[ 5763.743239]  soundcore(E) i2c_piix4(E) button(E) pcspkr(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sch_fq_codel(E) sunrpc(E) ext4(E) crc16(E) mbcache(E) jbd2(E) ata_generic(E) hid_generic(E) usbhid(E) ata_piix(E) virtio_console(E) virtio_blk(E) virtio_rng(E) qxl(E) drm_ttm_helper(E) ttm(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) cec(E) crc32c_intel(E) uhci_hcd(E) ehci_pci(E) ahci(E) rc_core(E) ehci_hcd(E) virtio_pci(E) libahci(E) virtio_ring(E) serio_raw(E) 8139cp(E) drm(E) virtio(E) usbcore(E) mii(E) libata(E) floppy(E) qemu_fw_cfg(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E)

1. box in box in box is cute, but not all that useful :)


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: zswap explosion when using zsmalloc pool compression
  2020-12-03  9:40                                   ` zswap explosion when using zsmalloc pool compression Mike Galbraith
@ 2020-12-03  9:48                                     ` Sebastian Andrzej Siewior
  2020-12-03 11:03                                       ` Mike Galbraith
  0 siblings, 1 reply; 70+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-12-03  9:48 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: RT

On 2020-12-03 10:40:16 [+0100], Mike Galbraith wrote:
> In thread "scheduling while atomic in z3fold", I mentioned that of the
> three pool methods, only zbud did not explode.  Below is the explosion
> encountered when zsmalloc is used in conjunction with runltp -f mm.  As
> with the z3fold explosion, this one also seems to be highly repeatable,
> this from one of my full distro "my box plays russian dolls [1]" KVMs.

great.

> [ 5763.720268] kernel BUG at kernel/locking/rtmutex.c:1057!
> [ 5763.726016] RAX: 00000000ffffffdd RBX: ffff903f77c68080 RCX: 0000000000000001

That is a deadlock. Does lockdep say anything or does it disable the bug
from happening?

Sebastian

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: zswap explosion when using zsmalloc pool compression
  2020-12-03  9:48                                     ` Sebastian Andrzej Siewior
@ 2020-12-03 11:03                                       ` Mike Galbraith
  2020-12-03 12:51                                         ` Mike Galbraith
  0 siblings, 1 reply; 70+ messages in thread
From: Mike Galbraith @ 2020-12-03 11:03 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: RT

On Thu, 2020-12-03 at 10:48 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-12-03 10:40:16 [+0100], Mike Galbraith wrote:
> > In thread "scheduling while atomic in z3fold", I mentioned that of the
> > three pool methods, only zbud did not explode.  Below is the explosion
> > encountered when zsmalloc is used in conjunction with runltp -f mm.  As
> > with the z3fold explosion, this one also seems to be highly repeatable,
> > this from one of my full distro "my box plays russian dolls [1]" KVMs.
>
> great.
>
> > [ 5763.720268] kernel BUG at kernel/locking/rtmutex.c:1057!
> …
> > [ 5763.726016] RAX: 00000000ffffffdd RBX: ffff903f77c68080 RCX: 0000000000000001
>
> That is a deadlock. Does lockdep say anything or does it disable the bug
> from happening?

The later, for the first time it did not explode.  However...

[  458.195604] BUG: spinlock bad magic on CPU#1, oom01/9449
[  458.195609]  lock: 0xffff980db7c69560, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
[  458.195613] CPU: 1 PID: 9449 Comm: oom01 Kdump: loaded Tainted: G            E     5.9.12-rt20-rt #19
[  458.195615] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
[  458.195616] Call Trace:
[  458.195617]  dump_stack+0x77/0x97
[  458.195633]  do_raw_spin_lock+0x71/0xc0
[  458.195644]  _raw_spin_lock_irqsave+0x45/0x50
[  458.195650]  ? rt_spin_lock_slowlock+0x3a/0x70
[  458.195652]  rt_spin_lock_slowlock+0x3a/0x70
[  458.195656]  rt_spin_lock+0x46/0xc0
[  458.195658]  ? zswap_frontswap_store+0x26e/0xa50
[  458.195665]  zswap_frontswap_store+0x26e/0xa50
[  458.195667]  ? rt_mutex_futex_unlock+0x56/0x80
[  458.195668]  __frontswap_store+0x6e/0x100
[  458.195670]  swap_writepage+0x37/0x70
[  458.195671]  shmem_writepage+0x267/0x400
[  458.195674]  pageout+0xd2/0x400
[  458.195677]  shrink_page_list+0xa6e/0xbd0
[  458.195679]  shrink_inactive_list+0x216/0x620
[  458.195683]  shrink_lruvec+0x39f/0x5e0
[  458.195689]  ? shrink_node+0x237/0x6e0
[  458.195694]  shrink_node+0x237/0x6e0
[  458.195697]  do_try_to_free_pages+0xb9/0x400
[  458.195698]  try_to_free_pages+0xe6/0x3b0
[  458.195701]  __alloc_pages_slowpath.constprop.108+0x3f8/0x1090
[  458.195705]  __alloc_pages_nodemask+0x3f6/0x480
[  458.195707]  alloc_pages_vma+0x135/0x190
[  458.195711]  handle_mm_fault+0x87c/0x16f0
[  458.195715]  ? lockdep_hardirqs_on+0x79/0x100
[  458.195718]  exc_page_fault+0x343/0x720
[  458.195720]  ? asm_exc_page_fault+0x1e/0x30
[  458.195722]  ? asm_exc_page_fault+0x8/0x30
[  458.195723]  asm_exc_page_fault+0x1e/0x30
[  458.195724] RIP: 0033:0x414300
[  458.195726] Code: e8 d5 f5 fe ff 48 83 f8 ff 0f 84 9c 00 00 00 41 83 fd 02 48 89 c5 74 32 41 83 fd 03 74 74 49 63 c4 48 89 ea 66 0f 1f 44 00 00 <c6> 02 07 48 01 c2 48 89 d1 48 29 e9 48 39 cb 7f ef 31 c0 48 83 c4
[  458.195730] RSP: 002b:00007f1e1d790ec0 EFLAGS: 00010206
[  458.195731] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 00000000416a8000
[  458.195731] RDX: 00007f1cdde38000 RSI: 00000000c0000000 RDI: 0000000000000000
[  458.195732] RBP: 00007f1c9c790000 R08: ffffffffffffffff R09: 0000000000000000
[  458.195732] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000001000
[  458.195733] R13: 0000000000000001 R14: 0000000000000001 R15: 00007ffe1f97c860


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: zswap explosion when using zsmalloc pool compression
  2020-12-03 11:03                                       ` Mike Galbraith
@ 2020-12-03 12:51                                         ` Mike Galbraith
  2020-12-03 13:07                                           ` Mike Galbraith
  0 siblings, 1 reply; 70+ messages in thread
From: Mike Galbraith @ 2020-12-03 12:51 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: RT

On Thu, 2020-12-03 at 12:03 +0100, Mike Galbraith wrote:
> On Thu, 2020-12-03 at 10:48 +0100, Sebastian Andrzej Siewior wrote:
> > On 2020-12-03 10:40:16 [+0100], Mike Galbraith wrote:
> > > In thread "scheduling while atomic in z3fold", I mentioned that of the
> > > three pool methods, only zbud did not explode.  Below is the explosion
> > > encountered when zsmalloc is used in conjunction with runltp -f mm.  As
> > > with the z3fold explosion, this one also seems to be highly repeatable,
> > > this from one of my full distro "my box plays russian dolls [1]" KVMs.
> > 
> > great.
> > 
> > > [ 5763.720268] kernel BUG at kernel/locking/rtmutex.c:1057!
> > …
> > > [ 5763.726016] RAX: 00000000ffffffdd RBX: ffff903f77c68080 RCX: 0000000000000001
> > 
> > That is a deadlock. Does lockdep say anything or does it disable the bug
> > from happening?
> 
> The later, for the first time it did not explode.  However...

Bare metal both moaned and went BUG().  zswap don't like RT much.

[  493.774076] BUG: spinlock bad magic on CPU#1, oom01/9152
[  493.774081]  lock: 0xffff8a33dec69560, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
[  493.774084] CPU: 1 PID: 9152 Comm: oom01 Kdump: loaded Tainted: G S          E     5.9.12-rt20-rt #21
[  493.774086] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[  493.774087] Call Trace:
[  493.774088]  dump_stack+0x77/0x97
[  493.774094]  do_raw_spin_lock+0x71/0xc0
[  493.774097]  _raw_spin_lock_irqsave+0x45/0x50
[  493.774100]  ? rt_spin_lock_slowlock+0x3a/0x70
[  493.774101]  rt_spin_lock_slowlock+0x3a/0x70
[  493.774104]  rt_spin_lock+0x46/0xc0
[  493.774106]  ? zswap_frontswap_store+0x26e/0xa50
[  493.774109]  zswap_frontswap_store+0x26e/0xa50
[  493.774111]  ? rt_mutex_futex_unlock+0x56/0x80
[  493.774113]  __frontswap_store+0x6e/0x100
[  493.774116]  swap_writepage+0x37/0x70
[  493.774118]  shmem_writepage+0x267/0x400
[  493.774121]  pageout+0xd2/0x400
[  493.774125]  shrink_page_list+0xa6e/0xbd0
[  493.774129]  shrink_inactive_list+0x216/0x620
[  493.774134]  shrink_lruvec+0x39f/0x5e0
[  493.774141]  ? shrink_node+0x237/0x6e0
[  493.774142]  shrink_node+0x237/0x6e0
[  493.774145]  do_try_to_free_pages+0xb9/0x400
[  493.774148]  try_to_free_pages+0xe6/0x3b0
[  493.774152]  __alloc_pages_slowpath.constprop.108+0x3f8/0x1090
[  493.774158]  __alloc_pages_nodemask+0x3f6/0x480
[  493.774161]  alloc_pages_vma+0x135/0x190
[  493.774164]  handle_mm_fault+0x87c/0x16f0
[  493.774168]  ? lockdep_hardirqs_on+0x79/0x100
[  493.774173]  exc_page_fault+0x343/0x720
[  493.774174]  ? asm_exc_page_fault+0x1e/0x30
[  493.774176]  ? asm_exc_page_fault+0x8/0x30
[  493.774178]  asm_exc_page_fault+0x1e/0x30
[  493.774180] RIP: 0033:0x414300
[  493.774181] Code: e8 d5 f5 fe ff 48 83 f8 ff 0f 84 9c 00 00 00 41 83 fd 02 48 89 c5 74 32 41 83 fd 03 74 74 49 63 c4 48 89 ea 66 0f 1f 44 00 00 <c6> 02 07 48 01 c2 48 89 d1 48 29 e9 48 39 cb 7f ef 31 c0 48 83 c4
[  493.774182] RSP: 002b:00007f17eb8ccec0 EFLAGS: 00010206
[  493.774184] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 000000000f7c5000
[  493.774185] RDX: 00007f12b888e000 RSI: 00000000c0000000 RDI: 0000000000000000
[  493.774185] RBP: 00007f12a90c9000 R08: ffffffffffffffff R09: 0000000000000000
[  493.774186] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000001000
[  493.774187] R13: 0000000000000001 R14: 0000000000000001 R15: 00007fffcfa8d4d0
[  495.792174] ------------[ cut here ]------------
[  495.792176] kernel BUG at kernel/locking/rtmutex.c:1057!
[  495.792181] invalid opcode: 0000 [#1] PREEMPT_RT SMP NOPTI
[  495.792184] CPU: 4 PID: 3466 Comm: kwin_x11 Kdump: loaded Tainted: G S          E     5.9.12-rt20-rt #21
[  495.792185] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[  495.792186] RIP: 0010:rt_spin_lock_slowlock_locked+0x29c/0x2b0
[  495.792191] Code: 88 d0 00 01 e8 a5 a5 76 ff e9 ce fe ff ff 48 8b 43 48 48 85 c0 74 06 48 3b 58 38 75 0f 49 39 c4 0f 85 69 ff ff ff 0f 0b 0f 0b <0f> 0b e8 ed 23 77 ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f
[  495.792192] RSP: 0000:ffffa2524328b598 EFLAGS: 00010086
[  495.792194] RAX: 00000000ffffffdd RBX: ffff8a33ded29d20 RCX: 0000000000000000
[  495.792195] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8a33ded29d20
[  495.792196] RBP: ffff8a337a651b40 R08: 0000000000000001 R09: 0000000000000000
[  495.792196] R10: ffffa2524328b4b8 R11: ffff8a33defd5000 R12: ffffa2524328b5e0
[  495.792197] R13: ffff8a337a651b40 R14: ffff8a337a6524d0 R15: 0000000000000282
[  495.792198] FS:  00007fb8d31f3d80(0000) GS:ffff8a33ded00000(0000) knlGS:0000000000000000
[  495.792199] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  495.792200] CR2: 00007efcde0bd020 CR3: 000000010755c005 CR4: 00000000001706e0
[  495.792201] Call Trace:
[  495.792205]  rt_spin_lock_slowlock+0x4b/0x70
[  495.792209]  rt_spin_lock+0x46/0xc0
[  495.792211]  ? zs_map_object+0x10d/0x3d0
[  495.792215]  zs_map_object+0x10d/0x3d0
[  495.792218]  zswap_frontswap_store+0x716/0xa50
[  495.792223]  __frontswap_store+0x6e/0x100
[  495.792225]  swap_writepage+0x37/0x70
[  495.792227]  pageout+0xd2/0x400
[  495.792232]  shrink_page_list+0xa6e/0xbd0
[  495.792237]  shrink_inactive_list+0x216/0x620
[  495.792242]  shrink_lruvec+0x39f/0x5e0
[  495.792249]  ? shrink_node+0x237/0x6e0
[  495.792250]  shrink_node+0x237/0x6e0
[  495.792254]  do_try_to_free_pages+0xb9/0x400
[  495.792256]  try_to_free_pages+0xe6/0x3b0
[  495.792261]  __alloc_pages_slowpath.constprop.108+0x3f8/0x1090
[  495.792263]  ? trace_hardirqs_on+0x35/0x130
[  495.792265]  ? asm_sysvec_call_function_single+0x12/0x20
[  495.792269]  ? trace_hardirqs_on+0x35/0x130
[  495.792271]  ? lock_acquire+0x201/0x420
[  495.792273]  ? _raw_spin_unlock_irqrestore+0x34/0x80
[  495.792275]  ? xas_descend+0x2e/0x120
[  495.792278]  __alloc_pages_nodemask+0x3f6/0x480
[  495.792281]  page_cache_readahead_unbounded+0x8a/0x200
[  495.792286]  filemap_fault+0x770/0xd50
[  495.792289]  ? filemap_map_pages+0x1e5/0x4f0
[  495.792292]  ext4_filemap_fault+0x2f/0x40 [ext4]
[  495.792315]  __do_fault+0x31/0x140
[  495.792318]  handle_mm_fault+0xf37/0x16f0
[  495.792323]  exc_page_fault+0x343/0x720
[  495.792326]  ? asm_exc_page_fault+0x8/0x30
[  495.792328]  ? asm_exc_page_fault+0x8/0x30
[  495.792330]  asm_exc_page_fault+0x1e/0x30
[  495.792331] RIP: 0033:0x7fb8d27439f6
[  495.792333] Code: ff ba ff ff ff ff f2 0f 10 05 ae cd 0a 00 48 39 d0 74 1f 48 8b 7b 28 e8 08 97 ea ff 48 85 c0 78 13 66 0f ef c0 f2 48 0f 2a c0 <f2> 0f 5e 05 72 3b 0c 00 5b c3 48 89 c2 66 0f ef c0 48 d1 ea 83 e0
[  495.792334] RSP: 002b:00007ffc24b74db0 EFLAGS: 00010246
[  495.792335] RAX: 0000000000000000 RBX: 0000564b8f45af90 RCX: 00007ffc24b74eb0
[  495.792336] RDX: 00000000ffffffff RSI: 00007fb8b4009920 RDI: 00007fb8b40020f0
[  495.792337] RBP: 0000564b8f379e90 R08: 0000000000000000 R09: 00007fb8d0a44360
[  495.792338] R10: 0000000000000001 R11: 0000000000000000 R12: 0000564b8f45af90
[  495.792338] R13: 00007ffc24b74e68 R14: 00007ffc24b74e78 R15: 0000000000000006
[  495.792343] Modules linked in: fuse(E) msr(E) xt_comment(E) ebtable_filter(E) ebtables(E) br_netfilter(E) xt_physdev(E) nfnetlink_cthelper(E) nfnetlink(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) xt_pkttype(E) xt_tcpudp(E) ip6t_REJECT(E) ipt_REJECT(E) iptable_filter(E) bpfilter(E) ip6table_mangle(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) ip6table_filter(E) ip6_tables(E) x_tables(E) nls_iso8859_1(E) nls_cp437(E) joydev(E) intel_rapl_msr(E) usblp(E) intel_rapl_common(E) at24(E) x86_pkg_temp_thermal(E) iTCO_wdt(E) intel_powerclamp(E) iTCO_vendor_support(E) coretemp(E) mei_hdcp(E) kvm_intel(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) kvm(E) ledtrig_audio(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) snd_hda_codec_hdmi(E) aesni_intel(E) crypto_simd(E) cryptd(E) glue_helper(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) snd_hda_core(E) snd_hwdep(E)
[  495.792378]  pcspkr(E) snd_pcm(E) r8169(E) i2c_i801(E) snd_timer(E) i2c_smbus(E) lpc_ich(E) realtek(E) mdio_devres(E) snd(E) mei_me(E) libphy(E) soundcore(E) fan(E) thermal(E) mei(E) intel_smartconnect(E) tiny_power_button(E) sch_fq_codel(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) hid_logitech_hidpp(E) sr_mod(E) cdrom(E) hid_logitech_dj(E) ums_realtek(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) nouveau(E) wmi(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) ehci_pci(E) fb_sys_fops(E) ehci_hcd(E) cec(E) xhci_pci(E) ahci(E) rc_core(E) libahci(E) xhci_hcd(E) ttm(E) libata(E) drm(E) usbcore(E) video(E) button(E) sd_mod(E) t10_pi(E) vfat(E) fat(E) virtio_blk(E) virtio_mmio(E) virtio_ring(E) virtio(E) ext4(E) crc32c_intel(E) crc16(E) mbcache(E) jbd2(E) loop(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E)
[  495.792420] Dumping ftrace buffer:
[  495.792423]    (ftrace buffer empty)

bad magic
0xffffffff812bcf60 is in zswap_frontswap_store (mm/zswap.c:1087).
1082                    ret = -EINVAL;
1083                    goto freepage;
1084            }
1085
1086            /* compress */
1087            local_lock(&zswap_comp.lock);
1088            dst = *this_cpu_ptr(&zswap_comp.dstmem);
1089            tfm = *this_cpu_ptr(entry->pool->tfm);
1090            src = kmap_atomic(page);
1091            ret = crypto_comp_compress(tfm, src, PAGE_SIZE, dst, &dlen);

BUG()
0xffffffff812f6e90 is in zs_map_object (mm/zsmalloc.c:1401).
1396
1397            get_zspage_mapping(zspage, &class_idx, &fg);
1398            class = pool->size_class[class_idx];
1399            off = (class->size * obj_idx) & ~PAGE_MASK;
1400
1401            local_lock(&zs_map_area.lock);
1402            area = this_cpu_ptr(&zs_map_area);
1403            area->vm_mm = mm;
1404            if (off + class->size <= PAGE_SIZE) {
1405                    /* this object is contained entirely within a page */



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: zswap explosion when using zsmalloc pool compression
  2020-12-03 12:51                                         ` Mike Galbraith
@ 2020-12-03 13:07                                           ` Mike Galbraith
  2020-12-03 13:43                                             ` Mike Galbraith
  0 siblings, 1 reply; 70+ messages in thread
From: Mike Galbraith @ 2020-12-03 13:07 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: RT

On Thu, 2020-12-03 at 13:51 +0100, Mike Galbraith wrote:
>
> BUG()
> 0xffffffff812f6e90 is in zs_map_object (mm/zsmalloc.c:1401).
> 1396
> 1397            get_zspage_mapping(zspage, &class_idx, &fg);
> 1398            class = pool->size_class[class_idx];
> 1399            off = (class->size * obj_idx) & ~PAGE_MASK;
> 1400
> 1401            local_lock(&zs_map_area.lock);
> 1402            area = this_cpu_ptr(&zs_map_area);
> 1403            area->vm_mm = mm;
> 1404            if (off + class->size <= PAGE_SIZE) {
> 1405                    /* this object is contained entirely within a page */


crash> rt_mutex ffff8a33ded29d20
struct rt_mutex {
  wait_lock = {
    raw_lock = {
      {
        val = {
          counter = 1
        },
        {
          locked = 1 '\001',
          pending = 0 '\000'
        },
        {
          locked_pending = 1,
          tail = 0
        }
      }
    },
    magic = 3735899821,
    owner_cpu = 4,
    owner = 0xffff8a337a651b40,
    dep_map = {
      key = 0x0,
      class_cache = {0x0, 0x0},
      name = 0xffffffffba10c8da "(lock).lock.lock.wait_lock",
      wait_type_outer = 0,
      wait_type_inner = 2
    }
  },
  waiters = {
    rb_root = {
      rb_node = 0xffffa2524328b5e0
    },
    rb_leftmost = 0xffffa2524328b5e0
  },
  owner = 0xffff8a3359ff3681,
  save_state = 1,
  dep_map = {
    key = 0x0,
    class_cache = {0x0, 0x0},
    name = 0x0,
    wait_type_outer = 0,
    wait_type_inner = 0
  }
}
crash>

Oh dear, lock.owner is indeed the guy who made BUG() happen.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-12-03  8:18                                   ` Mike Galbraith
  (?)
  (?)
@ 2020-12-03 13:39                                   ` Sebastian Andrzej Siewior
  2020-12-03 14:07                                       ` Vitaly Wool
  2020-12-06  9:18                                       ` Mike Galbraith
  -1 siblings, 2 replies; 70+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-12-03 13:39 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Vitaly Wool, Oleksandr Natalenko, linux-kernel, linux-mm,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, linux-rt-users

On 2020-12-03 09:18:21 [+0100], Mike Galbraith wrote:
> On Thu, 2020-12-03 at 03:16 +0100, Mike Galbraith wrote:
> > On Wed, 2020-12-02 at 23:08 +0100, Sebastian Andrzej Siewior wrote:
> > Looks like...
> >
> > d8f117abb380 z3fold: fix use-after-free when freeing handles
> >
> > ...wasn't completely effective...
> 
> The top two hunks seem to have rendered the thing RT tolerant.

Yes, it appears to. I have no idea if this is a proper fix or not.
Without your write lock, after a few attempts, KASAN says:

| BUG: KASAN: use-after-free in __pv_queued_spin_lock_slowpath+0x293/0x770
| Write of size 2 at addr ffff88800e0e10aa by task kworker/u16:3/237
| 
| CPU: 5 PID: 237 Comm: kworker/u16:3 Tainted: G        W         5.10.0-rc6-rt13-rt+
| Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1 04/01/2014
| Workqueue: zswap1 compact_page_work
| Call Trace:
|  dump_stack+0x7d/0xa3
|  print_address_description.constprop.0+0x19/0x120
|  __kasan_report.cold+0x1d/0x35
|  kasan_report+0x3a/0x50
|  check_memory_region+0x145/0x1a0
|  __pv_queued_spin_lock_slowpath+0x293/0x770
|  _raw_spin_lock_irq+0xca/0xe0
|  rt_read_unlock+0x2c/0x70
|  __release_z3fold_page.constprop.0+0x45e/0x620
|  do_compact_page+0x674/0xa50
|  process_one_work+0x63a/0x1130
|  worker_thread+0xd3/0xc80
|  kthread+0x401/0x4e0
|  ret_from_fork+0x22/0x30
| 
| Allocated by task 225 (systemd-journal):
|  kasan_save_stack+0x1b/0x40
|  __kasan_kmalloc.constprop.0+0xc2/0xd0
|  kmem_cache_alloc+0x103/0x2b0
|  z3fold_alloc+0x597/0x1970
|  zswap_frontswap_store+0x928/0x1bf0
|  __frontswap_store+0x117/0x320
|  swap_writepage+0x34/0x70
|  pageout+0x268/0x7c0
|  shrink_page_list+0x13e1/0x1e80
|  shrink_inactive_list+0x303/0xde0
|  shrink_lruvec+0x3dd/0x660
|  shrink_node_memcgs+0x3a1/0x600
|  shrink_node+0x3a7/0x1350
|  shrink_zones+0x1f1/0x7f0
|  do_try_to_free_pages+0x219/0xcc0
|  try_to_free_pages+0x1c5/0x4b0
|  __perform_reclaim+0x18f/0x2c0
|  __alloc_pages_slowpath.constprop.0+0x7ea/0x1790
|  __alloc_pages_nodemask+0x5f5/0x700
|  page_cache_ra_unbounded+0x30f/0x690
|  do_sync_mmap_readahead+0x3e3/0x640
|  filemap_fault+0x981/0x1110
|  __xfs_filemap_fault+0x12d/0x840
|  __do_fault+0xf3/0x4b0
|  do_fault+0x202/0x8c0
|  __handle_mm_fault+0x338/0x500
|  handle_mm_fault+0x1a8/0x670
|  do_user_addr_fault+0x409/0x8b0
|  exc_page_fault+0x60/0xc0
|  asm_exc_page_fault+0x1e/0x30
| 
| Freed by task 71 (oom_reaper):
|  kasan_save_stack+0x1b/0x40
|  kasan_set_track+0x1c/0x30
|  kasan_set_free_info+0x1b/0x30
|  __kasan_slab_free+0x110/0x150
|  kmem_cache_free+0x7f/0x450
|  z3fold_free+0x1f8/0xc90
|  zswap_free_entry+0x168/0x230
|  zswap_frontswap_invalidate_page+0x145/0x190
|  __frontswap_invalidate_page+0xe8/0x1a0
|  swap_range_free.constprop.0+0x266/0x300
|  swapcache_free_entries+0x1dc/0x970
|  free_swap_slot+0x19c/0x290
|  __swap_entry_free+0x139/0x160
|  free_swap_and_cache+0xda/0x230
|  zap_pte_range+0x275/0x1590
|  unmap_page_range+0x320/0x690
|  __oom_reap_task_mm+0x207/0x330
|  oom_reap_task_mm+0x78/0x7e0
|  oom_reap_task+0x6d/0x1a0
|  oom_reaper+0x103/0x290
|  kthread+0x401/0x4e0
|  ret_from_fork+0x22/0x30
| 
| The buggy address belongs to the object at ffff88800e0e1080
|  which belongs to the cache z3fold_handle of size 88
| The buggy address is located 42 bytes inside of
|  88-byte region [ffff88800e0e1080, ffff88800e0e10d8)
| The buggy address belongs to the page:
| page:000000002ba661bc refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xe0e1
| flags: 0xffff800000200(slab)
| raw: 000ffff800000200 dead000000000100 dead000000000122 ffff88800aa4eb40
| raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
| page dumped because: kasan: bad access detected
| 
| Memory state around the buggy address:
|  ffff88800e0e0f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|  ffff88800e0e1000: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
| >ffff88800e0e1080: fa fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
|                                   ^
|  ffff88800e0e1100: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
|  ffff88800e0e1180: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
| ==================================================================
| Disabling lock debugging due to kernel taint

with the lock I haven't seen anything.

Sebastian

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: zswap explosion when using zsmalloc pool compression
  2020-12-03 13:07                                           ` Mike Galbraith
@ 2020-12-03 13:43                                             ` Mike Galbraith
  2020-12-03 14:04                                               ` Mike Galbraith
  0 siblings, 1 reply; 70+ messages in thread
From: Mike Galbraith @ 2020-12-03 13:43 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: RT

Well crap.  I lost the zswap local_lock initialization fix in my 5.9-rt
tree during an update, so will have to retest.

	-Mike


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: zswap explosion when using zsmalloc pool compression
  2020-12-03 13:43                                             ` Mike Galbraith
@ 2020-12-03 14:04                                               ` Mike Galbraith
  2020-12-04  5:22                                                 ` [patch] mm,zswap: fix zswap::zswap_comp.lock vs zsmalloc::zs_map_area.lock deadlock Mike Galbraith
  0 siblings, 1 reply; 70+ messages in thread
From: Mike Galbraith @ 2020-12-03 14:04 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: RT

On Thu, 2020-12-03 at 14:43 +0100, Mike Galbraith wrote:
> Well crap.  I lost the zswap local_lock initialization fix in my 5.9-rt
> tree during an update, so will have to retest.

:) tadaaaa

[  484.806634] ======================================================
[  484.806634] WARNING: possible circular locking dependency detected
[  484.806635] 5.9.12-rt20-rt #22 Tainted: G S          E    
[  484.806636] ------------------------------------------------------
[  484.806637] systemd-udevd/13633 is trying to acquire lock:
[  484.806638] ffff9441dede95f0 ((lock).lock#4){+.+.}-{2:2}, at: zswap_frontswap_load+0x19a/0x3c0
[  484.806645] 
               but task is already holding lock:
[  484.806646] ffff9441dede9db0 ((lock).lock#5){+.+.}-{2:2}, at: zs_map_object+0x10d/0x3d0
[  484.806651] 
               which lock already depends on the new lock.
               
[  484.806651] 
               the existing dependency chain (in reverse order) is:
[  484.806652] 
               -> #2 ((lock).lock#5){+.+.}-{2:2}:
[  484.806654]        lock_acquire+0x2b7/0x420
[  484.806656]        rt_spin_lock+0x2b/0xc0
[  484.806659]        zs_map_object+0x10d/0x3d0
[  484.806661]        zswap_frontswap_store+0x716/0xa50
[  484.806662]        __frontswap_store+0x6e/0x100
[  484.806663]        swap_writepage+0x37/0x70
[  484.806665]        shmem_writepage+0x267/0x400
[  484.806667]        pageout+0xd2/0x400
[  484.806668]        shrink_page_list+0xa6e/0xbd0
[  484.806669]        shrink_inactive_list+0x216/0x620
[  484.806670]        shrink_lruvec+0x39f/0x5e0
[  484.806672]        shrink_node+0x237/0x6e0
[  484.806673]        do_try_to_free_pages+0xb9/0x400
[  484.806674]        try_to_free_pages+0xe6/0x3b0
[  484.806675]        __alloc_pages_slowpath.constprop.108+0x3f8/0x1090
[  484.806676]        __alloc_pages_nodemask+0x3f6/0x480
[  484.806677]        alloc_pages_vma+0x135/0x190
[  484.806679]        handle_mm_fault+0x87c/0x16f0
[  484.806681]        exc_page_fault+0x343/0x720
[  484.806683]        asm_exc_page_fault+0x1e/0x30
[  484.806684] 
               -> #1 (&zh->lock){+.+.}-{0:0}:
[  484.806686]        lock_acquire+0x2b7/0x420
[  484.806687]        _mutex_lock+0x28/0x40
[  484.806688]        zs_map_object+0x53/0x3d0
[  484.806689]        zswap_frontswap_store+0x716/0xa50
[  484.806691]        __frontswap_store+0x6e/0x100
[  484.806692]        swap_writepage+0x37/0x70
[  484.806693]        shmem_writepage+0x267/0x400
[  484.806694]        pageout+0xd2/0x400
[  484.806695]        shrink_page_list+0xa6e/0xbd0
[  484.806696]        shrink_inactive_list+0x216/0x620
[  484.806697]        shrink_lruvec+0x39f/0x5e0
[  484.806698]        shrink_node+0x237/0x6e0
[  484.806699]        do_try_to_free_pages+0xb9/0x400
[  484.806700]        try_to_free_pages+0xe6/0x3b0
[  484.806701]        __alloc_pages_slowpath.constprop.108+0x3f8/0x1090
[  484.806702]        __alloc_pages_nodemask+0x3f6/0x480
[  484.806703]        alloc_pages_vma+0x135/0x190
[  484.806704]        handle_mm_fault+0x87c/0x16f0
[  484.806705]        exc_page_fault+0x343/0x720
[  484.806706]        asm_exc_page_fault+0x1e/0x30
[  484.806707] 
               -> #0 ((lock).lock#4){+.+.}-{2:2}:
[  484.806709]        validate_chain+0xa4a/0x13e0
[  484.806710]        __lock_acquire+0x862/0xbc0
[  484.806711]        lock_acquire+0x2b7/0x420
[  484.806712]        rt_spin_lock+0x2b/0xc0
[  484.806713]        zswap_frontswap_load+0x19a/0x3c0
[  484.806715]        __frontswap_load+0x6e/0xd0
[  484.806716]        swap_readpage+0x88/0x240
[  484.806717]        swap_cluster_readahead+0x1cc/0x290
[  484.806718]        shmem_swapin+0x93/0xd0
[  484.806719]        shmem_swapin_page+0x2a3/0x6b0
[  484.806720]        shmem_getpage_gfp+0x321/0x700
[  484.806721]        shmem_file_read_iter+0x1a0/0x3a0
[  484.806722]        new_sync_read+0x10d/0x180
[  484.806724]        vfs_read+0x1a8/0x1d0
[  484.806725]        ksys_read+0x87/0xc0
[  484.806727]        do_syscall_64+0x33/0x40
[  484.806728]        entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  484.806729] 
               other info that might help us debug this:
               
[  484.806730] Chain exists of:
                 (lock).lock#4 --> &zh->lock --> (lock).lock#5
               
[  484.806732]  Possible unsafe locking scenario:
               
[  484.806733]        CPU0                    CPU1
[  484.806733]        ----                    ----
[  484.806733]   lock((lock).lock#5);
[  484.806735]                                lock(&zh->lock);
[  484.806735]                                lock((lock).lock#5);
[  484.806736]   lock((lock).lock#4);
[  484.806736] 
                *** DEADLOCK ***
               
[  484.806737] 5 locks held by systemd-udevd/13633:
[  484.806737]  #0: ffff9441ae089370 (&zh->lock){+.+.}-{0:0}, at: zs_map_object+0x53/0x3d0
[  484.806741]  #1: ffff94412d6b0b38 (&zspage->lock){.+.+}-{0:0}, at: zs_map_object+0x93/0x3d0
[  484.806744]  #2: ffffffff825981a0 (rcu_read_lock){....}-{1:2}, at: rt_read_lock+0x5d/0x160
[  484.806747]  #3: ffff9441dede9db0 ((lock).lock#5){+.+.}-{2:2}, at: zs_map_object+0x10d/0x3d0
[  484.806750]  #4: ffffffff825981a0 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5/0xc0
[  484.806753] 
               stack backtrace:
[  484.806754] CPU: 7 PID: 13633 Comm: systemd-udevd Kdump: loaded Tainted: G S          E     5.9.12-rt20-rt #22
[  484.806755] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[  484.806756] Call Trace:
[  484.806758]  dump_stack+0x77/0x97
[  484.806761]  check_noncircular+0x14d/0x160
[  484.806766]  ? validate_chain+0xa4a/0x13e0
[  484.806767]  validate_chain+0xa4a/0x13e0
[  484.806770]  __lock_acquire+0x862/0xbc0
[  484.806773]  lock_acquire+0x2b7/0x420
[  484.806774]  ? zswap_frontswap_load+0x19a/0x3c0
[  484.806778]  rt_spin_lock+0x2b/0xc0
[  484.806780]  ? zswap_frontswap_load+0x19a/0x3c0
[  484.806781]  zswap_frontswap_load+0x19a/0x3c0
[  484.806784]  __frontswap_load+0x6e/0xd0
[  484.806787]  swap_readpage+0x88/0x240
[  484.806789]  swap_cluster_readahead+0x1cc/0x290
[  484.806794]  ? shmem_swapin+0x93/0xd0
[  484.806796]  shmem_swapin+0x93/0xd0
[  484.806797]  ? find_held_lock+0x2d/0x90
[  484.806799]  ? find_get_entry+0x11a/0x2d0
[  484.806807]  shmem_swapin_page+0x2a3/0x6b0
[  484.806808]  ? find_get_entry+0x132/0x2d0
[  484.806812]  shmem_getpage_gfp+0x321/0x700
[  484.806816]  shmem_file_read_iter+0x1a0/0x3a0
[  484.806820]  new_sync_read+0x10d/0x180
[  484.806824]  vfs_read+0x1a8/0x1d0
[  484.806827]  ksys_read+0x87/0xc0
[  484.806828]  ? syscall_trace_enter.isra.15+0x8e/0x2b0
[  484.806832]  do_syscall_64+0x33/0x40
[  484.806834]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  484.806835] RIP: 0033:0x7f4c9ef9c1a1
[  484.806837] Code: fe ff ff 48 8d 3d 67 a1 09 00 48 83 ec 08 e8 e6 03 02 00 66 0f 1f 44 00 00 8b 05 0a d2 2c 00 48 63 ff 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 55 53 48 89 d5 48 89
[  484.806839] RSP: 002b:00007ffd400c69e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  484.806840] RAX: ffffffffffffffda RBX: 000055da146ae940 RCX: 00007f4c9ef9c1a1
[  484.806841] RDX: 0000000000001000 RSI: 000055da1467bd20 RDI: 000000000000000f
[  484.806842] RBP: 0000000000000d68 R08: 0000000000000000 R09: 0000000000000000
[  484.806843] R10: fffffffffffffe60 R11: 0000000000000246 R12: 00007f4c9f2647a0
[  484.806843] R13: 00007f4c9f2652e0 R14: 0000000000000d68 R15: 00000000000003cf
[  485.546571] ------------[ cut here ]------------
[  485.546574] kernel BUG at kernel/locking/rtmutex.c:1057!
[  485.546579] invalid opcode: 0000 [#1] PREEMPT_RT SMP NOPTI
[  485.546581] CPU: 4 PID: 13626 Comm: oom01 Kdump: loaded Tainted: G S          E     5.9.12-rt20-rt #22
[  485.546583] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
[  485.546584] RIP: 0010:rt_spin_lock_slowlock_locked+0x29c/0x2b0
[  485.546589] Code: 88 d0 00 01 e8 a5 a5 76 ff e9 ce fe ff ff 48 8b 43 48 48 85 c0 74 06 48 3b 58 38 75 0f 49 39 c4 0f 85 69 ff ff ff 0f 0b 0f 0b <0f> 0b e8 ed 23 77 ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f
[  485.546590] RSP: 0000:ffffa5610be6f6d8 EFLAGS: 00010086
[  485.546592] RAX: 00000000ffffffdd RBX: ffff9441ded29d20 RCX: 0000000000000000
[  485.546593] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9441ded29d20
[  485.546593] RBP: ffff94412d509b40 R08: 0000000000000001 R09: 0000000000000000
[  485.546594] R10: ffffa5610be6f5f8 R11: 000000000000063e R12: ffffa5610be6f720
[  485.546595] R13: ffff94412d509b40 R14: ffff94412d50a4d0 R15: 0000000000000282
[  485.546595] FS:  00007ff63eb08700(0000) GS:ffff9441ded00000(0000) knlGS:0000000000000000
[  485.546597] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  485.546597] CR2: 00007ff1f9e2f000 CR3: 0000000313dd6004 CR4: 00000000001706e0
[  485.546598] Call Trace:
[  485.546603]  rt_spin_lock_slowlock+0x4b/0x70
[  485.546607]  rt_spin_lock+0x46/0xc0
[  485.546609]  ? zs_map_object+0x10d/0x3d0
[  485.546612]  zs_map_object+0x10d/0x3d0
[  485.546616]  zswap_frontswap_store+0x716/0xa50
[  485.546620]  __frontswap_store+0x6e/0x100
[  485.546623]  swap_writepage+0x37/0x70
[  485.546625]  pageout+0xd2/0x400
[  485.546630]  shrink_page_list+0xa6e/0xbd0
[  485.546634]  shrink_inactive_list+0x216/0x620
[  485.546639]  shrink_lruvec+0x39f/0x5e0
[  485.546641]  ? lock_acquire+0x201/0x420
[  485.546647]  ? shrink_node+0x237/0x6e0
[  485.546649]  shrink_node+0x237/0x6e0
[  485.546652]  do_try_to_free_pages+0xb9/0x400
[  485.546655]  try_to_free_pages+0xe6/0x3b0
[  485.546660]  __alloc_pages_slowpath.constprop.108+0x3f8/0x1090
[  485.546662]  ? lock_release+0xf6/0x300
[  485.546665]  ? try_to_wake_up+0x70/0x750
[  485.546668]  ? lock_release+0xf6/0x300
[  485.546671]  __alloc_pages_nodemask+0x3f6/0x480
[  485.546673]  alloc_pages_vma+0x135/0x190
[  485.546677]  handle_mm_fault+0x87c/0x16f0
[  485.546680]  ? trace_hardirqs_on+0x35/0x130
[  485.546686]  exc_page_fault+0x343/0x720
[  485.546689]  ? asm_exc_page_fault+0x8/0x30
[  485.546691]  ? asm_exc_page_fault+0x8/0x30
[  485.546693]  asm_exc_page_fault+0x1e/0x30
[  485.546694] RIP: 0033:0x414300
[  485.546696] Code: e8 d5 f5 fe ff 48 83 f8 ff 0f 84 9c 00 00 00 41 83 fd 02 48 89 c5 74 32 41 83 fd 03 74 74 49 63 c4 48 89 ea 66 0f 1f 44 00 00 <c6> 02 07 48 01 c2 48 89 d1 48 29 e9 48 39 cb 7f ef 31 c0 48 83 c4
[  485.546697] RSP: 002b:00007ff63eb07ec0 EFLAGS: 00010206
[  485.546698] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 000000003d32a000
[  485.546699] RDX: 00007ff1f9e2f000 RSI: 00000000c0000000 RDI: 0000000000000000
[  485.546700] RBP: 00007ff1bcb05000 R08: ffffffffffffffff R09: 0000000000000000
[  485.546700] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000001000
[  485.546701] R13: 0000000000000001 R14: 0000000000000001 R15: 00007ffe3fd317f0
[  485.546706] Modules linked in: fuse(E) xt_comment(E) ebtable_filter(E) ebtables(E) br_netfilter(E) xt_physdev(E) nfnetlink_cthelper(E) nfnetlink(E) msr(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) xt_pkttype(E) xt_tcpudp(E) ip6t_REJECT(E) ipt_REJECT(E) iptable_filter(E) bpfilter(E) ip6table_mangle(E) ip_tables(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) libcrc32c(E) ip6table_filter(E) ip6_tables(E) x_tables(E) nls_iso8859_1(E) nls_cp437(E) intel_rapl_msr(E) joydev(E) usblp(E) intel_rapl_common(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) at24(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) snd_hda_codec_realtek(E) snd_hda_codec_generic(E) ghash_clmulni_intel(E) ledtrig_audio(E) aesni_intel(E) crypto_simd(E) cryptd(E) glue_helper(E) mei_hdcp(E) snd_hda_codec_hdmi(E) iTCO_wdt(E) iTCO_vendor_support(E) snd_hda_intel(E) snd_intel_dspcfg(E) snd_hda_codec(E) snd_hda_core(E) r8169(E) pcspkr(E)
[  485.546742]  snd_hwdep(E) i2c_i801(E) i2c_smbus(E) snd_pcm(E) realtek(E) mdio_devres(E) snd_timer(E) libphy(E) snd(E) mei_me(E) soundcore(E) lpc_ich(E) mei(E) fan(E) thermal(E) intel_smartconnect(E) tiny_power_button(E) sch_fq_codel(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) hid_logitech_hidpp(E) sr_mod(E) cdrom(E) hid_logitech_dj(E) ums_realtek(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) nouveau(E) wmi(E) i2c_algo_bit(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) ahci(E) cec(E) ehci_pci(E) libahci(E) rc_core(E) xhci_pci(E) ehci_hcd(E) ttm(E) libata(E) xhci_hcd(E) drm(E) usbcore(E) video(E) button(E) sd_mod(E) t10_pi(E) vfat(E) fat(E) virtio_blk(E) virtio_mmio(E) virtio_ring(E) virtio(E) ext4(E) crc32c_intel(E) crc16(E) mbcache(E) jbd2(E) loop(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E)
[  485.546783] Dumping ftrace buffer:
[  485.546787]    (ftrace buffer empty)


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-12-03 13:39                                   ` scheduling while atomic in z3fold Sebastian Andrzej Siewior
@ 2020-12-03 14:07                                       ` Vitaly Wool
  2020-12-06  9:18                                       ` Mike Galbraith
  1 sibling, 0 replies; 70+ messages in thread
From: Vitaly Wool @ 2020-12-03 14:07 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Mike Galbraith, Oleksandr Natalenko, LKML, Linux-MM,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, linux-rt-users

On Thu, Dec 3, 2020 at 2:39 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2020-12-03 09:18:21 [+0100], Mike Galbraith wrote:
> > On Thu, 2020-12-03 at 03:16 +0100, Mike Galbraith wrote:
> > > On Wed, 2020-12-02 at 23:08 +0100, Sebastian Andrzej Siewior wrote:
> > > Looks like...
> > >
> > > d8f117abb380 z3fold: fix use-after-free when freeing handles
> > >
> > > ...wasn't completely effective...
> >
> > The top two hunks seem to have rendered the thing RT tolerant.

Thanks for all your efforts, I promise to take a closer look at this
today, has had my hands full with RISC-V up until now.

Best regards,
   Vitaly

> Yes, it appears to. I have no idea if this is a proper fix or not.
> Without your write lock, after a few attempts, KASAN says:
>
> | BUG: KASAN: use-after-free in __pv_queued_spin_lock_slowpath+0x293/0x770
> | Write of size 2 at addr ffff88800e0e10aa by task kworker/u16:3/237
> |
> | CPU: 5 PID: 237 Comm: kworker/u16:3 Tainted: G        W         5.10.0-rc6-rt13-rt+
> | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1 04/01/2014
> | Workqueue: zswap1 compact_page_work
> | Call Trace:
> |  dump_stack+0x7d/0xa3
> |  print_address_description.constprop.0+0x19/0x120
> |  __kasan_report.cold+0x1d/0x35
> |  kasan_report+0x3a/0x50
> |  check_memory_region+0x145/0x1a0
> |  __pv_queued_spin_lock_slowpath+0x293/0x770
> |  _raw_spin_lock_irq+0xca/0xe0
> |  rt_read_unlock+0x2c/0x70
> |  __release_z3fold_page.constprop.0+0x45e/0x620
> |  do_compact_page+0x674/0xa50
> |  process_one_work+0x63a/0x1130
> |  worker_thread+0xd3/0xc80
> |  kthread+0x401/0x4e0
> |  ret_from_fork+0x22/0x30
> |
> | Allocated by task 225 (systemd-journal):
> |  kasan_save_stack+0x1b/0x40
> |  __kasan_kmalloc.constprop.0+0xc2/0xd0
> |  kmem_cache_alloc+0x103/0x2b0
> |  z3fold_alloc+0x597/0x1970
> |  zswap_frontswap_store+0x928/0x1bf0
> |  __frontswap_store+0x117/0x320
> |  swap_writepage+0x34/0x70
> |  pageout+0x268/0x7c0
> |  shrink_page_list+0x13e1/0x1e80
> |  shrink_inactive_list+0x303/0xde0
> |  shrink_lruvec+0x3dd/0x660
> |  shrink_node_memcgs+0x3a1/0x600
> |  shrink_node+0x3a7/0x1350
> |  shrink_zones+0x1f1/0x7f0
> |  do_try_to_free_pages+0x219/0xcc0
> |  try_to_free_pages+0x1c5/0x4b0
> |  __perform_reclaim+0x18f/0x2c0
> |  __alloc_pages_slowpath.constprop.0+0x7ea/0x1790
> |  __alloc_pages_nodemask+0x5f5/0x700
> |  page_cache_ra_unbounded+0x30f/0x690
> |  do_sync_mmap_readahead+0x3e3/0x640
> |  filemap_fault+0x981/0x1110
> |  __xfs_filemap_fault+0x12d/0x840
> |  __do_fault+0xf3/0x4b0
> |  do_fault+0x202/0x8c0
> |  __handle_mm_fault+0x338/0x500
> |  handle_mm_fault+0x1a8/0x670
> |  do_user_addr_fault+0x409/0x8b0
> |  exc_page_fault+0x60/0xc0
> |  asm_exc_page_fault+0x1e/0x30
> |
> | Freed by task 71 (oom_reaper):
> |  kasan_save_stack+0x1b/0x40
> |  kasan_set_track+0x1c/0x30
> |  kasan_set_free_info+0x1b/0x30
> |  __kasan_slab_free+0x110/0x150
> |  kmem_cache_free+0x7f/0x450
> |  z3fold_free+0x1f8/0xc90
> |  zswap_free_entry+0x168/0x230
> |  zswap_frontswap_invalidate_page+0x145/0x190
> |  __frontswap_invalidate_page+0xe8/0x1a0
> |  swap_range_free.constprop.0+0x266/0x300
> |  swapcache_free_entries+0x1dc/0x970
> |  free_swap_slot+0x19c/0x290
> |  __swap_entry_free+0x139/0x160
> |  free_swap_and_cache+0xda/0x230
> |  zap_pte_range+0x275/0x1590
> |  unmap_page_range+0x320/0x690
> |  __oom_reap_task_mm+0x207/0x330
> |  oom_reap_task_mm+0x78/0x7e0
> |  oom_reap_task+0x6d/0x1a0
> |  oom_reaper+0x103/0x290
> |  kthread+0x401/0x4e0
> |  ret_from_fork+0x22/0x30
> |
> | The buggy address belongs to the object at ffff88800e0e1080
> |  which belongs to the cache z3fold_handle of size 88
> | The buggy address is located 42 bytes inside of
> |  88-byte region [ffff88800e0e1080, ffff88800e0e10d8)
> | The buggy address belongs to the page:
> | page:000000002ba661bc refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xe0e1
> | flags: 0xffff800000200(slab)
> | raw: 000ffff800000200 dead000000000100 dead000000000122 ffff88800aa4eb40
> | raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
> | page dumped because: kasan: bad access detected
> |
> | Memory state around the buggy address:
> |  ffff88800e0e0f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> |  ffff88800e0e1000: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
> | >ffff88800e0e1080: fa fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
> |                                   ^
> |  ffff88800e0e1100: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
> |  ffff88800e0e1180: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
> | ==================================================================
> | Disabling lock debugging due to kernel taint
>
> with the lock I haven't seen anything.
>
> Sebastian

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-12-03 14:07                                       ` Vitaly Wool
  0 siblings, 0 replies; 70+ messages in thread
From: Vitaly Wool @ 2020-12-03 14:07 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Mike Galbraith, Oleksandr Natalenko, LKML, Linux-MM,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, linux-rt-users

On Thu, Dec 3, 2020 at 2:39 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> On 2020-12-03 09:18:21 [+0100], Mike Galbraith wrote:
> > On Thu, 2020-12-03 at 03:16 +0100, Mike Galbraith wrote:
> > > On Wed, 2020-12-02 at 23:08 +0100, Sebastian Andrzej Siewior wrote:
> > > Looks like...
> > >
> > > d8f117abb380 z3fold: fix use-after-free when freeing handles
> > >
> > > ...wasn't completely effective...
> >
> > The top two hunks seem to have rendered the thing RT tolerant.

Thanks for all your efforts, I promise to take a closer look at this
today, has had my hands full with RISC-V up until now.

Best regards,
   Vitaly

> Yes, it appears to. I have no idea if this is a proper fix or not.
> Without your write lock, after a few attempts, KASAN says:
>
> | BUG: KASAN: use-after-free in __pv_queued_spin_lock_slowpath+0x293/0x770
> | Write of size 2 at addr ffff88800e0e10aa by task kworker/u16:3/237
> |
> | CPU: 5 PID: 237 Comm: kworker/u16:3 Tainted: G        W         5.10.0-rc6-rt13-rt+
> | Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1 04/01/2014
> | Workqueue: zswap1 compact_page_work
> | Call Trace:
> |  dump_stack+0x7d/0xa3
> |  print_address_description.constprop.0+0x19/0x120
> |  __kasan_report.cold+0x1d/0x35
> |  kasan_report+0x3a/0x50
> |  check_memory_region+0x145/0x1a0
> |  __pv_queued_spin_lock_slowpath+0x293/0x770
> |  _raw_spin_lock_irq+0xca/0xe0
> |  rt_read_unlock+0x2c/0x70
> |  __release_z3fold_page.constprop.0+0x45e/0x620
> |  do_compact_page+0x674/0xa50
> |  process_one_work+0x63a/0x1130
> |  worker_thread+0xd3/0xc80
> |  kthread+0x401/0x4e0
> |  ret_from_fork+0x22/0x30
> |
> | Allocated by task 225 (systemd-journal):
> |  kasan_save_stack+0x1b/0x40
> |  __kasan_kmalloc.constprop.0+0xc2/0xd0
> |  kmem_cache_alloc+0x103/0x2b0
> |  z3fold_alloc+0x597/0x1970
> |  zswap_frontswap_store+0x928/0x1bf0
> |  __frontswap_store+0x117/0x320
> |  swap_writepage+0x34/0x70
> |  pageout+0x268/0x7c0
> |  shrink_page_list+0x13e1/0x1e80
> |  shrink_inactive_list+0x303/0xde0
> |  shrink_lruvec+0x3dd/0x660
> |  shrink_node_memcgs+0x3a1/0x600
> |  shrink_node+0x3a7/0x1350
> |  shrink_zones+0x1f1/0x7f0
> |  do_try_to_free_pages+0x219/0xcc0
> |  try_to_free_pages+0x1c5/0x4b0
> |  __perform_reclaim+0x18f/0x2c0
> |  __alloc_pages_slowpath.constprop.0+0x7ea/0x1790
> |  __alloc_pages_nodemask+0x5f5/0x700
> |  page_cache_ra_unbounded+0x30f/0x690
> |  do_sync_mmap_readahead+0x3e3/0x640
> |  filemap_fault+0x981/0x1110
> |  __xfs_filemap_fault+0x12d/0x840
> |  __do_fault+0xf3/0x4b0
> |  do_fault+0x202/0x8c0
> |  __handle_mm_fault+0x338/0x500
> |  handle_mm_fault+0x1a8/0x670
> |  do_user_addr_fault+0x409/0x8b0
> |  exc_page_fault+0x60/0xc0
> |  asm_exc_page_fault+0x1e/0x30
> |
> | Freed by task 71 (oom_reaper):
> |  kasan_save_stack+0x1b/0x40
> |  kasan_set_track+0x1c/0x30
> |  kasan_set_free_info+0x1b/0x30
> |  __kasan_slab_free+0x110/0x150
> |  kmem_cache_free+0x7f/0x450
> |  z3fold_free+0x1f8/0xc90
> |  zswap_free_entry+0x168/0x230
> |  zswap_frontswap_invalidate_page+0x145/0x190
> |  __frontswap_invalidate_page+0xe8/0x1a0
> |  swap_range_free.constprop.0+0x266/0x300
> |  swapcache_free_entries+0x1dc/0x970
> |  free_swap_slot+0x19c/0x290
> |  __swap_entry_free+0x139/0x160
> |  free_swap_and_cache+0xda/0x230
> |  zap_pte_range+0x275/0x1590
> |  unmap_page_range+0x320/0x690
> |  __oom_reap_task_mm+0x207/0x330
> |  oom_reap_task_mm+0x78/0x7e0
> |  oom_reap_task+0x6d/0x1a0
> |  oom_reaper+0x103/0x290
> |  kthread+0x401/0x4e0
> |  ret_from_fork+0x22/0x30
> |
> | The buggy address belongs to the object at ffff88800e0e1080
> |  which belongs to the cache z3fold_handle of size 88
> | The buggy address is located 42 bytes inside of
> |  88-byte region [ffff88800e0e1080, ffff88800e0e10d8)
> | The buggy address belongs to the page:
> | page:000000002ba661bc refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xe0e1
> | flags: 0xffff800000200(slab)
> | raw: 000ffff800000200 dead000000000100 dead000000000122 ffff88800aa4eb40
> | raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
> | page dumped because: kasan: bad access detected
> |
> | Memory state around the buggy address:
> |  ffff88800e0e0f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> |  ffff88800e0e1000: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
> | >ffff88800e0e1080: fa fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
> |                                   ^
> |  ffff88800e0e1100: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
> |  ffff88800e0e1180: 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc
> | ==================================================================
> | Disabling lock debugging due to kernel taint
>
> with the lock I haven't seen anything.
>
> Sebastian


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [patch] mm,zswap: fix zswap::zswap_comp.lock vs zsmalloc::zs_map_area.lock deadlock
  2020-12-03 14:04                                               ` Mike Galbraith
@ 2020-12-04  5:22                                                 ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-12-04  5:22 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: RT


When zswap_comp.lock was added, zpool_map/unmap_handle() were called both inside and
outside of preempt disabled sections, thus zswap_comp.lock locked sections when those
preempt disabled sections were removed.  With the later addition of zs_map_area.lock
in the zsmalloc map/unmap methods, these two locks collide in a zswap_frontswap_load()
vs zswap_frontswap_store() inversion/deadlock.

Call zpool map/unmap methods in zswap_frontswap_load() under zswap_comp.lock as
they are in zswap_frontswap_store() to prevent deadlock.

Signed-off-by: Mike Galbraith <efault@gmx.de>
---
 mm/zswap.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1183,17 +1183,17 @@ static int zswap_frontswap_load(unsigned
 	}

 	/* decompress */
+	local_lock(&zswap_comp.lock);
 	dlen = PAGE_SIZE;
 	src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO);
 	if (zpool_evictable(entry->pool->zpool))
 		src += sizeof(struct zswap_header);
 	dst = kmap_atomic(page);
-	local_lock(&zswap_comp.lock);
 	tfm = *this_cpu_ptr(entry->pool->tfm);
 	ret = crypto_comp_decompress(tfm, src, entry->length, dst, &dlen);
-	local_unlock(&zswap_comp.lock);
 	kunmap_atomic(dst);
 	zpool_unmap_handle(entry->pool->zpool, entry->handle);
+	local_unlock(&zswap_comp.lock);
 	BUG_ON(ret);

 freeentry:


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-12-03 13:39                                   ` scheduling while atomic in z3fold Sebastian Andrzej Siewior
@ 2020-12-06  9:18                                       ` Mike Galbraith
  2020-12-06  9:18                                       ` Mike Galbraith
  1 sibling, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-12-06  9:18 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Vitaly Wool, Oleksandr Natalenko, linux-kernel, linux-mm,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, linux-rt-users

On Thu, 2020-12-03 at 14:39 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-12-03 09:18:21 [+0100], Mike Galbraith wrote:
> > On Thu, 2020-12-03 at 03:16 +0100, Mike Galbraith wrote:
> > > On Wed, 2020-12-02 at 23:08 +0100, Sebastian Andrzej Siewior wrote:
> > > Looks like...
> > >
> > > d8f117abb380 z3fold: fix use-after-free when freeing handles
> > >
> > > ...wasn't completely effective...
> >
> > The top two hunks seem to have rendered the thing RT tolerant.
>
> Yes, it appears to. I have no idea if this is a proper fix or not.
> Without your write lock, after a few attempts, KASAN says:
>
> | BUG: KASAN: use-after-free in __pv_queued_spin_lock_slowpath+0x293/0x770
> | Write of size 2 at addr ffff88800e0e10aa by task kworker/u16:3/237

Things that make ya go hmmm...

I started poking at it thinking ok, given write_lock() fixes it, bad
juju must be happening under another read_lock(), so I poisoned the
structure about to be freed by __release_z3fold_page() under lock, and
opened a delay window for bad juju to materialize, but it didn't, so I
just poisoned instead, and sprinkled landmines all over the place.

My landmines are not triggering but the detonator is materializing
inside the structure containing the lock we explode on.  Somebody
blasts a recycled z3fold_buddy_slots into ram we were working on?

crash> bt -sx
PID: 11850  TASK: ffff933ee67f2280  CPU: 1   COMMAND: "swapoff"
 #0 [ffffa7bf4e7f3900] machine_kexec+0x16e at ffffffff8405f86e
 #1 [ffffa7bf4e7f3948] __crash_kexec+0xd2 at ffffffff8414c182
 #2 [ffffa7bf4e7f3a08] crash_kexec+0x30 at ffffffff8414d370
 #3 [ffffa7bf4e7f3a18] oops_end+0xca at ffffffff8402680a
 #4 [ffffa7bf4e7f3a38] no_context+0x14d at ffffffff8406d7ed
 #5 [ffffa7bf4e7f3a98] exc_page_fault+0x2b8 at ffffffff848bdb88
 #6 [ffffa7bf4e7f3af0] asm_exc_page_fault+0x1e at ffffffff84a00ace
 #7 [ffffa7bf4e7f3b78] mark_wakeup_next_waiter+0x51 at ffffffff840ea141
 #8 [ffffa7bf4e7f3bd8] rt_mutex_futex_unlock+0x4f at ffffffff848c93ef
 #9 [ffffa7bf4e7f3c18] z3fold_zpool_free+0x580 at ffffffffc0f37680 [z3fold]
#10 [ffffa7bf4e7f3c78] zswap_free_entry+0x43 at ffffffff8427c863
#11 [ffffa7bf4e7f3c88] zswap_frontswap_invalidate_page+0x8a at ffffffff8427c96a
#12 [ffffa7bf4e7f3cb0] __frontswap_invalidate_page+0x48 at ffffffff8427c058
#13 [ffffa7bf4e7f3cd8] swapcache_free_entries+0x1ee at ffffffff84276f8e
#14 [ffffa7bf4e7f3d20] free_swap_slot+0x9f at ffffffff8427b93f
#15 [ffffa7bf4e7f3d40] delete_from_swap_cache+0x61 at ffffffff84274651
#16 [ffffa7bf4e7f3d60] try_to_free_swap+0x70 at ffffffff84277550
#17 [ffffa7bf4e7f3d70] unuse_vma+0x55c at ffffffff842786cc
#18 [ffffa7bf4e7f3e90] try_to_unuse+0x139 at ffffffff84278eb9
#19 [ffffa7bf4e7f3ee8] __x64_sys_swapoff+0x1eb at ffffffff842798fb
#20 [ffffa7bf4e7f3f40] do_syscall_64+0x33 at ffffffff848b9ab3
#21 [ffffa7bf4e7f3f50] entry_SYSCALL_64_after_hwframe+0x44 at ffffffff84a0007c
    RIP: 00007fdd9ce26d17  RSP: 00007ffe99f98238  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 00005625219e8b60  RCX: 00007fdd9ce26d17
    RDX: 0000000000000001  RSI: 0000000000000001  RDI: 00005625219e8b60
    RBP: 0000000000000001   R8: 00007ffe99f982a0   R9: 0000000000000003
    R10: 00005625219e8721  R11: 0000000000000202  R12: 0000000000000001
    R13: 0000000000000000  R14: 00007ffe99f982a0  R15: 0000000000000000
    ORIG_RAX: 00000000000000a8  CS: 0033  SS: 002b
crash> bt -e
PID: 11850  TASK: ffff933ee67f2280  CPU: 1   COMMAND: "swapoff"

  KERNEL-MODE EXCEPTION FRAME AT: ffffa7bf4e7f3950
    [exception RIP: mark_wakeup_next_waiter+81]
    RIP: ffffffff840ea141  RSP: ffffa7bf4e7f3ba0  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000000000001
    RDX: 0000000000000001  RSI: ffffa7bf4e7f3bf0  RDI: 0000000000000001
    RBP: ffff933e5a7baba8   R8: ffff933f502c2000   R9: ffff933e5a7babc0
    R10: 0000000000000001  R11: ffffffff855938b8  R12: ffffa7bf4e7f3be0
    R13: ffffa7bf4e7f3bf0  R14: 0000000000000018  R15: ffff933ee67f2280
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
...
crash> gdb list *z3fold_zpool_free+0x580
0xffffffffc0f37680 is in z3fold_zpool_free (mm/z3fold.c:338).
warning: Source file is more recent than executable.
333
334             /* we are freeing a foreign handle if we are here */
335             zhdr->foreign_handles--;
336             is_free = true;
337             if (!test_bit(HANDLES_ORPHANED, &slots->pool)) {
338                     write_unlock(&slots->lock);
339                     return;
340             }
341             for (i = 0; i <= BUDDY_MASK; i++) {
342                     if (slots->slot[i]) {
crash> z3fold_buddy_slots -x ffff933e5a7bab80
struct z3fold_buddy_slots {
  slot = {0x0, 0x0, 0x0, 0x0},
  pool = 0xffff933e4b6d8e00,
  lock = {
    rtmutex = {
      wait_lock = {
        raw_lock = {
          {
            val = {
              counter = 0x1
            },
            {
              locked = 0x1,
              pending = 0x0
            },
            {
              locked_pending = 0x1,
              tail = 0x0
            }
          }
        }
      },
      waiters = {
        rb_root = {
          rb_node = 0xffff933e5a7ba700
        },
        rb_leftmost = 0x0          <== yup, that's why we've been exploding
      },
      owner = 0xffff933ee67f2280,  <== yup, that's our exploding task
      save_state = 0x1
    },
    readers = {
      counter = 0x80000000
    }
  },
  poison = 0xfeedbabedeadbeef      <== oh dear, that SHOULD be impossible
}
crash>

 326         write_lock(&slots->lock);
 327         BUG_ON(READ_ONCE(slots->poison));  <== we got past this landmine..
 328         *(unsigned long *)handle = 0;
 329         if (zhdr->slots == slots) {
 330                 write_unlock(&slots->lock);
 331                 return; /* simple case, nothing else to do */
 332         }
 333
 334         /* we are freeing a foreign handle if we are here */
 335         zhdr->foreign_handles--;
 336         is_free = true;
 337         if (!test_bit(HANDLES_ORPHANED, &slots->pool)) {
 338                 write_unlock(&slots->lock);  <== ..and exploded here due to a corrupt lock
                                                        AND poison written under read_lock()
                                                        has suddenly materialized as well!?!
---
 mm/z3fold.c |   25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -44,6 +44,7 @@
 #include <linux/zpool.h>
 #include <linux/magic.h>
 #include <linux/kmemleak.h>
+#include <linux/delay.h>

 /*
  * NCHUNKS_ORDER determines the internal allocation granularity, effectively
@@ -92,6 +93,7 @@ struct z3fold_buddy_slots {
 	unsigned long slot[BUDDY_MASK + 1];
 	unsigned long pool; /* back link + flags */
 	rwlock_t lock;
+	unsigned long poison;
 };
 #define HANDLE_FLAG_MASK	(0x03)

@@ -220,6 +222,7 @@ static inline struct z3fold_buddy_slots
 		kmemleak_not_leak(slots);
 		slots->pool = (unsigned long)pool;
 		rwlock_init(&slots->lock);
+		slots->poison = 0;
 	}

 	return slots;
@@ -267,11 +270,13 @@ static inline struct z3fold_header *__ge
 			unsigned long addr;

 			read_lock(&slots->lock);
+			BUG_ON(READ_ONCE(slots->poison));
 			addr = *(unsigned long *)handle;
 			zhdr = (struct z3fold_header *)(addr & PAGE_MASK);
 			if (lock)
 				locked = z3fold_page_trylock(zhdr);
 			read_unlock(&slots->lock);
+			BUG_ON(READ_ONCE(slots->poison));
 			if (locked)
 				break;
 			cpu_relax();
@@ -319,6 +324,7 @@ static inline void free_handle(unsigned
 	zhdr = handle_to_z3fold_header(handle);
 	slots = handle_to_slots(handle);
 	write_lock(&slots->lock);
+	BUG_ON(READ_ONCE(slots->poison));          <== poison was zero here
 	*(unsigned long *)handle = 0;
 	if (zhdr->slots == slots) {
 		write_unlock(&slots->lock);
@@ -338,6 +344,10 @@ static inline void free_handle(unsigned
 			break;
 		}
 	}
+	if (is_free)
+		slots->poison = 0xdeaddeaddeaddead;
+	else
+		BUG_ON(READ_ONCE(slots->poison));
 	write_unlock(&slots->lock);

 	if (is_free) {
@@ -475,8 +485,10 @@ static unsigned long __encode_handle(str
 		h |= (zhdr->last_chunks << BUDDY_SHIFT);

 	write_lock(&slots->lock);
+	BUG_ON(READ_ONCE(slots->poison));
 	slots->slot[idx] = h;
 	write_unlock(&slots->lock);
+	BUG_ON(READ_ONCE(slots->poison));
 	return (unsigned long)&slots->slot[idx];
 }

@@ -492,8 +504,10 @@ static unsigned short handle_to_chunks(u
 	unsigned long addr;

 	read_lock(&slots->lock);
+	BUG_ON(READ_ONCE(slots->poison));
 	addr = *(unsigned long *)handle;
 	read_unlock(&slots->lock);
+	BUG_ON(READ_ONCE(slots->poison));
 	return (addr & ~PAGE_MASK) >> BUDDY_SHIFT;
 }

@@ -509,10 +523,12 @@ static enum buddy handle_to_buddy(unsign
 	unsigned long addr;

 	read_lock(&slots->lock);
+	BUG_ON(READ_ONCE(slots->poison));
 	WARN_ON(handle & (1 << PAGE_HEADLESS));
 	addr = *(unsigned long *)handle;
 	read_unlock(&slots->lock);
 	zhdr = (struct z3fold_header *)(addr & PAGE_MASK);
+	BUG_ON(READ_ONCE(slots->poison));
 	return (addr - zhdr->first_num) & BUDDY_MASK;
 }

@@ -538,6 +554,7 @@ static void __release_z3fold_page(struct

 	/* If there are no foreign handles, free the handles array */
 	read_lock(&zhdr->slots->lock);
+	BUG_ON(READ_ONCE(zhdr->slots->poison));
 	for (i = 0; i <= BUDDY_MASK; i++) {
 		if (zhdr->slots->slot[i]) {
 			is_free = false;
@@ -546,6 +563,11 @@ static void __release_z3fold_page(struct
 	}
 	if (!is_free)
 		set_bit(HANDLES_ORPHANED, &zhdr->slots->pool);
+	else {
+		zhdr->slots->poison = 0xfeedbabedeadbeef;
+//		usleep_range(100, 200);
+//		zhdr->slots->poison = 0;
+	}
 	read_unlock(&zhdr->slots->lock);

 	if (is_free)
@@ -750,11 +772,14 @@ static struct z3fold_header *compact_sin
 		new_zhdr->foreign_handles++;
 		memcpy(q, p, sz);
 		write_lock(&zhdr->slots->lock);
+		BUG_ON(READ_ONCE(zhdr->slots->poison));
 		*(unsigned long *)old_handle = (unsigned long)new_zhdr +
 			__idx(new_zhdr, new_bud);
 		if (new_bud == LAST)
 			*(unsigned long *)old_handle |=
 					(new_zhdr->last_chunks << BUDDY_SHIFT);
+		BUG_ON(READ_ONCE(zhdr->slots->poison));
+		BUG_ON(READ_ONCE(new_zhdr->slots->poison));
 		write_unlock(&zhdr->slots->lock);
 		add_to_unbuddied(pool, new_zhdr);
 		z3fold_page_unlock(new_zhdr);


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-12-06  9:18                                       ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-12-06  9:18 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Vitaly Wool, Oleksandr Natalenko, linux-kernel, linux-mm,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, linux-rt-users

On Thu, 2020-12-03 at 14:39 +0100, Sebastian Andrzej Siewior wrote:
> On 2020-12-03 09:18:21 [+0100], Mike Galbraith wrote:
> > On Thu, 2020-12-03 at 03:16 +0100, Mike Galbraith wrote:
> > > On Wed, 2020-12-02 at 23:08 +0100, Sebastian Andrzej Siewior wrote:
> > > Looks like...
> > >
> > > d8f117abb380 z3fold: fix use-after-free when freeing handles
> > >
> > > ...wasn't completely effective...
> >
> > The top two hunks seem to have rendered the thing RT tolerant.
>
> Yes, it appears to. I have no idea if this is a proper fix or not.
> Without your write lock, after a few attempts, KASAN says:
>
> | BUG: KASAN: use-after-free in __pv_queued_spin_lock_slowpath+0x293/0x770
> | Write of size 2 at addr ffff88800e0e10aa by task kworker/u16:3/237

Things that make ya go hmmm...

I started poking at it thinking ok, given write_lock() fixes it, bad
juju must be happening under another read_lock(), so I poisoned the
structure about to be freed by __release_z3fold_page() under lock, and
opened a delay window for bad juju to materialize, but it didn't, so I
just poisoned instead, and sprinkled landmines all over the place.

My landmines are not triggering but the detonator is materializing
inside the structure containing the lock we explode on.  Somebody
blasts a recycled z3fold_buddy_slots into ram we were working on?

crash> bt -sx
PID: 11850  TASK: ffff933ee67f2280  CPU: 1   COMMAND: "swapoff"
 #0 [ffffa7bf4e7f3900] machine_kexec+0x16e at ffffffff8405f86e
 #1 [ffffa7bf4e7f3948] __crash_kexec+0xd2 at ffffffff8414c182
 #2 [ffffa7bf4e7f3a08] crash_kexec+0x30 at ffffffff8414d370
 #3 [ffffa7bf4e7f3a18] oops_end+0xca at ffffffff8402680a
 #4 [ffffa7bf4e7f3a38] no_context+0x14d at ffffffff8406d7ed
 #5 [ffffa7bf4e7f3a98] exc_page_fault+0x2b8 at ffffffff848bdb88
 #6 [ffffa7bf4e7f3af0] asm_exc_page_fault+0x1e at ffffffff84a00ace
 #7 [ffffa7bf4e7f3b78] mark_wakeup_next_waiter+0x51 at ffffffff840ea141
 #8 [ffffa7bf4e7f3bd8] rt_mutex_futex_unlock+0x4f at ffffffff848c93ef
 #9 [ffffa7bf4e7f3c18] z3fold_zpool_free+0x580 at ffffffffc0f37680 [z3fold]
#10 [ffffa7bf4e7f3c78] zswap_free_entry+0x43 at ffffffff8427c863
#11 [ffffa7bf4e7f3c88] zswap_frontswap_invalidate_page+0x8a at ffffffff8427c96a
#12 [ffffa7bf4e7f3cb0] __frontswap_invalidate_page+0x48 at ffffffff8427c058
#13 [ffffa7bf4e7f3cd8] swapcache_free_entries+0x1ee at ffffffff84276f8e
#14 [ffffa7bf4e7f3d20] free_swap_slot+0x9f at ffffffff8427b93f
#15 [ffffa7bf4e7f3d40] delete_from_swap_cache+0x61 at ffffffff84274651
#16 [ffffa7bf4e7f3d60] try_to_free_swap+0x70 at ffffffff84277550
#17 [ffffa7bf4e7f3d70] unuse_vma+0x55c at ffffffff842786cc
#18 [ffffa7bf4e7f3e90] try_to_unuse+0x139 at ffffffff84278eb9
#19 [ffffa7bf4e7f3ee8] __x64_sys_swapoff+0x1eb at ffffffff842798fb
#20 [ffffa7bf4e7f3f40] do_syscall_64+0x33 at ffffffff848b9ab3
#21 [ffffa7bf4e7f3f50] entry_SYSCALL_64_after_hwframe+0x44 at ffffffff84a0007c
    RIP: 00007fdd9ce26d17  RSP: 00007ffe99f98238  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 00005625219e8b60  RCX: 00007fdd9ce26d17
    RDX: 0000000000000001  RSI: 0000000000000001  RDI: 00005625219e8b60
    RBP: 0000000000000001   R8: 00007ffe99f982a0   R9: 0000000000000003
    R10: 00005625219e8721  R11: 0000000000000202  R12: 0000000000000001
    R13: 0000000000000000  R14: 00007ffe99f982a0  R15: 0000000000000000
    ORIG_RAX: 00000000000000a8  CS: 0033  SS: 002b
crash> bt -e
PID: 11850  TASK: ffff933ee67f2280  CPU: 1   COMMAND: "swapoff"

  KERNEL-MODE EXCEPTION FRAME AT: ffffa7bf4e7f3950
    [exception RIP: mark_wakeup_next_waiter+81]
    RIP: ffffffff840ea141  RSP: ffffa7bf4e7f3ba0  RFLAGS: 00010046
    RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000000000001
    RDX: 0000000000000001  RSI: ffffa7bf4e7f3bf0  RDI: 0000000000000001
    RBP: ffff933e5a7baba8   R8: ffff933f502c2000   R9: ffff933e5a7babc0
    R10: 0000000000000001  R11: ffffffff855938b8  R12: ffffa7bf4e7f3be0
    R13: ffffa7bf4e7f3bf0  R14: 0000000000000018  R15: ffff933ee67f2280
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
...
crash> gdb list *z3fold_zpool_free+0x580
0xffffffffc0f37680 is in z3fold_zpool_free (mm/z3fold.c:338).
warning: Source file is more recent than executable.
333
334             /* we are freeing a foreign handle if we are here */
335             zhdr->foreign_handles--;
336             is_free = true;
337             if (!test_bit(HANDLES_ORPHANED, &slots->pool)) {
338                     write_unlock(&slots->lock);
339                     return;
340             }
341             for (i = 0; i <= BUDDY_MASK; i++) {
342                     if (slots->slot[i]) {
crash> z3fold_buddy_slots -x ffff933e5a7bab80
struct z3fold_buddy_slots {
  slot = {0x0, 0x0, 0x0, 0x0},
  pool = 0xffff933e4b6d8e00,
  lock = {
    rtmutex = {
      wait_lock = {
        raw_lock = {
          {
            val = {
              counter = 0x1
            },
            {
              locked = 0x1,
              pending = 0x0
            },
            {
              locked_pending = 0x1,
              tail = 0x0
            }
          }
        }
      },
      waiters = {
        rb_root = {
          rb_node = 0xffff933e5a7ba700
        },
        rb_leftmost = 0x0          <== yup, that's why we've been exploding
      },
      owner = 0xffff933ee67f2280,  <== yup, that's our exploding task
      save_state = 0x1
    },
    readers = {
      counter = 0x80000000
    }
  },
  poison = 0xfeedbabedeadbeef      <== oh dear, that SHOULD be impossible
}
crash>

 326         write_lock(&slots->lock);
 327         BUG_ON(READ_ONCE(slots->poison));  <== we got past this landmine..
 328         *(unsigned long *)handle = 0;
 329         if (zhdr->slots == slots) {
 330                 write_unlock(&slots->lock);
 331                 return; /* simple case, nothing else to do */
 332         }
 333
 334         /* we are freeing a foreign handle if we are here */
 335         zhdr->foreign_handles--;
 336         is_free = true;
 337         if (!test_bit(HANDLES_ORPHANED, &slots->pool)) {
 338                 write_unlock(&slots->lock);  <== ..and exploded here due to a corrupt lock
                                                        AND poison written under read_lock()
                                                        has suddenly materialized as well!?!
---
 mm/z3fold.c |   25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -44,6 +44,7 @@
 #include <linux/zpool.h>
 #include <linux/magic.h>
 #include <linux/kmemleak.h>
+#include <linux/delay.h>

 /*
  * NCHUNKS_ORDER determines the internal allocation granularity, effectively
@@ -92,6 +93,7 @@ struct z3fold_buddy_slots {
 	unsigned long slot[BUDDY_MASK + 1];
 	unsigned long pool; /* back link + flags */
 	rwlock_t lock;
+	unsigned long poison;
 };
 #define HANDLE_FLAG_MASK	(0x03)

@@ -220,6 +222,7 @@ static inline struct z3fold_buddy_slots
 		kmemleak_not_leak(slots);
 		slots->pool = (unsigned long)pool;
 		rwlock_init(&slots->lock);
+		slots->poison = 0;
 	}

 	return slots;
@@ -267,11 +270,13 @@ static inline struct z3fold_header *__ge
 			unsigned long addr;

 			read_lock(&slots->lock);
+			BUG_ON(READ_ONCE(slots->poison));
 			addr = *(unsigned long *)handle;
 			zhdr = (struct z3fold_header *)(addr & PAGE_MASK);
 			if (lock)
 				locked = z3fold_page_trylock(zhdr);
 			read_unlock(&slots->lock);
+			BUG_ON(READ_ONCE(slots->poison));
 			if (locked)
 				break;
 			cpu_relax();
@@ -319,6 +324,7 @@ static inline void free_handle(unsigned
 	zhdr = handle_to_z3fold_header(handle);
 	slots = handle_to_slots(handle);
 	write_lock(&slots->lock);
+	BUG_ON(READ_ONCE(slots->poison));          <== poison was zero here
 	*(unsigned long *)handle = 0;
 	if (zhdr->slots == slots) {
 		write_unlock(&slots->lock);
@@ -338,6 +344,10 @@ static inline void free_handle(unsigned
 			break;
 		}
 	}
+	if (is_free)
+		slots->poison = 0xdeaddeaddeaddead;
+	else
+		BUG_ON(READ_ONCE(slots->poison));
 	write_unlock(&slots->lock);

 	if (is_free) {
@@ -475,8 +485,10 @@ static unsigned long __encode_handle(str
 		h |= (zhdr->last_chunks << BUDDY_SHIFT);

 	write_lock(&slots->lock);
+	BUG_ON(READ_ONCE(slots->poison));
 	slots->slot[idx] = h;
 	write_unlock(&slots->lock);
+	BUG_ON(READ_ONCE(slots->poison));
 	return (unsigned long)&slots->slot[idx];
 }

@@ -492,8 +504,10 @@ static unsigned short handle_to_chunks(u
 	unsigned long addr;

 	read_lock(&slots->lock);
+	BUG_ON(READ_ONCE(slots->poison));
 	addr = *(unsigned long *)handle;
 	read_unlock(&slots->lock);
+	BUG_ON(READ_ONCE(slots->poison));
 	return (addr & ~PAGE_MASK) >> BUDDY_SHIFT;
 }

@@ -509,10 +523,12 @@ static enum buddy handle_to_buddy(unsign
 	unsigned long addr;

 	read_lock(&slots->lock);
+	BUG_ON(READ_ONCE(slots->poison));
 	WARN_ON(handle & (1 << PAGE_HEADLESS));
 	addr = *(unsigned long *)handle;
 	read_unlock(&slots->lock);
 	zhdr = (struct z3fold_header *)(addr & PAGE_MASK);
+	BUG_ON(READ_ONCE(slots->poison));
 	return (addr - zhdr->first_num) & BUDDY_MASK;
 }

@@ -538,6 +554,7 @@ static void __release_z3fold_page(struct

 	/* If there are no foreign handles, free the handles array */
 	read_lock(&zhdr->slots->lock);
+	BUG_ON(READ_ONCE(zhdr->slots->poison));
 	for (i = 0; i <= BUDDY_MASK; i++) {
 		if (zhdr->slots->slot[i]) {
 			is_free = false;
@@ -546,6 +563,11 @@ static void __release_z3fold_page(struct
 	}
 	if (!is_free)
 		set_bit(HANDLES_ORPHANED, &zhdr->slots->pool);
+	else {
+		zhdr->slots->poison = 0xfeedbabedeadbeef;
+//		usleep_range(100, 200);
+//		zhdr->slots->poison = 0;
+	}
 	read_unlock(&zhdr->slots->lock);

 	if (is_free)
@@ -750,11 +772,14 @@ static struct z3fold_header *compact_sin
 		new_zhdr->foreign_handles++;
 		memcpy(q, p, sz);
 		write_lock(&zhdr->slots->lock);
+		BUG_ON(READ_ONCE(zhdr->slots->poison));
 		*(unsigned long *)old_handle = (unsigned long)new_zhdr +
 			__idx(new_zhdr, new_bud);
 		if (new_bud == LAST)
 			*(unsigned long *)old_handle |=
 					(new_zhdr->last_chunks << BUDDY_SHIFT);
+		BUG_ON(READ_ONCE(zhdr->slots->poison));
+		BUG_ON(READ_ONCE(new_zhdr->slots->poison));
 		write_unlock(&zhdr->slots->lock);
 		add_to_unbuddied(pool, new_zhdr);
 		z3fold_page_unlock(new_zhdr);



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-12-06  9:18                                       ` Mike Galbraith
  (?)
@ 2020-12-07  1:05                                       ` Vitaly Wool
  2020-12-07  2:18                                           ` Mike Galbraith
  -1 siblings, 1 reply; 70+ messages in thread
From: Vitaly Wool @ 2020-12-07  1:05 UTC (permalink / raw)
  To: Mike Galbraith, Sebastian Andrzej Siewior
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 2114 bytes --]

On 2020-12-06 10:18, Mike Galbraith wrote:

> On Thu, 2020-12-03 at 14:39 +0100, Sebastian Andrzej Siewior wrote:
>> On 2020-12-03 09:18:21 [+0100], Mike Galbraith wrote:
>>> On Thu, 2020-12-03 at 03:16 +0100, Mike Galbraith wrote:
>>>> On Wed, 2020-12-02 at 23:08 +0100, Sebastian Andrzej Siewior wrote: 
>>>> Looks like... d8f117abb380 z3fold: fix use-after-free when freeing 
>>>> handles ...wasn't completely effective...
>>> The top two hunks seem to have rendered the thing RT tolerant.
>> Yes, it appears to. I have no idea if this is a proper fix or not. 
>> Without your write lock, after a few attempts, KASAN says: | BUG: 
>> KASAN: use-after-free in __pv_queued_spin_lock_slowpath+0x293/0x770 | 
>> Write of size 2 at addr ffff88800e0e10aa by task kworker/u16:3/237
> Things that make ya go hmmm... I started poking at it thinking ok, 
> given write_lock() fixes it, bad juju must be happening under another 
> read_lock(), so I poisoned the structure about to be freed by 
> __release_z3fold_page() under lock, and opened a delay window for bad 
> juju to materialize, but it didn't, so I just poisoned instead, and 
> sprinkled landmines all over the place. My landmines are not 
> triggering but the detonator is materializing inside the structure 
> containing the lock we explode on. Somebody blasts a recycled 
> z3fold_buddy_slots into ram we were working on?

Could you please try the following patch in your setup:

diff --git a/mm/z3fold.c b/mm/z3fold.c
index 18feaa0bc537..efe9a012643d 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -544,12 +544,17 @@ static void __release_z3fold_page(struct z3fold_header *zhdr, bool locked)
  			break;
  		}
  	}
-	if (!is_free)
+	if (!is_free) {
  		set_bit(HANDLES_ORPHANED, &zhdr->slots->pool);
-	read_unlock(&zhdr->slots->lock);
-
-	if (is_free)
+		read_unlock(&zhdr->slots->lock);
+	} else {
+		zhdr->slots->slot[0] =
+			zhdr->slots->slot[1] =
+			zhdr->slots->slot[2] =
+			zhdr->slots->slot[3] = 0xdeadbeef;
+		read_unlock(&zhdr->slots->lock);
  		kmem_cache_free(pool->c_handle, zhdr->slots);
+	}
  
  	if (locked)
  		z3fold_page_unlock(zhdr);


[-- Attachment #2: Type: text/html, Size: 2857 bytes --]

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-12-07  1:05                                       ` Vitaly Wool
@ 2020-12-07  2:18                                           ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-12-07  2:18 UTC (permalink / raw)
  To: Vitaly Wool, Sebastian Andrzej Siewior
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-12-07 at 02:05 +0100, Vitaly Wool wrote:
>
> Could you please try the following patch in your setup:

crash> gdb list *z3fold_zpool_free+0x527
0xffffffffc0e14487 is in z3fold_zpool_free (mm/z3fold.c:341).
336                     if (slots->slot[i]) {
337                             is_free = false;
338                             break;
339                     }
340             }
341             write_unlock(&slots->lock);  <== boom
342
343             if (is_free) {
344                     struct z3fold_pool *pool = slots_to_pool(slots);
345
crash> z3fold_buddy_slots -x ffff99a3287b8780
struct z3fold_buddy_slots {
  slot = {0xdeadbeef, 0xdeadbeef, 0xdeadbeef, 0xdeadbeef},
  pool = 0xffff99a3146b8400,
  lock = {
    rtmutex = {
      wait_lock = {
        raw_lock = {
          {
            val = {
              counter = 0x1
            },
            {
              locked = 0x1,
              pending = 0x0
            },
            {
              locked_pending = 0x1,
              tail = 0x0
            }
          }
        }
      },
      waiters = {
        rb_root = {
          rb_node = 0xffff99a3287b8e00
        },
        rb_leftmost = 0x0
      },
      owner = 0xffff99a355c24500,
      save_state = 0x1
    },
    readers = {
      counter = 0x80000000
    }
  }
}

> diff --git a/mm/z3fold.c b/mm/z3fold.c
> index 18feaa0bc537..efe9a012643d 100644
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -544,12 +544,17 @@ static void __release_z3fold_page(struct z3fold_header *zhdr, bool locked)
>  			break;
>  		}
>  	}
> -	if (!is_free)
> +	if (!is_free) {
>  		set_bit(HANDLES_ORPHANED, &zhdr->slots->pool);
> -	read_unlock(&zhdr->slots->lock);
> -
> -	if (is_free)
> +		read_unlock(&zhdr->slots->lock);
> +	} else {
> +		zhdr->slots->slot[0] =
> +			zhdr->slots->slot[1] =
> +			zhdr->slots->slot[2] =
> +			zhdr->slots->slot[3] = 0xdeadbeef;
> +		read_unlock(&zhdr->slots->lock);
>  		kmem_cache_free(pool->c_handle, zhdr->slots);
> +	}
>
>  	if (locked)
>  		z3fold_page_unlock(zhdr);


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-12-07  2:18                                           ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-12-07  2:18 UTC (permalink / raw)
  To: Vitaly Wool, Sebastian Andrzej Siewior
  Cc: Oleksandr Natalenko, linux-kernel, linux-mm, Andrew Morton,
	Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-12-07 at 02:05 +0100, Vitaly Wool wrote:
>
> Could you please try the following patch in your setup:

crash> gdb list *z3fold_zpool_free+0x527
0xffffffffc0e14487 is in z3fold_zpool_free (mm/z3fold.c:341).
336                     if (slots->slot[i]) {
337                             is_free = false;
338                             break;
339                     }
340             }
341             write_unlock(&slots->lock);  <== boom
342
343             if (is_free) {
344                     struct z3fold_pool *pool = slots_to_pool(slots);
345
crash> z3fold_buddy_slots -x ffff99a3287b8780
struct z3fold_buddy_slots {
  slot = {0xdeadbeef, 0xdeadbeef, 0xdeadbeef, 0xdeadbeef},
  pool = 0xffff99a3146b8400,
  lock = {
    rtmutex = {
      wait_lock = {
        raw_lock = {
          {
            val = {
              counter = 0x1
            },
            {
              locked = 0x1,
              pending = 0x0
            },
            {
              locked_pending = 0x1,
              tail = 0x0
            }
          }
        }
      },
      waiters = {
        rb_root = {
          rb_node = 0xffff99a3287b8e00
        },
        rb_leftmost = 0x0
      },
      owner = 0xffff99a355c24500,
      save_state = 0x1
    },
    readers = {
      counter = 0x80000000
    }
  }
}

> diff --git a/mm/z3fold.c b/mm/z3fold.c
> index 18feaa0bc537..efe9a012643d 100644
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -544,12 +544,17 @@ static void __release_z3fold_page(struct z3fold_header *zhdr, bool locked)
>  			break;
>  		}
>  	}
> -	if (!is_free)
> +	if (!is_free) {
>  		set_bit(HANDLES_ORPHANED, &zhdr->slots->pool);
> -	read_unlock(&zhdr->slots->lock);
> -
> -	if (is_free)
> +		read_unlock(&zhdr->slots->lock);
> +	} else {
> +		zhdr->slots->slot[0] =
> +			zhdr->slots->slot[1] =
> +			zhdr->slots->slot[2] =
> +			zhdr->slots->slot[3] = 0xdeadbeef;
> +		read_unlock(&zhdr->slots->lock);
>  		kmem_cache_free(pool->c_handle, zhdr->slots);
> +	}
>
>  	if (locked)
>  		z3fold_page_unlock(zhdr);



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-12-07  2:18                                           ` Mike Galbraith
@ 2020-12-07 11:52                                             ` Vitaly Wool
  -1 siblings, 0 replies; 70+ messages in thread
From: Vitaly Wool @ 2020-12-07 11:52 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Sebastian Andrzej Siewior, Oleksandr Natalenko, LKML, Linux-MM,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, Dec 7, 2020 at 3:18 AM Mike Galbraith <efault@gmx.de> wrote:
>
> On Mon, 2020-12-07 at 02:05 +0100, Vitaly Wool wrote:
> >
> > Could you please try the following patch in your setup:
>
> crash> gdb list *z3fold_zpool_free+0x527
> 0xffffffffc0e14487 is in z3fold_zpool_free (mm/z3fold.c:341).
> 336                     if (slots->slot[i]) {
> 337                             is_free = false;
> 338                             break;
> 339                     }
> 340             }
> 341             write_unlock(&slots->lock);  <== boom
> 342
> 343             if (is_free) {
> 344                     struct z3fold_pool *pool = slots_to_pool(slots);
> 345
> crash> z3fold_buddy_slots -x ffff99a3287b8780
> struct z3fold_buddy_slots {
>   slot = {0xdeadbeef, 0xdeadbeef, 0xdeadbeef, 0xdeadbeef},
>   pool = 0xffff99a3146b8400,
>   lock = {
>     rtmutex = {
>       wait_lock = {
>         raw_lock = {
>           {
>             val = {
>               counter = 0x1
>             },
>             {
>               locked = 0x1,
>               pending = 0x0
>             },
>             {
>               locked_pending = 0x1,
>               tail = 0x0
>             }
>           }
>         }
>       },
>       waiters = {
>         rb_root = {
>           rb_node = 0xffff99a3287b8e00
>         },
>         rb_leftmost = 0x0
>       },
>       owner = 0xffff99a355c24500,
>       save_state = 0x1
>     },
>     readers = {
>       counter = 0x80000000
>     }
>   }
> }

Thanks. This trace beats me because I don't quite get how this could
have happened.
Hitting write_unlock at line 341 would mean that HANDLES_ORPHANED bit
is set but obviously it isn't.
Could you please comment out the ".shrink = z3fold_zpool_shrink" line
and retry? Reclaim is the trickiest thing over there since I have to
drop page lock while reclaiming.

Thanks,
   Vitaly

> > diff --git a/mm/z3fold.c b/mm/z3fold.c
> > index 18feaa0bc537..efe9a012643d 100644
> > --- a/mm/z3fold.c
> > +++ b/mm/z3fold.c
> > @@ -544,12 +544,17 @@ static void __release_z3fold_page(struct z3fold_header *zhdr, bool locked)
> >                       break;
> >               }
> >       }
> > -     if (!is_free)
> > +     if (!is_free) {
> >               set_bit(HANDLES_ORPHANED, &zhdr->slots->pool);
> > -     read_unlock(&zhdr->slots->lock);
> > -
> > -     if (is_free)
> > +             read_unlock(&zhdr->slots->lock);
> > +     } else {
> > +             zhdr->slots->slot[0] =
> > +                     zhdr->slots->slot[1] =
> > +                     zhdr->slots->slot[2] =
> > +                     zhdr->slots->slot[3] = 0xdeadbeef;
> > +             read_unlock(&zhdr->slots->lock);
> >               kmem_cache_free(pool->c_handle, zhdr->slots);
> > +     }
> >
> >       if (locked)
> >               z3fold_page_unlock(zhdr);
>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-12-07 11:52                                             ` Vitaly Wool
  0 siblings, 0 replies; 70+ messages in thread
From: Vitaly Wool @ 2020-12-07 11:52 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Sebastian Andrzej Siewior, Oleksandr Natalenko, LKML, Linux-MM,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, Dec 7, 2020 at 3:18 AM Mike Galbraith <efault@gmx.de> wrote:
>
> On Mon, 2020-12-07 at 02:05 +0100, Vitaly Wool wrote:
> >
> > Could you please try the following patch in your setup:
>
> crash> gdb list *z3fold_zpool_free+0x527
> 0xffffffffc0e14487 is in z3fold_zpool_free (mm/z3fold.c:341).
> 336                     if (slots->slot[i]) {
> 337                             is_free = false;
> 338                             break;
> 339                     }
> 340             }
> 341             write_unlock(&slots->lock);  <== boom
> 342
> 343             if (is_free) {
> 344                     struct z3fold_pool *pool = slots_to_pool(slots);
> 345
> crash> z3fold_buddy_slots -x ffff99a3287b8780
> struct z3fold_buddy_slots {
>   slot = {0xdeadbeef, 0xdeadbeef, 0xdeadbeef, 0xdeadbeef},
>   pool = 0xffff99a3146b8400,
>   lock = {
>     rtmutex = {
>       wait_lock = {
>         raw_lock = {
>           {
>             val = {
>               counter = 0x1
>             },
>             {
>               locked = 0x1,
>               pending = 0x0
>             },
>             {
>               locked_pending = 0x1,
>               tail = 0x0
>             }
>           }
>         }
>       },
>       waiters = {
>         rb_root = {
>           rb_node = 0xffff99a3287b8e00
>         },
>         rb_leftmost = 0x0
>       },
>       owner = 0xffff99a355c24500,
>       save_state = 0x1
>     },
>     readers = {
>       counter = 0x80000000
>     }
>   }
> }

Thanks. This trace beats me because I don't quite get how this could
have happened.
Hitting write_unlock at line 341 would mean that HANDLES_ORPHANED bit
is set but obviously it isn't.
Could you please comment out the ".shrink = z3fold_zpool_shrink" line
and retry? Reclaim is the trickiest thing over there since I have to
drop page lock while reclaiming.

Thanks,
   Vitaly

> > diff --git a/mm/z3fold.c b/mm/z3fold.c
> > index 18feaa0bc537..efe9a012643d 100644
> > --- a/mm/z3fold.c
> > +++ b/mm/z3fold.c
> > @@ -544,12 +544,17 @@ static void __release_z3fold_page(struct z3fold_header *zhdr, bool locked)
> >                       break;
> >               }
> >       }
> > -     if (!is_free)
> > +     if (!is_free) {
> >               set_bit(HANDLES_ORPHANED, &zhdr->slots->pool);
> > -     read_unlock(&zhdr->slots->lock);
> > -
> > -     if (is_free)
> > +             read_unlock(&zhdr->slots->lock);
> > +     } else {
> > +             zhdr->slots->slot[0] =
> > +                     zhdr->slots->slot[1] =
> > +                     zhdr->slots->slot[2] =
> > +                     zhdr->slots->slot[3] = 0xdeadbeef;
> > +             read_unlock(&zhdr->slots->lock);
> >               kmem_cache_free(pool->c_handle, zhdr->slots);
> > +     }
> >
> >       if (locked)
> >               z3fold_page_unlock(zhdr);
>


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-12-07 11:52                                             ` Vitaly Wool
@ 2020-12-07 12:34                                               ` Mike Galbraith
  -1 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-12-07 12:34 UTC (permalink / raw)
  To: Vitaly Wool
  Cc: Sebastian Andrzej Siewior, Oleksandr Natalenko, LKML, Linux-MM,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-12-07 at 12:52 +0100, Vitaly Wool wrote:
>
> Thanks. This trace beats me because I don't quite get how this could
> have happened.

I swear there's a mythical creature loose in there somewhere ;-)
Everything looks just peachy up to the instant it goes boom, then you
find in the wreckage that which was most definitely NOT there just a
few ns ago.

> Hitting write_unlock at line 341 would mean that HANDLES_ORPHANED bit
> is set but obviously it isn't.
> Could you please comment out the ".shrink = z3fold_zpool_shrink" line
> and retry?

Unfortunately, that made zero difference.

	-Mike


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-12-07 12:34                                               ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-12-07 12:34 UTC (permalink / raw)
  To: Vitaly Wool
  Cc: Sebastian Andrzej Siewior, Oleksandr Natalenko, LKML, Linux-MM,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-12-07 at 12:52 +0100, Vitaly Wool wrote:
>
> Thanks. This trace beats me because I don't quite get how this could
> have happened.

I swear there's a mythical creature loose in there somewhere ;-)
Everything looks just peachy up to the instant it goes boom, then you
find in the wreckage that which was most definitely NOT there just a
few ns ago.

> Hitting write_unlock at line 341 would mean that HANDLES_ORPHANED bit
> is set but obviously it isn't.
> Could you please comment out the ".shrink = z3fold_zpool_shrink" line
> and retry?

Unfortunately, that made zero difference.

	-Mike



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-12-07 12:34                                               ` Mike Galbraith
@ 2020-12-07 15:21                                                 ` Vitaly Wool
  -1 siblings, 0 replies; 70+ messages in thread
From: Vitaly Wool @ 2020-12-07 15:21 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Sebastian Andrzej Siewior, Oleksandr Natalenko, LKML, Linux-MM,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, Dec 7, 2020 at 1:34 PM Mike Galbraith <efault@gmx.de> wrote:
>
> On Mon, 2020-12-07 at 12:52 +0100, Vitaly Wool wrote:
> >
> > Thanks. This trace beats me because I don't quite get how this could
> > have happened.
>
> I swear there's a mythical creature loose in there somewhere ;-)
> Everything looks just peachy up to the instant it goes boom, then you
> find in the wreckage that which was most definitely NOT there just a
> few ns ago.
>
> > Hitting write_unlock at line 341 would mean that HANDLES_ORPHANED bit
> > is set but obviously it isn't.
> > Could you please comment out the ".shrink = z3fold_zpool_shrink" line
> > and retry?
>
> Unfortunately, that made zero difference.

Okay, I suggest that you submit the patch that changes read_lock() to
write_lock() in __release_z3fold_page() and I'll ack it then.
I would like to rewrite the code so that write_lock is not necessary
there but I don't want to hold you back and it isn't likely that I'll
complete this today.

Best regards,
   Vitaly

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-12-07 15:21                                                 ` Vitaly Wool
  0 siblings, 0 replies; 70+ messages in thread
From: Vitaly Wool @ 2020-12-07 15:21 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Sebastian Andrzej Siewior, Oleksandr Natalenko, LKML, Linux-MM,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, Dec 7, 2020 at 1:34 PM Mike Galbraith <efault@gmx.de> wrote:
>
> On Mon, 2020-12-07 at 12:52 +0100, Vitaly Wool wrote:
> >
> > Thanks. This trace beats me because I don't quite get how this could
> > have happened.
>
> I swear there's a mythical creature loose in there somewhere ;-)
> Everything looks just peachy up to the instant it goes boom, then you
> find in the wreckage that which was most definitely NOT there just a
> few ns ago.
>
> > Hitting write_unlock at line 341 would mean that HANDLES_ORPHANED bit
> > is set but obviously it isn't.
> > Could you please comment out the ".shrink = z3fold_zpool_shrink" line
> > and retry?
>
> Unfortunately, that made zero difference.

Okay, I suggest that you submit the patch that changes read_lock() to
write_lock() in __release_z3fold_page() and I'll ack it then.
I would like to rewrite the code so that write_lock is not necessary
there but I don't want to hold you back and it isn't likely that I'll
complete this today.

Best regards,
   Vitaly


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-12-07 15:21                                                 ` Vitaly Wool
  (?)
@ 2020-12-07 15:41                                                 ` Sebastian Andrzej Siewior
  -1 siblings, 0 replies; 70+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-12-07 15:41 UTC (permalink / raw)
  To: Vitaly Wool
  Cc: Mike Galbraith, Oleksandr Natalenko, LKML, Linux-MM,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, linux-rt-users

On 2020-12-07 16:21:20 [+0100], Vitaly Wool wrote:
> On Mon, Dec 7, 2020 at 1:34 PM Mike Galbraith <efault@gmx.de> wrote:
> >
> > Unfortunately, that made zero difference.
> 
> Okay, I suggest that you submit the patch that changes read_lock() to
> write_lock() in __release_z3fold_page() and I'll ack it then.
> I would like to rewrite the code so that write_lock is not necessary
> there but I don't want to hold you back and it isn't likely that I'll
> complete this today.

Please with a

   Fixes: 4a3ac9311dac3 ("mm/z3fold.c: add inter-page compaction")

tag.

> Best regards,
>    Vitaly

Sebastian

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-12-07 15:21                                                 ` Vitaly Wool
@ 2020-12-07 15:41                                                   ` Mike Galbraith
  -1 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-12-07 15:41 UTC (permalink / raw)
  To: Vitaly Wool
  Cc: Sebastian Andrzej Siewior, Oleksandr Natalenko, LKML, Linux-MM,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-12-07 at 16:21 +0100, Vitaly Wool wrote:
> On Mon, Dec 7, 2020 at 1:34 PM Mike Galbraith <efault@gmx.de> wrote:
> >
>
> > Unfortunately, that made zero difference.
>
> Okay, I suggest that you submit the patch that changes read_lock() to
> write_lock() in __release_z3fold_page() and I'll ack it then.
> I would like to rewrite the code so that write_lock is not necessary
> there but I don't want to hold you back and it isn't likely that I'll
> complete this today.

Nah, I'm in no rush... especially not to sign off on "Because the
little voices in my head said this bit should look like that bit over
yonder, and testing _seems_ to indicate they're right about that" :)

	-Mike


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-12-07 15:41                                                   ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-12-07 15:41 UTC (permalink / raw)
  To: Vitaly Wool
  Cc: Sebastian Andrzej Siewior, Oleksandr Natalenko, LKML, Linux-MM,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, linux-rt-users

On Mon, 2020-12-07 at 16:21 +0100, Vitaly Wool wrote:
> On Mon, Dec 7, 2020 at 1:34 PM Mike Galbraith <efault@gmx.de> wrote:
> >
>
> > Unfortunately, that made zero difference.
>
> Okay, I suggest that you submit the patch that changes read_lock() to
> write_lock() in __release_z3fold_page() and I'll ack it then.
> I would like to rewrite the code so that write_lock is not necessary
> there but I don't want to hold you back and it isn't likely that I'll
> complete this today.

Nah, I'm in no rush... especially not to sign off on "Because the
little voices in my head said this bit should look like that bit over
yonder, and testing _seems_ to indicate they're right about that" :)

	-Mike



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-12-07 15:41                                                   ` Mike Galbraith
  (?)
@ 2020-12-08 23:26                                                   ` Vitaly Wool
  2020-12-09  6:13                                                       ` Mike Galbraith
  -1 siblings, 1 reply; 70+ messages in thread
From: Vitaly Wool @ 2020-12-08 23:26 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Sebastian Andrzej Siewior, Oleksandr Natalenko, LKML, Linux-MM,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, linux-rt-users

Hi Mike,

On 2020-12-07 16:41, Mike Galbraith wrote:
> On Mon, 2020-12-07 at 16:21 +0100, Vitaly Wool wrote:
>> On Mon, Dec 7, 2020 at 1:34 PM Mike Galbraith <efault@gmx.de> wrote:
>>>
>>
>>> Unfortunately, that made zero difference.
>>
>> Okay, I suggest that you submit the patch that changes read_lock() to
>> write_lock() in __release_z3fold_page() and I'll ack it then.
>> I would like to rewrite the code so that write_lock is not necessary
>> there but I don't want to hold you back and it isn't likely that I'll
>> complete this today.
> 
> Nah, I'm in no rush... especially not to sign off on "Because the
> little voices in my head said this bit should look like that bit over
> yonder, and testing _seems_ to indicate they're right about that" :)
> 
> 	-Mike
> 

okay, thanks. Would this make things better:

diff --git a/mm/z3fold.c b/mm/z3fold.c
index 18feaa0bc537..340c38a5ffac 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -303,10 +303,9 @@ static inline void put_z3fold_header(struct 
z3fold_header *zhdr)
  		z3fold_page_unlock(zhdr);
  }

-static inline void free_handle(unsigned long handle)
+static inline void free_handle(unsigned long handle, struct 
z3fold_header *zhdr)
  {
  	struct z3fold_buddy_slots *slots;
-	struct z3fold_header *zhdr;
  	int i;
  	bool is_free;

@@ -316,22 +315,13 @@ static inline void free_handle(unsigned long handle)
  	if (WARN_ON(*(unsigned long *)handle == 0))
  		return;

-	zhdr = handle_to_z3fold_header(handle);
  	slots = handle_to_slots(handle);
  	write_lock(&slots->lock);
  	*(unsigned long *)handle = 0;
-	if (zhdr->slots == slots) {
-		write_unlock(&slots->lock);
-		return; /* simple case, nothing else to do */
-	}
+	if (zhdr->slots != slots)
+		zhdr->foreign_handles--;

-	/* we are freeing a foreign handle if we are here */
-	zhdr->foreign_handles--;
  	is_free = true;
-	if (!test_bit(HANDLES_ORPHANED, &slots->pool)) {
-		write_unlock(&slots->lock);
-		return;
-	}
  	for (i = 0; i <= BUDDY_MASK; i++) {
  		if (slots->slot[i]) {
  			is_free = false;
@@ -343,6 +333,8 @@ static inline void free_handle(unsigned long handle)
  	if (is_free) {
  		struct z3fold_pool *pool = slots_to_pool(slots);

+		if (zhdr->slots == slots)
+			zhdr->slots = NULL;
  		kmem_cache_free(pool->c_handle, slots);
  	}
  }
@@ -525,8 +517,6 @@ static void __release_z3fold_page(struct 
z3fold_header *zhdr, bool locked)
  {
  	struct page *page = virt_to_page(zhdr);
  	struct z3fold_pool *pool = zhdr_to_pool(zhdr);
-	bool is_free = true;
-	int i;

  	WARN_ON(!list_empty(&zhdr->buddy));
  	set_bit(PAGE_STALE, &page->private);
@@ -536,21 +526,6 @@ static void __release_z3fold_page(struct 
z3fold_header *zhdr, bool locked)
  		list_del_init(&page->lru);
  	spin_unlock(&pool->lock);

-	/* If there are no foreign handles, free the handles array */
-	read_lock(&zhdr->slots->lock);
-	for (i = 0; i <= BUDDY_MASK; i++) {
-		if (zhdr->slots->slot[i]) {
-			is_free = false;
-			break;
-		}
-	}
-	if (!is_free)
-		set_bit(HANDLES_ORPHANED, &zhdr->slots->pool);
-	read_unlock(&zhdr->slots->lock);
-
-	if (is_free)
-		kmem_cache_free(pool->c_handle, zhdr->slots);
-
  	if (locked)
  		z3fold_page_unlock(zhdr);

@@ -973,6 +948,9 @@ static inline struct z3fold_header 
*__z3fold_alloc(struct z3fold_pool *pool,
  		}
  	}

+	if (zhdr && !zhdr->slots)
+		zhdr->slots = alloc_slots(pool,
+					can_sleep ? GFP_NOIO : GFP_ATOMIC);
  	return zhdr;
  }

@@ -1270,7 +1248,7 @@ static void z3fold_free(struct z3fold_pool *pool, 
unsigned long handle)
  	}

  	if (!page_claimed)
-		free_handle(handle);
+		free_handle(handle, zhdr);
  	if (kref_put(&zhdr->refcount, release_z3fold_page_locked_list)) {
  		atomic64_dec(&pool->pages_nr);
  		return;
@@ -1429,19 +1407,19 @@ static int z3fold_reclaim_page(struct 
z3fold_pool *pool, unsigned int retries)
  			ret = pool->ops->evict(pool, middle_handle);
  			if (ret)
  				goto next;
-			free_handle(middle_handle);
+			free_handle(middle_handle, zhdr);
  		}
  		if (first_handle) {
  			ret = pool->ops->evict(pool, first_handle);
  			if (ret)
  				goto next;
-			free_handle(first_handle);
+			free_handle(first_handle, zhdr);
  		}
  		if (last_handle) {
  			ret = pool->ops->evict(pool, last_handle);
  			if (ret)
  				goto next;
-			free_handle(last_handle);
+			free_handle(last_handle, zhdr);
  		}
  next:
  		if (test_bit(PAGE_HEADLESS, &page->private)) {

--

Best regards,
    Vitaly

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-12-08 23:26                                                   ` Vitaly Wool
@ 2020-12-09  6:13                                                       ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-12-09  6:13 UTC (permalink / raw)
  To: Vitaly Wool
  Cc: Sebastian Andrzej Siewior, Oleksandr Natalenko, LKML, Linux-MM,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, linux-rt-users

On Wed, 2020-12-09 at 00:26 +0100, Vitaly Wool wrote:
> Hi Mike,
>
> On 2020-12-07 16:41, Mike Galbraith wrote:
> > On Mon, 2020-12-07 at 16:21 +0100, Vitaly Wool wrote:
> >> On Mon, Dec 7, 2020 at 1:34 PM Mike Galbraith <efault@gmx.de> wrote:
> >>>
> >>
> >>> Unfortunately, that made zero difference.
> >>
> >> Okay, I suggest that you submit the patch that changes read_lock() to
> >> write_lock() in __release_z3fold_page() and I'll ack it then.
> >> I would like to rewrite the code so that write_lock is not necessary
> >> there but I don't want to hold you back and it isn't likely that I'll
> >> complete this today.
> >
> > Nah, I'm in no rush... especially not to sign off on "Because the
> > little voices in my head said this bit should look like that bit over
> > yonder, and testing _seems_ to indicate they're right about that" :)
> >
> > 	-Mike
> >
>
> okay, thanks. Would this make things better:

Yup, z3fold became RT tolerant with this (un-munged and) applied.

BTW, I don't think my write_lock() hacklet actually plugged the hole
that leads to the below.  I think it just reduced the odds of the two
meeting enough to make it look ~solid in fairly limited test time.

[ 3369.373023] kworker/-7413      4.....12 3368809247us : do_compact_page: zhdr: ffff95d93abd8000 zhdr->slots: ffff95d951f5df80 zhdr->slots->slot[0]: 0
[ 3369.373025] kworker/-7413      4.....12 3368809248us : do_compact_page: old_handle ffff95d951f5df98 *old_handle was ffff95d93abd804f now is ffff95da3ab8104c
[ 3369.373027] kworker/-7413      4.....11 3368809249us : __release_z3fold_page.constprop.25: freed ffff95d951f5df80
[ 3369.373028] ---------------------------------
[ 3369.373029] CR2: 0000000000000018
crash> p debug_handle | grep '\[2'
  [2]: ffff95dc1ecac0d0
crash> rd ffff95dc1ecac0d0
ffff95dc1ecac0d0:  ffff95d951f5df98                    ...Q....
crash> p debug_zhdr | grep '\[2'
  [2]: ffff95dc1ecac0c8
crash> rd ffff95dc1ecac0c8
ffff95dc1ecac0c8:  ffff95da3ab81000                    ...:....  <== kworker is not working on same zhdr as free_handle()..
crash> p debug_slots | grep '\[2'
  [2]: ffff95dc1ecac0c0
crash> rd ffff95dc1ecac0c0
ffff95dc1ecac0c0:  ffff95d951f5df80                    ...Q....  <== ..but is the same slots, and frees it under free_handle().
crash> bt -sx                                                          leading to use after free+corruption+explosion 1us later.
PID: 9334   TASK: ffff95d95a1eb3c0  CPU: 2   COMMAND: "swapoff"
 #0 [ffffb4248847f8f0] machine_kexec+0x16e at ffffffffa605f8ce
 #1 [ffffb4248847f938] __crash_kexec+0xd2 at ffffffffa614c162
 #2 [ffffb4248847f9f8] crash_kexec+0x30 at ffffffffa614d350
 #3 [ffffb4248847fa08] oops_end+0xca at ffffffffa602680a
 #4 [ffffb4248847fa28] no_context+0x14d at ffffffffa606d7cd
 #5 [ffffb4248847fa88] exc_page_fault+0x2b8 at ffffffffa68bdb88
 #6 [ffffb4248847fae0] asm_exc_page_fault+0x1e at ffffffffa6a00ace
 #7 [ffffb4248847fb68] mark_wakeup_next_waiter+0x51 at ffffffffa60ea121
 #8 [ffffb4248847fbd0] rt_mutex_futex_unlock+0x4f at ffffffffa68c93ef
 #9 [ffffb4248847fc10] z3fold_zpool_free+0x593 at ffffffffc0ecb663 [z3fold]
#10 [ffffb4248847fc78] zswap_free_entry+0x43 at ffffffffa627c823
#11 [ffffb4248847fc88] zswap_frontswap_invalidate_page+0x8a at ffffffffa627c92a
#12 [ffffb4248847fcb0] __frontswap_invalidate_page+0x48 at ffffffffa627c018
#13 [ffffb4248847fcd8] swapcache_free_entries+0x1ee at ffffffffa6276f5e
#14 [ffffb4248847fd20] free_swap_slot+0x9f at ffffffffa627b8ff
#15 [ffffb4248847fd40] delete_from_swap_cache+0x61 at ffffffffa6274621
#16 [ffffb4248847fd60] try_to_free_swap+0x70 at ffffffffa6277520
#17 [ffffb4248847fd70] unuse_vma+0x55c at ffffffffa627869c
#18 [ffffb4248847fe90] try_to_unuse+0x139 at ffffffffa6278e89
#19 [ffffb4248847fee8] __x64_sys_swapoff+0x1eb at ffffffffa62798cb
#20 [ffffb4248847ff40] do_syscall_64+0x33 at ffffffffa68b9ab3
#21 [ffffb4248847ff50] entry_SYSCALL_64_after_hwframe+0x44 at ffffffffa6a0007c
    RIP: 00007fbd835a5d17  RSP: 00007ffd60634458  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 0000559540e34b60  RCX: 00007fbd835a5d17
    RDX: 0000000000000001  RSI: 0000000000000001  RDI: 0000559540e34b60
    RBP: 0000000000000001   R8: 00007ffd606344c0   R9: 0000000000000003
    R10: 0000559540e34721  R11: 0000000000000202  R12: 0000000000000001
    R13: 0000000000000000  R14: 00007ffd606344c0  R15: 0000000000000000
    ORIG_RAX: 00000000000000a8  CS: 0033  SS: 002b
crash>

>
> diff --git a/mm/z3fold.c b/mm/z3fold.c
> index 18feaa0bc537..340c38a5ffac 100644
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -303,10 +303,9 @@ static inline void put_z3fold_header(struct
> z3fold_header *zhdr)
>   		z3fold_page_unlock(zhdr);
>   }
>
> -static inline void free_handle(unsigned long handle)
> +static inline void free_handle(unsigned long handle, struct
> z3fold_header *zhdr)
>   {
>   	struct z3fold_buddy_slots *slots;
> -	struct z3fold_header *zhdr;
>   	int i;
>   	bool is_free;
>
> @@ -316,22 +315,13 @@ static inline void free_handle(unsigned long handle)
>   	if (WARN_ON(*(unsigned long *)handle == 0))
>   		return;
>
> -	zhdr = handle_to_z3fold_header(handle);
>   	slots = handle_to_slots(handle);
>   	write_lock(&slots->lock);
>   	*(unsigned long *)handle = 0;
> -	if (zhdr->slots == slots) {
> -		write_unlock(&slots->lock);
> -		return; /* simple case, nothing else to do */
> -	}
> +	if (zhdr->slots != slots)
> +		zhdr->foreign_handles--;
>
> -	/* we are freeing a foreign handle if we are here */
> -	zhdr->foreign_handles--;
>   	is_free = true;
> -	if (!test_bit(HANDLES_ORPHANED, &slots->pool)) {
> -		write_unlock(&slots->lock);
> -		return;
> -	}
>   	for (i = 0; i <= BUDDY_MASK; i++) {
>   		if (slots->slot[i]) {
>   			is_free = false;
> @@ -343,6 +333,8 @@ static inline void free_handle(unsigned long handle)
>   	if (is_free) {
>   		struct z3fold_pool *pool = slots_to_pool(slots);
>
> +		if (zhdr->slots == slots)
> +			zhdr->slots = NULL;
>   		kmem_cache_free(pool->c_handle, slots);
>   	}
>   }
> @@ -525,8 +517,6 @@ static void __release_z3fold_page(struct
> z3fold_header *zhdr, bool locked)
>   {
>   	struct page *page = virt_to_page(zhdr);
>   	struct z3fold_pool *pool = zhdr_to_pool(zhdr);
> -	bool is_free = true;
> -	int i;
>
>   	WARN_ON(!list_empty(&zhdr->buddy));
>   	set_bit(PAGE_STALE, &page->private);
> @@ -536,21 +526,6 @@ static void __release_z3fold_page(struct
> z3fold_header *zhdr, bool locked)
>   		list_del_init(&page->lru);
>   	spin_unlock(&pool->lock);
>
> -	/* If there are no foreign handles, free the handles array */
> -	read_lock(&zhdr->slots->lock);
> -	for (i = 0; i <= BUDDY_MASK; i++) {
> -		if (zhdr->slots->slot[i]) {
> -			is_free = false;
> -			break;
> -		}
> -	}
> -	if (!is_free)
> -		set_bit(HANDLES_ORPHANED, &zhdr->slots->pool);
> -	read_unlock(&zhdr->slots->lock);
> -
> -	if (is_free)
> -		kmem_cache_free(pool->c_handle, zhdr->slots);
> -
>   	if (locked)
>   		z3fold_page_unlock(zhdr);
>
> @@ -973,6 +948,9 @@ static inline struct z3fold_header
> *__z3fold_alloc(struct z3fold_pool *pool,
>   		}
>   	}
>
> +	if (zhdr && !zhdr->slots)
> +		zhdr->slots = alloc_slots(pool,
> +					can_sleep ? GFP_NOIO : GFP_ATOMIC);
>   	return zhdr;
>   }
>
> @@ -1270,7 +1248,7 @@ static void z3fold_free(struct z3fold_pool *pool,
> unsigned long handle)
>   	}
>
>   	if (!page_claimed)
> -		free_handle(handle);
> +		free_handle(handle, zhdr);
>   	if (kref_put(&zhdr->refcount, release_z3fold_page_locked_list)) {
>   		atomic64_dec(&pool->pages_nr);
>   		return;
> @@ -1429,19 +1407,19 @@ static int z3fold_reclaim_page(struct
> z3fold_pool *pool, unsigned int retries)
>   			ret = pool->ops->evict(pool, middle_handle);
>   			if (ret)
>   				goto next;
> -			free_handle(middle_handle);
> +			free_handle(middle_handle, zhdr);
>   		}
>   		if (first_handle) {
>   			ret = pool->ops->evict(pool, first_handle);
>   			if (ret)
>   				goto next;
> -			free_handle(first_handle);
> +			free_handle(first_handle, zhdr);
>   		}
>   		if (last_handle) {
>   			ret = pool->ops->evict(pool, last_handle);
>   			if (ret)
>   				goto next;
> -			free_handle(last_handle);
> +			free_handle(last_handle, zhdr);
>   		}
>   next:
>   		if (test_bit(PAGE_HEADLESS, &page->private)) {
>
> --
>
> Best regards,
>     Vitaly


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-12-09  6:13                                                       ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-12-09  6:13 UTC (permalink / raw)
  To: Vitaly Wool
  Cc: Sebastian Andrzej Siewior, Oleksandr Natalenko, LKML, Linux-MM,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, linux-rt-users

On Wed, 2020-12-09 at 00:26 +0100, Vitaly Wool wrote:
> Hi Mike,
>
> On 2020-12-07 16:41, Mike Galbraith wrote:
> > On Mon, 2020-12-07 at 16:21 +0100, Vitaly Wool wrote:
> >> On Mon, Dec 7, 2020 at 1:34 PM Mike Galbraith <efault@gmx.de> wrote:
> >>>
> >>
> >>> Unfortunately, that made zero difference.
> >>
> >> Okay, I suggest that you submit the patch that changes read_lock() to
> >> write_lock() in __release_z3fold_page() and I'll ack it then.
> >> I would like to rewrite the code so that write_lock is not necessary
> >> there but I don't want to hold you back and it isn't likely that I'll
> >> complete this today.
> >
> > Nah, I'm in no rush... especially not to sign off on "Because the
> > little voices in my head said this bit should look like that bit over
> > yonder, and testing _seems_ to indicate they're right about that" :)
> >
> > 	-Mike
> >
>
> okay, thanks. Would this make things better:

Yup, z3fold became RT tolerant with this (un-munged and) applied.

BTW, I don't think my write_lock() hacklet actually plugged the hole
that leads to the below.  I think it just reduced the odds of the two
meeting enough to make it look ~solid in fairly limited test time.

[ 3369.373023] kworker/-7413      4.....12 3368809247us : do_compact_page: zhdr: ffff95d93abd8000 zhdr->slots: ffff95d951f5df80 zhdr->slots->slot[0]: 0
[ 3369.373025] kworker/-7413      4.....12 3368809248us : do_compact_page: old_handle ffff95d951f5df98 *old_handle was ffff95d93abd804f now is ffff95da3ab8104c
[ 3369.373027] kworker/-7413      4.....11 3368809249us : __release_z3fold_page.constprop.25: freed ffff95d951f5df80
[ 3369.373028] ---------------------------------
[ 3369.373029] CR2: 0000000000000018
crash> p debug_handle | grep '\[2'
  [2]: ffff95dc1ecac0d0
crash> rd ffff95dc1ecac0d0
ffff95dc1ecac0d0:  ffff95d951f5df98                    ...Q....
crash> p debug_zhdr | grep '\[2'
  [2]: ffff95dc1ecac0c8
crash> rd ffff95dc1ecac0c8
ffff95dc1ecac0c8:  ffff95da3ab81000                    ...:....  <== kworker is not working on same zhdr as free_handle()..
crash> p debug_slots | grep '\[2'
  [2]: ffff95dc1ecac0c0
crash> rd ffff95dc1ecac0c0
ffff95dc1ecac0c0:  ffff95d951f5df80                    ...Q....  <== ..but is the same slots, and frees it under free_handle().
crash> bt -sx                                                          leading to use after free+corruption+explosion 1us later.
PID: 9334   TASK: ffff95d95a1eb3c0  CPU: 2   COMMAND: "swapoff"
 #0 [ffffb4248847f8f0] machine_kexec+0x16e at ffffffffa605f8ce
 #1 [ffffb4248847f938] __crash_kexec+0xd2 at ffffffffa614c162
 #2 [ffffb4248847f9f8] crash_kexec+0x30 at ffffffffa614d350
 #3 [ffffb4248847fa08] oops_end+0xca at ffffffffa602680a
 #4 [ffffb4248847fa28] no_context+0x14d at ffffffffa606d7cd
 #5 [ffffb4248847fa88] exc_page_fault+0x2b8 at ffffffffa68bdb88
 #6 [ffffb4248847fae0] asm_exc_page_fault+0x1e at ffffffffa6a00ace
 #7 [ffffb4248847fb68] mark_wakeup_next_waiter+0x51 at ffffffffa60ea121
 #8 [ffffb4248847fbd0] rt_mutex_futex_unlock+0x4f at ffffffffa68c93ef
 #9 [ffffb4248847fc10] z3fold_zpool_free+0x593 at ffffffffc0ecb663 [z3fold]
#10 [ffffb4248847fc78] zswap_free_entry+0x43 at ffffffffa627c823
#11 [ffffb4248847fc88] zswap_frontswap_invalidate_page+0x8a at ffffffffa627c92a
#12 [ffffb4248847fcb0] __frontswap_invalidate_page+0x48 at ffffffffa627c018
#13 [ffffb4248847fcd8] swapcache_free_entries+0x1ee at ffffffffa6276f5e
#14 [ffffb4248847fd20] free_swap_slot+0x9f at ffffffffa627b8ff
#15 [ffffb4248847fd40] delete_from_swap_cache+0x61 at ffffffffa6274621
#16 [ffffb4248847fd60] try_to_free_swap+0x70 at ffffffffa6277520
#17 [ffffb4248847fd70] unuse_vma+0x55c at ffffffffa627869c
#18 [ffffb4248847fe90] try_to_unuse+0x139 at ffffffffa6278e89
#19 [ffffb4248847fee8] __x64_sys_swapoff+0x1eb at ffffffffa62798cb
#20 [ffffb4248847ff40] do_syscall_64+0x33 at ffffffffa68b9ab3
#21 [ffffb4248847ff50] entry_SYSCALL_64_after_hwframe+0x44 at ffffffffa6a0007c
    RIP: 00007fbd835a5d17  RSP: 00007ffd60634458  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 0000559540e34b60  RCX: 00007fbd835a5d17
    RDX: 0000000000000001  RSI: 0000000000000001  RDI: 0000559540e34b60
    RBP: 0000000000000001   R8: 00007ffd606344c0   R9: 0000000000000003
    R10: 0000559540e34721  R11: 0000000000000202  R12: 0000000000000001
    R13: 0000000000000000  R14: 00007ffd606344c0  R15: 0000000000000000
    ORIG_RAX: 00000000000000a8  CS: 0033  SS: 002b
crash>

>
> diff --git a/mm/z3fold.c b/mm/z3fold.c
> index 18feaa0bc537..340c38a5ffac 100644
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -303,10 +303,9 @@ static inline void put_z3fold_header(struct
> z3fold_header *zhdr)
>   		z3fold_page_unlock(zhdr);
>   }
>
> -static inline void free_handle(unsigned long handle)
> +static inline void free_handle(unsigned long handle, struct
> z3fold_header *zhdr)
>   {
>   	struct z3fold_buddy_slots *slots;
> -	struct z3fold_header *zhdr;
>   	int i;
>   	bool is_free;
>
> @@ -316,22 +315,13 @@ static inline void free_handle(unsigned long handle)
>   	if (WARN_ON(*(unsigned long *)handle == 0))
>   		return;
>
> -	zhdr = handle_to_z3fold_header(handle);
>   	slots = handle_to_slots(handle);
>   	write_lock(&slots->lock);
>   	*(unsigned long *)handle = 0;
> -	if (zhdr->slots == slots) {
> -		write_unlock(&slots->lock);
> -		return; /* simple case, nothing else to do */
> -	}
> +	if (zhdr->slots != slots)
> +		zhdr->foreign_handles--;
>
> -	/* we are freeing a foreign handle if we are here */
> -	zhdr->foreign_handles--;
>   	is_free = true;
> -	if (!test_bit(HANDLES_ORPHANED, &slots->pool)) {
> -		write_unlock(&slots->lock);
> -		return;
> -	}
>   	for (i = 0; i <= BUDDY_MASK; i++) {
>   		if (slots->slot[i]) {
>   			is_free = false;
> @@ -343,6 +333,8 @@ static inline void free_handle(unsigned long handle)
>   	if (is_free) {
>   		struct z3fold_pool *pool = slots_to_pool(slots);
>
> +		if (zhdr->slots == slots)
> +			zhdr->slots = NULL;
>   		kmem_cache_free(pool->c_handle, slots);
>   	}
>   }
> @@ -525,8 +517,6 @@ static void __release_z3fold_page(struct
> z3fold_header *zhdr, bool locked)
>   {
>   	struct page *page = virt_to_page(zhdr);
>   	struct z3fold_pool *pool = zhdr_to_pool(zhdr);
> -	bool is_free = true;
> -	int i;
>
>   	WARN_ON(!list_empty(&zhdr->buddy));
>   	set_bit(PAGE_STALE, &page->private);
> @@ -536,21 +526,6 @@ static void __release_z3fold_page(struct
> z3fold_header *zhdr, bool locked)
>   		list_del_init(&page->lru);
>   	spin_unlock(&pool->lock);
>
> -	/* If there are no foreign handles, free the handles array */
> -	read_lock(&zhdr->slots->lock);
> -	for (i = 0; i <= BUDDY_MASK; i++) {
> -		if (zhdr->slots->slot[i]) {
> -			is_free = false;
> -			break;
> -		}
> -	}
> -	if (!is_free)
> -		set_bit(HANDLES_ORPHANED, &zhdr->slots->pool);
> -	read_unlock(&zhdr->slots->lock);
> -
> -	if (is_free)
> -		kmem_cache_free(pool->c_handle, zhdr->slots);
> -
>   	if (locked)
>   		z3fold_page_unlock(zhdr);
>
> @@ -973,6 +948,9 @@ static inline struct z3fold_header
> *__z3fold_alloc(struct z3fold_pool *pool,
>   		}
>   	}
>
> +	if (zhdr && !zhdr->slots)
> +		zhdr->slots = alloc_slots(pool,
> +					can_sleep ? GFP_NOIO : GFP_ATOMIC);
>   	return zhdr;
>   }
>
> @@ -1270,7 +1248,7 @@ static void z3fold_free(struct z3fold_pool *pool,
> unsigned long handle)
>   	}
>
>   	if (!page_claimed)
> -		free_handle(handle);
> +		free_handle(handle, zhdr);
>   	if (kref_put(&zhdr->refcount, release_z3fold_page_locked_list)) {
>   		atomic64_dec(&pool->pages_nr);
>   		return;
> @@ -1429,19 +1407,19 @@ static int z3fold_reclaim_page(struct
> z3fold_pool *pool, unsigned int retries)
>   			ret = pool->ops->evict(pool, middle_handle);
>   			if (ret)
>   				goto next;
> -			free_handle(middle_handle);
> +			free_handle(middle_handle, zhdr);
>   		}
>   		if (first_handle) {
>   			ret = pool->ops->evict(pool, first_handle);
>   			if (ret)
>   				goto next;
> -			free_handle(first_handle);
> +			free_handle(first_handle, zhdr);
>   		}
>   		if (last_handle) {
>   			ret = pool->ops->evict(pool, last_handle);
>   			if (ret)
>   				goto next;
> -			free_handle(last_handle);
> +			free_handle(last_handle, zhdr);
>   		}
>   next:
>   		if (test_bit(PAGE_HEADLESS, &page->private)) {
>
> --
>
> Best regards,
>     Vitaly



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
  2020-12-09  6:13                                                       ` Mike Galbraith
@ 2020-12-09  6:31                                                         ` Mike Galbraith
  -1 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-12-09  6:31 UTC (permalink / raw)
  To: Vitaly Wool
  Cc: Sebastian Andrzej Siewior, Oleksandr Natalenko, LKML, Linux-MM,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, linux-rt-users

On Wed, 2020-12-09 at 07:13 +0100, Mike Galbraith wrote:
> On Wed, 2020-12-09 at 00:26 +0100, Vitaly Wool wrote:
> > Hi Mike,
> >
> > On 2020-12-07 16:41, Mike Galbraith wrote:
> > > On Mon, 2020-12-07 at 16:21 +0100, Vitaly Wool wrote:
> > >> On Mon, Dec 7, 2020 at 1:34 PM Mike Galbraith <efault@gmx.de> wrote:
> > >>>
> > >>
> > >>> Unfortunately, that made zero difference.
> > >>
> > >> Okay, I suggest that you submit the patch that changes read_lock() to
> > >> write_lock() in __release_z3fold_page() and I'll ack it then.
> > >> I would like to rewrite the code so that write_lock is not necessary
> > >> there but I don't want to hold you back and it isn't likely that I'll
> > >> complete this today.
> > >
> > > Nah, I'm in no rush... especially not to sign off on "Because the
> > > little voices in my head said this bit should look like that bit over
> > > yonder, and testing _seems_ to indicate they're right about that" :)
> > >
> > > 	-Mike
> > >
> >
> > okay, thanks. Would this make things better:
>
> Yup, z3fold became RT tolerant with this (un-munged and) applied.

Below is the other change that any RT users of z3fold will need.

mm, z3fold: Remove preempt disabled sections for RT

Replace get_cpu_ptr() with migrate_disable()+this_cpu_ptr() so RT can take
spinlocks that become sleeping locks.

Signed-off-by Mike Galbraith <efault@gmx.de>
---
 mm/z3fold.c |   17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -617,14 +617,16 @@ static inline void add_to_unbuddied(stru
 {
 	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 ||
 			zhdr->middle_chunks == 0) {
-		struct list_head *unbuddied = get_cpu_ptr(pool->unbuddied);
-
+		struct list_head *unbuddied;
 		int freechunks = num_free_chunks(zhdr);
+
+		migrate_disable();
+		unbuddied = this_cpu_ptr(pool->unbuddied);
 		spin_lock(&pool->lock);
 		list_add(&zhdr->buddy, &unbuddied[freechunks]);
 		spin_unlock(&pool->lock);
 		zhdr->cpu = smp_processor_id();
-		put_cpu_ptr(pool->unbuddied);
+		migrate_enable();
 	}
 }

@@ -861,8 +863,9 @@ static inline struct z3fold_header *__z3
 	int chunks = size_to_chunks(size), i;

 lookup:
+	migrate_disable();
 	/* First, try to find an unbuddied z3fold page. */
-	unbuddied = get_cpu_ptr(pool->unbuddied);
+	unbuddied = this_cpu_ptr(pool->unbuddied);
 	for_each_unbuddied_list(i, chunks) {
 		struct list_head *l = &unbuddied[i];

@@ -880,7 +883,7 @@ static inline struct z3fold_header *__z3
 		    !z3fold_page_trylock(zhdr)) {
 			spin_unlock(&pool->lock);
 			zhdr = NULL;
-			put_cpu_ptr(pool->unbuddied);
+			migrate_enable();
 			if (can_sleep)
 				cond_resched();
 			goto lookup;
@@ -894,7 +897,7 @@ static inline struct z3fold_header *__z3
 		    test_bit(PAGE_CLAIMED, &page->private)) {
 			z3fold_page_unlock(zhdr);
 			zhdr = NULL;
-			put_cpu_ptr(pool->unbuddied);
+			migrate_enable();
 			if (can_sleep)
 				cond_resched();
 			goto lookup;
@@ -909,7 +912,7 @@ static inline struct z3fold_header *__z3
 		kref_get(&zhdr->refcount);
 		break;
 	}
-	put_cpu_ptr(pool->unbuddied);
+	migrate_enable();

 	if (!zhdr) {
 		int cpu;


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: scheduling while atomic in z3fold
@ 2020-12-09  6:31                                                         ` Mike Galbraith
  0 siblings, 0 replies; 70+ messages in thread
From: Mike Galbraith @ 2020-12-09  6:31 UTC (permalink / raw)
  To: Vitaly Wool
  Cc: Sebastian Andrzej Siewior, Oleksandr Natalenko, LKML, Linux-MM,
	Andrew Morton, Steven Rostedt, Thomas Gleixner, linux-rt-users

On Wed, 2020-12-09 at 07:13 +0100, Mike Galbraith wrote:
> On Wed, 2020-12-09 at 00:26 +0100, Vitaly Wool wrote:
> > Hi Mike,
> >
> > On 2020-12-07 16:41, Mike Galbraith wrote:
> > > On Mon, 2020-12-07 at 16:21 +0100, Vitaly Wool wrote:
> > >> On Mon, Dec 7, 2020 at 1:34 PM Mike Galbraith <efault@gmx.de> wrote:
> > >>>
> > >>
> > >>> Unfortunately, that made zero difference.
> > >>
> > >> Okay, I suggest that you submit the patch that changes read_lock() to
> > >> write_lock() in __release_z3fold_page() and I'll ack it then.
> > >> I would like to rewrite the code so that write_lock is not necessary
> > >> there but I don't want to hold you back and it isn't likely that I'll
> > >> complete this today.
> > >
> > > Nah, I'm in no rush... especially not to sign off on "Because the
> > > little voices in my head said this bit should look like that bit over
> > > yonder, and testing _seems_ to indicate they're right about that" :)
> > >
> > > 	-Mike
> > >
> >
> > okay, thanks. Would this make things better:
>
> Yup, z3fold became RT tolerant with this (un-munged and) applied.

Below is the other change that any RT users of z3fold will need.

mm, z3fold: Remove preempt disabled sections for RT

Replace get_cpu_ptr() with migrate_disable()+this_cpu_ptr() so RT can take
spinlocks that become sleeping locks.

Signed-off-by Mike Galbraith <efault@gmx.de>
---
 mm/z3fold.c |   17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -617,14 +617,16 @@ static inline void add_to_unbuddied(stru
 {
 	if (zhdr->first_chunks == 0 || zhdr->last_chunks == 0 ||
 			zhdr->middle_chunks == 0) {
-		struct list_head *unbuddied = get_cpu_ptr(pool->unbuddied);
-
+		struct list_head *unbuddied;
 		int freechunks = num_free_chunks(zhdr);
+
+		migrate_disable();
+		unbuddied = this_cpu_ptr(pool->unbuddied);
 		spin_lock(&pool->lock);
 		list_add(&zhdr->buddy, &unbuddied[freechunks]);
 		spin_unlock(&pool->lock);
 		zhdr->cpu = smp_processor_id();
-		put_cpu_ptr(pool->unbuddied);
+		migrate_enable();
 	}
 }

@@ -861,8 +863,9 @@ static inline struct z3fold_header *__z3
 	int chunks = size_to_chunks(size), i;

 lookup:
+	migrate_disable();
 	/* First, try to find an unbuddied z3fold page. */
-	unbuddied = get_cpu_ptr(pool->unbuddied);
+	unbuddied = this_cpu_ptr(pool->unbuddied);
 	for_each_unbuddied_list(i, chunks) {
 		struct list_head *l = &unbuddied[i];

@@ -880,7 +883,7 @@ static inline struct z3fold_header *__z3
 		    !z3fold_page_trylock(zhdr)) {
 			spin_unlock(&pool->lock);
 			zhdr = NULL;
-			put_cpu_ptr(pool->unbuddied);
+			migrate_enable();
 			if (can_sleep)
 				cond_resched();
 			goto lookup;
@@ -894,7 +897,7 @@ static inline struct z3fold_header *__z3
 		    test_bit(PAGE_CLAIMED, &page->private)) {
 			z3fold_page_unlock(zhdr);
 			zhdr = NULL;
-			put_cpu_ptr(pool->unbuddied);
+			migrate_enable();
 			if (can_sleep)
 				cond_resched();
 			goto lookup;
@@ -909,7 +912,7 @@ static inline struct z3fold_header *__z3
 		kref_get(&zhdr->refcount);
 		break;
 	}
-	put_cpu_ptr(pool->unbuddied);
+	migrate_enable();

 	if (!zhdr) {
 		int cpu;



^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2020-12-09  6:34 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-28 14:05 scheduling while atomic in z3fold Oleksandr Natalenko
2020-11-28 14:09 ` Oleksandr Natalenko
2020-11-28 14:27   ` Oleksandr Natalenko
2020-11-29  6:41     ` Mike Galbraith
2020-11-29  6:41       ` Mike Galbraith
2020-11-29  7:48       ` Mike Galbraith
2020-11-29  9:21         ` Mike Galbraith
2020-11-29  9:21           ` Mike Galbraith
2020-11-29 10:56           ` Mike Galbraith
2020-11-29 10:56             ` Mike Galbraith
2020-11-29 11:29             ` Oleksandr Natalenko
2020-11-29 11:41               ` Mike Galbraith
2020-11-29 11:41                 ` Mike Galbraith
2020-11-30 13:20                 ` Sebastian Andrzej Siewior
2020-11-30 13:53                   ` Oleksandr Natalenko
2020-11-30 14:28                     ` Sebastian Andrzej Siewior
2020-11-30 14:42                   ` Mike Galbraith
2020-11-30 14:42                     ` Mike Galbraith
2020-11-30 14:52                     ` Sebastian Andrzej Siewior
2020-11-30 15:01                       ` Mike Galbraith
2020-11-30 15:01                         ` Mike Galbraith
2020-11-30 15:03                         ` Mike Galbraith
2020-11-30 15:03                           ` Mike Galbraith
2020-11-30 16:03                         ` Sebastian Andrzej Siewior
2020-11-30 16:27                           ` Mike Galbraith
2020-11-30 16:27                             ` Mike Galbraith
2020-11-30 16:32                             ` Sebastian Andrzej Siewior
2020-11-30 16:36                               ` Mike Galbraith
2020-11-30 16:36                                 ` Mike Galbraith
2020-11-30 19:09                               ` Mike Galbraith
2020-11-30 19:09                                 ` Mike Galbraith
2020-11-30 16:53                             ` Mike Galbraith
2020-11-30 16:53                               ` Mike Galbraith
2020-12-02  2:30                           ` Mike Galbraith
2020-12-02  2:30                             ` Mike Galbraith
2020-12-02 22:08                             ` Sebastian Andrzej Siewior
2020-12-03  2:16                               ` Mike Galbraith
2020-12-03  2:16                                 ` Mike Galbraith
2020-12-03  8:18                                 ` Mike Galbraith
2020-12-03  8:18                                   ` Mike Galbraith
2020-12-03  9:40                                   ` zswap explosion when using zsmalloc pool compression Mike Galbraith
2020-12-03  9:48                                     ` Sebastian Andrzej Siewior
2020-12-03 11:03                                       ` Mike Galbraith
2020-12-03 12:51                                         ` Mike Galbraith
2020-12-03 13:07                                           ` Mike Galbraith
2020-12-03 13:43                                             ` Mike Galbraith
2020-12-03 14:04                                               ` Mike Galbraith
2020-12-04  5:22                                                 ` [patch] mm,zswap: fix zswap::zswap_comp.lock vs zsmalloc::zs_map_area.lock deadlock Mike Galbraith
2020-12-03 13:39                                   ` scheduling while atomic in z3fold Sebastian Andrzej Siewior
2020-12-03 14:07                                     ` Vitaly Wool
2020-12-03 14:07                                       ` Vitaly Wool
2020-12-06  9:18                                     ` Mike Galbraith
2020-12-06  9:18                                       ` Mike Galbraith
2020-12-07  1:05                                       ` Vitaly Wool
2020-12-07  2:18                                         ` Mike Galbraith
2020-12-07  2:18                                           ` Mike Galbraith
2020-12-07 11:52                                           ` Vitaly Wool
2020-12-07 11:52                                             ` Vitaly Wool
2020-12-07 12:34                                             ` Mike Galbraith
2020-12-07 12:34                                               ` Mike Galbraith
2020-12-07 15:21                                               ` Vitaly Wool
2020-12-07 15:21                                                 ` Vitaly Wool
2020-12-07 15:41                                                 ` Sebastian Andrzej Siewior
2020-12-07 15:41                                                 ` Mike Galbraith
2020-12-07 15:41                                                   ` Mike Galbraith
2020-12-08 23:26                                                   ` Vitaly Wool
2020-12-09  6:13                                                     ` Mike Galbraith
2020-12-09  6:13                                                       ` Mike Galbraith
2020-12-09  6:31                                                       ` Mike Galbraith
2020-12-09  6:31                                                         ` Mike Galbraith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.