* Crashes/hung tasks with z3pool under memory pressure
@ 2018-04-12 22:01 Guenter Roeck
2018-04-13 5:21 ` Vitaly Wool
0 siblings, 1 reply; 13+ messages in thread
From: Guenter Roeck @ 2018-04-12 22:01 UTC (permalink / raw)
To: linux-kernel; +Cc: Vitaly Wool, Andrew Morton, Matthew Wilcox, asavery, gwendal
Hi all,
we are observing crashes with z3pool under memory pressure. The kernel version
used to reproduce the problem is v4.16-11827-g5d1365940a68, but the problem was
also seen with v4.14 based kernels.
For simplicity, here is a set of shortened logs. A more complete log is
available at [1].
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >= PREEMPT_MASK - 10)
WARNING: CPU: 2 PID: 594 at kernel/sched/core.c:3212 preempt_count_add+0x90/0xa0
Modules linked in:
CPU: 2 PID: 594 Comm: memory-eater Not tainted 4.16.0-yocto-standard+ #8
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014
RIP: 0010:preempt_count_add+0x90/0xa0
RSP: 0000:ffffb12740db7750 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000f6
RDX: 00000000000000f6 RSI: 0000000000000082 RDI: 00000000ffffffff
RBP: fffff00480f357a0 R08: 000000000000004a R09: 00000000000001ad
R10: ffffb12740db77e0 R11: 0000000000000000 R12: ffff9cbc7e265d10
R13: ffff9cbc7cd5e000 R14: ffff9cbc7a7000d8 R15: fffff00480f35780
FS: 00007f5140791700(0000) GS:ffff9cbc7fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f513260f000 CR3: 0000000032086000 CR4: 00000000000006e0
Call Trace:
_raw_spin_trylock+0x13/0x30
z3fold_zpool_shrink+0xab/0x3a0
zswap_frontswap_store+0x10b/0x610
...
WARNING: CPU: 1 PID: 92 at mm/z3fold.c:278 release_z3fold_page_locked+0x25/0x40
Modules linked in:
...
INFO: rcu_preempt self-detected stall on CPU
2-...!: (20958 ticks this GP) idle=5da/1/4611686018427387906
softirq=4104/4113 fqs=11
...
RIP: 0010:queued_spin_lock_slowpath+0x132/0x190
RSP: 0000:ffffb12740db7750 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
RAX: 0000000000100101 RBX: ffff9cbc7a7000c8 RCX: 0000000000000001
RDX: 0000000000000101 RSI: 0000000000000001 RDI: 0000000000000101
RBP: 0000000000000000 R08: ffff9cbc7fc21240 R09: ffffffffaf19c900
R10: ffffb12740db75a0 R11: 0000000000000010 R12: fffff00480522d20
R13: ffff9cbc548b4000 R14: ffff9cbc7a7000d8 R15: fffff00480522d00
? __zswap_pool_current+0x80/0x90
z3fold_zpool_shrink+0x1d3/0x3a0
zswap_frontswap_store+0x10b/0x610
...
With lock debugging enabled, the log is a bit different, but similar.
BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
depth: 48 max: 48!
48 locks held by memory-eater/619:
#0: 000000002da807ce (&mm->mmap_sem){++++}, at: __do_page_fault+0x122/0x5a0
#1: 0000000012fa6629 (&(&pool->lock)->rlock#3){+.+.}, at: z3fold_zpool_shrink+0x47/0x3e0
#2: 00000000c85f45dd (&(&zhdr->page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0
#3: 00000000876f5fdc (&(&zhdr->page_lock)->rlock){+.+.}, at: z3fold_zpool_shrink+0xb7/0x3e0
...
watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [memory-eater:613]
Modules linked in:
irq event stamp: 1435394
hardirqs last enabled at (1435393): [<ffffffff9b90ed81>]
_raw_spin_unlock_irqrestore+0x51/0x60
hardirqs last disabled at (1435394): [<ffffffff9b907baa>] __schedule+0xba/0xbb0
softirqs last enabled at (1434508): [<ffffffff9bc0027c>] __do_softirq+0x27c/0x516
softirqs last disabled at (1434323): [<ffffffff9b069a29>] irq_exit+0xa9/0xc0
CPU: 0 PID: 613 Comm: memory-eater Tainted: G W
4.16.0-yocto-standard+ #9
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1 04/01/2014
RIP: 0010:queued_spin_lock_slowpath+0x177/0x1a0
RSP: 0000:ffffa61f80e074e0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: 0000000000000000 RBX: ffff9704379cba08 RCX: ffff97043fc22080
RDX: 0000000000000001 RSI: ffffffff9c05f6a0 RDI: 0000000000040000
RBP: 0000000000000000 R08: ffffffff9b1f2215 R09: 0000000000000000
R10: ffffa61f80e07490 R11: ffff9704379cba20 R12: ffff97043fc22080
R13: ffffde77404e0920 R14: ffff970413824000 R15: ffff9704379cba00
FS: 00007f8c6317f700(0000) GS:ffff97043fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8c43f3d010 CR3: 000000003aba0000 CR4: 00000000000006f0
Call Trace:
do_raw_spin_lock+0xad/0xb0
z3fold_zpool_malloc+0x595/0x790
...
The problem is easy to reproduce. Please see [2] and the other files
at [3] for details. Various additional crash logs, observed with
chromeos-4.14, are available at [4].
Please let me know if there is anything else I can do to help solving
or debugging the problem. I had a look into the code, but I must admit
that its locking is a mystery to me.
Thanks,
Guenter
---
[1] http://server.roeck-us.net/qemu/z3pool/crashdump
[2] http://server.roeck-us.net/qemu/z3pool/README
[3] http://server.roeck-us.net/qemu/z3pool/
[4] https://bugs.chromium.org/p/chromium/issues/detail?id=822360
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Crashes/hung tasks with z3pool under memory pressure
2018-04-12 22:01 Crashes/hung tasks with z3pool under memory pressure Guenter Roeck
@ 2018-04-13 5:21 ` Vitaly Wool
2018-04-13 17:35 ` Guenter Roeck
0 siblings, 1 reply; 13+ messages in thread
From: Vitaly Wool @ 2018-04-13 5:21 UTC (permalink / raw)
To: linux; +Cc: LKML, Andrew Morton, mawilcox, asavery, gwendal
Hi Guenter,
Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck <linux@roeck-us.net>:
> Hi all,
> we are observing crashes with z3pool under memory pressure. The kernel
version
> used to reproduce the problem is v4.16-11827-g5d1365940a68, but the
problem was
> also seen with v4.14 based kernels.
just before I dig into this, could you please try reproducing the errors
you see with https://patchwork.kernel.org/patch/10210459/ applied?
Thanks,
Vitaly
> For simplicity, here is a set of shortened logs. A more complete log is
> available at [1].
> ------------[ cut here ]------------
> DEBUG_LOCKS_WARN_ON((preempt_count() & PREEMPT_MASK) >= PREEMPT_MASK - 10)
> WARNING: CPU: 2 PID: 594 at kernel/sched/core.c:3212
preempt_count_add+0x90/0xa0
> Modules linked in:
> CPU: 2 PID: 594 Comm: memory-eater Not tainted 4.16.0-yocto-standard+ #8
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1
04/01/2014
> RIP: 0010:preempt_count_add+0x90/0xa0
> RSP: 0000:ffffb12740db7750 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: 0000000000000001 RCX: 00000000000000f6
> RDX: 00000000000000f6 RSI: 0000000000000082 RDI: 00000000ffffffff
> RBP: fffff00480f357a0 R08: 000000000000004a R09: 00000000000001ad
> R10: ffffb12740db77e0 R11: 0000000000000000 R12: ffff9cbc7e265d10
> R13: ffff9cbc7cd5e000 R14: ffff9cbc7a7000d8 R15: fffff00480f35780
> FS: 00007f5140791700(0000) GS:ffff9cbc7fd00000(0000)
knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f513260f000 CR3: 0000000032086000 CR4: 00000000000006e0
> Call Trace:
> _raw_spin_trylock+0x13/0x30
> z3fold_zpool_shrink+0xab/0x3a0
> zswap_frontswap_store+0x10b/0x610
> ...
> WARNING: CPU: 1 PID: 92 at mm/z3fold.c:278
release_z3fold_page_locked+0x25/0x40
> Modules linked in:
> ...
> INFO: rcu_preempt self-detected stall on CPU
> 2-...!: (20958 ticks this GP) idle=5da/1/4611686018427387906
> softirq=4104/4113 fqs=11
> ...
> RIP: 0010:queued_spin_lock_slowpath+0x132/0x190
> RSP: 0000:ffffb12740db7750 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000100101 RBX: ffff9cbc7a7000c8 RCX: 0000000000000001
> RDX: 0000000000000101 RSI: 0000000000000001 RDI: 0000000000000101
> RBP: 0000000000000000 R08: ffff9cbc7fc21240 R09: ffffffffaf19c900
> R10: ffffb12740db75a0 R11: 0000000000000010 R12: fffff00480522d20
> R13: ffff9cbc548b4000 R14: ffff9cbc7a7000d8 R15: fffff00480522d00
> ? __zswap_pool_current+0x80/0x90
> z3fold_zpool_shrink+0x1d3/0x3a0
> zswap_frontswap_store+0x10b/0x610
> ...
> With lock debugging enabled, the log is a bit different, but similar.
> BUG: MAX_LOCK_DEPTH too low!
> turning off the locking correctness validator.
> depth: 48 max: 48!
> 48 locks held by memory-eater/619:
> #0: 000000002da807ce (&mm->mmap_sem){++++}, at:
__do_page_fault+0x122/0x5a0
> #1: 0000000012fa6629 (&(&pool->lock)->rlock#3){+.+.}, at:
z3fold_zpool_shrink+0x47/0x3e0
> #2: 00000000c85f45dd (&(&zhdr->page_lock)->rlock){+.+.}, at:
z3fold_zpool_shrink+0xb7/0x3e0
> #3: 00000000876f5fdc (&(&zhdr->page_lock)->rlock){+.+.}, at:
z3fold_zpool_shrink+0xb7/0x3e0
> ...
> watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [memory-eater:613]
> Modules linked in:
> irq event stamp: 1435394
> hardirqs last enabled at (1435393): [<ffffffff9b90ed81>]
> _raw_spin_unlock_irqrestore+0x51/0x60
> hardirqs last disabled at (1435394): [<ffffffff9b907baa>]
__schedule+0xba/0xbb0
> softirqs last enabled at (1434508): [<ffffffff9bc0027c>]
__do_softirq+0x27c/0x516
> softirqs last disabled at (1434323): [<ffffffff9b069a29>]
irq_exit+0xa9/0xc0
> CPU: 0 PID: 613 Comm: memory-eater Tainted: G W
> 4.16.0-yocto-standard+ #9
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1
04/01/2014
> RIP: 0010:queued_spin_lock_slowpath+0x177/0x1a0
> RSP: 0000:ffffa61f80e074e0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> RAX: 0000000000000000 RBX: ffff9704379cba08 RCX: ffff97043fc22080
> RDX: 0000000000000001 RSI: ffffffff9c05f6a0 RDI: 0000000000040000
> RBP: 0000000000000000 R08: ffffffff9b1f2215 R09: 0000000000000000
> R10: ffffa61f80e07490 R11: ffff9704379cba20 R12: ffff97043fc22080
> R13: ffffde77404e0920 R14: ffff970413824000 R15: ffff9704379cba00
> FS: 00007f8c6317f700(0000) GS:ffff97043fc00000(0000)
knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f8c43f3d010 CR3: 000000003aba0000 CR4: 00000000000006f0
> Call Trace:
> do_raw_spin_lock+0xad/0xb0
> z3fold_zpool_malloc+0x595/0x790
> ...
> The problem is easy to reproduce. Please see [2] and the other files
> at [3] for details. Various additional crash logs, observed with
> chromeos-4.14, are available at [4].
> Please let me know if there is anything else I can do to help solving
> or debugging the problem. I had a look into the code, but I must admit
> that its locking is a mystery to me.
> Thanks,
> Guenter
> ---
> [1] http://server.roeck-us.net/qemu/z3pool/crashdump
> [2] http://server.roeck-us.net/qemu/z3pool/README
> [3] http://server.roeck-us.net/qemu/z3pool/
> [4] https://bugs.chromium.org/p/chromium/issues/detail?id=822360
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Crashes/hung tasks with z3pool under memory pressure
2018-04-13 5:21 ` Vitaly Wool
@ 2018-04-13 17:35 ` Guenter Roeck
[not found] ` <CAMJBoFPXObpXyQWz-zPJ7JnC-Z5FqqrLfr5BFWKdh+szZrPZ7A@mail.gmail.com>
0 siblings, 1 reply; 13+ messages in thread
From: Guenter Roeck @ 2018-04-13 17:35 UTC (permalink / raw)
To: Vitaly Wool; +Cc: LKML, Andrew Morton, mawilcox, asavery, gwendal
On Fri, Apr 13, 2018 at 05:21:02AM +0000, Vitaly Wool wrote:
> Hi Guenter,
>
>
> Den fre 13 apr. 2018 kl 00:01 skrev Guenter Roeck <linux@roeck-us.net>:
>
> > Hi all,
>
> > we are observing crashes with z3pool under memory pressure. The kernel
> version
> > used to reproduce the problem is v4.16-11827-g5d1365940a68, but the
> problem was
> > also seen with v4.14 based kernels.
>
>
> just before I dig into this, could you please try reproducing the errors
> you see with https://patchwork.kernel.org/patch/10210459/ applied?
>
As mentioned above, I tested with v4.16-11827-g5d1365940a68, which already
includes this patch.
$ git log --oneline v4.14..5d1365940a68 mm/z3fold.c
8a97ea546bb6 mm/z3fold.c: use gfpflags_allow_blocking
1ec6995d1290 z3fold: fix memory leak
5c9bab592f53 z3fold: limit use of stale list for allocation
f144c390f905 mm: docs: fix parameter names mismatch
5d03a6613957 mm/z3fold.c: use kref to prevent page free/compact race
Guenter
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Crashes/hung tasks with z3pool under memory pressure
@ 2018-04-17 14:00 Vitaly Wool
2018-04-17 16:35 ` Guenter Roeck
2018-04-18 8:13 ` Vitaly Wool
0 siblings, 2 replies; 13+ messages in thread
From: Vitaly Wool @ 2018-04-17 14:00 UTC (permalink / raw)
To: Guenter Roeck; +Cc: LKML, Andrew Morton, mawilcox, asavery, gwendal
Hi Guenter,
> [ ... ]
> > Ugh. Could you please keep that patch and apply this on top:
> >
> > diff --git a/mm/z3fold.c b/mm/z3fold.c
> > index c0bca6153b95..e8a80d044d9e 100644
> > --- a/mm/z3fold.c
> > +++ b/mm/z3fold.c
> > @@ -840,6 +840,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries)
> > kref_get(&zhdr->refcount);
> > list_del_init(&zhdr->buddy);
> > zhdr->cpu = -1;
> > + break;
> > }
> > list_del_init(&page->lru);
> >
> Much better, in a way. The system now takes much longer to crash,
> and the crash reason is a bit different. The log is too long to attach,
> so I copied it to [1].
>
> crashdump.0002 Latest log
> 000[12]-Fix-attempt-[12].patch Patches applied on top of v4.17.0-rc1.
thanks for the update. Let's start from a clean sheet. I believe this patch has to be applied anyway so could you please check if it solves the problem.
diff --git a/mm/z3fold.c b/mm/z3fold.c
index c0bca6153b95..059fb3d5ca86 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -144,7 +144,8 @@ enum z3fold_page_flags {
PAGE_HEADLESS = 0,
MIDDLE_CHUNK_MAPPED,
NEEDS_COMPACTING,
- PAGE_STALE
+ PAGE_STALE,
+ UNDER_RECLAIM
};
/*****************
@@ -173,6 +174,7 @@ static struct z3fold_header *init_z3fold_page(struct page *page,
clear_bit(MIDDLE_CHUNK_MAPPED, &page->private);
clear_bit(NEEDS_COMPACTING, &page->private);
clear_bit(PAGE_STALE, &page->private);
+ clear_bit(UNDER_RECLAIM, &page->private);
spin_lock_init(&zhdr->page_lock);
kref_init(&zhdr->refcount);
@@ -756,6 +758,10 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned long handle)
atomic64_dec(&pool->pages_nr);
return;
}
+ if (test_bit(UNDER_RECLAIM, &page->private)) {
+ z3fold_page_unlock(zhdr);
+ return;
+ }
if (test_and_set_bit(NEEDS_COMPACTING, &page->private)) {
z3fold_page_unlock(zhdr);
return;
@@ -840,6 +846,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries)
kref_get(&zhdr->refcount);
list_del_init(&zhdr->buddy);
zhdr->cpu = -1;
+ set_bit(UNDER_RECLAIM, &page->private);
+ break;
}
list_del_init(&page->lru);
@@ -888,6 +896,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries)
}
next:
spin_lock(&pool->lock);
+ clear_bit(UNDER_RECLAIM, &page->private);
if (test_bit(PAGE_HEADLESS, &page->private)) {
if (ret == 0) {
spin_unlock(&pool->lock);
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: Crashes/hung tasks with z3pool under memory pressure
2018-04-17 14:00 Vitaly Wool
@ 2018-04-17 16:35 ` Guenter Roeck
2018-04-18 8:13 ` Vitaly Wool
1 sibling, 0 replies; 13+ messages in thread
From: Guenter Roeck @ 2018-04-17 16:35 UTC (permalink / raw)
To: Vitaly Wool; +Cc: LKML, Andrew Morton, mawilcox, asavery, gwendal
On Tue, Apr 17, 2018 at 04:00:32PM +0200, Vitaly Wool wrote:
> Hi Guenter,
>
> > [ ... ]
> > > Ugh. Could you please keep that patch and apply this on top:
> > >
> > > diff --git a/mm/z3fold.c b/mm/z3fold.c
> > > index c0bca6153b95..e8a80d044d9e 100644
> > > --- a/mm/z3fold.c
> > > +++ b/mm/z3fold.c
> > > @@ -840,6 +840,7 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries)
> > > kref_get(&zhdr->refcount);
> > > list_del_init(&zhdr->buddy);
> > > zhdr->cpu = -1;
> > > + break;
> > > }
> > > list_del_init(&page->lru);
> > >
> > Much better, in a way. The system now takes much longer to crash,
> > and the crash reason is a bit different. The log is too long to attach,
> > so I copied it to [1].
> >
> > crashdump.0002 Latest log
> > 000[12]-Fix-attempt-[12].patch Patches applied on top of v4.17.0-rc1.
>
> thanks for the update. Let's start from a clean sheet. I believe this patch has to be applied anyway so could you please check if it solves the problem.
>
Getting better; the log is much less noisy. Unfortunately, there are still
locking problems, resulting in a hung task. I copied the log message to [1].
This is with [2] applied on top of v4.17-rc1.
Guenter
---
[1] http://server.roeck-us.net/qemu/z3pool/crashdump.0003
[2] http://server.roeck-us.net/qemu/z3pool/0001-Fix-attempt-3.patch
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Crashes/hung tasks with z3pool under memory pressure
2018-04-17 14:00 Vitaly Wool
2018-04-17 16:35 ` Guenter Roeck
@ 2018-04-18 8:13 ` Vitaly Wool
2018-04-18 16:07 ` Guenter Roeck
1 sibling, 1 reply; 13+ messages in thread
From: Vitaly Wool @ 2018-04-18 8:13 UTC (permalink / raw)
To: Guenter Roeck; +Cc: LKML, Andrew Morton, mawilcox, asavery, gwendal
Den tis 17 apr. 2018 kl 18:35 skrev Guenter Roeck <linux@roeck-us.net>:
<snip>
> Getting better; the log is much less noisy. Unfortunately, there are still
> locking problems, resulting in a hung task. I copied the log message to [1].
> This is with [2] applied on top of v4.17-rc1.
Now this version (this is a full patch to be applied instead of the previous one) should have the above problem resolved too:
diff --git a/mm/z3fold.c b/mm/z3fold.c
index c0bca6153b95..901c0b07cbda 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -144,7 +144,8 @@ enum z3fold_page_flags {
PAGE_HEADLESS = 0,
MIDDLE_CHUNK_MAPPED,
NEEDS_COMPACTING,
- PAGE_STALE
+ PAGE_STALE,
+ UNDER_RECLAIM
};
/*****************
@@ -173,6 +174,7 @@ static struct z3fold_header *init_z3fold_page(struct page *page,
clear_bit(MIDDLE_CHUNK_MAPPED, &page->private);
clear_bit(NEEDS_COMPACTING, &page->private);
clear_bit(PAGE_STALE, &page->private);
+ clear_bit(UNDER_RECLAIM, &page->private);
spin_lock_init(&zhdr->page_lock);
kref_init(&zhdr->refcount);
@@ -756,6 +758,10 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned long handle)
atomic64_dec(&pool->pages_nr);
return;
}
+ if (test_bit(UNDER_RECLAIM, &page->private)) {
+ z3fold_page_unlock(zhdr);
+ return;
+ }
if (test_and_set_bit(NEEDS_COMPACTING, &page->private)) {
z3fold_page_unlock(zhdr);
return;
@@ -840,6 +846,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries)
kref_get(&zhdr->refcount);
list_del_init(&zhdr->buddy);
zhdr->cpu = -1;
+ set_bit(UNDER_RECLAIM, &page->private);
+ break;
}
list_del_init(&page->lru);
@@ -887,25 +895,35 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries)
goto next;
}
next:
- spin_lock(&pool->lock);
if (test_bit(PAGE_HEADLESS, &page->private)) {
if (ret == 0) {
- spin_unlock(&pool->lock);
free_z3fold_page(page);
return 0;
}
- } else if (kref_put(&zhdr->refcount, release_z3fold_page)) {
- atomic64_dec(&pool->pages_nr);
+ spin_lock(&pool->lock);
+ list_add(&page->lru, &pool->lru);
+ spin_unlock(&pool->lock);
+ } else {
+ z3fold_page_lock(zhdr);
+ clear_bit(UNDER_RECLAIM, &page->private);
+ if (kref_put(&zhdr->refcount,
+ release_z3fold_page_locked)) {
+ atomic64_dec(&pool->pages_nr);
+ return 0;
+ }
+ /*
+ * if we are here, the page is still not completely
+ * free. Take the global pool lock then to be able
+ * to add it back to the lru list
+ */
+ spin_lock(&pool->lock);
+ list_add(&page->lru, &pool->lru);
spin_unlock(&pool->lock);
- return 0;
+ z3fold_page_unlock(zhdr);
}
- /*
- * Add to the beginning of LRU.
- * Pool lock has to be kept here to ensure the page has
- * not already been released
- */
- list_add(&page->lru, &pool->lru);
+ /* We started off locked to we need to lock the pool back */
+ spin_lock(&pool->lock);
}
spin_unlock(&pool->lock);
return -EAGAIN;
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: Crashes/hung tasks with z3pool under memory pressure
2018-04-18 8:13 ` Vitaly Wool
@ 2018-04-18 16:07 ` Guenter Roeck
[not found] ` <CAMJBoFO9ktEHK=e=Dkq99tNNKHM1iPuxUJ5bmyoX_bzjGwOmig@mail.gmail.com>
0 siblings, 1 reply; 13+ messages in thread
From: Guenter Roeck @ 2018-04-18 16:07 UTC (permalink / raw)
To: Vitaly Wool; +Cc: LKML, Andrew Morton, mawilcox, asavery, gwendal
On Wed, Apr 18, 2018 at 10:13:17AM +0200, Vitaly Wool wrote:
> Den tis 17 apr. 2018 kl 18:35 skrev Guenter Roeck <linux@roeck-us.net>:
>
> <snip>
>
> > Getting better; the log is much less noisy. Unfortunately, there are still
> > locking problems, resulting in a hung task. I copied the log message to [1].
> > This is with [2] applied on top of v4.17-rc1.
>
> Now this version (this is a full patch to be applied instead of the previous one) should have the above problem resolved too:
>
Excellent - I can not reproduce the problem with this patch
applied.
Guenter
> diff --git a/mm/z3fold.c b/mm/z3fold.c
> index c0bca6153b95..901c0b07cbda 100644
> --- a/mm/z3fold.c
> +++ b/mm/z3fold.c
> @@ -144,7 +144,8 @@ enum z3fold_page_flags {
> PAGE_HEADLESS = 0,
> MIDDLE_CHUNK_MAPPED,
> NEEDS_COMPACTING,
> - PAGE_STALE
> + PAGE_STALE,
> + UNDER_RECLAIM
> };
>
> /*****************
> @@ -173,6 +174,7 @@ static struct z3fold_header *init_z3fold_page(struct page *page,
> clear_bit(MIDDLE_CHUNK_MAPPED, &page->private);
> clear_bit(NEEDS_COMPACTING, &page->private);
> clear_bit(PAGE_STALE, &page->private);
> + clear_bit(UNDER_RECLAIM, &page->private);
>
> spin_lock_init(&zhdr->page_lock);
> kref_init(&zhdr->refcount);
> @@ -756,6 +758,10 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned long handle)
> atomic64_dec(&pool->pages_nr);
> return;
> }
> + if (test_bit(UNDER_RECLAIM, &page->private)) {
> + z3fold_page_unlock(zhdr);
> + return;
> + }
> if (test_and_set_bit(NEEDS_COMPACTING, &page->private)) {
> z3fold_page_unlock(zhdr);
> return;
> @@ -840,6 +846,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries)
> kref_get(&zhdr->refcount);
> list_del_init(&zhdr->buddy);
> zhdr->cpu = -1;
> + set_bit(UNDER_RECLAIM, &page->private);
> + break;
> }
>
> list_del_init(&page->lru);
> @@ -887,25 +895,35 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries)
> goto next;
> }
> next:
> - spin_lock(&pool->lock);
> if (test_bit(PAGE_HEADLESS, &page->private)) {
> if (ret == 0) {
> - spin_unlock(&pool->lock);
> free_z3fold_page(page);
> return 0;
> }
> - } else if (kref_put(&zhdr->refcount, release_z3fold_page)) {
> - atomic64_dec(&pool->pages_nr);
> + spin_lock(&pool->lock);
> + list_add(&page->lru, &pool->lru);
> + spin_unlock(&pool->lock);
> + } else {
> + z3fold_page_lock(zhdr);
> + clear_bit(UNDER_RECLAIM, &page->private);
> + if (kref_put(&zhdr->refcount,
> + release_z3fold_page_locked)) {
> + atomic64_dec(&pool->pages_nr);
> + return 0;
> + }
> + /*
> + * if we are here, the page is still not completely
> + * free. Take the global pool lock then to be able
extra then ?
> + * to add it back to the lru list
> + */
> + spin_lock(&pool->lock);
> + list_add(&page->lru, &pool->lru);
> spin_unlock(&pool->lock);
> - return 0;
> + z3fold_page_unlock(zhdr);
> }
>
> - /*
> - * Add to the beginning of LRU.
> - * Pool lock has to be kept here to ensure the page has
> - * not already been released
> - */
> - list_add(&page->lru, &pool->lru);
> + /* We started off locked to we need to lock the pool back */
> + spin_lock(&pool->lock);
> }
> spin_unlock(&pool->lock);
> return -EAGAIN;
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2018-04-27 17:18 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-12 22:01 Crashes/hung tasks with z3pool under memory pressure Guenter Roeck
2018-04-13 5:21 ` Vitaly Wool
2018-04-13 17:35 ` Guenter Roeck
[not found] ` <CAMJBoFPXObpXyQWz-zPJ7JnC-Z5FqqrLfr5BFWKdh+szZrPZ7A@mail.gmail.com>
2018-04-13 17:56 ` Guenter Roeck
2018-04-16 12:43 ` Vitaly Wool
2018-04-16 15:58 ` Guenter Roeck
2018-04-16 22:14 ` Vitaly Wool
2018-04-16 22:37 ` Guenter Roeck
2018-04-17 14:00 Vitaly Wool
2018-04-17 16:35 ` Guenter Roeck
2018-04-18 8:13 ` Vitaly Wool
2018-04-18 16:07 ` Guenter Roeck
[not found] ` <CAMJBoFO9ktEHK=e=Dkq99tNNKHM1iPuxUJ5bmyoX_bzjGwOmig@mail.gmail.com>
2018-04-27 17:18 ` Guenter Roeck
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.