* [PATCH] mm: mglru: Fix soft lockup attributed to scanning folios
@ 2024-03-07 3:19 Yafang Shao
2024-03-07 17:06 ` Andrew Morton
0 siblings, 1 reply; 6+ messages in thread
From: Yafang Shao @ 2024-03-07 3:19 UTC (permalink / raw)
To: akpm, yuzhao; +Cc: linux-mm, Yafang Shao, stable
After we enabled mglru on our 384C1536GB production servers, we
encountered frequent soft lockups attributed to scanning folios.
The soft lockup as follows,
[Sat Feb 24 02:29:42 2024] watchdog: BUG: soft lockup - CPU#215 stuck for 111s! [kworker/215:0:2200100]
[Sat Feb 24 02:29:42 2024] Call Trace:
[Sat Feb 24 02:29:42 2024] <IRQ>
[Sat Feb 24 02:29:42 2024] ? show_regs.cold+0x1a/0x1f
[Sat Feb 24 02:29:42 2024] ? watchdog_timer_fn+0x1c4/0x220
[Sat Feb 24 02:29:42 2024] ? softlockup_fn+0x30/0x30
[Sat Feb 24 02:29:42 2024] ? __hrtimer_run_queues+0xa2/0x2b0
[Sat Feb 24 02:29:42 2024] ? hrtimer_interrupt+0x109/0x220
[Sat Feb 24 02:29:42 2024] ? __sysvec_apic_timer_interrupt+0x5e/0x110
[Sat Feb 24 02:29:42 2024] ? sysvec_apic_timer_interrupt+0x7b/0x90
[Sat Feb 24 02:29:42 2024] </IRQ>
[Sat Feb 24 02:29:42 2024] <TASK>
[Sat Feb 24 02:29:42 2024] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[Sat Feb 24 02:29:42 2024] ? folio_end_writeback+0x73/0xa0
[Sat Feb 24 02:29:42 2024] ? folio_rotate_reclaimable+0x8c/0x90
[Sat Feb 24 02:29:42 2024] ? folio_rotate_reclaimable+0x57/0x90
[Sat Feb 24 02:29:42 2024] ? folio_rotate_reclaimable+0x8c/0x90
[Sat Feb 24 02:29:42 2024] folio_end_writeback+0x73/0xa0
[Sat Feb 24 02:29:42 2024] iomap_finish_ioend+0x1d4/0x420
[Sat Feb 24 02:29:42 2024] iomap_finish_ioends+0x5e/0xe0
[Sat Feb 24 02:29:42 2024] xfs_end_ioend+0x65/0x150 [xfs]
[Sat Feb 24 02:29:42 2024] xfs_end_io+0xbc/0xf0 [xfs]
[Sat Feb 24 02:29:42 2024] process_one_work+0x1ec/0x3c0
[Sat Feb 24 02:29:42 2024] worker_thread+0x4d/0x390
[Sat Feb 24 02:29:42 2024] ? process_one_work+0x3c0/0x3c0
[Sat Feb 24 02:29:42 2024] kthread+0xee/0x120
[Sat Feb 24 02:29:42 2024] ? kthread_complete_and_exit+0x20/0x20
[Sat Feb 24 02:29:42 2024] ret_from_fork+0x1f/0x30
[Sat Feb 24 02:29:42 2024] </TASK>
From our analysis of the vmcore generated by the soft lockup, the thread
was waiting for the spinlock lruvec->lru_lock:
PID: 2200100 TASK: ffff9a221d8b4000 CPU: 215 COMMAND: "kworker/215:0"
#0 [fffffe000319ae20] crash_nmi_callback at ffffffff8e055419
#1 [fffffe000319ae58] nmi_handle at ffffffff8e0253c0
#2 [fffffe000319aea0] default_do_nmi at ffffffff8eae5985
#3 [fffffe000319aec8] exc_nmi at ffffffff8eae5b78
#4 [fffffe000319aef0] end_repeat_nmi at ffffffff8ec015f0
[exception RIP: queued_spin_lock_slowpath+59]
RIP: ffffffff8eaf9b8b RSP: ffffb58b01d4fc20 RFLAGS: 00000002
RAX: 0000000000000001 RBX: ffffb58b01d4fc90 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff99d2b6ff9050
RBP: ffffb58b01d4fc40 R8: 0000000000035b21 R9: 0000000000000040
R10: 0000000000035b00 R11: 0000000000000001 R12: ffff99d2b6ff9050
R13: 0000000000000046 R14: ffffffff8e28bd30 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#5 [ffffb58b01d4fc20] queued_spin_lock_slowpath at ffffffff8eaf9b8b
#6 [ffffb58b01d4fc48] _raw_spin_lock_irqsave at ffffffff8eaf9b11
#7 [ffffb58b01d4fc68] folio_lruvec_lock_irqsave at ffffffff8e337a82
#8 [ffffb58b01d4fc88] folio_batch_move_lru at ffffffff8e28dbcf
#9 [ffffb58b01d4fcd0] folio_batch_add_and_move at ffffffff8e28dce7
#10 [ffffb58b01d4fce0] folio_rotate_reclaimable at ffffffff8e28eee7
#11 [ffffb58b01d4fcf8] folio_end_writeback at ffffffff8e27bfb3
#12 [ffffb58b01d4fd10] iomap_finish_ioend at ffffffff8e3d9d04
#13 [ffffb58b01d4fd98] iomap_finish_ioends at ffffffff8e3d9fae
#14 [ffffb58b01d4fde0] xfs_end_ioend at ffffffffc0fae835 [xfs]
#15 [ffffb58b01d4fe20] xfs_end_io at ffffffffc0fae9dc [xfs]
#16 [ffffb58b01d4fe60] process_one_work at ffffffff8e0ae08c
#17 [ffffb58b01d4feb0] worker_thread at ffffffff8e0ae2ad
#18 [ffffb58b01d4ff10] kthread at ffffffff8e0b671e
#19 [ffffb58b01d4ff50] ret_from_fork at ffffffff8e002dcf
While the spinlock (RDI: ffff99d2b6ff9050) was held by a task which was
scanning folios:
PID: 2400713 TASK: ffff996be1d14000 CPU: 50 COMMAND: "chitu_main"
--- <NMI exception stack> ---
#5 [ffffb58b14ef76e8] __mod_zone_page_state at ffffffff8e2a9c36
#6 [ffffb58b14ef76f0] folio_inc_gen at ffffffff8e2990bd
#7 [ffffb58b14ef7740] sort_folio at ffffffff8e29afbb
#8 [ffffb58b14ef7748] sysvec_apic_timer_interrupt at ffffffff8eae79f0
#9 [ffffb58b14ef77b0] scan_folios at ffffffff8e29b49b
#10 [ffffb58b14ef7878] evict_folios at ffffffff8e29bb53
#11 [ffffb58b14ef7968] lru_gen_shrink_lruvec at ffffffff8e29cb57
#12 [ffffb58b14ef7a28] shrink_lruvec at ffffffff8e29e135
#13 [ffffb58b14ef7af0] shrink_node at ffffffff8e29e78c
#14 [ffffb58b14ef7b88] do_try_to_free_pages at ffffffff8e29ec08
#15 [ffffb58b14ef7bf8] try_to_free_mem_cgroup_pages at ffffffff8e2a17a6
#16 [ffffb58b14ef7ca8] try_charge_memcg at ffffffff8e338879
#17 [ffffb58b14ef7d48] charge_memcg at ffffffff8e3394f8
#18 [ffffb58b14ef7d70] __mem_cgroup_charge at ffffffff8e33aded
#19 [ffffb58b14ef7d98] do_anonymous_page at ffffffff8e2c6523
#20 [ffffb58b14ef7dd8] __handle_mm_fault at ffffffff8e2cc27d
#21 [ffffb58b14ef7e78] handle_mm_fault at ffffffff8e2cc3ba
#22 [ffffb58b14ef7eb8] do_user_addr_fault at ffffffff8e073a99
#23 [ffffb58b14ef7f20] exc_page_fault at ffffffff8eae82f7
#24 [ffffb58b14ef7f50] asm_exc_page_fault at ffffffff8ec00bb7
There were a total of 22 tasks waiting for this spinlock
(RDI: ffff99d2b6ff9050):
crash> foreach RU bt | grep -B 8 queued_spin_lock_slowpath | grep "RDI: ffff99d2b6ff9050" | wc -l
22
Additionally, two other threads were also engaged in scanning folios, one
with 19 waiters and the other with 15 waiters.
To address this issue under heavy reclaim conditions, we introduced a
hotfix version of the fix, incorporating cond_resched() in scan_folios().
Following the application of this hotfix to our servers, the soft lockup
issue ceased.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: stable@vger.kernel.org # 6.1+
---
mm/vmscan.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4f9c854ce6cc..8f2877285b9a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4367,6 +4367,10 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH)
break;
+
+ spin_unlock_irq(&lruvec->lru_lock);
+ cond_resched();
+ spin_lock_irq(&lruvec->lru_lock);
}
if (skipped_zone) {
--
2.30.1 (Apple Git-130)
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: mglru: Fix soft lockup attributed to scanning folios
2024-03-07 3:19 [PATCH] mm: mglru: Fix soft lockup attributed to scanning folios Yafang Shao
@ 2024-03-07 17:06 ` Andrew Morton
2024-03-08 8:57 ` Yafang Shao
0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2024-03-07 17:06 UTC (permalink / raw)
To: Yafang Shao; +Cc: yuzhao, linux-mm, stable
On Thu, 7 Mar 2024 11:19:52 +0800 Yafang Shao <laoar.shao@gmail.com> wrote:
> After we enabled mglru on our 384C1536GB production servers, we
> encountered frequent soft lockups attributed to scanning folios.
>
> The soft lockup as follows,
>
> ...
>
> There were a total of 22 tasks waiting for this spinlock
> (RDI: ffff99d2b6ff9050):
>
> crash> foreach RU bt | grep -B 8 queued_spin_lock_slowpath | grep "RDI: ffff99d2b6ff9050" | wc -l
> 22
If we're holding the lock for this long then there's a possibility of
getting hit by the NMI watchdog also.
> Additionally, two other threads were also engaged in scanning folios, one
> with 19 waiters and the other with 15 waiters.
>
> To address this issue under heavy reclaim conditions, we introduced a
> hotfix version of the fix, incorporating cond_resched() in scan_folios().
> Following the application of this hotfix to our servers, the soft lockup
> issue ceased.
>
> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4367,6 +4367,10 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
>
> if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH)
> break;
> +
> + spin_unlock_irq(&lruvec->lru_lock);
> + cond_resched();
> + spin_lock_irq(&lruvec->lru_lock);
> }
Presumably wrapping this with `if (need_resched())' will save some work.
This lock is held for a reason. I'd like to see an analysis of why
this change is safe.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: mglru: Fix soft lockup attributed to scanning folios
2024-03-07 17:06 ` Andrew Morton
@ 2024-03-08 8:57 ` Yafang Shao
2024-03-12 20:29 ` Yu Zhao
0 siblings, 1 reply; 6+ messages in thread
From: Yafang Shao @ 2024-03-08 8:57 UTC (permalink / raw)
To: Andrew Morton; +Cc: yuzhao, linux-mm, stable
On Fri, Mar 8, 2024 at 1:06 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu, 7 Mar 2024 11:19:52 +0800 Yafang Shao <laoar.shao@gmail.com> wrote:
>
> > After we enabled mglru on our 384C1536GB production servers, we
> > encountered frequent soft lockups attributed to scanning folios.
> >
> > The soft lockup as follows,
> >
> > ...
> >
> > There were a total of 22 tasks waiting for this spinlock
> > (RDI: ffff99d2b6ff9050):
> >
> > crash> foreach RU bt | grep -B 8 queued_spin_lock_slowpath | grep "RDI: ffff99d2b6ff9050" | wc -l
> > 22
>
> If we're holding the lock for this long then there's a possibility of
> getting hit by the NMI watchdog also.
The NMI watchdog is disabled as these servers are KVM guest.
kernel.nmi_watchdog = 0
kernel.soft_watchdog = 1
>
> > Additionally, two other threads were also engaged in scanning folios, one
> > with 19 waiters and the other with 15 waiters.
> >
> > To address this issue under heavy reclaim conditions, we introduced a
> > hotfix version of the fix, incorporating cond_resched() in scan_folios().
> > Following the application of this hotfix to our servers, the soft lockup
> > issue ceased.
> >
> > ...
> >
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -4367,6 +4367,10 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
> >
> > if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH)
> > break;
> > +
> > + spin_unlock_irq(&lruvec->lru_lock);
> > + cond_resched();
> > + spin_lock_irq(&lruvec->lru_lock);
> > }
>
> Presumably wrapping this with `if (need_resched())' will save some work.
good suggestion.
>
> This lock is held for a reason. I'd like to see an analysis of why
> this change is safe.
I believe the key point here is whether we can reduce the scope of
this lock from:
evict_folios
spin_lock_irq(&lruvec->lru_lock);
scanned = isolate_folios(lruvec, sc, swappiness, &type, &list);
scanned += try_to_inc_min_seq(lruvec, swappiness);
if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS)
scanned = 0;
spin_unlock_irq(&lruvec->lru_lock);
to:
evict_folios
spin_lock_irq(&lruvec->lru_lock);
scanned = isolate_folios(lruvec, sc, swappiness, &type, &list);
spin_unlock_irq(&lruvec->lru_lock);
spin_lock_irq(&lruvec->lru_lock);
scanned += try_to_inc_min_seq(lruvec, swappiness);
if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS)
scanned = 0;
spin_unlock_irq(&lruvec->lru_lock);
In isolate_folios(), it merely utilizes the min_seq to retrieve the
generation without modifying it. If multiple tasks are running
evict_folios() concurrently, it seems inconsequential whether min_seq
is incremented by one task or another. I'd appreciate Yu's
confirmation on this matter.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: mglru: Fix soft lockup attributed to scanning folios
2024-03-08 8:57 ` Yafang Shao
@ 2024-03-12 20:29 ` Yu Zhao
2024-03-12 22:11 ` Yu Zhao
0 siblings, 1 reply; 6+ messages in thread
From: Yu Zhao @ 2024-03-12 20:29 UTC (permalink / raw)
To: Yafang Shao; +Cc: Andrew Morton, linux-mm, stable
On Fri, Mar 08, 2024 at 04:57:08PM +0800, Yafang Shao wrote:
> On Fri, Mar 8, 2024 at 1:06 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Thu, 7 Mar 2024 11:19:52 +0800 Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > > After we enabled mglru on our 384C1536GB production servers, we
> > > encountered frequent soft lockups attributed to scanning folios.
> > >
> > > The soft lockup as follows,
> > >
> > > ...
> > >
> > > There were a total of 22 tasks waiting for this spinlock
> > > (RDI: ffff99d2b6ff9050):
> > >
> > > crash> foreach RU bt | grep -B 8 queued_spin_lock_slowpath | grep "RDI: ffff99d2b6ff9050" | wc -l
> > > 22
> >
> > If we're holding the lock for this long then there's a possibility of
> > getting hit by the NMI watchdog also.
>
> The NMI watchdog is disabled as these servers are KVM guest.
>
> kernel.nmi_watchdog = 0
> kernel.soft_watchdog = 1
>
> >
> > > Additionally, two other threads were also engaged in scanning folios, one
> > > with 19 waiters and the other with 15 waiters.
> > >
> > > To address this issue under heavy reclaim conditions, we introduced a
> > > hotfix version of the fix, incorporating cond_resched() in scan_folios().
> > > Following the application of this hotfix to our servers, the soft lockup
> > > issue ceased.
> > >
> > > ...
> > >
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -4367,6 +4367,10 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
> > >
> > > if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH)
> > > break;
> > > +
> > > + spin_unlock_irq(&lruvec->lru_lock);
> > > + cond_resched();
> > > + spin_lock_irq(&lruvec->lru_lock);
> > > }
> >
> > Presumably wrapping this with `if (need_resched())' will save some work.
>
> good suggestion.
>
> >
> > This lock is held for a reason. I'd like to see an analysis of why
> > this change is safe.
>
> I believe the key point here is whether we can reduce the scope of
> this lock from:
>
> evict_folios
> spin_lock_irq(&lruvec->lru_lock);
> scanned = isolate_folios(lruvec, sc, swappiness, &type, &list);
> scanned += try_to_inc_min_seq(lruvec, swappiness);
> if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS)
> scanned = 0;
> spin_unlock_irq(&lruvec->lru_lock);
>
> to:
>
> evict_folios
> spin_lock_irq(&lruvec->lru_lock);
> scanned = isolate_folios(lruvec, sc, swappiness, &type, &list);
> spin_unlock_irq(&lruvec->lru_lock);
>
> spin_lock_irq(&lruvec->lru_lock);
> scanned += try_to_inc_min_seq(lruvec, swappiness);
> if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS)
> scanned = 0;
> spin_unlock_irq(&lruvec->lru_lock);
>
> In isolate_folios(), it merely utilizes the min_seq to retrieve the
> generation without modifying it. If multiple tasks are running
> evict_folios() concurrently, it seems inconsequential whether min_seq
> is incremented by one task or another. I'd appreciate Yu's
> confirmation on this matter.
Hi Yafang,
Thanks for the patch!
Yes, your second analysis is correct -- we can't just drop the lock
as the original patch does because min_seq can be updated in the mean
time. If this happens, the gen value becomes invalid, since it's based
on the expired min_seq:
sort_folio()
{
..
gen = lru_gen_from_seq(lrugen->min_seq[type]);
..
}
The following might be a better approach (untested):
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4255619a1a31..6fe53cfa8ef8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4365,7 +4365,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
skipped_zone += delta;
}
- if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH)
+ if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH ||
+ spin_is_contended(&lruvec->lru_lock))
break;
}
@@ -4375,7 +4376,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
skipped += skipped_zone;
}
- if (!remaining || isolated >= MIN_LRU_BATCH)
+ if (!remaining || isolated >= MIN_LRU_BATCH ||
+ (scanned && spin_is_contended(&lruvec->lru_lock)))
break;
}
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: mglru: Fix soft lockup attributed to scanning folios
2024-03-12 20:29 ` Yu Zhao
@ 2024-03-12 22:11 ` Yu Zhao
2024-03-13 2:21 ` Yafang Shao
0 siblings, 1 reply; 6+ messages in thread
From: Yu Zhao @ 2024-03-12 22:11 UTC (permalink / raw)
To: Yafang Shao; +Cc: Andrew Morton, linux-mm, stable
On Tue, Mar 12, 2024 at 02:29:48PM -0600, Yu Zhao wrote:
> On Fri, Mar 08, 2024 at 04:57:08PM +0800, Yafang Shao wrote:
> > On Fri, Mar 8, 2024 at 1:06 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > On Thu, 7 Mar 2024 11:19:52 +0800 Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > > After we enabled mglru on our 384C1536GB production servers, we
> > > > encountered frequent soft lockups attributed to scanning folios.
> > > >
> > > > The soft lockup as follows,
> > > >
> > > > ...
> > > >
> > > > There were a total of 22 tasks waiting for this spinlock
> > > > (RDI: ffff99d2b6ff9050):
> > > >
> > > > crash> foreach RU bt | grep -B 8 queued_spin_lock_slowpath | grep "RDI: ffff99d2b6ff9050" | wc -l
> > > > 22
> > >
> > > If we're holding the lock for this long then there's a possibility of
> > > getting hit by the NMI watchdog also.
> >
> > The NMI watchdog is disabled as these servers are KVM guest.
> >
> > kernel.nmi_watchdog = 0
> > kernel.soft_watchdog = 1
> >
> > >
> > > > Additionally, two other threads were also engaged in scanning folios, one
> > > > with 19 waiters and the other with 15 waiters.
> > > >
> > > > To address this issue under heavy reclaim conditions, we introduced a
> > > > hotfix version of the fix, incorporating cond_resched() in scan_folios().
> > > > Following the application of this hotfix to our servers, the soft lockup
> > > > issue ceased.
> > > >
> > > > ...
> > > >
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -4367,6 +4367,10 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
> > > >
> > > > if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH)
> > > > break;
> > > > +
> > > > + spin_unlock_irq(&lruvec->lru_lock);
> > > > + cond_resched();
> > > > + spin_lock_irq(&lruvec->lru_lock);
> > > > }
> > >
> > > Presumably wrapping this with `if (need_resched())' will save some work.
> >
> > good suggestion.
> >
> > >
> > > This lock is held for a reason. I'd like to see an analysis of why
> > > this change is safe.
> >
> > I believe the key point here is whether we can reduce the scope of
> > this lock from:
> >
> > evict_folios
> > spin_lock_irq(&lruvec->lru_lock);
> > scanned = isolate_folios(lruvec, sc, swappiness, &type, &list);
> > scanned += try_to_inc_min_seq(lruvec, swappiness);
> > if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS)
> > scanned = 0;
> > spin_unlock_irq(&lruvec->lru_lock);
> >
> > to:
> >
> > evict_folios
> > spin_lock_irq(&lruvec->lru_lock);
> > scanned = isolate_folios(lruvec, sc, swappiness, &type, &list);
> > spin_unlock_irq(&lruvec->lru_lock);
> >
> > spin_lock_irq(&lruvec->lru_lock);
> > scanned += try_to_inc_min_seq(lruvec, swappiness);
> > if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS)
> > scanned = 0;
> > spin_unlock_irq(&lruvec->lru_lock);
> >
> > In isolate_folios(), it merely utilizes the min_seq to retrieve the
> > generation without modifying it. If multiple tasks are running
> > evict_folios() concurrently, it seems inconsequential whether min_seq
> > is incremented by one task or another. I'd appreciate Yu's
> > confirmation on this matter.
>
> Hi Yafang,
>
> Thanks for the patch!
>
> Yes, your second analysis is correct -- we can't just drop the lock
> as the original patch does because min_seq can be updated in the mean
> time. If this happens, the gen value becomes invalid, since it's based
> on the expired min_seq:
>
> sort_folio()
> {
> ..
> gen = lru_gen_from_seq(lrugen->min_seq[type]);
> ..
> }
>
> The following might be a better approach (untested):
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4255619a1a31..6fe53cfa8ef8 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4365,7 +4365,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
> skipped_zone += delta;
> }
>
> - if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH)
> + if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH ||
> + spin_is_contended(&lruvec->lru_lock))
> break;
> }
>
> @@ -4375,7 +4376,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
> skipped += skipped_zone;
> }
>
> - if (!remaining || isolated >= MIN_LRU_BATCH)
> + if (!remaining || isolated >= MIN_LRU_BATCH ||
> + (scanned && spin_is_contended(&lruvec->lru_lock)))
> break;
> }
A better way might be:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4255619a1a31..ac59f064c4e1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4367,6 +4367,11 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH)
break;
+
+ if (need_resched() || spin_is_contended(&lruvec->lru_lock)) {
+ remaining = 0;
+ break;
+ }
}
if (skipped_zone) {
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: mglru: Fix soft lockup attributed to scanning folios
2024-03-12 22:11 ` Yu Zhao
@ 2024-03-13 2:21 ` Yafang Shao
0 siblings, 0 replies; 6+ messages in thread
From: Yafang Shao @ 2024-03-13 2:21 UTC (permalink / raw)
To: Yu Zhao; +Cc: Andrew Morton, linux-mm, stable
On Wed, Mar 13, 2024 at 6:12 AM Yu Zhao <yuzhao@google.com> wrote:
>
> On Tue, Mar 12, 2024 at 02:29:48PM -0600, Yu Zhao wrote:
> > On Fri, Mar 08, 2024 at 04:57:08PM +0800, Yafang Shao wrote:
> > > On Fri, Mar 8, 2024 at 1:06 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > >
> > > > On Thu, 7 Mar 2024 11:19:52 +0800 Yafang Shao <laoar.shao@gmail.com> wrote:
> > > >
> > > > > After we enabled mglru on our 384C1536GB production servers, we
> > > > > encountered frequent soft lockups attributed to scanning folios.
> > > > >
> > > > > The soft lockup as follows,
> > > > >
> > > > > ...
> > > > >
> > > > > There were a total of 22 tasks waiting for this spinlock
> > > > > (RDI: ffff99d2b6ff9050):
> > > > >
> > > > > crash> foreach RU bt | grep -B 8 queued_spin_lock_slowpath | grep "RDI: ffff99d2b6ff9050" | wc -l
> > > > > 22
> > > >
> > > > If we're holding the lock for this long then there's a possibility of
> > > > getting hit by the NMI watchdog also.
> > >
> > > The NMI watchdog is disabled as these servers are KVM guest.
> > >
> > > kernel.nmi_watchdog = 0
> > > kernel.soft_watchdog = 1
> > >
> > > >
> > > > > Additionally, two other threads were also engaged in scanning folios, one
> > > > > with 19 waiters and the other with 15 waiters.
> > > > >
> > > > > To address this issue under heavy reclaim conditions, we introduced a
> > > > > hotfix version of the fix, incorporating cond_resched() in scan_folios().
> > > > > Following the application of this hotfix to our servers, the soft lockup
> > > > > issue ceased.
> > > > >
> > > > > ...
> > > > >
> > > > > --- a/mm/vmscan.c
> > > > > +++ b/mm/vmscan.c
> > > > > @@ -4367,6 +4367,10 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
> > > > >
> > > > > if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH)
> > > > > break;
> > > > > +
> > > > > + spin_unlock_irq(&lruvec->lru_lock);
> > > > > + cond_resched();
> > > > > + spin_lock_irq(&lruvec->lru_lock);
> > > > > }
> > > >
> > > > Presumably wrapping this with `if (need_resched())' will save some work.
> > >
> > > good suggestion.
> > >
> > > >
> > > > This lock is held for a reason. I'd like to see an analysis of why
> > > > this change is safe.
> > >
> > > I believe the key point here is whether we can reduce the scope of
> > > this lock from:
> > >
> > > evict_folios
> > > spin_lock_irq(&lruvec->lru_lock);
> > > scanned = isolate_folios(lruvec, sc, swappiness, &type, &list);
> > > scanned += try_to_inc_min_seq(lruvec, swappiness);
> > > if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS)
> > > scanned = 0;
> > > spin_unlock_irq(&lruvec->lru_lock);
> > >
> > > to:
> > >
> > > evict_folios
> > > spin_lock_irq(&lruvec->lru_lock);
> > > scanned = isolate_folios(lruvec, sc, swappiness, &type, &list);
> > > spin_unlock_irq(&lruvec->lru_lock);
> > >
> > > spin_lock_irq(&lruvec->lru_lock);
> > > scanned += try_to_inc_min_seq(lruvec, swappiness);
> > > if (get_nr_gens(lruvec, !swappiness) == MIN_NR_GENS)
> > > scanned = 0;
> > > spin_unlock_irq(&lruvec->lru_lock);
> > >
> > > In isolate_folios(), it merely utilizes the min_seq to retrieve the
> > > generation without modifying it. If multiple tasks are running
> > > evict_folios() concurrently, it seems inconsequential whether min_seq
> > > is incremented by one task or another. I'd appreciate Yu's
> > > confirmation on this matter.
> >
> > Hi Yafang,
> >
> > Thanks for the patch!
> >
> > Yes, your second analysis is correct -- we can't just drop the lock
> > as the original patch does because min_seq can be updated in the mean
> > time. If this happens, the gen value becomes invalid, since it's based
> > on the expired min_seq:
> >
> > sort_folio()
> > {
> > ..
> > gen = lru_gen_from_seq(lrugen->min_seq[type]);
> > ..
> > }
> >
> > The following might be a better approach (untested):
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 4255619a1a31..6fe53cfa8ef8 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -4365,7 +4365,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
> > skipped_zone += delta;
> > }
> >
> > - if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH)
> > + if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH ||
> > + spin_is_contended(&lruvec->lru_lock))
> > break;
> > }
> >
> > @@ -4375,7 +4376,8 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
> > skipped += skipped_zone;
> > }
> >
> > - if (!remaining || isolated >= MIN_LRU_BATCH)
> > + if (!remaining || isolated >= MIN_LRU_BATCH ||
> > + (scanned && spin_is_contended(&lruvec->lru_lock)))
> > break;
> > }
>
> A better way might be:
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4255619a1a31..ac59f064c4e1 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4367,6 +4367,11 @@ static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
>
> if (!--remaining || max(isolated, skipped_zone) >= MIN_LRU_BATCH)
> break;
> +
> + if (need_resched() || spin_is_contended(&lruvec->lru_lock)) {
> + remaining = 0;
> + break;
> + }
> }
>
> if (skipped_zone) {
It is better. Thanks for your suggestion.
I will verify it on our production servers, which may take several days.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-03-13 2:22 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-07 3:19 [PATCH] mm: mglru: Fix soft lockup attributed to scanning folios Yafang Shao
2024-03-07 17:06 ` Andrew Morton
2024-03-08 8:57 ` Yafang Shao
2024-03-12 20:29 ` Yu Zhao
2024-03-12 22:11 ` Yu Zhao
2024-03-13 2:21 ` Yafang Shao
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.