* zone->nr_inactive race?
@ 2003-04-21 17:58 Nikita Danilov
2003-04-21 22:34 ` Andrew Morton
0 siblings, 1 reply; 3+ messages in thread
From: Nikita Danilov @ 2003-04-21 17:58 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Andrew Morton
Hello,
I am observing on 2.5.67 that sometimes shrink_slab() is called with
insanely huge total_scanned. In kgdb I see
(gdb) f
#1 0xc013734c in shrink_caches (classzone=0xc04ad680, priority=12,
total_scanned=0xddf15b60, gfp_mask=210, nr_pages=32, ps=0xddf15b64)
at mm/vmscan.c:794
(gdb) p/x max_scan
$20 = 0xfffff # which looks suspiciously similar to ((unsigned)-1) >> 12
(gdb) p priority
$21 = 12
(gdb) p zone->nr_inactive
$22 = 0
(gdb) info threads
....
16 Thread 17 0xc012e37c in unlock_page (page=0xc14d12c8)
at include/asm/bitops.h:175
...
(gdb) thread 16 # this is the only other thread doing something page cache related at the moment
(gdb) bt
#0 0xc012e37c in unlock_page (page=0xc14d12c8) at include/asm/bitops.h:175
#1 0xc01369d9 in shrink_list (page_list=0xdfd27e54, gfp_mask=208,
max_scan=0xdfd27eb8, nr_mapped=0xdfd27f0c, priority=7) at mm/vmscan.c:434
#2 0xc0136c56 in shrink_cache (nr_pages=513, zone=0xc04ad680, gfp_mask=208,
max_scan=637, nr_mapped=0xdfd27f0c, priority=7) at mm/vmscan.c:517
#3 0xc01372bd in shrink_zone (zone=0xc04ad680, max_scan=1026, gfp_mask=208,
nr_pages=513, nr_mapped=0xdfd27f0c, ps=0xdfd27f44, priority=7)
at mm/vmscan.c:746
#4 0xc0137527 in balance_pgdat (pgdat=0xc04ac280, nr_pages=0, ps=0xdfd27f44)
at mm/vmscan.c:909
#5 0xc01376a7 in kswapd (p=0xc04ac280) at mm/vmscan.c:969
Looks like zone->nr_inactive was negative.
After some more debugging I managed to break into kgbd when
zone->nr_inactive is 0xffffffc0 (again in shrink_caches()). On the
another processor a thread is running inside refill_inactive_zone():
while (!list_empty(&l_inactive)) {
page = list_entry(l_inactive.prev, struct page, lru);
prefetchw_prev_lru_page(page, &l_inactive, flags);
if (TestSetPageLRU(page))
BUG();
(somewhere near to TestSetPageLRU(page)).
(gdb) bt
#0 0xc0137008 in refill_inactive_zone (zone=0xc04ad680, nr_pages_in=128,
ps=0xdd921a24, priority=9) at include/asm/bitops.h:136
#1 0xc01372a6 in shrink_zone (zone=0xc04ad680, max_scan=64, gfp_mask=210,
nr_pages=32, nr_mapped=0xdd9219e4, ps=0xdd921a24, priority=9)
at mm/vmscan.c:744
#2 0xc0137349 in shrink_caches (classzone=0xc04ad680, priority=9,
total_scanned=0xdd921a20, gfp_mask=210, nr_pages=32, ps=0xdd921a24)
at mm/vmscan.c:794
#3 0xc0137417 in try_to_free_pages (classzone=0xc04ad680, gfp_mask=210,
order=0) at mm/vmscan.c:836
#4 0xc0131848 in __alloc_pages (gfp_mask=210, order=0, zonelist=0xc04afea0)
at mm/page_alloc.c:610
...
This fragment of refill_inactive_zone() looks strange:
list_move(&page->lru, &zone->inactive_list);
if (!pagevec_add(&pvec, page)) {
spin_unlock_irq(&zone->lru_lock);
if (buffer_heads_over_limit)
pagevec_strip(&pvec);
__pagevec_release(&pvec);
spin_lock_irq(&zone->lru_lock);
}
page is already on the inactive list (but not accounted for in
zone->nr_inactive). Zone spin lock is released. If at this moment other
thread would call del_page_from_lru(page), zone->nr_inactive can go
negative.
Let me know if more info is necessary, I can reproduce this.
Nikita.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: zone->nr_inactive race?
2003-04-21 17:58 zone->nr_inactive race? Nikita Danilov
@ 2003-04-21 22:34 ` Andrew Morton
2003-04-22 6:43 ` Nikita Danilov
0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2003-04-21 22:34 UTC (permalink / raw)
To: Nikita Danilov; +Cc: Linux-Kernel
Nikita Danilov <Nikita@Namesys.COM> wrote:
>
> This fragment of refill_inactive_zone() looks strange:
>
> list_move(&page->lru, &zone->inactive_list);
> if (!pagevec_add(&pvec, page)) {
> spin_unlock_irq(&zone->lru_lock);
> if (buffer_heads_over_limit)
> pagevec_strip(&pvec);
> __pagevec_release(&pvec);
> spin_lock_irq(&zone->lru_lock);
> }
>
Thanks, you're dead right. That's buggy.
I am fairly surprised that you were able to hit this. How are you doing
it? On a 1G machine with a teeny ZONE_HIGHMEM??
I haven't tested this yet, but it should fix it up.
diff -puN mm/vmscan.c~nr_inactive-race-fix mm/vmscan.c
--- 25/mm/vmscan.c~nr_inactive-race-fix Mon Apr 21 15:15:48 2003
+++ 25-akpm/mm/vmscan.c Mon Apr 21 15:30:33 2003
@@ -557,6 +557,7 @@ static void
refill_inactive_zone(struct zone *zone, const int nr_pages_in,
struct page_state *ps, int priority)
{
+ int pgmoved;
int pgdeactivate = 0;
int nr_pages = nr_pages_in;
LIST_HEAD(l_hold); /* The pages which were snipped off */
@@ -570,6 +571,7 @@ refill_inactive_zone(struct zone *zone,
long swap_tendency;
lru_add_drain();
+ pgmoved = 0;
spin_lock_irq(&zone->lru_lock);
while (nr_pages && !list_empty(&zone->active_list)) {
page = list_entry(zone->active_list.prev, struct page, lru);
@@ -584,9 +586,12 @@ refill_inactive_zone(struct zone *zone,
} else {
page_cache_get(page);
list_add(&page->lru, &l_hold);
+ pgmoved++;
}
nr_pages--;
}
+ zone->nr_active -= pgmoved;
+ zone->nr_inactive += pgmoved;
spin_unlock_irq(&zone->lru_lock);
/*
@@ -646,10 +651,10 @@ refill_inactive_zone(struct zone *zone,
continue;
}
list_add(&page->lru, &l_inactive);
- pgdeactivate++;
}
pagevec_init(&pvec, 1);
+ pgmoved = 0;
spin_lock_irq(&zone->lru_lock);
while (!list_empty(&l_inactive)) {
page = list_entry(l_inactive.prev, struct page, lru);
@@ -659,19 +664,27 @@ refill_inactive_zone(struct zone *zone,
if (!TestClearPageActive(page))
BUG();
list_move(&page->lru, &zone->inactive_list);
+ pgmoved++;
if (!pagevec_add(&pvec, page)) {
+ zone->nr_inactive += pgmoved;
spin_unlock_irq(&zone->lru_lock);
+ pgdeactivate += pgmoved;
+ pgmoved = 0;
if (buffer_heads_over_limit)
pagevec_strip(&pvec);
__pagevec_release(&pvec);
spin_lock_irq(&zone->lru_lock);
}
}
+ zone->nr_inactive += pgmoved;
+ pgdeactivate += pgmoved;
if (buffer_heads_over_limit) {
spin_unlock_irq(&zone->lru_lock);
pagevec_strip(&pvec);
spin_lock_irq(&zone->lru_lock);
}
+
+ pgmoved = 0;
while (!list_empty(&l_active)) {
page = list_entry(l_active.prev, struct page, lru);
prefetchw_prev_lru_page(page, &l_active, flags);
@@ -679,14 +692,16 @@ refill_inactive_zone(struct zone *zone,
BUG();
BUG_ON(!PageActive(page));
list_move(&page->lru, &zone->active_list);
+ pgmoved++;
if (!pagevec_add(&pvec, page)) {
+ zone->nr_active += pgmoved;
+ pgmoved = 0;
spin_unlock_irq(&zone->lru_lock);
__pagevec_release(&pvec);
spin_lock_irq(&zone->lru_lock);
}
}
- zone->nr_active -= pgdeactivate;
- zone->nr_inactive += pgdeactivate;
+ zone->nr_active += pgmoved;
spin_unlock_irq(&zone->lru_lock);
pagevec_release(&pvec);
_
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: zone->nr_inactive race?
2003-04-21 22:34 ` Andrew Morton
@ 2003-04-22 6:43 ` Nikita Danilov
0 siblings, 0 replies; 3+ messages in thread
From: Nikita Danilov @ 2003-04-22 6:43 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linux-Kernel
Andrew Morton writes:
> Nikita Danilov <Nikita@Namesys.COM> wrote:
> >
> > This fragment of refill_inactive_zone() looks strange:
> >
> > list_move(&page->lru, &zone->inactive_list);
> > if (!pagevec_add(&pvec, page)) {
> > spin_unlock_irq(&zone->lru_lock);
> > if (buffer_heads_over_limit)
> > pagevec_strip(&pvec);
> > __pagevec_release(&pvec);
> > spin_lock_irq(&zone->lru_lock);
> > }
> >
>
> Thanks, you're dead right. That's buggy.
>
> I am fairly surprised that you were able to hit this. How are you doing
> it? On a 1G machine with a teeny ZONE_HIGHMEM??
:)
Modester:
Dual Xeon, 2.20GHz with hyper threading.
512M of ram, but with CONFIG_HIGHMEM4G=y.
I am running
ftp://ftp.namesys.com/pub/namesys-utils/nfs_fh_stale.c
with
./nfs -p 41 -i 100000000 -B -L 22000000 -F sync=0 -s 0 -f 1000000000 -M 1000000000
on reiser4. Its on-disk working set stabilizes somewhere around 14G, and
it produces large amounts of ->writepage() traffic.
>
> I haven't tested this yet, but it should fix it up.
>
OK, I shall try.
Nikita.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2003-04-22 6:31 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-21 17:58 zone->nr_inactive race? Nikita Danilov
2003-04-21 22:34 ` Andrew Morton
2003-04-22 6:43 ` Nikita Danilov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).