linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* zone->nr_inactive race?
@ 2003-04-21 17:58 Nikita Danilov
  2003-04-21 22:34 ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: Nikita Danilov @ 2003-04-21 17:58 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Andrew Morton

Hello,

I am observing on 2.5.67 that sometimes shrink_slab() is called with
insanely huge total_scanned. In kgdb I see

(gdb) f
#1  0xc013734c in shrink_caches (classzone=0xc04ad680, priority=12, 
    total_scanned=0xddf15b60, gfp_mask=210, nr_pages=32, ps=0xddf15b64)
    at mm/vmscan.c:794
(gdb) p/x max_scan
$20 = 0xfffff # which looks suspiciously similar to ((unsigned)-1) >> 12
(gdb) p priority
$21 = 12
(gdb) p zone->nr_inactive
$22 = 0
(gdb) info threads
....
  16 Thread 17  0xc012e37c in unlock_page (page=0xc14d12c8)
    at include/asm/bitops.h:175
...
(gdb) thread 16 # this is the only other thread doing something page cache related at the moment
(gdb) bt
#0  0xc012e37c in unlock_page (page=0xc14d12c8) at include/asm/bitops.h:175
#1  0xc01369d9 in shrink_list (page_list=0xdfd27e54, gfp_mask=208, 
    max_scan=0xdfd27eb8, nr_mapped=0xdfd27f0c, priority=7) at mm/vmscan.c:434
#2  0xc0136c56 in shrink_cache (nr_pages=513, zone=0xc04ad680, gfp_mask=208, 
    max_scan=637, nr_mapped=0xdfd27f0c, priority=7) at mm/vmscan.c:517
#3  0xc01372bd in shrink_zone (zone=0xc04ad680, max_scan=1026, gfp_mask=208, 
    nr_pages=513, nr_mapped=0xdfd27f0c, ps=0xdfd27f44, priority=7)
    at mm/vmscan.c:746
#4  0xc0137527 in balance_pgdat (pgdat=0xc04ac280, nr_pages=0, ps=0xdfd27f44)
    at mm/vmscan.c:909
#5  0xc01376a7 in kswapd (p=0xc04ac280) at mm/vmscan.c:969

Looks like zone->nr_inactive was negative.

After some more debugging I managed to break into kgbd when
zone->nr_inactive is 0xffffffc0 (again in shrink_caches()). On the
another processor a thread is running inside refill_inactive_zone():

	while (!list_empty(&l_inactive)) {
		page = list_entry(l_inactive.prev, struct page, lru);
		prefetchw_prev_lru_page(page, &l_inactive, flags);
		if (TestSetPageLRU(page))
			BUG();

(somewhere near to TestSetPageLRU(page)).

(gdb) bt
#0  0xc0137008 in refill_inactive_zone (zone=0xc04ad680, nr_pages_in=128, 
    ps=0xdd921a24, priority=9) at include/asm/bitops.h:136
#1  0xc01372a6 in shrink_zone (zone=0xc04ad680, max_scan=64, gfp_mask=210, 
    nr_pages=32, nr_mapped=0xdd9219e4, ps=0xdd921a24, priority=9)
    at mm/vmscan.c:744
#2  0xc0137349 in shrink_caches (classzone=0xc04ad680, priority=9, 
    total_scanned=0xdd921a20, gfp_mask=210, nr_pages=32, ps=0xdd921a24)
    at mm/vmscan.c:794
#3  0xc0137417 in try_to_free_pages (classzone=0xc04ad680, gfp_mask=210, 
    order=0) at mm/vmscan.c:836
#4  0xc0131848 in __alloc_pages (gfp_mask=210, order=0, zonelist=0xc04afea0)
    at mm/page_alloc.c:610
...

This fragment of refill_inactive_zone() looks strange:

		list_move(&page->lru, &zone->inactive_list);
		if (!pagevec_add(&pvec, page)) {
			spin_unlock_irq(&zone->lru_lock);
			if (buffer_heads_over_limit)
				pagevec_strip(&pvec);
			__pagevec_release(&pvec);
			spin_lock_irq(&zone->lru_lock);
		}

page is already on the inactive list (but not accounted for in
zone->nr_inactive). Zone spin lock is released. If at this moment other
thread would call del_page_from_lru(page), zone->nr_inactive can go
negative.

Let me know if more info is necessary, I can reproduce this.

Nikita.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: zone->nr_inactive race?
  2003-04-21 17:58 zone->nr_inactive race? Nikita Danilov
@ 2003-04-21 22:34 ` Andrew Morton
  2003-04-22  6:43   ` Nikita Danilov
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2003-04-21 22:34 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Linux-Kernel

Nikita Danilov <Nikita@Namesys.COM> wrote:
>
> This fragment of refill_inactive_zone() looks strange:
> 
> 		list_move(&page->lru, &zone->inactive_list);
> 		if (!pagevec_add(&pvec, page)) {
> 			spin_unlock_irq(&zone->lru_lock);
> 			if (buffer_heads_over_limit)
> 				pagevec_strip(&pvec);
> 			__pagevec_release(&pvec);
> 			spin_lock_irq(&zone->lru_lock);
> 		}
> 

Thanks, you're dead right.  That's buggy.

I am fairly surprised that you were able to hit this.  How are you doing
it?  On a 1G machine with a teeny ZONE_HIGHMEM??

I haven't tested this yet, but it should fix it up.


diff -puN mm/vmscan.c~nr_inactive-race-fix mm/vmscan.c
--- 25/mm/vmscan.c~nr_inactive-race-fix	Mon Apr 21 15:15:48 2003
+++ 25-akpm/mm/vmscan.c	Mon Apr 21 15:30:33 2003
@@ -557,6 +557,7 @@ static void
 refill_inactive_zone(struct zone *zone, const int nr_pages_in,
 			struct page_state *ps, int priority)
 {
+	int pgmoved;
 	int pgdeactivate = 0;
 	int nr_pages = nr_pages_in;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
@@ -570,6 +571,7 @@ refill_inactive_zone(struct zone *zone, 
 	long swap_tendency;
 
 	lru_add_drain();
+	pgmoved = 0;
 	spin_lock_irq(&zone->lru_lock);
 	while (nr_pages && !list_empty(&zone->active_list)) {
 		page = list_entry(zone->active_list.prev, struct page, lru);
@@ -584,9 +586,12 @@ refill_inactive_zone(struct zone *zone, 
 		} else {
 			page_cache_get(page);
 			list_add(&page->lru, &l_hold);
+			pgmoved++;
 		}
 		nr_pages--;
 	}
+	zone->nr_active -= pgmoved;
+	zone->nr_inactive += pgmoved;
 	spin_unlock_irq(&zone->lru_lock);
 
 	/*
@@ -646,10 +651,10 @@ refill_inactive_zone(struct zone *zone, 
 			continue;
 		}
 		list_add(&page->lru, &l_inactive);
-		pgdeactivate++;
 	}
 
 	pagevec_init(&pvec, 1);
+	pgmoved = 0;
 	spin_lock_irq(&zone->lru_lock);
 	while (!list_empty(&l_inactive)) {
 		page = list_entry(l_inactive.prev, struct page, lru);
@@ -659,19 +664,27 @@ refill_inactive_zone(struct zone *zone, 
 		if (!TestClearPageActive(page))
 			BUG();
 		list_move(&page->lru, &zone->inactive_list);
+		pgmoved++;
 		if (!pagevec_add(&pvec, page)) {
+			zone->nr_inactive += pgmoved;
 			spin_unlock_irq(&zone->lru_lock);
+			pgdeactivate += pgmoved;
+			pgmoved = 0;
 			if (buffer_heads_over_limit)
 				pagevec_strip(&pvec);
 			__pagevec_release(&pvec);
 			spin_lock_irq(&zone->lru_lock);
 		}
 	}
+	zone->nr_inactive += pgmoved;
+	pgdeactivate += pgmoved;
 	if (buffer_heads_over_limit) {
 		spin_unlock_irq(&zone->lru_lock);
 		pagevec_strip(&pvec);
 		spin_lock_irq(&zone->lru_lock);
 	}
+
+	pgmoved = 0;
 	while (!list_empty(&l_active)) {
 		page = list_entry(l_active.prev, struct page, lru);
 		prefetchw_prev_lru_page(page, &l_active, flags);
@@ -679,14 +692,16 @@ refill_inactive_zone(struct zone *zone, 
 			BUG();
 		BUG_ON(!PageActive(page));
 		list_move(&page->lru, &zone->active_list);
+		pgmoved++;
 		if (!pagevec_add(&pvec, page)) {
+			zone->nr_active += pgmoved;
+			pgmoved = 0;
 			spin_unlock_irq(&zone->lru_lock);
 			__pagevec_release(&pvec);
 			spin_lock_irq(&zone->lru_lock);
 		}
 	}
-	zone->nr_active -= pgdeactivate;
-	zone->nr_inactive += pgdeactivate;
+	zone->nr_active += pgmoved;
 	spin_unlock_irq(&zone->lru_lock);
 	pagevec_release(&pvec);
 

_


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: zone->nr_inactive race?
  2003-04-21 22:34 ` Andrew Morton
@ 2003-04-22  6:43   ` Nikita Danilov
  0 siblings, 0 replies; 3+ messages in thread
From: Nikita Danilov @ 2003-04-22  6:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux-Kernel

Andrew Morton writes:
 > Nikita Danilov <Nikita@Namesys.COM> wrote:
 > >
 > > This fragment of refill_inactive_zone() looks strange:
 > > 
 > > 		list_move(&page->lru, &zone->inactive_list);
 > > 		if (!pagevec_add(&pvec, page)) {
 > > 			spin_unlock_irq(&zone->lru_lock);
 > > 			if (buffer_heads_over_limit)
 > > 				pagevec_strip(&pvec);
 > > 			__pagevec_release(&pvec);
 > > 			spin_lock_irq(&zone->lru_lock);
 > > 		}
 > > 
 > 
 > Thanks, you're dead right.  That's buggy.
 > 
 > I am fairly surprised that you were able to hit this.  How are you doing
 > it?  On a 1G machine with a teeny ZONE_HIGHMEM??

:)

Modester:

Dual Xeon, 2.20GHz with hyper threading.

512M of ram, but with CONFIG_HIGHMEM4G=y.

I am running

ftp://ftp.namesys.com/pub/namesys-utils/nfs_fh_stale.c

with 

./nfs -p 41 -i 100000000 -B -L 22000000 -F sync=0 -s 0 -f 1000000000 -M 1000000000

on reiser4. Its on-disk working set stabilizes somewhere around 14G, and
it produces large amounts of ->writepage() traffic.

 > 
 > I haven't tested this yet, but it should fix it up.
 > 

OK, I shall try.

Nikita.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2003-04-22  6:31 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-21 17:58 zone->nr_inactive race? Nikita Danilov
2003-04-21 22:34 ` Andrew Morton
2003-04-22  6:43   ` Nikita Danilov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).