linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/4] mm: de-skew page refcount
@ 2006-01-18 10:40 Nick Piggin
  2006-01-18 10:40 ` [patch 1/4] mm: page refcount use atomic primitives Nick Piggin
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Nick Piggin @ 2006-01-18 10:40 UTC (permalink / raw)
  To: Linux Memory Management, Linux Kernel Mailing List
  Cc: Hugh Dickins, Andrew Morton, Andrea Arcangeli, Nick Piggin,
	Linus Torvalds, David Miller

The following patchset (against 2.6.16-rc1 + migrate race fixes) uses the new
atomic ops to do away with the offset page refcounting, and simplify the race
that it was designed to cover.

This allows some nice optimisations, and in the page freeing path we end up
saving 2 atomic ops including a spin_lock_irqsave in the !PageLRU case, and 1
or 2 atomic ops in the PageLRU case.

Andrew's previous feedback has been incorporated (less BUG_ONs in fastpaths
and more detail in changelogs).

Anyone spot any holes or races?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 1/4] mm: page refcount use atomic primitives
  2006-01-18 10:40 [patch 0/4] mm: de-skew page refcount Nick Piggin
@ 2006-01-18 10:40 ` Nick Piggin
  2006-01-18 10:40 ` [patch 2/4] mm: PageLRU no testset Nick Piggin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Nick Piggin @ 2006-01-18 10:40 UTC (permalink / raw)
  To: Linux Memory Management, Linux Kernel Mailing List
  Cc: Hugh Dickins, Andrew Morton, Andrea Arcangeli, Nick Piggin,
	Linus Torvalds, David Miller

The VM has an interesting race where a page refcount can drop to zero, but
it is still on the LRU lists for a short time. This was solved by testing
a 0->1 refcount transition when picking up pages from the LRU, and dropping
the refcount in that case.

Instead, use atomic_inc_not_zero to ensure we never pick up a 0 refcount
page from the LRU (ie. we guarantee that such a page will not be touched
by vmscan), so we need not hold ->lru_lock to stabalise PageLRU.

This ensures we can test PageLRU without taking the lru_lock, and allows
further optimisations (in later patches) -- we end up saving 2 atomic ops
including a spin_lock_irqsave in the !PageLRU case, and 1 or 2 atomic ops
in the PageLRU case.

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/include/linux/mm.h
===================================================================
--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -286,43 +286,44 @@ struct page {
  *
  * Also, many kernel routines increase the page count before a critical
  * routine so they can be sure the page doesn't go away from under them.
- *
- * Since 2.6.6 (approx), a free page has ->_count = -1.  This is so that we
- * can use atomic_add_negative(-1, page->_count) to detect when the page
- * becomes free and so that we can also use atomic_inc_and_test to atomically
- * detect when we just tried to grab a ref on a page which some other CPU has
- * already deemed to be freeable.
- *
- * NO code should make assumptions about this internal detail!  Use the provided
- * macros which retain the old rules: page_count(page) == 0 is a free page.
  */
 
 /*
  * Drop a ref, return true if the logical refcount fell to zero (the page has
  * no users)
  */
-#define put_page_testzero(p)				\
-	({						\
-		BUG_ON(page_count(p) == 0);		\
-		atomic_add_negative(-1, &(p)->_count);	\
-	})
+static inline int put_page_testzero(struct page *page)
+{
+	BUG_ON(atomic_read(&page->_count) == 0);
+	return atomic_dec_and_test(&page->_count);
+}
 
 /*
  * Grab a ref, return true if the page previously had a logical refcount of
  * zero.  ie: returns true if we just grabbed an already-deemed-to-be-free page
  */
-#define get_page_testone(p)	atomic_inc_and_test(&(p)->_count)
+static inline int get_page_unless_zero(struct page *page)
+{
+	return atomic_inc_not_zero(&page->_count);
+}
 
-#define set_page_count(p,v) 	atomic_set(&(p)->_count, (v) - 1)
-#define __put_page(p)		atomic_dec(&(p)->_count)
+static inline void set_page_count(struct page *page, int v)
+{
+	atomic_set(&page->_count, v);
+}
+
+static inline void __put_page(struct page *page)
+{
+	atomic_dec(&page->_count);
+}
 
 extern void FASTCALL(__page_cache_release(struct page *));
 
 static inline int page_count(struct page *page)
 {
-	if (PageCompound(page))
+	if (unlikely(PageCompound(page)))
 		page = (struct page *)page_private(page);
-	return atomic_read(&page->_count) + 1;
+	return atomic_read(&page->_count);
 }
 
 static inline void get_page(struct page *page)
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -815,24 +815,20 @@ static int isolate_lru_pages(int nr_to_s
 	int scan = 0;
 
 	while (scan++ < nr_to_scan && !list_empty(src)) {
+		struct list_head *target;
 		page = lru_to_page(src);
 		prefetchw_prev_lru_page(page, src, flags);
 
-		if (!TestClearPageLRU(page))
-			BUG();
+		BUG_ON(!PageLRU(page));
 		list_del(&page->lru);
-		if (get_page_testone(page)) {
-			/*
-			 * It is being freed elsewhere
-			 */
-			__put_page(page);
-			SetPageLRU(page);
-			list_add(&page->lru, src);
-			continue;
-		} else {
-			list_add(&page->lru, dst);
+		target = src;
+		if (get_page_unless_zero(page)) {
+			ClearPageLRU(page);
+			target = dst;
 			nr_taken++;
-		}
+		} /* else it is being freed elsewhere */
+
+		list_add(&page->lru, target);
 	}
 
 	*scanned = scan;
Index: linux-2.6/mm/swap.c
===================================================================
--- linux-2.6.orig/mm/swap.c
+++ linux-2.6/mm/swap.c
@@ -198,17 +198,18 @@ int lru_add_drain_all(void)
  */
 void fastcall __page_cache_release(struct page *page)
 {
-	unsigned long flags;
-	struct zone *zone = page_zone(page);
+	if (PageLRU(page)) {
+		unsigned long flags;
 
-	spin_lock_irqsave(&zone->lru_lock, flags);
-	if (TestClearPageLRU(page))
+		struct zone *zone = page_zone(page);
+		spin_lock_irqsave(&zone->lru_lock, flags);
+		if (!TestClearPageLRU(page))
+			BUG();
 		del_page_from_lru(zone, page);
-	if (page_count(page) != 0)
-		page = NULL;
-	spin_unlock_irqrestore(&zone->lru_lock, flags);
-	if (page)
-		free_hot_page(page);
+		spin_unlock_irqrestore(&zone->lru_lock, flags);
+	}
+
+	free_hot_page(page);
 }
 
 EXPORT_SYMBOL(__page_cache_release);
@@ -234,27 +235,30 @@ void release_pages(struct page **pages, 
 	pagevec_init(&pages_to_free, cold);
 	for (i = 0; i < nr; i++) {
 		struct page *page = pages[i];
-		struct zone *pagezone;
 
 		if (!put_page_testzero(page))
 			continue;
 
-		pagezone = page_zone(page);
-		if (pagezone != zone) {
-			if (zone)
-				spin_unlock_irq(&zone->lru_lock);
-			zone = pagezone;
-			spin_lock_irq(&zone->lru_lock);
-		}
-		if (TestClearPageLRU(page))
+		if (PageLRU(page)) {
+			struct zone *pagezone = page_zone(page);
+			if (pagezone != zone) {
+				if (zone)
+					spin_unlock_irq(&zone->lru_lock);
+				zone = pagezone;
+				spin_lock_irq(&zone->lru_lock);
+			}
+			if (!TestClearPageLRU(page))
+				BUG();
 			del_page_from_lru(zone, page);
-		if (page_count(page) == 0) {
-			if (!pagevec_add(&pages_to_free, page)) {
+		}
+
+		if (!pagevec_add(&pages_to_free, page)) {
+			if (zone) {
 				spin_unlock_irq(&zone->lru_lock);
-				__pagevec_free(&pages_to_free);
-				pagevec_reinit(&pages_to_free);
-				zone = NULL;	/* No lock is held */
+				zone = NULL;
 			}
+			__pagevec_free(&pages_to_free);
+			pagevec_reinit(&pages_to_free);
 		}
 	}
 	if (zone)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 2/4] mm: PageLRU no testset
  2006-01-18 10:40 [patch 0/4] mm: de-skew page refcount Nick Piggin
  2006-01-18 10:40 ` [patch 1/4] mm: page refcount use atomic primitives Nick Piggin
@ 2006-01-18 10:40 ` Nick Piggin
  2006-01-19 17:48   ` Nikita Danilov
  2006-01-18 10:40 ` [patch 3/3] mm: PageActive " Nick Piggin
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2006-01-18 10:40 UTC (permalink / raw)
  To: Linux Memory Management, Linux Kernel Mailing List
  Cc: Hugh Dickins, Andrew Morton, Andrea Arcangeli, Nick Piggin,
	Linus Torvalds, David Miller

PG_lru is protected by zone->lru_lock. It does not need TestSet/TestClear
operations.

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -775,9 +775,10 @@ int isolate_lru_page(struct page *page)
 	if (PageLRU(page)) {
 		struct zone *zone = page_zone(page);
 		spin_lock_irq(&zone->lru_lock);
-		if (TestClearPageLRU(page)) {
+		if (PageLRU(page)) {
 			ret = 1;
 			get_page(page);
+			ClearPageLRU(page);
 			if (PageActive(page))
 				del_page_from_active_list(zone, page);
 			else
@@ -881,8 +882,8 @@ static void shrink_cache(struct zone *zo
 		 */
 		while (!list_empty(&page_list)) {
 			page = lru_to_page(&page_list);
-			if (TestSetPageLRU(page))
-				BUG();
+			BUG_ON(PageLRU(page));
+			SetPageLRU(page);
 			list_del(&page->lru);
 			if (PageActive(page))
 				add_page_to_active_list(zone, page);
@@ -994,8 +995,8 @@ refill_inactive_zone(struct zone *zone, 
 	while (!list_empty(&l_inactive)) {
 		page = lru_to_page(&l_inactive);
 		prefetchw_prev_lru_page(page, &l_inactive, flags);
-		if (TestSetPageLRU(page))
-			BUG();
+		BUG_ON(PageLRU(page));
+		SetPageLRU(page);
 		if (!TestClearPageActive(page))
 			BUG();
 		list_move(&page->lru, &zone->inactive_list);
@@ -1023,8 +1024,8 @@ refill_inactive_zone(struct zone *zone, 
 	while (!list_empty(&l_active)) {
 		page = lru_to_page(&l_active);
 		prefetchw_prev_lru_page(page, &l_active, flags);
-		if (TestSetPageLRU(page))
-			BUG();
+		BUG_ON(PageLRU(page));
+		SetPageLRU(page);
 		BUG_ON(!PageActive(page));
 		list_move(&page->lru, &zone->active_list);
 		pgmoved++;
Index: linux-2.6/mm/swap.c
===================================================================
--- linux-2.6.orig/mm/swap.c
+++ linux-2.6/mm/swap.c
@@ -203,8 +203,8 @@ void fastcall __page_cache_release(struc
 
 		struct zone *zone = page_zone(page);
 		spin_lock_irqsave(&zone->lru_lock, flags);
-		if (!TestClearPageLRU(page))
-			BUG();
+		BUG_ON(!PageLRU(page));
+		ClearPageLRU(page);
 		del_page_from_lru(zone, page);
 		spin_unlock_irqrestore(&zone->lru_lock, flags);
 	}
@@ -247,8 +247,8 @@ void release_pages(struct page **pages, 
 				zone = pagezone;
 				spin_lock_irq(&zone->lru_lock);
 			}
-			if (!TestClearPageLRU(page))
-				BUG();
+			BUG_ON(!PageLRU(page));
+			ClearPageLRU(page);
 			del_page_from_lru(zone, page);
 		}
 
@@ -327,8 +327,8 @@ void __pagevec_lru_add(struct pagevec *p
 			zone = pagezone;
 			spin_lock_irq(&zone->lru_lock);
 		}
-		if (TestSetPageLRU(page))
-			BUG();
+		BUG_ON(PageLRU(page));
+		SetPageLRU(page);
 		add_page_to_inactive_list(zone, page);
 	}
 	if (zone)
@@ -354,8 +354,8 @@ void __pagevec_lru_add_active(struct pag
 			zone = pagezone;
 			spin_lock_irq(&zone->lru_lock);
 		}
-		if (TestSetPageLRU(page))
-			BUG();
+		BUG_ON(PageLRU(page));
+		SetPageLRU(page);
 		if (TestSetPageActive(page))
 			BUG();
 		add_page_to_active_list(zone, page);
Index: linux-2.6/include/linux/page-flags.h
===================================================================
--- linux-2.6.orig/include/linux/page-flags.h
+++ linux-2.6/include/linux/page-flags.h
@@ -244,10 +244,9 @@ extern void __mod_page_state_offset(unsi
 #define __ClearPageDirty(page)	__clear_bit(PG_dirty, &(page)->flags)
 #define TestClearPageDirty(page) test_and_clear_bit(PG_dirty, &(page)->flags)
 
-#define SetPageLRU(page)	set_bit(PG_lru, &(page)->flags)
 #define PageLRU(page)		test_bit(PG_lru, &(page)->flags)
-#define TestSetPageLRU(page)	test_and_set_bit(PG_lru, &(page)->flags)
-#define TestClearPageLRU(page)	test_and_clear_bit(PG_lru, &(page)->flags)
+#define SetPageLRU(page)	set_bit(PG_lru, &(page)->flags)
+#define ClearPageLRU(page)	clear_bit(PG_lru, &(page)->flags)
 
 #define PageActive(page)	test_bit(PG_active, &(page)->flags)
 #define SetPageActive(page)	set_bit(PG_active, &(page)->flags)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 3/3] mm: PageActive no testset
  2006-01-18 10:40 [patch 0/4] mm: de-skew page refcount Nick Piggin
  2006-01-18 10:40 ` [patch 1/4] mm: page refcount use atomic primitives Nick Piggin
  2006-01-18 10:40 ` [patch 2/4] mm: PageLRU no testset Nick Piggin
@ 2006-01-18 10:40 ` Nick Piggin
  2006-01-18 14:13   ` Marcelo Tosatti
  2006-01-18 10:41 ` [patch 4/4] mm: less atomic ops Nick Piggin
  2006-01-18 16:38 ` [patch 0/4] mm: de-skew page refcount Linus Torvalds
  4 siblings, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2006-01-18 10:40 UTC (permalink / raw)
  To: Linux Memory Management, Linux Kernel Mailing List
  Cc: Hugh Dickins, Andrew Morton, Andrea Arcangeli, Nick Piggin,
	Linus Torvalds, David Miller

PG_active is protected by zone->lru_lock, it does not need TestSet/TestClear
operations.

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -997,8 +997,9 @@ refill_inactive_zone(struct zone *zone, 
 		prefetchw_prev_lru_page(page, &l_inactive, flags);
 		BUG_ON(PageLRU(page));
 		SetPageLRU(page);
-		if (!TestClearPageActive(page))
-			BUG();
+		BUG_ON(!PageActive(page));
+		ClearPageActive(page);
+
 		list_move(&page->lru, &zone->inactive_list);
 		pgmoved++;
 		if (!pagevec_add(&pvec, page)) {
Index: linux-2.6/mm/swap.c
===================================================================
--- linux-2.6.orig/mm/swap.c
+++ linux-2.6/mm/swap.c
@@ -356,8 +356,8 @@ void __pagevec_lru_add_active(struct pag
 		}
 		BUG_ON(PageLRU(page));
 		SetPageLRU(page);
-		if (TestSetPageActive(page))
-			BUG();
+		BUG_ON(PageActive(page));
+		SetPageActive(page);
 		add_page_to_active_list(zone, page);
 	}
 	if (zone)
Index: linux-2.6/include/linux/page-flags.h
===================================================================
--- linux-2.6.orig/include/linux/page-flags.h
+++ linux-2.6/include/linux/page-flags.h
@@ -251,8 +251,6 @@ extern void __mod_page_state_offset(unsi
 #define PageActive(page)	test_bit(PG_active, &(page)->flags)
 #define SetPageActive(page)	set_bit(PG_active, &(page)->flags)
 #define ClearPageActive(page)	clear_bit(PG_active, &(page)->flags)
-#define TestClearPageActive(page) test_and_clear_bit(PG_active, &(page)->flags)
-#define TestSetPageActive(page) test_and_set_bit(PG_active, &(page)->flags)
 
 #define PageSlab(page)		test_bit(PG_slab, &(page)->flags)
 #define SetPageSlab(page)	set_bit(PG_slab, &(page)->flags)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [patch 4/4] mm: less atomic ops
  2006-01-18 10:40 [patch 0/4] mm: de-skew page refcount Nick Piggin
                   ` (2 preceding siblings ...)
  2006-01-18 10:40 ` [patch 3/3] mm: PageActive " Nick Piggin
@ 2006-01-18 10:41 ` Nick Piggin
  2006-01-18 16:38 ` [patch 0/4] mm: de-skew page refcount Linus Torvalds
  4 siblings, 0 replies; 20+ messages in thread
From: Nick Piggin @ 2006-01-18 10:41 UTC (permalink / raw)
  To: Linux Memory Management, Linux Kernel Mailing List
  Cc: Hugh Dickins, Andrew Morton, Andrea Arcangeli, Nick Piggin,
	Linus Torvalds, David Miller

In the page release paths, we can be sure that nobody will mess with our
page->flags because the refcount has dropped to 0. So no need for atomic
operations here.

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/include/linux/page-flags.h
===================================================================
--- linux-2.6.orig/include/linux/page-flags.h
+++ linux-2.6/include/linux/page-flags.h
@@ -247,10 +247,12 @@ extern void __mod_page_state_offset(unsi
 #define PageLRU(page)		test_bit(PG_lru, &(page)->flags)
 #define SetPageLRU(page)	set_bit(PG_lru, &(page)->flags)
 #define ClearPageLRU(page)	clear_bit(PG_lru, &(page)->flags)
+#define __ClearPageLRU(page)	__clear_bit(PG_lru, &(page)->flags)
 
 #define PageActive(page)	test_bit(PG_active, &(page)->flags)
 #define SetPageActive(page)	set_bit(PG_active, &(page)->flags)
 #define ClearPageActive(page)	clear_bit(PG_active, &(page)->flags)
+#define __ClearPageActive(page)	__clear_bit(PG_active, &(page)->flags)
 
 #define PageSlab(page)		test_bit(PG_slab, &(page)->flags)
 #define SetPageSlab(page)	set_bit(PG_slab, &(page)->flags)
Index: linux-2.6/mm/swap.c
===================================================================
--- linux-2.6.orig/mm/swap.c
+++ linux-2.6/mm/swap.c
@@ -204,7 +204,7 @@ void fastcall __page_cache_release(struc
 		struct zone *zone = page_zone(page);
 		spin_lock_irqsave(&zone->lru_lock, flags);
 		BUG_ON(!PageLRU(page));
-		ClearPageLRU(page);
+		__ClearPageLRU(page);
 		del_page_from_lru(zone, page);
 		spin_unlock_irqrestore(&zone->lru_lock, flags);
 	}
@@ -248,7 +248,7 @@ void release_pages(struct page **pages, 
 				spin_lock_irq(&zone->lru_lock);
 			}
 			BUG_ON(!PageLRU(page));
-			ClearPageLRU(page);
+			__ClearPageLRU(page);
 			del_page_from_lru(zone, page);
 		}
 
Index: linux-2.6/include/linux/mm_inline.h
===================================================================
--- linux-2.6.orig/include/linux/mm_inline.h
+++ linux-2.6/include/linux/mm_inline.h
@@ -32,7 +32,7 @@ del_page_from_lru(struct zone *zone, str
 {
 	list_del(&page->lru);
 	if (PageActive(page)) {
-		ClearPageActive(page);
+		__ClearPageActive(page);
 		zone->nr_active--;
 	} else {
 		zone->nr_inactive--;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 3/3] mm: PageActive no testset
  2006-01-18 10:40 ` [patch 3/3] mm: PageActive " Nick Piggin
@ 2006-01-18 14:13   ` Marcelo Tosatti
  2006-01-19 14:50     ` Nick Piggin
  0 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2006-01-18 14:13 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Linux Memory Management, Linux Kernel Mailing List, Hugh Dickins,
	Andrew Morton, Andrea Arcangeli, Linus Torvalds, David Miller

Hi Nick,

On Wed, Jan 18, 2006 at 11:40:58AM +0100, Nick Piggin wrote:
> PG_active is protected by zone->lru_lock, it does not need TestSet/TestClear
> operations.

page->flags bits (including PG_active and PG_lru bits) are touched by
several codepaths which do not hold zone->lru_lock. 

AFAICT zone->lru_lock guards access to the LRU list, and no more than
that.

Moreover, what about consistency of the rest of page->flags bits?

PPC for example implements test_and_set_bit() with:

	lwarx	reg, addr   (load and create reservation for 32-bit addr)
	or 	reg, BITOP_MASK(nr)	
	stwcx	reg, addr  (store word upon reservation validation, otherwise loop)

If you don't use atomic operations on page->flags, unrelated bits other
than that you're working with can have their updates lost, given that 
in reality page->flags is not protected by the lru_lock.

For example:

/*
 * shrink_list adds the number of reclaimed pages to sc->nr_reclaimed
 */
static int shrink_list(struct list_head *page_list, struct scan_control *sc)
{
...
                BUG_ON(PageActive(page));
...
activate_locked:
                SetPageActive(page);
                pgactivate++;
keep_locked:
                unlock_page(page);
keep:
                list_add(&page->lru, &ret_pages);
                BUG_ON(PageLRU(page));
}

And recently:

#ifdef CONFIG_MIGRATION
static inline void move_to_lru(struct page *page)
{
        list_del(&page->lru);
        if (PageActive(page)) {
                /*
                 * lru_cache_add_active checks that
                 * the PG_active bit is off.
                 */ 
                ClearPageActive(page);
                lru_cache_add_active(page);

Not relying on zone->lru_lock allows interesting optimizations
such as moving active/inactive pgflag bit setting from inside
__pagevec_lru_add/__pagevec_lru_add_active to the caller, and merging
the two.

Comments?


> Signed-off-by: Nick Piggin <npiggin@suse.de>
> 
> Index: linux-2.6/mm/vmscan.c
> ===================================================================
> --- linux-2.6.orig/mm/vmscan.c
> +++ linux-2.6/mm/vmscan.c
> @@ -997,8 +997,9 @@ refill_inactive_zone(struct zone *zone, 
>  		prefetchw_prev_lru_page(page, &l_inactive, flags);
>  		BUG_ON(PageLRU(page));
>  		SetPageLRU(page);
> -		if (!TestClearPageActive(page))
> -			BUG();
> +		BUG_ON(!PageActive(page));
> +		ClearPageActive(page);
> +
>  		list_move(&page->lru, &zone->inactive_list);
>  		pgmoved++;
>  		if (!pagevec_add(&pvec, page)) {
> Index: linux-2.6/mm/swap.c
> ===================================================================
> --- linux-2.6.orig/mm/swap.c
> +++ linux-2.6/mm/swap.c
> @@ -356,8 +356,8 @@ void __pagevec_lru_add_active(struct pag
>  		}
>  		BUG_ON(PageLRU(page));
>  		SetPageLRU(page);
> -		if (TestSetPageActive(page))
> -			BUG();
> +		BUG_ON(PageActive(page));
> +		SetPageActive(page);
>  		add_page_to_active_list(zone, page);
>  	}
>  	if (zone)
> Index: linux-2.6/include/linux/page-flags.h
> ===================================================================
> --- linux-2.6.orig/include/linux/page-flags.h
> +++ linux-2.6/include/linux/page-flags.h
> @@ -251,8 +251,6 @@ extern void __mod_page_state_offset(unsi
>  #define PageActive(page)	test_bit(PG_active, &(page)->flags)
>  #define SetPageActive(page)	set_bit(PG_active, &(page)->flags)
>  #define ClearPageActive(page)	clear_bit(PG_active, &(page)->flags)
> -#define TestClearPageActive(page) test_and_clear_bit(PG_active, &(page)->flags)
> -#define TestSetPageActive(page) test_and_set_bit(PG_active, &(page)->flags)
>  
>  #define PageSlab(page)		test_bit(PG_slab, &(page)->flags)
>  #define SetPageSlab(page)	set_bit(PG_slab, &(page)->flags)
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 0/4] mm: de-skew page refcount
  2006-01-18 10:40 [patch 0/4] mm: de-skew page refcount Nick Piggin
                   ` (3 preceding siblings ...)
  2006-01-18 10:41 ` [patch 4/4] mm: less atomic ops Nick Piggin
@ 2006-01-18 16:38 ` Linus Torvalds
  2006-01-18 17:05   ` Nick Piggin
  4 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2006-01-18 16:38 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Linux Memory Management, Linux Kernel Mailing List, Hugh Dickins,
	Andrew Morton, Andrea Arcangeli, David Miller



On Wed, 18 Jan 2006, Nick Piggin wrote:
>
> The following patchset (against 2.6.16-rc1 + migrate race fixes) uses the new
> atomic ops to do away with the offset page refcounting, and simplify the race
> that it was designed to cover.
> 
> This allows some nice optimisations

Why?

The real downside is that "atomic_inc_nonzero()" is a lot more expensive 
than checking for zero on x86 (and x86-64).

The reason it's offset is that on architectures that automatically test 
the _result_ of an atomic op (ie x86[-64]), it's easy to see when 
something _becomes_ negative or _becomes_ zero, and that's what

	atomic_add_negative
	atomic_inc_and_test

are optimized for (there's also "atomic_dec_and_test()" which reacts on 
the count becoming zero, but that doesn't have a pairing: there's no way 
to react to the count becoming one for the increment operation, so the 
"atomic_dec_and_test()" is used for things where zero means "free it").

Nothing else can be done that fast on x86. Everything else requires an 
insane "load, update, cmpxchg" sequence.

So I disagree with this patch series. It has real downsides. There's a 
reason we have the offset.

I suspect that whatever "nice optimizations" you have are quite doable 
without doing this count pessimization.

		Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 0/4] mm: de-skew page refcount
  2006-01-18 16:38 ` [patch 0/4] mm: de-skew page refcount Linus Torvalds
@ 2006-01-18 17:05   ` Nick Piggin
  2006-01-18 19:27     ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2006-01-18 17:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nick Piggin, Linux Memory Management, Linux Kernel Mailing List,
	Hugh Dickins, Andrew Morton, Andrea Arcangeli, David Miller

On Wed, Jan 18, 2006 at 08:38:44AM -0800, Linus Torvalds wrote:
> 
> 
> On Wed, 18 Jan 2006, Nick Piggin wrote:
> >
> > The following patchset (against 2.6.16-rc1 + migrate race fixes) uses the new
> > atomic ops to do away with the offset page refcounting, and simplify the race
> > that it was designed to cover.
> > 
> > This allows some nice optimisations
> 
> Why?
> 
> The real downside is that "atomic_inc_nonzero()" is a lot more expensive 
> than checking for zero on x86 (and x86-64).
> 
> The reason it's offset is that on architectures that automatically test 
> the _result_ of an atomic op (ie x86[-64]), it's easy to see when 
> something _becomes_ negative or _becomes_ zero, and that's what
> 
> 	atomic_add_negative
> 	atomic_inc_and_test
> 
> are optimized for (there's also "atomic_dec_and_test()" which reacts on 
> the count becoming zero, but that doesn't have a pairing: there's no way 
> to react to the count becoming one for the increment operation, so the 
> "atomic_dec_and_test()" is used for things where zero means "free it").
> 
> Nothing else can be done that fast on x86. Everything else requires an 
> insane "load, update, cmpxchg" sequence.
> 

Yes, I realise inc_not_zero isn't as fast as dec_and_test on x86(-64).
In this case when the cacheline will already be exclusive I bet it isn't
that much of a difference (in the noise from my testing).

> So I disagree with this patch series. It has real downsides. There's a 
> reason we have the offset.
> 

Yes, there is a reason, I detailed it in the changelog and got rid of it.

> "nice optimizations"

They're in this patchset.

Nick

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 0/4] mm: de-skew page refcount
  2006-01-18 17:05   ` Nick Piggin
@ 2006-01-18 19:27     ` Linus Torvalds
  2006-01-19 14:00       ` Nick Piggin
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2006-01-18 19:27 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Linux Memory Management, Linux Kernel Mailing List, Hugh Dickins,
	Andrew Morton, Andrea Arcangeli, David Miller



On Wed, 18 Jan 2006, Nick Piggin wrote:
> 
> > So I disagree with this patch series. It has real downsides. There's a 
> > reason we have the offset.
> 
> Yes, there is a reason, I detailed it in the changelog and got rid of it.

And I'm not applying it. I'd be crazy to replace good code by code that is 
objectively _worse_.

The fact that you _document_ that it's worse doesn't make it any better.

The places that you improve (in the other patches) seem to have nothing at 
all to do with the counter skew issue, so I don't see the point.

So let me repeat: WHY DID YOU MAKE THE CODE WORSE?

		Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 0/4] mm: de-skew page refcount
  2006-01-18 19:27     ` Linus Torvalds
@ 2006-01-19 14:00       ` Nick Piggin
  2006-01-19 16:36         ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2006-01-19 14:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nick Piggin, Linux Memory Management, Linux Kernel Mailing List,
	Hugh Dickins, Andrew Morton, Andrea Arcangeli, David Miller

On Wed, Jan 18, 2006 at 11:27:13AM -0800, Linus Torvalds wrote:
> 
> 
> On Wed, 18 Jan 2006, Nick Piggin wrote:
> > 
> > > So I disagree with this patch series. It has real downsides. There's a 
> > > reason we have the offset.
> > 
> > Yes, there is a reason, I detailed it in the changelog and got rid of it.
> 
> And I'm not applying it. I'd be crazy to replace good code by code that is 
> objectively _worse_.
> 

And you're not? Damn.

> The fact that you _document_ that it's worse doesn't make it any better.
> 
> The places that you improve (in the other patches) seem to have nothing at 
> all to do with the counter skew issue, so I don't see the point.
> 

You know, I believe you're right. I needed the de-skewing patch for
something unrelated and it seemed that it opened the possibility for
the following optimisations (ie. because we no longer touch a page
after its refcount goes to zero).

But actually it doesn't matter that we might touch page_count, only
that we not clear PageLRU. So the enabler is simply moving the
TestClearPageLRU after the get_page_testone.

So I'll respin the patches without the de-skewing and the series
will become much smaller and neater.

> So let me repeat: WHY DID YOU MAKE THE CODE WORSE?
> 

You've never bothered me about that until now...

Thanks for the feedback!

Nick

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 3/3] mm: PageActive no testset
  2006-01-18 14:13   ` Marcelo Tosatti
@ 2006-01-19 14:50     ` Nick Piggin
  2006-01-19 16:52       ` Marcelo Tosatti
  0 siblings, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2006-01-19 14:50 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Nick Piggin, Linux Memory Management, Linux Kernel Mailing List,
	Hugh Dickins, Andrew Morton, Andrea Arcangeli, Linus Torvalds,
	David Miller

Hi Marcelo,

On Wed, Jan 18, 2006 at 12:13:46PM -0200, Marcelo Tosatti wrote:
> Hi Nick,
> 
> On Wed, Jan 18, 2006 at 11:40:58AM +0100, Nick Piggin wrote:
> > PG_active is protected by zone->lru_lock, it does not need TestSet/TestClear
> > operations.
> 
> page->flags bits (including PG_active and PG_lru bits) are touched by
> several codepaths which do not hold zone->lru_lock. 
> 
> AFAICT zone->lru_lock guards access to the LRU list, and no more than
> that.
> 

Yep.

> Moreover, what about consistency of the rest of page->flags bits?
> 

That's OK, set_bit and clear_bit are atomic as well, they just don't
imply memory barriers and can be implemented a bit more simply.

The test-set / test-clear operations also kind of imply that it is
being used for locking or without other synchronisation (usually).

> PPC for example implements test_and_set_bit() with:
> 
> 	lwarx	reg, addr   (load and create reservation for 32-bit addr)
> 	or 	reg, BITOP_MASK(nr)	
> 	stwcx	reg, addr  (store word upon reservation validation, otherwise loop)
> 

Thanks,
Nick

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 0/4] mm: de-skew page refcount
  2006-01-19 14:00       ` Nick Piggin
@ 2006-01-19 16:36         ` Linus Torvalds
  2006-01-19 17:06           ` Nick Piggin
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2006-01-19 16:36 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Linux Memory Management, Linux Kernel Mailing List, Hugh Dickins,
	Andrew Morton, Andrea Arcangeli, David Miller



On Thu, 19 Jan 2006, Nick Piggin wrote:
> 
> But actually it doesn't matter that we might touch page_count, only
> that we not clear PageLRU. So the enabler is simply moving the
> TestClearPageLRU after the get_page_testone.

One side note on your patch: the pure bit _test_ operation is very cheap, 
but the "bit change" operation is very expensive (and not really any less 
expensive than the "test-and-change" one).

So the patch to avoid "test_and_clear_bit()" really helps only if the test 
usually results in us not doing the clear. Is that the case? Hmm..

So I _think_ that at least the case in "isolate_lru_page()", you'd 
actually be better off doing the "test-and-clear" instead of separate 
"test" and "clear-bit" ops, no? In that one, it would seem that 99+% of 
the time, the bit is set (because we tested it just before getting the 
lock).

No?

> I needed the de-skewing patch for something unrelated and it seemed that 
> it opened the possibility for the following optimisations (ie. because 
> we no longer touch a page after its refcount goes to zero).
>
> But actually it doesn't matter that we might touch page_count, only
> that we not clear PageLRU. So the enabler is simply moving the
> TestClearPageLRU after the get_page_testone.

Yes.

Now, that whole "we might touch the page count" thing does actually worry 
me a bit. The locking rules are subtle (but they -seem- safe: before we 
actually really put the page on the free-list in the freeing path, we'll 
have locked the LRU list if it was on one).

But if you were to change _that_ one to a

	atomic_add_unless(&page->counter, 1, -1);

I think that would be a real cleanup. And at that point I won't even 
complain that "atomic_inc_test()" is faster - that "get_page_testone()" 
thing is just fundamentally a bit scary, so I'd applaud it regardless.

(The difference: the "counter skewing" may be unexpected, but it's just a 
simple trick. In contrast, the "touch the count after the page may be 
already in the freeing stage" is a scary subtle thing. Even if I can't 
see any actual bug in it, it just worries me in a way that offsetting a 
counter by one does not..)

		Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 3/3] mm: PageActive no testset
  2006-01-19 14:50     ` Nick Piggin
@ 2006-01-19 16:52       ` Marcelo Tosatti
  2006-01-19 20:02         ` Nick Piggin
  0 siblings, 1 reply; 20+ messages in thread
From: Marcelo Tosatti @ 2006-01-19 16:52 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Linux Memory Management, Linux Kernel Mailing List, Hugh Dickins,
	Andrew Morton, Andrea Arcangeli, Linus Torvalds, David Miller

On Thu, Jan 19, 2006 at 03:50:08PM +0100, Nick Piggin wrote:
> Hi Marcelo,
> 
> On Wed, Jan 18, 2006 at 12:13:46PM -0200, Marcelo Tosatti wrote:
> > Hi Nick,
> > 
> > On Wed, Jan 18, 2006 at 11:40:58AM +0100, Nick Piggin wrote:
> > > PG_active is protected by zone->lru_lock, it does not need TestSet/TestClear
> > > operations.
> > 
> > page->flags bits (including PG_active and PG_lru bits) are touched by
> > several codepaths which do not hold zone->lru_lock. 
> > 
> > AFAICT zone->lru_lock guards access to the LRU list, and no more than
> > that.
> > 
> 
> Yep.
> 
> > Moreover, what about consistency of the rest of page->flags bits?
> > 
> 
> That's OK, set_bit and clear_bit are atomic as well, they just don't
> imply memory barriers and can be implemented a bit more simply.

Indeed!

> The test-set / test-clear operations also kind of imply that it is
> being used for locking or without other synchronisation (usually).

Non-atomic versions such as __ClearPageLRU()/__ClearPageActive() are 
not usable, though.

PPC:

static __inline__ void __clear_bit(unsigned long nr,
                                   volatile unsigned long *addr)
{
        unsigned long mask = BITOP_MASK(nr);
        unsigned long *p = ((unsigned long *)addr) + BITOP_WORD(nr);

        *p &= ~mask;
}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 0/4] mm: de-skew page refcount
  2006-01-19 16:36         ` Linus Torvalds
@ 2006-01-19 17:06           ` Nick Piggin
  2006-01-19 17:27             ` Linus Torvalds
  0 siblings, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2006-01-19 17:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nick Piggin, Linux Memory Management, Linux Kernel Mailing List,
	Hugh Dickins, Andrew Morton, Andrea Arcangeli, David Miller

On Thu, Jan 19, 2006 at 08:36:14AM -0800, Linus Torvalds wrote:
> 
> So I _think_ that at least the case in "isolate_lru_page()", you'd 
> actually be better off doing the "test-and-clear" instead of separate 
> "test" and "clear-bit" ops, no? In that one, it would seem that 99+% of 
> the time, the bit is set (because we tested it just before getting the 
> lock).
> 
> No?
> 

Well in isolate_lru_page, the test operation is actually optional
(ie. it is the conditional for a BUG). And I have plans for making
some of those configurable....

But at least on the G5, test_and_clear can be noticable (although
IIRC it was in the noise for _this_ articular case) because of the
memory barriers required.

> 
> Now, that whole "we might touch the page count" thing does actually worry 
> me a bit. The locking rules are subtle (but they -seem- safe: before we 
> actually really put the page on the free-list in the freeing path, we'll 
> have locked the LRU list if it was on one).
> 

Yes, I think Andrew did his homework. I thought it through quite a bit
before sending the patches and again after your feedback. Subtle though,
no doubt.

> But if you were to change _that_ one to a
> 
> 	atomic_add_unless(&page->counter, 1, -1);
> 
> I think that would be a real cleanup. And at that point I won't even 
> complain that "atomic_inc_test()" is faster - that "get_page_testone()" 
> thing is just fundamentally a bit scary, so I'd applaud it regardless.
> 

Hmm... this is what the de-skew patch _did_ (although it was wrapped
in a function called get_page_unless_zero), in fact the main aim was
to prevent this twiddling and the de-skewing was just a nice side effect
(I guess the patch title is misleading).

So I'm confused...

> (The difference: the "counter skewing" may be unexpected, but it's just a 
> simple trick. In contrast, the "touch the count after the page may be 
> already in the freeing stage" is a scary subtle thing. Even if I can't 
> see any actual bug in it, it just worries me in a way that offsetting a 
> counter by one does not..)
> 

Don't worry, you'll be seeing that patch again -- it is required for
lockless pagecache.

Nick


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 0/4] mm: de-skew page refcount
  2006-01-19 17:06           ` Nick Piggin
@ 2006-01-19 17:27             ` Linus Torvalds
  2006-01-19 17:38               ` Nick Piggin
  0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2006-01-19 17:27 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Linux Memory Management, Linux Kernel Mailing List, Hugh Dickins,
	Andrew Morton, Andrea Arcangeli, David Miller



On Thu, 19 Jan 2006, Nick Piggin wrote:
> 
> Hmm... this is what the de-skew patch _did_ (although it was wrapped
> in a function called get_page_unless_zero), in fact the main aim was
> to prevent this twiddling and the de-skewing was just a nice side effect
> (I guess the patch title is misleading).
> 
> So I'm confused...

The thing I minded was the _other_ changes, namely the de-skewing itself. 
It seemed totally unnecessary to what you claimed was the point of the 
patch.

So I objected to the patch on the grounds that it did what you claimed 
badly. All the _optimization_ was totally independent of that de-skewing, 
and the de-skewing was a potential un-optimization.

But if you do the optimizations as one independent set of patches, and 
_then_ do the counter thing as a "simplify logic" patch, I don't see that 
as a problem.

Side note: I may be crazy, but for me when merging, one of the biggest 
things is "does this pass my 'makes sense' detector". I look less at the 
end result, than I actually look at the _change_. See?

That's why two separate patches that do the same thing as one combined 
patch may make sense, even if the _combined_ one does not (it could go the 
other way too, obviously).

		Linus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 0/4] mm: de-skew page refcount
  2006-01-19 17:27             ` Linus Torvalds
@ 2006-01-19 17:38               ` Nick Piggin
  0 siblings, 0 replies; 20+ messages in thread
From: Nick Piggin @ 2006-01-19 17:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nick Piggin, Linux Memory Management, Linux Kernel Mailing List,
	Hugh Dickins, Andrew Morton, Andrea Arcangeli, David Miller

On Thu, Jan 19, 2006 at 09:27:14AM -0800, Linus Torvalds wrote:
> 
> 
> On Thu, 19 Jan 2006, Nick Piggin wrote:
> > 
> > Hmm... this is what the de-skew patch _did_ (although it was wrapped
> > in a function called get_page_unless_zero), in fact the main aim was
> > to prevent this twiddling and the de-skewing was just a nice side effect
> > (I guess the patch title is misleading).
> > 
> > So I'm confused...
> 
> The thing I minded was the _other_ changes, namely the de-skewing itself. 
> It seemed totally unnecessary to what you claimed was the point of the 
> patch.
> 
> So I objected to the patch on the grounds that it did what you claimed 
> badly. All the _optimization_ was totally independent of that de-skewing, 
> and the de-skewing was a potential un-optimization.
> 

No longer confused...

> But if you do the optimizations as one independent set of patches, and 
> _then_ do the counter thing as a "simplify logic" patch, I don't see that 
> as a problem.
> 
> Side note: I may be crazy, but for me when merging, one of the biggest 
> things is "does this pass my 'makes sense' detector". I look less at the 
> end result, than I actually look at the _change_. See?
> 
> That's why two separate patches that do the same thing as one combined 
> patch may make sense, even if the _combined_ one does not (it could go the 
> other way too, obviously).
> 

I agree, and the patches really are cleaner this way too, so again,
thanks for the input on them.

I'll resend soonish (with a trimmed cc list).

Nick

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/4] mm: PageLRU no testset
  2006-01-18 10:40 ` [patch 2/4] mm: PageLRU no testset Nick Piggin
@ 2006-01-19 17:48   ` Nikita Danilov
  2006-01-19 18:10     ` Nick Piggin
  0 siblings, 1 reply; 20+ messages in thread
From: Nikita Danilov @ 2006-01-19 17:48 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Hugh Dickins, Andrew Morton, Andrea Arcangeli, Linus Torvalds,
	David Miller, Linux Kernel Mailing List

Nick Piggin writes:
 > PG_lru is protected by zone->lru_lock. It does not need TestSet/TestClear
 > operations.
 > 
 > Signed-off-by: Nick Piggin <npiggin@suse.de>
 > 
 > Index: linux-2.6/mm/vmscan.c
 > ===================================================================
 > --- linux-2.6.orig/mm/vmscan.c
 > +++ linux-2.6/mm/vmscan.c
 > @@ -775,9 +775,10 @@ int isolate_lru_page(struct page *page)
 >  	if (PageLRU(page)) {
 >  		struct zone *zone = page_zone(page);
 >  		spin_lock_irq(&zone->lru_lock);
 > -		if (TestClearPageLRU(page)) {
 > +		if (PageLRU(page)) {
 >  			ret = 1;
 >  			get_page(page);
 > +			ClearPageLRU(page);

Why is that better? ClearPageLRU() is also "atomic" operation (in the
sense of using LOCK_PREFIX on x86), so it seems this change strictly
adds cycles to the hot-path when page is on LRU.

Nikita.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 2/4] mm: PageLRU no testset
  2006-01-19 17:48   ` Nikita Danilov
@ 2006-01-19 18:10     ` Nick Piggin
  0 siblings, 0 replies; 20+ messages in thread
From: Nick Piggin @ 2006-01-19 18:10 UTC (permalink / raw)
  To: Nikita Danilov
  Cc: Nick Piggin, Hugh Dickins, Andrew Morton, Andrea Arcangeli,
	Linus Torvalds, David Miller, Linux Kernel Mailing List

On Thu, Jan 19, 2006 at 08:48:52PM +0300, Nikita Danilov wrote:
> Nick Piggin writes:
>  > PG_lru is protected by zone->lru_lock. It does not need TestSet/TestClear
>  > operations.
>  > 
>  > Signed-off-by: Nick Piggin <npiggin@suse.de>
>  > 
>  > Index: linux-2.6/mm/vmscan.c
>  > ===================================================================
>  > --- linux-2.6.orig/mm/vmscan.c
>  > +++ linux-2.6/mm/vmscan.c
>  > @@ -775,9 +775,10 @@ int isolate_lru_page(struct page *page)
>  >  	if (PageLRU(page)) {
>  >  		struct zone *zone = page_zone(page);
>  >  		spin_lock_irq(&zone->lru_lock);
>  > -		if (TestClearPageLRU(page)) {
>  > +		if (PageLRU(page)) {
>  >  			ret = 1;
>  >  			get_page(page);
>  > +			ClearPageLRU(page);
> 
> Why is that better? ClearPageLRU() is also "atomic" operation (in the
> sense of using LOCK_PREFIX on x86), so it seems this change strictly
> adds cycles to the hot-path when page is on LRU.
> 

Less restrictive memory ordering requirements.

Nick
 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 3/3] mm: PageActive no testset
  2006-01-19 16:52       ` Marcelo Tosatti
@ 2006-01-19 20:02         ` Nick Piggin
  2006-01-19 21:41           ` Marcelo Tosatti
  0 siblings, 1 reply; 20+ messages in thread
From: Nick Piggin @ 2006-01-19 20:02 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Nick Piggin, Linux Memory Management, Linux Kernel Mailing List,
	Hugh Dickins, Andrew Morton, Andrea Arcangeli, Linus Torvalds,
	David Miller

On Thu, Jan 19, 2006 at 02:52:22PM -0200, Marcelo Tosatti wrote:
> On Thu, Jan 19, 2006 at 03:50:08PM +0100, Nick Piggin wrote:
> 
> > The test-set / test-clear operations also kind of imply that it is
> > being used for locking or without other synchronisation (usually).
> 
> Non-atomic versions such as __ClearPageLRU()/__ClearPageActive() are 
> not usable, though.
> 

Correct. Although I was able to use them in a couple of other places
in a subsequent patch in the series. I trust you don't see a problem
with those usages?

Thanks,
Nick


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [patch 3/3] mm: PageActive no testset
  2006-01-19 20:02         ` Nick Piggin
@ 2006-01-19 21:41           ` Marcelo Tosatti
  0 siblings, 0 replies; 20+ messages in thread
From: Marcelo Tosatti @ 2006-01-19 21:41 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Linux Memory Management, Linux Kernel Mailing List, Hugh Dickins,
	Andrew Morton, Andrea Arcangeli, Linus Torvalds, David Miller

On Thu, Jan 19, 2006 at 09:02:26PM +0100, Nick Piggin wrote:
> On Thu, Jan 19, 2006 at 02:52:22PM -0200, Marcelo Tosatti wrote:
> > On Thu, Jan 19, 2006 at 03:50:08PM +0100, Nick Piggin wrote:
> > 
> > > The test-set / test-clear operations also kind of imply that it is
> > > being used for locking or without other synchronisation (usually).
> > 
> > Non-atomic versions such as __ClearPageLRU()/__ClearPageActive() are 
> > not usable, though.
> > 
> 
> Correct. Although I was able to use them in a couple of other places
> in a subsequent patch in the series. I trust you don't see a problem
> with those usages?

Indeed, sorry. Would you mind adding a comment that page->flags must be
accessed atomically otherwise and that __ versions are special as to
when the page cannot be referenced anymore? (its really not obvious)

Also this comments on top of page-flags.h could be updated

 * During disk I/O, PG_locked is used. This bit is set before I/O and
 * reset when I/O completes. page_waitqueue(page) is a wait queue of all tasks
 * waiting for the I/O on this page to complete.

s/PG_locked/PG_writeback/

 * Note that the referenced bit, the page->lru list_head and the active,
 * inactive_dirty and inactive_clean lists are protected by the
 * zone->lru_lock, and *NOT* by the usual PG_locked bit!

inactive_dirty and inactive_clean do not exist anymore





^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2006-01-19 23:42 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-18 10:40 [patch 0/4] mm: de-skew page refcount Nick Piggin
2006-01-18 10:40 ` [patch 1/4] mm: page refcount use atomic primitives Nick Piggin
2006-01-18 10:40 ` [patch 2/4] mm: PageLRU no testset Nick Piggin
2006-01-19 17:48   ` Nikita Danilov
2006-01-19 18:10     ` Nick Piggin
2006-01-18 10:40 ` [patch 3/3] mm: PageActive " Nick Piggin
2006-01-18 14:13   ` Marcelo Tosatti
2006-01-19 14:50     ` Nick Piggin
2006-01-19 16:52       ` Marcelo Tosatti
2006-01-19 20:02         ` Nick Piggin
2006-01-19 21:41           ` Marcelo Tosatti
2006-01-18 10:41 ` [patch 4/4] mm: less atomic ops Nick Piggin
2006-01-18 16:38 ` [patch 0/4] mm: de-skew page refcount Linus Torvalds
2006-01-18 17:05   ` Nick Piggin
2006-01-18 19:27     ` Linus Torvalds
2006-01-19 14:00       ` Nick Piggin
2006-01-19 16:36         ` Linus Torvalds
2006-01-19 17:06           ` Nick Piggin
2006-01-19 17:27             ` Linus Torvalds
2006-01-19 17:38               ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).