linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] close pageblock_migratetype and pageblock_skip races
@ 2014-02-28 14:14 Vlastimil Babka
  2014-02-28 14:14 ` [PATCH 1/6] mm: call get_pageblock_migratetype() under zone->lock where possible Vlastimil Babka
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Vlastimil Babka @ 2014-02-28 14:14 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Joonsoo Kim
  Cc: linux-mm, linux-kernel, Rik van Riel, Johannes Weiner,
	Minchan Kim, Kirill A. Shutemov, Vlastimil Babka

Hello,

this series follows on the discussions of Joonsoo Kim's series
"improve robustness on handling migratetype" https://lkml.org/lkml/2014/1/9/29
The goal is to close the race of get/set_pageblock_migratetype (and _skip)
which Joonsoo found in the code and I've observed in my further compaction
series development.

Instead of a new seqlock for the pageblock bitmap, my series extends the
coverage of zone->lock where possible (patch 1) and deals with the races where
it's not feasible to lock (patches 2-4), as suggested by Mel in the original
thread.

Testing of patches 1-4 made me realize that a race between setting migratetype
and set/clear_pageblock_skip is also an issue because all the 4 bits are packed
within the same byte for each pair of pageblocks and the bit operations are not
atomic. Thus and update to the skip bit may lose racing updates to some bits
comprising migratetype and break it. Patch 5 reduces the amount of unneeded
set_pageblock_skip calls, and patch 6 fixes the race by making the bit
operations atomic, including reasons for picking this solution instead of
using zone->lock also for set_pageblock_skip().

Vlastimil

Vlastimil Babka (6):
  mm: call get_pageblock_migratetype() under zone->lock where possible
  mm: add get_pageblock_migratetype_nolock() for cases where locking is
    undesirable
  mm: add is_migrate_isolate_page_nolock() for cases where locking is
    undesirable
  mm: add set_pageblock_migratetype_nolock() for calls outside
    zone->lock
  mm: compaction: do not set pageblock skip bit when already set
  mm: use atomic bit operations in set_pageblock_flags_group()

 include/linux/mmzone.h         | 24 +++++++++++++++
 include/linux/page-isolation.h | 24 +++++++++++++++
 mm/compaction.c                | 18 ++++++++---
 mm/hugetlb.c                   |  2 +-
 mm/memory-failure.c            |  3 +-
 mm/page_alloc.c                | 69 ++++++++++++++++++++++++++++--------------
 mm/page_isolation.c            | 23 ++++++++------
 mm/vmstat.c                    |  2 +-
 8 files changed, 126 insertions(+), 39 deletions(-)

-- 
1.8.4.5


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/6] mm: call get_pageblock_migratetype() under zone->lock where possible
  2014-02-28 14:14 [PATCH 0/6] close pageblock_migratetype and pageblock_skip races Vlastimil Babka
@ 2014-02-28 14:14 ` Vlastimil Babka
  2014-02-28 14:15 ` [PATCH 2/6] mm: add get_pageblock_migratetype_nolock() for cases where locking is undesirable Vlastimil Babka
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2014-02-28 14:14 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Joonsoo Kim
  Cc: linux-mm, linux-kernel, Rik van Riel, Johannes Weiner,
	Minchan Kim, Kirill A. Shutemov, Vlastimil Babka

Joonsoo Kim has found a theoretical race between get/set_pageblock_migratetype
which can result in wrong values being read, including values higher than
MIGRATE_TYPES, thanks to the non-synchronized get/set operations which read or
update individual bits in a loop.

My own testing with a debug patch showed that the race occurs once per mmtests'
stress-highalloc benchmark (although not necessarily in the same pageblock).
However during a development of unrelated compaction patches, I have observed
that with frequent calls to {start,undo}_isolate_page_range() the race occurs
several thousands of times and has resulted in NULL pointer dereferences in
move_freepages() and free_one_page() in places where free_list[migratetype] is
manipulated by e.g. list_move(). Further debugging confirmed that migratetype
had invalid value of 6, causing out of bounds access to the free_list array.

This shows that the race exist and while it may be extremely rare and harmless
unless page isolation is performed frequently (lower migratetypes do not
set the highest-order bit which can result in invalid value), it could get worse
if e.g. new migratetypes are added.

We want to close this race while avoiding extra overhead as much as possible,
as pageblock operations occur in fast paths and should be cheap. We observe
that all (non-init) set_pageblock_migratetype() calls are done under zone->lock
and most get_pageblock_migratetype() calls can be moved under the lock as well
since it's taken in those paths anyway.

Therefore, this patch makes get_pageblock_migratetype() safe where possible:

 - in __free_pages_ok() by moving it to free_one_page() which already locks
 - in buffered_rmqueue() by moving it inside the locked section
 - in undo_isolate_page_range() by removing the call, as the call to
   unset_migratetype_isolate() locks and calls it again anyway
 - in test_pages_isolated() by extending the locked section

The remaining unsafe calls to get_pageblock_migratetype() are dealt with in
the next patch.

Reported-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Suggested-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c     | 20 +++++++++++---------
 mm/page_isolation.c | 23 +++++++++++++----------
 2 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9d6892c..0cb41ec 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -714,12 +714,16 @@ static void free_pcppages_bulk(struct zone *zone, int count,
 	spin_unlock(&zone->lock);
 }
 
-static void free_one_page(struct zone *zone, struct page *page, int order,
-				int migratetype)
+static void free_one_page(struct zone *zone, struct page *page, int order)
 {
+	int migratetype;
+
 	spin_lock(&zone->lock);
 	zone->pages_scanned = 0;
 
+	migratetype = get_pageblock_migratetype(page);
+	set_freepage_migratetype(page, migratetype);
+
 	__free_one_page(page, zone, order, migratetype);
 	if (unlikely(!is_migrate_isolate(migratetype)))
 		__mod_zone_freepage_state(zone, 1 << order, migratetype);
@@ -756,16 +760,13 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
 static void __free_pages_ok(struct page *page, unsigned int order)
 {
 	unsigned long flags;
-	int migratetype;
 
 	if (!free_pages_prepare(page, order))
 		return;
 
 	local_irq_save(flags);
 	__count_vm_events(PGFREE, 1 << order);
-	migratetype = get_pageblock_migratetype(page);
-	set_freepage_migratetype(page, migratetype);
-	free_one_page(page_zone(page), page, order, migratetype);
+	free_one_page(page_zone(page), page, order);
 	local_irq_restore(flags);
 }
 
@@ -1387,7 +1388,7 @@ void free_hot_cold_page(struct page *page, int cold)
 	 */
 	if (migratetype >= MIGRATE_PCPTYPES) {
 		if (unlikely(is_migrate_isolate(migratetype))) {
-			free_one_page(zone, page, 0, migratetype);
+			free_one_page(zone, page, 0);
 			goto out;
 		}
 		migratetype = MIGRATE_MOVABLE;
@@ -1570,11 +1571,12 @@ again:
 		}
 		spin_lock_irqsave(&zone->lock, flags);
 		page = __rmqueue(zone, order, migratetype);
+		if (page)
+			__mod_zone_freepage_state(zone, -(1 << order),
+					  get_pageblock_migratetype(page));
 		spin_unlock(&zone->lock);
 		if (!page)
 			goto failed;
-		__mod_zone_freepage_state(zone, -(1 << order),
-					  get_pageblock_migratetype(page));
 	}
 
 	__mod_zone_page_state(zone, NR_ALLOC_BATCH, -(1 << order));
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index d1473b2..d66f8ce 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -157,9 +157,8 @@ int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 	     pfn < end_pfn;
 	     pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
-		if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
-			continue;
-		unset_migratetype_isolate(page, migratetype);
+		if (page)
+			unset_migratetype_isolate(page, migratetype);
 	}
 	return 0;
 }
@@ -223,9 +222,16 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 {
 	unsigned long pfn, flags;
 	struct page *page;
+	struct page *first_page;
 	struct zone *zone;
-	int ret;
+	int ret = 0;
+
+	first_page = __first_valid_page(start_pfn, end_pfn - start_pfn);
+	if (!first_page)
+		return -EBUSY;
 
+	zone = page_zone(first_page);
+	spin_lock_irqsave(&zone->lock, flags);
 	/*
 	 * Note: pageblock_nr_pages != MAX_ORDER. Then, chunks of free pages
 	 * are not aligned to pageblock_nr_pages.
@@ -234,16 +240,13 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 	for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
 		if (page && get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
-			break;
+			goto unlock_out;
 	}
-	page = __first_valid_page(start_pfn, end_pfn - start_pfn);
-	if ((pfn < end_pfn) || !page)
-		return -EBUSY;
+
 	/* Check all pages are free or marked as ISOLATED */
-	zone = page_zone(page);
-	spin_lock_irqsave(&zone->lock, flags);
 	ret = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
 						skip_hwpoisoned_pages);
+unlock_out:
 	spin_unlock_irqrestore(&zone->lock, flags);
 	return ret ? 0 : -EBUSY;
 }
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/6] mm: add get_pageblock_migratetype_nolock() for cases where locking is undesirable
  2014-02-28 14:14 [PATCH 0/6] close pageblock_migratetype and pageblock_skip races Vlastimil Babka
  2014-02-28 14:14 ` [PATCH 1/6] mm: call get_pageblock_migratetype() under zone->lock where possible Vlastimil Babka
@ 2014-02-28 14:15 ` Vlastimil Babka
  2014-03-03  8:22   ` Joonsoo Kim
  2014-03-05  0:37   ` Joonsoo Kim
  2014-02-28 14:15 ` [PATCH 3/6] mm: add is_migrate_isolate_page_nolock() " Vlastimil Babka
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 16+ messages in thread
From: Vlastimil Babka @ 2014-02-28 14:15 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Joonsoo Kim
  Cc: linux-mm, linux-kernel, Rik van Riel, Johannes Weiner,
	Minchan Kim, Kirill A. Shutemov, Vlastimil Babka

In order to prevent race with set_pageblock_migratetype, most of calls to
get_pageblock_migratetype have been moved under zone->lock. For the remaining
call sites, the extra locking is undesirable, notably in free_hot_cold_page().

This patch introduces a _nolock version to be used on these call sites, where
a wrong value does not affect correctness. The function makes sure that the
value does not exceed valid migratetype numbers. Such too-high values are
assumed to be a result of race and caller-supplied fallback value is returned
instead.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/mmzone.h | 24 ++++++++++++++++++++++++
 mm/compaction.c        | 14 +++++++++++---
 mm/memory-failure.c    |  3 ++-
 mm/page_alloc.c        | 22 +++++++++++++++++-----
 mm/vmstat.c            |  2 +-
 5 files changed, 55 insertions(+), 10 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index fac5509..7c3f678 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -75,6 +75,30 @@ enum {
 
 extern int page_group_by_mobility_disabled;
 
+/*
+ * When called without zone->lock held, a race with set_pageblock_migratetype
+ * may result in bogus values. Use this variant only when this does not affect
+ * correctness, and taking zone->lock would be costly. Values >= MIGRATE_TYPES
+ * are considered to be a result of this race and the value of race_fallback
+ * argument is returned instead.
+ */
+static inline int get_pageblock_migratetype_nolock(struct page *page,
+	int race_fallback)
+{
+	int ret = get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
+
+	if (unlikely(ret >= MIGRATE_TYPES))
+		ret = race_fallback;
+
+	return ret;
+}
+
+/*
+ * Should be called only with zone->lock held. In cases where locking overhead
+ * is undesirable, consider the _nolock version.
+ * Note that VM_BUG_ON(locked) here would require e.g. moving the function to a
+ * .c file to be able to include page_zone() definition.
+ */
 static inline int get_pageblock_migratetype(struct page *page)
 {
 	return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
diff --git a/mm/compaction.c b/mm/compaction.c
index 5142920..f0db73b 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -217,12 +217,17 @@ static inline bool compact_trylock_irqsave(spinlock_t *lock,
 /* Returns true if the page is within a block suitable for migration to */
 static bool suitable_migration_target(struct page *page)
 {
+	int migratetype;
+
 	/* If the page is a large free page, then disallow migration */
 	if (PageBuddy(page) && page_order(page) >= pageblock_order)
 		return false;
 
+	/* If someone races on the pageblock, just assume it's not suitable */
+	migratetype = get_pageblock_migratetype_nolock(page, MIGRATE_RESERVE);
+
 	/* If the block is MIGRATE_MOVABLE or MIGRATE_CMA, allow migration */
-	if (migrate_async_suitable(get_pageblock_migratetype(page)))
+	if (migrate_async_suitable(migratetype))
 		return true;
 
 	/* Otherwise skip the block */
@@ -530,9 +535,12 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 			/*
 			 * For async migration, also only scan in MOVABLE
 			 * blocks. Async migration is optimistic to see if
-			 * the minimum amount of work satisfies the allocation
+			 * the minimum amount of work satisfies the allocation.
+			 * If we race on the migratetype, just assume it's an
+			 * unsuitable one.
 			 */
-			mt = get_pageblock_migratetype(page);
+			mt = get_pageblock_migratetype_nolock(page,
+					MIGRATE_RESERVE);
 			if (!cc->sync && !migrate_async_suitable(mt)) {
 				cc->finished_update_migrate = true;
 				skipped_async_unsuitable = true;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 35ef28a..d0625f6 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1672,7 +1672,8 @@ int soft_offline_page(struct page *page, int flags)
 	 * was free. This flag should be kept set until the source page
 	 * is freed and PG_hwpoison on it is set.
 	 */
-	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
+	if (get_pageblock_migratetype_nolock(page, MIGRATE_RESERVE)
+			!= MIGRATE_ISOLATE)
 		set_migratetype_isolate(page, true);
 
 	ret = get_any_page(page, pfn, flags);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0cb41ec..de5b419 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1374,7 +1374,16 @@ void free_hot_cold_page(struct page *page, int cold)
 	if (!free_pages_prepare(page, 0))
 		return;
 
-	migratetype = get_pageblock_migratetype(page);
+	/*
+	 * We don't want to take zone->lock here just to determine pageblock
+	 * migratetype safely. So we allow a race, which will be detected if
+	 * the migratetype appears to be >= MIGRATE_TYPES.
+	 * In case of a detected race, defer to free_one_page() below, which
+	 * will re-read the pageblock migratetype under zone->lock and re-set
+	 * freepage migratetype accordingly.
+	 * We use MIGRATE_TYPES as MIGRATE_ISOLATE may not be enabled.
+	 */
+	migratetype = get_pageblock_migratetype_nolock(page, MIGRATE_TYPES);
 	set_freepage_migratetype(page, migratetype);
 	local_irq_save(flags);
 	__count_vm_event(PGFREE);
@@ -1387,7 +1396,8 @@ void free_hot_cold_page(struct page *page, int cold)
 	 * excessively into the page allocator
 	 */
 	if (migratetype >= MIGRATE_PCPTYPES) {
-		if (unlikely(is_migrate_isolate(migratetype))) {
+		if (unlikely(is_migrate_isolate(migratetype)
+			|| migratetype == MIGRATE_TYPES)) {
 			free_one_page(zone, page, 0);
 			goto out;
 		}
@@ -6080,8 +6090,9 @@ void set_pageblock_flags_group(struct page *page, unsigned long flags,
  * If @count is not zero, it is okay to include less @count unmovable pages
  *
  * PageLRU check without isolation or lru_lock could race so that
- * MIGRATE_MOVABLE block might include unmovable pages. It means you can't
- * expect this function should be exact.
+ * MIGRATE_MOVABLE block might include unmovable pages. The detection of
+ * pageblock migratetype can race as well. It means you can't expect this
+ * function to be exact.
  */
 bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 			 bool skip_hwpoisoned_pages)
@@ -6095,7 +6106,8 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
 	 */
 	if (zone_idx(zone) == ZONE_MOVABLE)
 		return false;
-	mt = get_pageblock_migratetype(page);
+	/* In case of a detected race, try to reduce false positives */
+	mt = get_pageblock_migratetype_nolock(page, MIGRATE_UNMOVABLE);
 	if (mt == MIGRATE_MOVABLE || is_migrate_cma(mt))
 		return false;
 
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 2592010..1f08bf6 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -956,7 +956,7 @@ static void pagetypeinfo_showblockcount_print(struct seq_file *m,
 		if (!memmap_valid_within(pfn, page, zone))
 			continue;
 
-		mtype = get_pageblock_migratetype(page);
+		mtype = get_pageblock_migratetype_nolock(page, MIGRATE_TYPES);
 
 		if (mtype < MIGRATE_TYPES)
 			count[mtype]++;
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/6] mm: add is_migrate_isolate_page_nolock() for cases where locking is undesirable
  2014-02-28 14:14 [PATCH 0/6] close pageblock_migratetype and pageblock_skip races Vlastimil Babka
  2014-02-28 14:14 ` [PATCH 1/6] mm: call get_pageblock_migratetype() under zone->lock where possible Vlastimil Babka
  2014-02-28 14:15 ` [PATCH 2/6] mm: add get_pageblock_migratetype_nolock() for cases where locking is undesirable Vlastimil Babka
@ 2014-02-28 14:15 ` Vlastimil Babka
  2014-03-05  0:39   ` Joonsoo Kim
  2014-02-28 14:15 ` [PATCH 4/6] mm: add set_pageblock_migratetype_nolock() for calls outside zone->lock Vlastimil Babka
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2014-02-28 14:15 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Joonsoo Kim
  Cc: linux-mm, linux-kernel, Rik van Riel, Johannes Weiner,
	Minchan Kim, Kirill A. Shutemov, Vlastimil Babka

This patch complements the addition of get_pageblock_migratetype_nolock() for
the case where is_migrate_isolate_page() cannot be called with zone->lock held.
A race with set_pageblock_migratetype() may be detected, in which case a caller
supplied argument is returned.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 include/linux/page-isolation.h | 24 ++++++++++++++++++++++++
 mm/hugetlb.c                   |  2 +-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 3fff8e7..f7bd491 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -2,10 +2,30 @@
 #define __LINUX_PAGEISOLATION_H
 
 #ifdef CONFIG_MEMORY_ISOLATION
+/*
+ * Should be called only with zone->lock held. In cases where locking overhead
+ * is undesirable, consider the _nolock version.
+ */
 static inline bool is_migrate_isolate_page(struct page *page)
 {
 	return get_pageblock_migratetype(page) == MIGRATE_ISOLATE;
 }
+/*
+ * When called without zone->lock held, a race with set_pageblock_migratetype
+ * may result in bogus values. The race may be detected, in which case the
+ * value of race_fallback argument is returned. For details, see
+ * get_pageblock_migratetype_nolock().
+ */
+static inline bool is_migrate_isolate_page_nolock(struct page *page,
+		bool race_fallback)
+{
+	int migratetype = get_pageblock_migratetype_nolock(page, MIGRATE_TYPES);
+
+	if (unlikely(migratetype == MIGRATE_TYPES))
+		return race_fallback;
+
+	return migratetype == MIGRATE_ISOLATE;
+}
 static inline bool is_migrate_isolate(int migratetype)
 {
 	return migratetype == MIGRATE_ISOLATE;
@@ -15,6 +35,10 @@ static inline bool is_migrate_isolate_page(struct page *page)
 {
 	return false;
 }
+static inline bool is_migrate_isolate_page_nolock(struct page *page)
+{
+	return false;
+}
 static inline bool is_migrate_isolate(int migratetype)
 {
 	return false;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 2252cac..fac4003 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -525,7 +525,7 @@ static struct page *dequeue_huge_page_node(struct hstate *h, int nid)
 	struct page *page;
 
 	list_for_each_entry(page, &h->hugepage_freelists[nid], lru)
-		if (!is_migrate_isolate_page(page))
+		if (!is_migrate_isolate_page_nolock(page, true))
 			break;
 	/*
 	 * if 'non-isolated free hugepage' not found on the list,
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/6] mm: add set_pageblock_migratetype_nolock() for calls outside zone->lock
  2014-02-28 14:14 [PATCH 0/6] close pageblock_migratetype and pageblock_skip races Vlastimil Babka
                   ` (2 preceding siblings ...)
  2014-02-28 14:15 ` [PATCH 3/6] mm: add is_migrate_isolate_page_nolock() " Vlastimil Babka
@ 2014-02-28 14:15 ` Vlastimil Babka
  2014-02-28 14:15 ` [PATCH 5/6] mm: compaction: do not set pageblock skip bit when already set Vlastimil Babka
  2014-02-28 14:15 ` [PATCH 6/6] mm: use atomic bit operations in set_pageblock_flags_group() Vlastimil Babka
  5 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2014-02-28 14:15 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Joonsoo Kim
  Cc: linux-mm, linux-kernel, Rik van Riel, Johannes Weiner,
	Minchan Kim, Kirill A. Shutemov, Vlastimil Babka

To prevent races, set_pageblock_migratetype() should be called with zone->lock
held. This patch adds a debugging assertion and introduces a _nolock variant
for zone init functions.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index de5b419..fd6a64c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -232,7 +232,7 @@ EXPORT_SYMBOL(nr_online_nodes);
 
 int page_group_by_mobility_disabled __read_mostly;
 
-void set_pageblock_migratetype(struct page *page, int migratetype)
+static void set_pageblock_migratetype_nolock(struct page *page, int migratetype)
 {
 	if (unlikely(page_group_by_mobility_disabled &&
 		     migratetype < MIGRATE_PCPTYPES))
@@ -242,6 +242,13 @@ void set_pageblock_migratetype(struct page *page, int migratetype)
 					PB_migrate, PB_migrate_end);
 }
 
+void set_pageblock_migratetype(struct page *page, int migratetype)
+{
+	VM_BUG_ON(!spin_is_locked(&page_zone(page)->lock));
+
+	set_pageblock_migratetype_nolock(page, migratetype);
+}
+
 bool oom_killer_disabled __read_mostly;
 
 #ifdef CONFIG_DEBUG_VM
@@ -803,7 +810,7 @@ void __init init_cma_reserved_pageblock(struct page *page)
 	} while (++p, --i);
 
 	set_page_refcounted(page);
-	set_pageblock_migratetype(page, MIGRATE_CMA);
+	set_pageblock_migratetype_nolock(page, MIGRATE_CMA);
 	__free_pages(page, pageblock_order);
 	adjust_managed_page_count(page, pageblock_nr_pages);
 }
@@ -4100,7 +4107,7 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 		if ((z->zone_start_pfn <= pfn)
 		    && (pfn < zone_end_pfn(z))
 		    && !(pfn & (pageblock_nr_pages - 1)))
-			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+			set_pageblock_migratetype_nolock(page, MIGRATE_MOVABLE);
 
 		INIT_LIST_HEAD(&page->lru);
 #ifdef WANT_PAGE_VIRTUAL
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/6] mm: compaction: do not set pageblock skip bit when already set
  2014-02-28 14:14 [PATCH 0/6] close pageblock_migratetype and pageblock_skip races Vlastimil Babka
                   ` (3 preceding siblings ...)
  2014-02-28 14:15 ` [PATCH 4/6] mm: add set_pageblock_migratetype_nolock() for calls outside zone->lock Vlastimil Babka
@ 2014-02-28 14:15 ` Vlastimil Babka
  2014-02-28 14:15 ` [PATCH 6/6] mm: use atomic bit operations in set_pageblock_flags_group() Vlastimil Babka
  5 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2014-02-28 14:15 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Joonsoo Kim
  Cc: linux-mm, linux-kernel, Rik van Riel, Johannes Weiner,
	Minchan Kim, Kirill A. Shutemov, Vlastimil Babka

Compaction migratepages scanner calls update_pageblock_skip() for blocks where
isolation failed. It currently does that also for blocks where no isolation
was attempted because the skip bit was already set. This is wasteful, so this
patch reuses the existing skipped_async_unsuitable flag to avoid setting the
skip bit again.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/compaction.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index f0db73b..20a75ee 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -529,8 +529,10 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
 			int mt;
 
 			last_pageblock_nr = pageblock_nr;
-			if (!isolation_suitable(cc, page))
+			if (!isolation_suitable(cc, page)) {
+				skipped_async_unsuitable = true;
 				goto next_pageblock;
+			}
 
 			/*
 			 * For async migration, also only scan in MOVABLE
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/6] mm: use atomic bit operations in set_pageblock_flags_group()
  2014-02-28 14:14 [PATCH 0/6] close pageblock_migratetype and pageblock_skip races Vlastimil Babka
                   ` (4 preceding siblings ...)
  2014-02-28 14:15 ` [PATCH 5/6] mm: compaction: do not set pageblock skip bit when already set Vlastimil Babka
@ 2014-02-28 14:15 ` Vlastimil Babka
  2014-03-03  8:28   ` Joonsoo Kim
  5 siblings, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2014-02-28 14:15 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, Joonsoo Kim
  Cc: linux-mm, linux-kernel, Rik van Riel, Johannes Weiner,
	Minchan Kim, Kirill A. Shutemov, Vlastimil Babka

set_pageblock_flags_group() is used to set either migratetype or skip bit of a
pageblock. Setting migratetype is done under zone->lock (except from __init
code), however changing the skip bits is not protected and the pageblock flags
bitmap packs migratetype and skip bits together and uses non-atomic bit ops.
Therefore, races between setting migratetype and skip bit are possible and the
non-atomic read-modify-update of the skip bit may cause lost updates to
migratetype bits, resulting in invalid migratetype values, which are in turn
used to e.g. index free_list array.

The race has been observed to happen and cause panics, albeit during
development of series that increases frequency of migratetype changes through
{start,undo}_isolate_page_range() calls.

Two possible solutions were investigated: 1) using zone->lock for changing
pageblock_skip bit and 2) changing the bitmap operations to be atomic. The
problem of 1) is that zone->lock is already contended and almost never held in
the compaction code that updates pageblock_skip bits. Solution 2) should scale
better, but adds atomic operations also to migratype changes which are already
protected by zone->lock.

Using mmtests' stress-highalloc benchmark, little difference was found between
the two solutions. The base is 3.13 with recent compaction series by myself and
Joonsoo Kim applied.

                3.13        3.13        3.13
                base     2)atomic     1)lock
User         6103.92     6072.09     6178.79
System       1039.68     1033.96     1042.92
Elapsed      2114.27     2090.20     2110.23

For 1) stats show how many times the compaction code had to lock zone->lock
during the benchmark, or failed due to contention.

update_pageblock_skip stats:

mig scanner already locked        0
mig scanner had to lock           172985
mig scanner skip bit already set  1
mig scanner failed to lock        43
free scanner already locked       42
free scanner had to lock          499631
free scanner skip bit already set 87
free scanner failed to lock       79

For 2) Profiling found no measurable increase of time spent in the pageblock
update operations.

Therefore, solution 2) was selected as implemented by this patch. To minimize
dirty cachelines and amount of atomic ops, the bitmap bits are only changed
when needed. For migratetype, this is not racy thanks to zone->lock protection.
For pageblock_skip bits, this raciness is not an issue as the bits
are just a heuristic for memory compaction.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index fd6a64c..050bf5e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6085,11 +6085,15 @@ void set_pageblock_flags_group(struct page *page, unsigned long flags,
 	bitidx = pfn_to_bitidx(zone, pfn);
 	VM_BUG_ON_PAGE(!zone_spans_pfn(zone, pfn), page);
 
-	for (; start_bitidx <= end_bitidx; start_bitidx++, value <<= 1)
-		if (flags & value)
-			__set_bit(bitidx + start_bitidx, bitmap);
-		else
-			__clear_bit(bitidx + start_bitidx, bitmap);
+	for (; start_bitidx <= end_bitidx; start_bitidx++, value <<= 1) {
+		int oldbit = test_bit(bitidx + start_bitidx, bitmap);
+		unsigned long newbit = flags & value;
+
+		if (!oldbit && newbit)
+			set_bit(bitidx + start_bitidx, bitmap);
+		else if (oldbit && !newbit)
+			clear_bit(bitidx + start_bitidx, bitmap);
+	}
 }
 
 /*
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/6] mm: add get_pageblock_migratetype_nolock() for cases where locking is undesirable
  2014-02-28 14:15 ` [PATCH 2/6] mm: add get_pageblock_migratetype_nolock() for cases where locking is undesirable Vlastimil Babka
@ 2014-03-03  8:22   ` Joonsoo Kim
  2014-03-03 13:54     ` Vlastimil Babka
  2014-03-05  0:37   ` Joonsoo Kim
  1 sibling, 1 reply; 16+ messages in thread
From: Joonsoo Kim @ 2014-03-03  8:22 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Mel Gorman, linux-mm, linux-kernel, Rik van Riel,
	Johannes Weiner, Minchan Kim, Kirill A. Shutemov

On Fri, Feb 28, 2014 at 03:15:00PM +0100, Vlastimil Babka wrote:
> In order to prevent race with set_pageblock_migratetype, most of calls to
> get_pageblock_migratetype have been moved under zone->lock. For the remaining
> call sites, the extra locking is undesirable, notably in free_hot_cold_page().
> 
> This patch introduces a _nolock version to be used on these call sites, where
> a wrong value does not affect correctness. The function makes sure that the
> value does not exceed valid migratetype numbers. Such too-high values are
> assumed to be a result of race and caller-supplied fallback value is returned
> instead.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  include/linux/mmzone.h | 24 ++++++++++++++++++++++++
>  mm/compaction.c        | 14 +++++++++++---
>  mm/memory-failure.c    |  3 ++-
>  mm/page_alloc.c        | 22 +++++++++++++++++-----
>  mm/vmstat.c            |  2 +-
>  5 files changed, 55 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fac5509..7c3f678 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -75,6 +75,30 @@ enum {
>  
>  extern int page_group_by_mobility_disabled;
>  
> +/*
> + * When called without zone->lock held, a race with set_pageblock_migratetype
> + * may result in bogus values. Use this variant only when this does not affect
> + * correctness, and taking zone->lock would be costly. Values >= MIGRATE_TYPES
> + * are considered to be a result of this race and the value of race_fallback
> + * argument is returned instead.
> + */
> +static inline int get_pageblock_migratetype_nolock(struct page *page,
> +	int race_fallback)
> +{
> +	int ret = get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
> +
> +	if (unlikely(ret >= MIGRATE_TYPES))
> +		ret = race_fallback;
> +
> +	return ret;
> +}

Hello, Vlastimil.

First of all, thanks for nice work!
I have another opinion about this implementation. It can be wrong, so if it
is wrong, please let me know.

Although this implementation would close the race which triggers NULL dereference,
I think that this isn't enough if you have a plan to add more
{start,undo}_isolate_page_range().

Consider that there are lots of {start,undo}_isolate_page_range() calls
on the system without CMA.

bit representation of migratetype is like as following.

MIGRATE_MOVABLE = 010
MIGRATE_ISOLATE = 100

We could read following values as migratetype of the page on movable pageblock
if race occurs.

start_isolate_page_range() case: 010 -> 100
010, 000, 100

undo_isolate_page_range() case: 100 -> 010
100, 110, 010

Above implementation prevents us from getting 110, but, it can't prevent us from
getting 000, that is, MIGRATE_UNMOVABLE. If this race occurs in free_hot_cold_page(),
this page would go into unmovable pcp and then allocated for that migratetype.
It results in more fragmented memory.


Consider another case that system enables CONFIG_CMA,

MIGRATE_MOVABLE = 010
MIGRATE_ISOLATE = 101

start_isolate_page_range() case: 010 -> 101
010, 011, 001, 101

undo_isolate_page_range() case: 101 -> 010
101, 100, 110, 010

This can results in totally different values and this also makes the problem
mentioned above. And, although this doesn't cause any problem on CMA for now,
if another migratetype is introduced or some migratetype is removed, it can cause
CMA typed page to go into other migratetype and makes CMA permanently failed.

To close this kind of races without dependency how many pageblock isolation occurs,
I recommend that you use separate pageblock bits for MIGRATE_CMA, MIGRATE_ISOLATE
and use accessor function whenver we need to check migratetype. IMHO, it may not
impose much overhead.

How about it?

Thanks.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] mm: use atomic bit operations in set_pageblock_flags_group()
  2014-02-28 14:15 ` [PATCH 6/6] mm: use atomic bit operations in set_pageblock_flags_group() Vlastimil Babka
@ 2014-03-03  8:28   ` Joonsoo Kim
  2014-03-03 12:46     ` Vlastimil Babka
  0 siblings, 1 reply; 16+ messages in thread
From: Joonsoo Kim @ 2014-03-03  8:28 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Mel Gorman, linux-mm, linux-kernel, Rik van Riel,
	Johannes Weiner, Minchan Kim, Kirill A. Shutemov

On Fri, Feb 28, 2014 at 03:15:04PM +0100, Vlastimil Babka wrote:
> set_pageblock_flags_group() is used to set either migratetype or skip bit of a
> pageblock. Setting migratetype is done under zone->lock (except from __init
> code), however changing the skip bits is not protected and the pageblock flags
> bitmap packs migratetype and skip bits together and uses non-atomic bit ops.
> Therefore, races between setting migratetype and skip bit are possible and the
> non-atomic read-modify-update of the skip bit may cause lost updates to
> migratetype bits, resulting in invalid migratetype values, which are in turn
> used to e.g. index free_list array.
> 
> The race has been observed to happen and cause panics, albeit during
> development of series that increases frequency of migratetype changes through
> {start,undo}_isolate_page_range() calls.
> 
> Two possible solutions were investigated: 1) using zone->lock for changing
> pageblock_skip bit and 2) changing the bitmap operations to be atomic. The
> problem of 1) is that zone->lock is already contended and almost never held in
> the compaction code that updates pageblock_skip bits. Solution 2) should scale
> better, but adds atomic operations also to migratype changes which are already
> protected by zone->lock.

How about 3) introduce new bitmap for pageblock_skip?
I guess that migratetype bitmap is read-intensive and set/clear pageblock_skip
could make performance degradation.

> 
> Using mmtests' stress-highalloc benchmark, little difference was found between
> the two solutions. The base is 3.13 with recent compaction series by myself and
> Joonsoo Kim applied.
> 
>                 3.13        3.13        3.13
>                 base     2)atomic     1)lock
> User         6103.92     6072.09     6178.79
> System       1039.68     1033.96     1042.92
> Elapsed      2114.27     2090.20     2110.23
> 

I really wonder how 2) is better than base although there is a little difference.
Is it the avg result of 10 runs? Do you have any idea what happens?

Thanks.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/6] mm: use atomic bit operations in set_pageblock_flags_group()
  2014-03-03  8:28   ` Joonsoo Kim
@ 2014-03-03 12:46     ` Vlastimil Babka
  0 siblings, 0 replies; 16+ messages in thread
From: Vlastimil Babka @ 2014-03-03 12:46 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Mel Gorman, linux-mm, linux-kernel, Rik van Riel,
	Johannes Weiner, Minchan Kim, Kirill A. Shutemov

On 03/03/2014 09:28 AM, Joonsoo Kim wrote:
> On Fri, Feb 28, 2014 at 03:15:04PM +0100, Vlastimil Babka wrote:
>> set_pageblock_flags_group() is used to set either migratetype or skip bit of a
>> pageblock. Setting migratetype is done under zone->lock (except from __init
>> code), however changing the skip bits is not protected and the pageblock flags
>> bitmap packs migratetype and skip bits together and uses non-atomic bit ops.
>> Therefore, races between setting migratetype and skip bit are possible and the
>> non-atomic read-modify-update of the skip bit may cause lost updates to
>> migratetype bits, resulting in invalid migratetype values, which are in turn
>> used to e.g. index free_list array.
>>
>> The race has been observed to happen and cause panics, albeit during
>> development of series that increases frequency of migratetype changes through
>> {start,undo}_isolate_page_range() calls.
>>
>> Two possible solutions were investigated: 1) using zone->lock for changing
>> pageblock_skip bit and 2) changing the bitmap operations to be atomic. The
>> problem of 1) is that zone->lock is already contended and almost never held in
>> the compaction code that updates pageblock_skip bits. Solution 2) should scale
>> better, but adds atomic operations also to migratype changes which are already
>> protected by zone->lock.
>
> How about 3) introduce new bitmap for pageblock_skip?
> I guess that migratetype bitmap is read-intensive and set/clear pageblock_skip
> could make performance degradation.

Yes that would be also possible, but was deemed too ugly and maybe even 
uglier in case some new pageblock bits are introduced. But it seems no 
performance degradation was observed for 1) and 2).

I guess if we left the whole idea of packed bitmap we could also make 
atomic the update of the whole migratetype instead of processing each 
bit separately. But that would mean at least 8 bits per pageblock for 
migratetype (and I have no idea about specifics for other archs than x86 
here). Maybe 4 bits if it's even more ugly and distinguishes odd and 
even pageblocks...

>>
>> Using mmtests' stress-highalloc benchmark, little difference was found between
>> the two solutions. The base is 3.13 with recent compaction series by myself and
>> Joonsoo Kim applied.
>>
>>                  3.13        3.13        3.13
>>                  base     2)atomic     1)lock
>> User         6103.92     6072.09     6178.79
>> System       1039.68     1033.96     1042.92
>> Elapsed      2114.27     2090.20     2110.23
>>
>
> I really wonder how 2) is better than base although there is a little difference.
> Is it the avg result of 10 runs? Do you have any idea what happens?

It is avg of 10 runs but I guess this just means 10 runs are not enough 
to get results precise enough. One difference is that atomic version 
does not clear/set bits that don't need it, but the profiles show the 
whole operation is pretty negligible. And if at least one bit is changed 
(I guess it is, unless migratetypes are somewhere set to the same value 
as they already are), cache line becomes dirty anyway. And again, 
profiles suggest that very little cache is dirtied here.

Vlastimil

> Thanks.
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/6] mm: add get_pageblock_migratetype_nolock() for cases where locking is undesirable
  2014-03-03  8:22   ` Joonsoo Kim
@ 2014-03-03 13:54     ` Vlastimil Babka
  2014-03-04  0:55       ` Joonsoo Kim
  0 siblings, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2014-03-03 13:54 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Mel Gorman, linux-mm, linux-kernel, Rik van Riel,
	Johannes Weiner, Minchan Kim, Kirill A. Shutemov

On 03/03/2014 09:22 AM, Joonsoo Kim wrote:
> On Fri, Feb 28, 2014 at 03:15:00PM +0100, Vlastimil Babka wrote:
>> In order to prevent race with set_pageblock_migratetype, most of calls to
>> get_pageblock_migratetype have been moved under zone->lock. For the remaining
>> call sites, the extra locking is undesirable, notably in free_hot_cold_page().
>>
>> This patch introduces a _nolock version to be used on these call sites, where
>> a wrong value does not affect correctness. The function makes sure that the
>> value does not exceed valid migratetype numbers. Such too-high values are
>> assumed to be a result of race and caller-supplied fallback value is returned
>> instead.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
>>   include/linux/mmzone.h | 24 ++++++++++++++++++++++++
>>   mm/compaction.c        | 14 +++++++++++---
>>   mm/memory-failure.c    |  3 ++-
>>   mm/page_alloc.c        | 22 +++++++++++++++++-----
>>   mm/vmstat.c            |  2 +-
>>   5 files changed, 55 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index fac5509..7c3f678 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -75,6 +75,30 @@ enum {
>>
>>   extern int page_group_by_mobility_disabled;
>>
>> +/*
>> + * When called without zone->lock held, a race with set_pageblock_migratetype
>> + * may result in bogus values. Use this variant only when this does not affect
>> + * correctness, and taking zone->lock would be costly. Values >= MIGRATE_TYPES
>> + * are considered to be a result of this race and the value of race_fallback
>> + * argument is returned instead.
>> + */
>> +static inline int get_pageblock_migratetype_nolock(struct page *page,
>> +	int race_fallback)
>> +{
>> +	int ret = get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
>> +
>> +	if (unlikely(ret >= MIGRATE_TYPES))
>> +		ret = race_fallback;
>> +
>> +	return ret;
>> +}
>
> Hello, Vlastimil.
>
> First of all, thanks for nice work!
> I have another opinion about this implementation. It can be wrong, so if it
> is wrong, please let me know.

Thanks, all opinions/reviewing is welcome :)

> Although this implementation would close the race which triggers NULL dereference,
> I think that this isn't enough if you have a plan to add more
> {start,undo}_isolate_page_range().
>
> Consider that there are lots of {start,undo}_isolate_page_range() calls
> on the system without CMA.
>
> bit representation of migratetype is like as following.
>
> MIGRATE_MOVABLE = 010
> MIGRATE_ISOLATE = 100
>
> We could read following values as migratetype of the page on movable pageblock
> if race occurs.
>
> start_isolate_page_range() case: 010 -> 100
> 010, 000, 100
>
> undo_isolate_page_range() case: 100 -> 010
> 100, 110, 010
>
> Above implementation prevents us from getting 110, but, it can't prevent us from
> getting 000, that is, MIGRATE_UNMOVABLE. If this race occurs in free_hot_cold_page(),
> this page would go into unmovable pcp and then allocated for that migratetype.
> It results in more fragmented memory.

Yes, that can happen. But I would expect it to be negligible to other 
causes of fragmentation. But I'm not at this moment sure how often 
{start,undo}_isolate_page_range() would be called in the end. Certainly
not as often as in the development patch which is just to see if that 
can improve anything. Because it will have its own overhead (mostly for 
zone->lock) that might be too large. But good point, I will try to 
quantify this.

>
> Consider another case that system enables CONFIG_CMA,
>
> MIGRATE_MOVABLE = 010
> MIGRATE_ISOLATE = 101
>
> start_isolate_page_range() case: 010 -> 101
> 010, 011, 001, 101
>
> undo_isolate_page_range() case: 101 -> 010
> 101, 100, 110, 010
>
> This can results in totally different values and this also makes the problem
> mentioned above. And, although this doesn't cause any problem on CMA for now,
> if another migratetype is introduced or some migratetype is removed, it can cause
> CMA typed page to go into other migratetype and makes CMA permanently failed.

This should actually be no problem for free_hot_cold_page() as any 
migratetype >= MIGRATE_PCPTYPES will defer to free_one_page() which will 
reread migratetype under zone->lock. So as long as MIGRATE_PCPTYPES does 
not include a migratetype with such dangerous "permanently failed" 
properties, it should be good. And I doubt such migratetype would be 
added to pcptypes. But of course, anyone adding new migratetype would 
have to reconsider each get_pageblock_migratetype_nolock() call for such 
potential problems.

> To close this kind of races without dependency how many pageblock isolation occurs,
> I recommend that you use separate pageblock bits for MIGRATE_CMA, MIGRATE_ISOLATE
> and use accessor function whenver we need to check migratetype. IMHO, it may not
> impose much overhead.

That could work in case the fragmentation is confirmed to be a problem.

Thanks,
Vlastimil

> How about it?
>
> Thanks.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/6] mm: add get_pageblock_migratetype_nolock() for cases where locking is undesirable
  2014-03-03 13:54     ` Vlastimil Babka
@ 2014-03-04  0:55       ` Joonsoo Kim
  2014-03-04 12:16         ` Vlastimil Babka
  0 siblings, 1 reply; 16+ messages in thread
From: Joonsoo Kim @ 2014-03-04  0:55 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Mel Gorman, linux-mm, linux-kernel, Rik van Riel,
	Johannes Weiner, Minchan Kim, Kirill A. Shutemov

On Mon, Mar 03, 2014 at 02:54:09PM +0100, Vlastimil Babka wrote:
> On 03/03/2014 09:22 AM, Joonsoo Kim wrote:
> >On Fri, Feb 28, 2014 at 03:15:00PM +0100, Vlastimil Babka wrote:
> >>In order to prevent race with set_pageblock_migratetype, most of calls to
> >>get_pageblock_migratetype have been moved under zone->lock. For the remaining
> >>call sites, the extra locking is undesirable, notably in free_hot_cold_page().
> >>
> >>This patch introduces a _nolock version to be used on these call sites, where
> >>a wrong value does not affect correctness. The function makes sure that the
> >>value does not exceed valid migratetype numbers. Such too-high values are
> >>assumed to be a result of race and caller-supplied fallback value is returned
> >>instead.
> >>
> >>Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >>---
> >>  include/linux/mmzone.h | 24 ++++++++++++++++++++++++
> >>  mm/compaction.c        | 14 +++++++++++---
> >>  mm/memory-failure.c    |  3 ++-
> >>  mm/page_alloc.c        | 22 +++++++++++++++++-----
> >>  mm/vmstat.c            |  2 +-
> >>  5 files changed, 55 insertions(+), 10 deletions(-)
> >>
> >>diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >>index fac5509..7c3f678 100644
> >>--- a/include/linux/mmzone.h
> >>+++ b/include/linux/mmzone.h
> >>@@ -75,6 +75,30 @@ enum {
> >>
> >>  extern int page_group_by_mobility_disabled;
> >>
> >>+/*
> >>+ * When called without zone->lock held, a race with set_pageblock_migratetype
> >>+ * may result in bogus values. Use this variant only when this does not affect
> >>+ * correctness, and taking zone->lock would be costly. Values >= MIGRATE_TYPES
> >>+ * are considered to be a result of this race and the value of race_fallback
> >>+ * argument is returned instead.
> >>+ */
> >>+static inline int get_pageblock_migratetype_nolock(struct page *page,
> >>+	int race_fallback)
> >>+{
> >>+	int ret = get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
> >>+
> >>+	if (unlikely(ret >= MIGRATE_TYPES))
> >>+		ret = race_fallback;
> >>+
> >>+	return ret;
> >>+}
> >
> >Hello, Vlastimil.
> >
> >First of all, thanks for nice work!
> >I have another opinion about this implementation. It can be wrong, so if it
> >is wrong, please let me know.
> 
> Thanks, all opinions/reviewing is welcome :)
> 
> >Although this implementation would close the race which triggers NULL dereference,
> >I think that this isn't enough if you have a plan to add more
> >{start,undo}_isolate_page_range().
> >
> >Consider that there are lots of {start,undo}_isolate_page_range() calls
> >on the system without CMA.
> >
> >bit representation of migratetype is like as following.
> >
> >MIGRATE_MOVABLE = 010
> >MIGRATE_ISOLATE = 100
> >
> >We could read following values as migratetype of the page on movable pageblock
> >if race occurs.
> >
> >start_isolate_page_range() case: 010 -> 100
> >010, 000, 100
> >
> >undo_isolate_page_range() case: 100 -> 010
> >100, 110, 010
> >
> >Above implementation prevents us from getting 110, but, it can't prevent us from
> >getting 000, that is, MIGRATE_UNMOVABLE. If this race occurs in free_hot_cold_page(),
> >this page would go into unmovable pcp and then allocated for that migratetype.
> >It results in more fragmented memory.
> 
> Yes, that can happen. But I would expect it to be negligible to
> other causes of fragmentation. But I'm not at this moment sure how
> often {start,undo}_isolate_page_range() would be called in the end.
> Certainly
> not as often as in the development patch which is just to see if
> that can improve anything. Because it will have its own overhead
> (mostly for zone->lock) that might be too large. But good point, I
> will try to quantify this.
> 
> >
> >Consider another case that system enables CONFIG_CMA,
> >
> >MIGRATE_MOVABLE = 010
> >MIGRATE_ISOLATE = 101
> >
> >start_isolate_page_range() case: 010 -> 101
> >010, 011, 001, 101
> >
> >undo_isolate_page_range() case: 101 -> 010
> >101, 100, 110, 010
> >
> >This can results in totally different values and this also makes the problem
> >mentioned above. And, although this doesn't cause any problem on CMA for now,
> >if another migratetype is introduced or some migratetype is removed, it can cause
> >CMA typed page to go into other migratetype and makes CMA permanently failed.
> 
> This should actually be no problem for free_hot_cold_page() as any
> migratetype >= MIGRATE_PCPTYPES will defer to free_one_page() which
> will reread migratetype under zone->lock. So as long as
> MIGRATE_PCPTYPES does not include a migratetype with such dangerous
> "permanently failed" properties, it should be good. And I doubt such
> migratetype would be added to pcptypes. But of course, anyone adding
> new migratetype would have to reconsider each
> get_pageblock_migratetype_nolock() call for such potential problems.

Please let me explain more.
Now CMA page can have following race values.

MIGRATE_CMA = 100
MIGRATE_ISOLATE = 101

start_isolate_page_range(): 100 -> 101
100, 101
undo_isolate_page_range(): 101 -> 100
101, 100

So, race doesn't cause any big problem.

But, as you mentioned in earlier patch, it could get worse if MIGRATE_RESERVE
is removed. It doesn't happen until now, but, it can be possible.

In that case,

MIGRATE_CMA = 011
MIGRATE_ISOLATE = 100

start_isolate_page_range(): 011 -> 100
011, 010, 000, 100
undo_isolate_page_range(): 100 -> 011
100, 101, 111, 011

If this race happens, CMA page can go into MIGRATE_UNMOVABLE list, because
"migratetpye >= MIGRATE_PCPTYPES" can't prevent it, and this could make
CMA permanently failed.

I think that to dump the responsibility on developer who want to add/remove migratetype
is not reasonable and doesn't work well, because they may not have enough background
knowledge. I hope to close the possible race more in this time.

Thanks.

> 
> >To close this kind of races without dependency how many pageblock isolation occurs,
> >I recommend that you use separate pageblock bits for MIGRATE_CMA, MIGRATE_ISOLATE
> >and use accessor function whenver we need to check migratetype. IMHO, it may not
> >impose much overhead.
> 
> That could work in case the fragmentation is confirmed to be a problem.
> 
> Thanks,
> Vlastimil
> 
> >How about it?
> >
> >Thanks.
> >
> >--
> >To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >the body to majordomo@kvack.org.  For more info on Linux MM,
> >see: http://www.linux-mm.org/ .
> >Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> >
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/6] mm: add get_pageblock_migratetype_nolock() for cases where locking is undesirable
  2014-03-04  0:55       ` Joonsoo Kim
@ 2014-03-04 12:16         ` Vlastimil Babka
  2014-03-05  0:29           ` Joonsoo Kim
  0 siblings, 1 reply; 16+ messages in thread
From: Vlastimil Babka @ 2014-03-04 12:16 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Mel Gorman, linux-mm, linux-kernel, Rik van Riel,
	Johannes Weiner, Minchan Kim, Kirill A. Shutemov

On 03/04/2014 01:55 AM, Joonsoo Kim wrote:
> On Mon, Mar 03, 2014 at 02:54:09PM +0100, Vlastimil Babka wrote:
>> On 03/03/2014 09:22 AM, Joonsoo Kim wrote:
>>> On Fri, Feb 28, 2014 at 03:15:00PM +0100, Vlastimil Babka wrote:
>>>> In order to prevent race with set_pageblock_migratetype, most of calls to
>>>> get_pageblock_migratetype have been moved under zone->lock. For the remaining
>>>> call sites, the extra locking is undesirable, notably in free_hot_cold_page().
>>>>
>>>> This patch introduces a _nolock version to be used on these call sites, where
>>>> a wrong value does not affect correctness. The function makes sure that the
>>>> value does not exceed valid migratetype numbers. Such too-high values are
>>>> assumed to be a result of race and caller-supplied fallback value is returned
>>>> instead.
>>>>
>>>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>>>> ---
>>>>   include/linux/mmzone.h | 24 ++++++++++++++++++++++++
>>>>   mm/compaction.c        | 14 +++++++++++---
>>>>   mm/memory-failure.c    |  3 ++-
>>>>   mm/page_alloc.c        | 22 +++++++++++++++++-----
>>>>   mm/vmstat.c            |  2 +-
>>>>   5 files changed, 55 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>>> index fac5509..7c3f678 100644
>>>> --- a/include/linux/mmzone.h
>>>> +++ b/include/linux/mmzone.h
>>>> @@ -75,6 +75,30 @@ enum {
>>>>
>>>>   extern int page_group_by_mobility_disabled;
>>>>
>>>> +/*
>>>> + * When called without zone->lock held, a race with set_pageblock_migratetype
>>>> + * may result in bogus values. Use this variant only when this does not affect
>>>> + * correctness, and taking zone->lock would be costly. Values >= MIGRATE_TYPES
>>>> + * are considered to be a result of this race and the value of race_fallback
>>>> + * argument is returned instead.
>>>> + */
>>>> +static inline int get_pageblock_migratetype_nolock(struct page *page,
>>>> +	int race_fallback)
>>>> +{
>>>> +	int ret = get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
>>>> +
>>>> +	if (unlikely(ret >= MIGRATE_TYPES))
>>>> +		ret = race_fallback;
>>>> +
>>>> +	return ret;
>>>> +}
>>>
>>> Hello, Vlastimil.
>>>
>>> First of all, thanks for nice work!
>>> I have another opinion about this implementation. It can be wrong, so if it
>>> is wrong, please let me know.
>>
>> Thanks, all opinions/reviewing is welcome :)
>>
>>> Although this implementation would close the race which triggers NULL dereference,
>>> I think that this isn't enough if you have a plan to add more
>>> {start,undo}_isolate_page_range().
>>>
>>> Consider that there are lots of {start,undo}_isolate_page_range() calls
>>> on the system without CMA.
>>>
>>> bit representation of migratetype is like as following.
>>>
>>> MIGRATE_MOVABLE = 010
>>> MIGRATE_ISOLATE = 100
>>>
>>> We could read following values as migratetype of the page on movable pageblock
>>> if race occurs.
>>>
>>> start_isolate_page_range() case: 010 -> 100
>>> 010, 000, 100
>>>
>>> undo_isolate_page_range() case: 100 -> 010
>>> 100, 110, 010
>>>
>>> Above implementation prevents us from getting 110, but, it can't prevent us from
>>> getting 000, that is, MIGRATE_UNMOVABLE. If this race occurs in free_hot_cold_page(),
>>> this page would go into unmovable pcp and then allocated for that migratetype.
>>> It results in more fragmented memory.
>>
>> Yes, that can happen. But I would expect it to be negligible to
>> other causes of fragmentation. But I'm not at this moment sure how
>> often {start,undo}_isolate_page_range() would be called in the end.
>> Certainly
>> not as often as in the development patch which is just to see if
>> that can improve anything. Because it will have its own overhead
>> (mostly for zone->lock) that might be too large. But good point, I
>> will try to quantify this.
>>
>>>
>>> Consider another case that system enables CONFIG_CMA,
>>>
>>> MIGRATE_MOVABLE = 010
>>> MIGRATE_ISOLATE = 101
>>>
>>> start_isolate_page_range() case: 010 -> 101
>>> 010, 011, 001, 101
>>>
>>> undo_isolate_page_range() case: 101 -> 010
>>> 101, 100, 110, 010
>>>
>>> This can results in totally different values and this also makes the problem
>>> mentioned above. And, although this doesn't cause any problem on CMA for now,
>>> if another migratetype is introduced or some migratetype is removed, it can cause
>>> CMA typed page to go into other migratetype and makes CMA permanently failed.
>>
>> This should actually be no problem for free_hot_cold_page() as any
>> migratetype >= MIGRATE_PCPTYPES will defer to free_one_page() which
>> will reread migratetype under zone->lock. So as long as
>> MIGRATE_PCPTYPES does not include a migratetype with such dangerous
>> "permanently failed" properties, it should be good. And I doubt such
>> migratetype would be added to pcptypes. But of course, anyone adding
>> new migratetype would have to reconsider each
>> get_pageblock_migratetype_nolock() call for such potential problems.
>
> Please let me explain more.
> Now CMA page can have following race values.
>
> MIGRATE_CMA = 100
> MIGRATE_ISOLATE = 101
>
> start_isolate_page_range(): 100 -> 101
> 100, 101
> undo_isolate_page_range(): 101 -> 100
> 101, 100
>
> So, race doesn't cause any big problem.
>
> But, as you mentioned in earlier patch, it could get worse if MIGRATE_RESERVE
> is removed. It doesn't happen until now, but, it can be possible.
>
> In that case,
>
> MIGRATE_CMA = 011
> MIGRATE_ISOLATE = 100
>
> start_isolate_page_range(): 011 -> 100
> 011, 010, 000, 100
> undo_isolate_page_range(): 100 -> 011
> 100, 101, 111, 011
>
> If this race happens, CMA page can go into MIGRATE_UNMOVABLE list, because
> "migratetpye >= MIGRATE_PCPTYPES" can't prevent it, and this could make
> CMA permanently failed.

Oh, I understand what you mean now, sorry. And since 
alloc_contig_range() is already doing just this kind of 
CMA->ISOLATE->CMA transitions, it could really be a problem even without 
my pending work. But maybe it would be enough to add just extra 
PG_isolate bit to prevent this, and CMA could stay within the current 
migratetype bits, no? The separate bit could be even useful to simplify 
my work.
Would you agree that this can be postponed a bit as I develop the 
further compaction series, since CMA currently doesn't have the 
dangerous value? Maybe there will be new concerns that will lead to 
different solution.
This series already prevents possible panic, which is worse than this issue.

> I think that to dump the responsibility on developer who want to add/remove migratetype
> is not reasonable and doesn't work well, because they may not have enough background
> knowledge. I hope to close the possible race more in this time.

Right, but even if we now added separate bits for ISOLATE and CMA, 
anyone adding new migratetype would still have to think how to handle 
that one (common migratetype bits or also some new bit?) and what are 
the consequences.

Vlastimil

> Thanks.
>
>>
>>> To close this kind of races without dependency how many pageblock isolation occurs,
>>> I recommend that you use separate pageblock bits for MIGRATE_CMA, MIGRATE_ISOLATE
>>> and use accessor function whenver we need to check migratetype. IMHO, it may not
>>> impose much overhead.
>>
>> That could work in case the fragmentation is confirmed to be a problem.
>>
>> Thanks,
>> Vlastimil
>>
>>> How about it?
>>>
>>> Thanks.
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majordomo@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/6] mm: add get_pageblock_migratetype_nolock() for cases where locking is undesirable
  2014-03-04 12:16         ` Vlastimil Babka
@ 2014-03-05  0:29           ` Joonsoo Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Joonsoo Kim @ 2014-03-05  0:29 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Mel Gorman, linux-mm, linux-kernel, Rik van Riel,
	Johannes Weiner, Minchan Kim, Kirill A. Shutemov

On Tue, Mar 04, 2014 at 01:16:56PM +0100, Vlastimil Babka wrote:
> On 03/04/2014 01:55 AM, Joonsoo Kim wrote:
> >On Mon, Mar 03, 2014 at 02:54:09PM +0100, Vlastimil Babka wrote:
> >>On 03/03/2014 09:22 AM, Joonsoo Kim wrote:
> >>>On Fri, Feb 28, 2014 at 03:15:00PM +0100, Vlastimil Babka wrote:
> >>>>In order to prevent race with set_pageblock_migratetype, most of calls to
> >>>>get_pageblock_migratetype have been moved under zone->lock. For the remaining
> >>>>call sites, the extra locking is undesirable, notably in free_hot_cold_page().
> >>>>
> >>>>This patch introduces a _nolock version to be used on these call sites, where
> >>>>a wrong value does not affect correctness. The function makes sure that the
> >>>>value does not exceed valid migratetype numbers. Such too-high values are
> >>>>assumed to be a result of race and caller-supplied fallback value is returned
> >>>>instead.
> >>>>
> >>>>Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >>>>---
> >>>>  include/linux/mmzone.h | 24 ++++++++++++++++++++++++
> >>>>  mm/compaction.c        | 14 +++++++++++---
> >>>>  mm/memory-failure.c    |  3 ++-
> >>>>  mm/page_alloc.c        | 22 +++++++++++++++++-----
> >>>>  mm/vmstat.c            |  2 +-
> >>>>  5 files changed, 55 insertions(+), 10 deletions(-)
> >>>>
> >>>>diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >>>>index fac5509..7c3f678 100644
> >>>>--- a/include/linux/mmzone.h
> >>>>+++ b/include/linux/mmzone.h
> >>>>@@ -75,6 +75,30 @@ enum {
> >>>>
> >>>>  extern int page_group_by_mobility_disabled;
> >>>>
> >>>>+/*
> >>>>+ * When called without zone->lock held, a race with set_pageblock_migratetype
> >>>>+ * may result in bogus values. Use this variant only when this does not affect
> >>>>+ * correctness, and taking zone->lock would be costly. Values >= MIGRATE_TYPES
> >>>>+ * are considered to be a result of this race and the value of race_fallback
> >>>>+ * argument is returned instead.
> >>>>+ */
> >>>>+static inline int get_pageblock_migratetype_nolock(struct page *page,
> >>>>+	int race_fallback)
> >>>>+{
> >>>>+	int ret = get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
> >>>>+
> >>>>+	if (unlikely(ret >= MIGRATE_TYPES))
> >>>>+		ret = race_fallback;
> >>>>+
> >>>>+	return ret;
> >>>>+}
> >>>
> >>>Hello, Vlastimil.
> >>>
> >>>First of all, thanks for nice work!
> >>>I have another opinion about this implementation. It can be wrong, so if it
> >>>is wrong, please let me know.
> >>
> >>Thanks, all opinions/reviewing is welcome :)
> >>
> >>>Although this implementation would close the race which triggers NULL dereference,
> >>>I think that this isn't enough if you have a plan to add more
> >>>{start,undo}_isolate_page_range().
> >>>
> >>>Consider that there are lots of {start,undo}_isolate_page_range() calls
> >>>on the system without CMA.
> >>>
> >>>bit representation of migratetype is like as following.
> >>>
> >>>MIGRATE_MOVABLE = 010
> >>>MIGRATE_ISOLATE = 100
> >>>
> >>>We could read following values as migratetype of the page on movable pageblock
> >>>if race occurs.
> >>>
> >>>start_isolate_page_range() case: 010 -> 100
> >>>010, 000, 100
> >>>
> >>>undo_isolate_page_range() case: 100 -> 010
> >>>100, 110, 010
> >>>
> >>>Above implementation prevents us from getting 110, but, it can't prevent us from
> >>>getting 000, that is, MIGRATE_UNMOVABLE. If this race occurs in free_hot_cold_page(),
> >>>this page would go into unmovable pcp and then allocated for that migratetype.
> >>>It results in more fragmented memory.
> >>
> >>Yes, that can happen. But I would expect it to be negligible to
> >>other causes of fragmentation. But I'm not at this moment sure how
> >>often {start,undo}_isolate_page_range() would be called in the end.
> >>Certainly
> >>not as often as in the development patch which is just to see if
> >>that can improve anything. Because it will have its own overhead
> >>(mostly for zone->lock) that might be too large. But good point, I
> >>will try to quantify this.
> >>
> >>>
> >>>Consider another case that system enables CONFIG_CMA,
> >>>
> >>>MIGRATE_MOVABLE = 010
> >>>MIGRATE_ISOLATE = 101
> >>>
> >>>start_isolate_page_range() case: 010 -> 101
> >>>010, 011, 001, 101
> >>>
> >>>undo_isolate_page_range() case: 101 -> 010
> >>>101, 100, 110, 010
> >>>
> >>>This can results in totally different values and this also makes the problem
> >>>mentioned above. And, although this doesn't cause any problem on CMA for now,
> >>>if another migratetype is introduced or some migratetype is removed, it can cause
> >>>CMA typed page to go into other migratetype and makes CMA permanently failed.
> >>
> >>This should actually be no problem for free_hot_cold_page() as any
> >>migratetype >= MIGRATE_PCPTYPES will defer to free_one_page() which
> >>will reread migratetype under zone->lock. So as long as
> >>MIGRATE_PCPTYPES does not include a migratetype with such dangerous
> >>"permanently failed" properties, it should be good. And I doubt such
> >>migratetype would be added to pcptypes. But of course, anyone adding
> >>new migratetype would have to reconsider each
> >>get_pageblock_migratetype_nolock() call for such potential problems.
> >
> >Please let me explain more.
> >Now CMA page can have following race values.
> >
> >MIGRATE_CMA = 100
> >MIGRATE_ISOLATE = 101
> >
> >start_isolate_page_range(): 100 -> 101
> >100, 101
> >undo_isolate_page_range(): 101 -> 100
> >101, 100
> >
> >So, race doesn't cause any big problem.
> >
> >But, as you mentioned in earlier patch, it could get worse if MIGRATE_RESERVE
> >is removed. It doesn't happen until now, but, it can be possible.
> >
> >In that case,
> >
> >MIGRATE_CMA = 011
> >MIGRATE_ISOLATE = 100
> >
> >start_isolate_page_range(): 011 -> 100
> >011, 010, 000, 100
> >undo_isolate_page_range(): 100 -> 011
> >100, 101, 111, 011
> >
> >If this race happens, CMA page can go into MIGRATE_UNMOVABLE list, because
> >"migratetpye >= MIGRATE_PCPTYPES" can't prevent it, and this could make
> >CMA permanently failed.
> 
> Oh, I understand what you mean now, sorry. And since
> alloc_contig_range() is already doing just this kind of
> CMA->ISOLATE->CMA transitions, it could really be a problem even
> without my pending work. But maybe it would be enough to add just
> extra PG_isolate bit to prevent this, and CMA could stay within the
> current migratetype bits, no? The separate bit could be even useful
> to simplify my work.

Yes, CMA could stay within the current migratetype bits.

> Would you agree that this can be postponed a bit as I develop the
> further compaction series, since CMA currently doesn't have the
> dangerous value? Maybe there will be new concerns that will lead to
> different solution.
> This series already prevents possible panic, which is worse than this issue.

If we introduce separate bit for MIGRATE_ISOLATE and makes bit operation atomic(6/6),
this patchset can be changed. Although I think that it is better to distinguish
lock/nolock case, nolock variant has no effect since there is no possibility
to get strange race values prevented by this patch if we have separate bit for
MIGRATE_ISOLATE. So I would like to see the change at this time. But, I don't
have strong objection to current implementation. :)

> 
> >I think that to dump the responsibility on developer who want to add/remove migratetype
> >is not reasonable and doesn't work well, because they may not have enough background
> >knowledge. I hope to close the possible race more in this time.
> 
> Right, but even if we now added separate bits for ISOLATE and CMA,
> anyone adding new migratetype would still have to think how to
> handle that one (common migratetype bits or also some new bit?) and
> what are the consequences.

Okay.

Thanks.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/6] mm: add get_pageblock_migratetype_nolock() for cases where locking is undesirable
  2014-02-28 14:15 ` [PATCH 2/6] mm: add get_pageblock_migratetype_nolock() for cases where locking is undesirable Vlastimil Babka
  2014-03-03  8:22   ` Joonsoo Kim
@ 2014-03-05  0:37   ` Joonsoo Kim
  1 sibling, 0 replies; 16+ messages in thread
From: Joonsoo Kim @ 2014-03-05  0:37 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Mel Gorman, linux-mm, linux-kernel, Rik van Riel,
	Johannes Weiner, Minchan Kim, Kirill A. Shutemov

On Fri, Feb 28, 2014 at 03:15:00PM +0100, Vlastimil Babka wrote:
> In order to prevent race with set_pageblock_migratetype, most of calls to
> get_pageblock_migratetype have been moved under zone->lock. For the remaining
> call sites, the extra locking is undesirable, notably in free_hot_cold_page().
> 
> This patch introduces a _nolock version to be used on these call sites, where
> a wrong value does not affect correctness. The function makes sure that the
> value does not exceed valid migratetype numbers. Such too-high values are
> assumed to be a result of race and caller-supplied fallback value is returned
> instead.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  include/linux/mmzone.h | 24 ++++++++++++++++++++++++
>  mm/compaction.c        | 14 +++++++++++---
>  mm/memory-failure.c    |  3 ++-
>  mm/page_alloc.c        | 22 +++++++++++++++++-----
>  mm/vmstat.c            |  2 +-
>  5 files changed, 55 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fac5509..7c3f678 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -75,6 +75,30 @@ enum {
>  
>  extern int page_group_by_mobility_disabled;
>  
> +/*
> + * When called without zone->lock held, a race with set_pageblock_migratetype
> + * may result in bogus values. Use this variant only when this does not affect
> + * correctness, and taking zone->lock would be costly. Values >= MIGRATE_TYPES
> + * are considered to be a result of this race and the value of race_fallback
> + * argument is returned instead.
> + */
> +static inline int get_pageblock_migratetype_nolock(struct page *page,
> +	int race_fallback)
> +{
> +	int ret = get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
> +
> +	if (unlikely(ret >= MIGRATE_TYPES))
> +		ret = race_fallback;
> +
> +	return ret;
> +}

How about below forms?

get_pageblock_migratetype_locked(struct page *page)
get_pageblock_migratetype(struct page *page, int race_fallback)

get_pageblock_migratetype() and _nolock looks error-prone because developer
who try to use get_pageblock_migratetype() may not know that it needs lock.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] mm: add is_migrate_isolate_page_nolock() for cases where locking is undesirable
  2014-02-28 14:15 ` [PATCH 3/6] mm: add is_migrate_isolate_page_nolock() " Vlastimil Babka
@ 2014-03-05  0:39   ` Joonsoo Kim
  0 siblings, 0 replies; 16+ messages in thread
From: Joonsoo Kim @ 2014-03-05  0:39 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Mel Gorman, linux-mm, linux-kernel, Rik van Riel,
	Johannes Weiner, Minchan Kim, Kirill A. Shutemov

On Fri, Feb 28, 2014 at 03:15:01PM +0100, Vlastimil Babka wrote:
> This patch complements the addition of get_pageblock_migratetype_nolock() for
> the case where is_migrate_isolate_page() cannot be called with zone->lock held.
> A race with set_pageblock_migratetype() may be detected, in which case a caller
> supplied argument is returned.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  include/linux/page-isolation.h | 24 ++++++++++++++++++++++++
>  mm/hugetlb.c                   |  2 +-
>  2 files changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 3fff8e7..f7bd491 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -2,10 +2,30 @@
>  #define __LINUX_PAGEISOLATION_H
>  
>  #ifdef CONFIG_MEMORY_ISOLATION
> +/*
> + * Should be called only with zone->lock held. In cases where locking overhead
> + * is undesirable, consider the _nolock version.
> + */
>  static inline bool is_migrate_isolate_page(struct page *page)
>  {
>  	return get_pageblock_migratetype(page) == MIGRATE_ISOLATE;
>  }
> +/*
> + * When called without zone->lock held, a race with set_pageblock_migratetype
> + * may result in bogus values. The race may be detected, in which case the
> + * value of race_fallback argument is returned. For details, see
> + * get_pageblock_migratetype_nolock().
> + */
> +static inline bool is_migrate_isolate_page_nolock(struct page *page,
> +		bool race_fallback)
> +{
> +	int migratetype = get_pageblock_migratetype_nolock(page, MIGRATE_TYPES);
> +
> +	if (unlikely(migratetype == MIGRATE_TYPES))
> +		return race_fallback;
> +
> +	return migratetype == MIGRATE_ISOLATE;
> +}
>  static inline bool is_migrate_isolate(int migratetype)
>  {
>  	return migratetype == MIGRATE_ISOLATE;
> @@ -15,6 +35,10 @@ static inline bool is_migrate_isolate_page(struct page *page)
>  {
>  	return false;
>  }
> +static inline bool is_migrate_isolate_page_nolock(struct page *page)
> +{
> +	return false;
> +}
>  static inline bool is_migrate_isolate(int migratetype)
>  {
>  	return false;

Nitpick.
You need race_fallback parameter for is_migrate_isolate_page_nolock().


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-03-05  0:39 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-28 14:14 [PATCH 0/6] close pageblock_migratetype and pageblock_skip races Vlastimil Babka
2014-02-28 14:14 ` [PATCH 1/6] mm: call get_pageblock_migratetype() under zone->lock where possible Vlastimil Babka
2014-02-28 14:15 ` [PATCH 2/6] mm: add get_pageblock_migratetype_nolock() for cases where locking is undesirable Vlastimil Babka
2014-03-03  8:22   ` Joonsoo Kim
2014-03-03 13:54     ` Vlastimil Babka
2014-03-04  0:55       ` Joonsoo Kim
2014-03-04 12:16         ` Vlastimil Babka
2014-03-05  0:29           ` Joonsoo Kim
2014-03-05  0:37   ` Joonsoo Kim
2014-02-28 14:15 ` [PATCH 3/6] mm: add is_migrate_isolate_page_nolock() " Vlastimil Babka
2014-03-05  0:39   ` Joonsoo Kim
2014-02-28 14:15 ` [PATCH 4/6] mm: add set_pageblock_migratetype_nolock() for calls outside zone->lock Vlastimil Babka
2014-02-28 14:15 ` [PATCH 5/6] mm: compaction: do not set pageblock skip bit when already set Vlastimil Babka
2014-02-28 14:15 ` [PATCH 6/6] mm: use atomic bit operations in set_pageblock_flags_group() Vlastimil Babka
2014-03-03  8:28   ` Joonsoo Kim
2014-03-03 12:46     ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).