linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4 v2] mm/swap: Add locking for pagevec
@ 2019-04-24 11:12 Sebastian Andrzej Siewior
  2019-04-24 11:12 ` [PATCH 1/4] mm/page_alloc: Split drain_local_pages() Sebastian Andrzej Siewior
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-04-24 11:12 UTC (permalink / raw)
  To: linux-mm; +Cc: tglx, frederic, Christoph Lameter, anna-maria


The swap code synchronizes its access to the (four) pagevec struct
(which is allocated per-CPU) by disabling preemption. This works and the
one struct needs to be accessed from interrupt context is protected by
disabling interrupts. This was manually audited and there is no lockdep
coverage for this.
There is one case where the per-CPU of a remote CPU needs to be accessed
and this is solved by started a worker on the remote CPU and waiting for
it to finish.

In v1 [0] it was attempted to add per-CPU spinlocks for the access to
struct. This would add lockdep coverage and access from a remote CPU so
the worker wouldn't be required.
It was argued about the cost of the uncontended spin_lock() and that the
benefit of avoiding the per-CPU worker to be rare because it is hardly
used.
A static key has been suggested which enables the per-CPU locking under
certain circumstances like in the NOHZ_FULL case and is implemented as
part of this series.

Sebastian


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/4] mm/page_alloc: Split drain_local_pages()
  2019-04-24 11:12 [PATCH 0/4 v2] mm/swap: Add locking for pagevec Sebastian Andrzej Siewior
@ 2019-04-24 11:12 ` Sebastian Andrzej Siewior
  2019-04-24 11:12 ` [PATCH 2/4] mm/swap: Add static key dependent pagevec locking Sebastian Andrzej Siewior
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-04-24 11:12 UTC (permalink / raw)
  To: linux-mm
  Cc: tglx, frederic, Christoph Lameter, anna-maria, Sebastian Andrzej Siewior

From: Anna-Maria Gleixner <anna-maria@linutronix.de>

Splitting the functionality of drain_local_pages() into a separate
function. This is a preparatory work for introducing the static key
dependend locking mechanism.

No functional change.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/gfp.h |  1 +
 mm/page_alloc.c     | 13 +++++++++----
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index fdab7de7490df..fcad3a07c9b04 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -557,6 +557,7 @@ extern void page_frag_free(void *addr);
 void page_alloc_init(void);
 void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
 void drain_all_pages(struct zone *zone);
+void drain_cpu_pages(unsigned int cpu, struct zone *zone);
 void drain_local_pages(struct zone *zone);
 
 void page_alloc_init_late(void);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index c6ce20aaf80bb..cf120e0700035 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2713,6 +2713,14 @@ static void drain_pages(unsigned int cpu)
 	}
 }
 
+void drain_cpu_pages(unsigned int cpu, struct zone *zone)
+{
+	if (zone)
+		drain_pages_zone(cpu, zone);
+	else
+		drain_pages(cpu);
+}
+
 /*
  * Spill all of this CPU's per-cpu pages back into the buddy allocator.
  *
@@ -2723,10 +2731,7 @@ void drain_local_pages(struct zone *zone)
 {
 	int cpu = smp_processor_id();
 
-	if (zone)
-		drain_pages_zone(cpu, zone);
-	else
-		drain_pages(cpu);
+	drain_cpu_pages(cpu, zone);
 }
 
 static void drain_local_pages_wq(struct work_struct *work)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/4] mm/swap: Add static key dependent pagevec locking
  2019-04-24 11:12 [PATCH 0/4 v2] mm/swap: Add locking for pagevec Sebastian Andrzej Siewior
  2019-04-24 11:12 ` [PATCH 1/4] mm/page_alloc: Split drain_local_pages() Sebastian Andrzej Siewior
@ 2019-04-24 11:12 ` Sebastian Andrzej Siewior
  2019-04-24 11:12 ` [PATCH 3/4] mm/swap: Access struct pagevec remotely Sebastian Andrzej Siewior
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-04-24 11:12 UTC (permalink / raw)
  To: linux-mm
  Cc: tglx, frederic, Christoph Lameter, anna-maria, Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

The locking of struct pagevec is done by disabling preemption. In case the
struct has be accessed form interrupt context then interrupts are
disabled. This means the struct can only be accessed locally from the
CPU. There is also no lockdep coverage which would scream during if it
accessed from wrong context.

Create struct swap_pagevec which contains of a pagevec member and a
spin_lock_t. Introduce a static key, which changes the locking behavior
only if the key is set in the following way: Before the struct is accessed
the spin_lock has to be acquired instead of using preempt_disable(). Since
the struct is used CPU-locally there is no spinning on the lock but the
lock is acquired immediately. If the struct is accessed from interrupt
context, spin_lock_irqsave() is used.

No functional change yet because static key is not enabled.

[anna-maria: introduce static key]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 mm/compaction.c |  14 ++--
 mm/internal.h   |   2 +
 mm/swap.c       | 186 +++++++++++++++++++++++++++++++++++++++---------
 3 files changed, 165 insertions(+), 37 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 3319e0872d014..ec47c96186771 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2224,10 +2224,16 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 				block_start_pfn(cc->migrate_pfn, cc->order);
 
 			if (last_migrated_pfn < current_block_start) {
-				cpu = get_cpu();
-				lru_add_drain_cpu(cpu);
-				drain_local_pages(cc->zone);
-				put_cpu();
+				if (static_branch_unlikely(&use_pvec_lock)) {
+					cpu = get_cpu();
+					lru_add_drain_cpu(cpu);
+					drain_local_pages(cc->zone);
+					put_cpu();
+				} else {
+					cpu = raw_smp_processor_id();
+					lru_add_drain_cpu(cpu);
+					drain_cpu_pages(cpu, cc->zone);
+				}
 				/* No more flushing until we migrate again */
 				last_migrated_pfn = 0;
 			}
diff --git a/mm/internal.h b/mm/internal.h
index 9eeaf2b95166f..ddfa760e61652 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -36,6 +36,8 @@
 /* Do not use these with a slab allocator */
 #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK)
 
+extern struct static_key_false use_pvec_lock;
+
 void page_writeback_init(void);
 
 vm_fault_t do_swap_page(struct vm_fault *vmf);
diff --git a/mm/swap.c b/mm/swap.c
index 301ed4e043205..136c80480dbde 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -43,14 +43,107 @@
 /* How many pages do we try to swap or page in/out together? */
 int page_cluster;
 
-static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
-static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
-static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs);
-static DEFINE_PER_CPU(struct pagevec, lru_lazyfree_pvecs);
+DEFINE_STATIC_KEY_FALSE(use_pvec_lock);
+
+struct swap_pagevec {
+	spinlock_t	lock;
+	struct pagevec	pvec;
+};
+
+#define DEFINE_PER_CPU_PAGEVEC(lvar)				\
+	DEFINE_PER_CPU(struct swap_pagevec, lvar) = {		\
+		.lock = __SPIN_LOCK_UNLOCKED((lvar).lock) }
+
+static DEFINE_PER_CPU_PAGEVEC(lru_add_pvec);
+static DEFINE_PER_CPU_PAGEVEC(lru_rotate_pvecs);
+static DEFINE_PER_CPU_PAGEVEC(lru_deactivate_file_pvecs);
+static DEFINE_PER_CPU_PAGEVEC(lru_lazyfree_pvecs);
 #ifdef CONFIG_SMP
-static DEFINE_PER_CPU(struct pagevec, activate_page_pvecs);
+static DEFINE_PER_CPU_PAGEVEC(activate_page_pvecs);
 #endif
 
+static inline
+struct swap_pagevec *lock_swap_pvec(struct swap_pagevec __percpu *p)
+{
+	struct swap_pagevec *swpvec;
+
+	if (static_branch_likely(&use_pvec_lock)) {
+		swpvec = raw_cpu_ptr(p);
+
+		spin_lock(&swpvec->lock);
+	} else {
+		swpvec = &get_cpu_var(*p);
+	}
+	return swpvec;
+}
+
+static inline struct swap_pagevec *
+lock_swap_pvec_cpu(struct swap_pagevec __percpu *p, int cpu)
+{
+	struct swap_pagevec *swpvec = per_cpu_ptr(p, cpu);
+
+	if (static_branch_likely(&use_pvec_lock))
+		spin_lock(&swpvec->lock);
+
+	return swpvec;
+}
+
+static inline struct swap_pagevec *
+lock_swap_pvec_irqsave(struct swap_pagevec __percpu *p, unsigned long *flags)
+{
+	struct swap_pagevec *swpvec;
+
+	if (static_branch_likely(&use_pvec_lock)) {
+		swpvec = raw_cpu_ptr(p);
+
+		spin_lock_irqsave(&swpvec->lock, (*flags));
+	} else {
+		local_irq_save(*flags);
+
+		swpvec = this_cpu_ptr(p);
+	}
+	return swpvec;
+}
+
+static inline struct swap_pagevec *
+lock_swap_pvec_cpu_irqsave(struct swap_pagevec __percpu *p, int cpu,
+			   unsigned long *flags)
+{
+	struct swap_pagevec *swpvec = per_cpu_ptr(p, cpu);
+
+	if (static_branch_likely(&use_pvec_lock))
+		spin_lock_irqsave(&swpvec->lock, *flags);
+	else
+		local_irq_save(*flags);
+
+	return swpvec;
+}
+
+static inline void unlock_swap_pvec(struct swap_pagevec *swpvec,
+				    struct swap_pagevec __percpu *p)
+{
+	if (static_branch_likely(&use_pvec_lock))
+		spin_unlock(&swpvec->lock);
+	else
+		put_cpu_var(*p);
+
+}
+
+static inline void unlock_swap_pvec_cpu(struct swap_pagevec *swpvec)
+{
+	if (static_branch_likely(&use_pvec_lock))
+		spin_unlock(&swpvec->lock);
+}
+
+static inline void
+unlock_swap_pvec_irqrestore(struct swap_pagevec *swpvec, unsigned long flags)
+{
+	if (static_branch_likely(&use_pvec_lock))
+		spin_unlock_irqrestore(&swpvec->lock, flags);
+	else
+		local_irq_restore(flags);
+}
+
 /*
  * This path almost never happens for VM activity - pages are normally
  * freed via pagevecs.  But it gets used by networking.
@@ -248,15 +341,17 @@ void rotate_reclaimable_page(struct page *page)
 {
 	if (!PageLocked(page) && !PageDirty(page) &&
 	    !PageUnevictable(page) && PageLRU(page)) {
+		struct swap_pagevec *swpvec;
 		struct pagevec *pvec;
 		unsigned long flags;
 
 		get_page(page);
-		local_irq_save(flags);
-		pvec = this_cpu_ptr(&lru_rotate_pvecs);
+
+		swpvec = lock_swap_pvec_irqsave(&lru_rotate_pvecs, &flags);
+		pvec = &swpvec->pvec;
 		if (!pagevec_add(pvec, page) || PageCompound(page))
 			pagevec_move_tail(pvec);
-		local_irq_restore(flags);
+		unlock_swap_pvec_irqrestore(swpvec, flags);
 	}
 }
 
@@ -291,27 +386,32 @@ static void __activate_page(struct page *page, struct lruvec *lruvec,
 #ifdef CONFIG_SMP
 static void activate_page_drain(int cpu)
 {
-	struct pagevec *pvec = &per_cpu(activate_page_pvecs, cpu);
+	struct swap_pagevec *swpvec = lock_swap_pvec_cpu(&activate_page_pvecs, cpu);
+	struct pagevec *pvec = &swpvec->pvec;
 
 	if (pagevec_count(pvec))
 		pagevec_lru_move_fn(pvec, __activate_page, NULL);
+	unlock_swap_pvec_cpu(swpvec);
 }
 
 static bool need_activate_page_drain(int cpu)
 {
-	return pagevec_count(&per_cpu(activate_page_pvecs, cpu)) != 0;
+	return pagevec_count(per_cpu_ptr(&activate_page_pvecs.pvec, cpu)) != 0;
 }
 
 void activate_page(struct page *page)
 {
 	page = compound_head(page);
 	if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
-		struct pagevec *pvec = &get_cpu_var(activate_page_pvecs);
+		struct swap_pagevec *swpvec;
+		struct pagevec *pvec;
 
 		get_page(page);
+		swpvec = lock_swap_pvec(&activate_page_pvecs);
+		pvec = &swpvec->pvec;
 		if (!pagevec_add(pvec, page) || PageCompound(page))
 			pagevec_lru_move_fn(pvec, __activate_page, NULL);
-		put_cpu_var(activate_page_pvecs);
+		unlock_swap_pvec(swpvec, &activate_page_pvecs);
 	}
 }
 
@@ -333,7 +433,8 @@ void activate_page(struct page *page)
 
 static void __lru_cache_activate_page(struct page *page)
 {
-	struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
+	struct swap_pagevec *swpvec = lock_swap_pvec(&lru_add_pvec);
+	struct pagevec *pvec = &swpvec->pvec;
 	int i;
 
 	/*
@@ -355,7 +456,7 @@ static void __lru_cache_activate_page(struct page *page)
 		}
 	}
 
-	put_cpu_var(lru_add_pvec);
+	unlock_swap_pvec(swpvec, &lru_add_pvec);
 }
 
 /*
@@ -397,12 +498,13 @@ EXPORT_SYMBOL(mark_page_accessed);
 
 static void __lru_cache_add(struct page *page)
 {
-	struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
+	struct swap_pagevec *swpvec = lock_swap_pvec(&lru_add_pvec);
+	struct pagevec *pvec = &swpvec->pvec;
 
 	get_page(page);
 	if (!pagevec_add(pvec, page) || PageCompound(page))
 		__pagevec_lru_add(pvec);
-	put_cpu_var(lru_add_pvec);
+	unlock_swap_pvec(swpvec, &lru_add_pvec);
 }
 
 /**
@@ -570,28 +672,34 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec,
  */
 void lru_add_drain_cpu(int cpu)
 {
-	struct pagevec *pvec = &per_cpu(lru_add_pvec, cpu);
+	struct swap_pagevec *swpvec = lock_swap_pvec_cpu(&lru_add_pvec, cpu);
+	struct pagevec *pvec = &swpvec->pvec;
+	unsigned long flags;
 
 	if (pagevec_count(pvec))
 		__pagevec_lru_add(pvec);
+	unlock_swap_pvec_cpu(swpvec);
 
-	pvec = &per_cpu(lru_rotate_pvecs, cpu);
+	swpvec = lock_swap_pvec_cpu_irqsave(&lru_rotate_pvecs, cpu, &flags);
+	pvec = &swpvec->pvec;
 	if (pagevec_count(pvec)) {
-		unsigned long flags;
 
 		/* No harm done if a racing interrupt already did this */
-		local_irq_save(flags);
 		pagevec_move_tail(pvec);
-		local_irq_restore(flags);
 	}
+	unlock_swap_pvec_irqrestore(swpvec, flags);
 
-	pvec = &per_cpu(lru_deactivate_file_pvecs, cpu);
+	swpvec = lock_swap_pvec_cpu(&lru_deactivate_file_pvecs, cpu);
+	pvec = &swpvec->pvec;
 	if (pagevec_count(pvec))
 		pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL);
+	unlock_swap_pvec_cpu(swpvec);
 
-	pvec = &per_cpu(lru_lazyfree_pvecs, cpu);
+	swpvec = lock_swap_pvec_cpu(&lru_lazyfree_pvecs, cpu);
+	pvec = &swpvec->pvec;
 	if (pagevec_count(pvec))
 		pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL);
+	unlock_swap_pvec_cpu(swpvec);
 
 	activate_page_drain(cpu);
 }
@@ -606,6 +714,9 @@ void lru_add_drain_cpu(int cpu)
  */
 void deactivate_file_page(struct page *page)
 {
+	struct swap_pagevec *swpvec;
+	struct pagevec *pvec;
+
 	/*
 	 * In a workload with many unevictable page such as mprotect,
 	 * unevictable page deactivation for accelerating reclaim is pointless.
@@ -614,11 +725,12 @@ void deactivate_file_page(struct page *page)
 		return;
 
 	if (likely(get_page_unless_zero(page))) {
-		struct pagevec *pvec = &get_cpu_var(lru_deactivate_file_pvecs);
+		swpvec = lock_swap_pvec(&lru_deactivate_file_pvecs);
+		pvec = &swpvec->pvec;
 
 		if (!pagevec_add(pvec, page) || PageCompound(page))
 			pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL);
-		put_cpu_var(lru_deactivate_file_pvecs);
+		unlock_swap_pvec(swpvec, &lru_deactivate_file_pvecs);
 	}
 }
 
@@ -631,21 +743,29 @@ void deactivate_file_page(struct page *page)
  */
 void mark_page_lazyfree(struct page *page)
 {
+	struct swap_pagevec *swpvec;
+	struct pagevec *pvec;
+
 	if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
 	    !PageSwapCache(page) && !PageUnevictable(page)) {
-		struct pagevec *pvec = &get_cpu_var(lru_lazyfree_pvecs);
+		swpvec = lock_swap_pvec(&lru_lazyfree_pvecs);
+		pvec = &swpvec->pvec;
 
 		get_page(page);
 		if (!pagevec_add(pvec, page) || PageCompound(page))
 			pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL);
-		put_cpu_var(lru_lazyfree_pvecs);
+		unlock_swap_pvec(swpvec, &lru_lazyfree_pvecs);
 	}
 }
 
 void lru_add_drain(void)
 {
-	lru_add_drain_cpu(get_cpu());
-	put_cpu();
+	if (static_branch_likely(&use_pvec_lock)) {
+		lru_add_drain_cpu(raw_smp_processor_id());
+	} else {
+		lru_add_drain_cpu(get_cpu());
+		put_cpu();
+	}
 }
 
 #ifdef CONFIG_SMP
@@ -683,10 +803,10 @@ void lru_add_drain_all(void)
 	for_each_online_cpu(cpu) {
 		struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
 
-		if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) ||
-		    pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) ||
-		    pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) ||
-		    pagevec_count(&per_cpu(lru_lazyfree_pvecs, cpu)) ||
+		if (pagevec_count(&per_cpu(lru_add_pvec.pvec, cpu)) ||
+		    pagevec_count(&per_cpu(lru_rotate_pvecs.pvec, cpu)) ||
+		    pagevec_count(&per_cpu(lru_deactivate_file_pvecs.pvec, cpu)) ||
+		    pagevec_count(&per_cpu(lru_lazyfree_pvecs.pvec, cpu)) ||
 		    need_activate_page_drain(cpu)) {
 			INIT_WORK(work, lru_add_drain_per_cpu);
 			queue_work_on(cpu, mm_percpu_wq, work);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/4] mm/swap: Access struct pagevec remotely
  2019-04-24 11:12 [PATCH 0/4 v2] mm/swap: Add locking for pagevec Sebastian Andrzej Siewior
  2019-04-24 11:12 ` [PATCH 1/4] mm/page_alloc: Split drain_local_pages() Sebastian Andrzej Siewior
  2019-04-24 11:12 ` [PATCH 2/4] mm/swap: Add static key dependent pagevec locking Sebastian Andrzej Siewior
@ 2019-04-24 11:12 ` Sebastian Andrzej Siewior
  2019-04-24 11:12 ` [PATCH 4/4] mm/swap: Enable "use_pvec_lock" nohz_full dependent Sebastian Andrzej Siewior
  2019-04-24 12:15 ` [PATCH 0/4 v2] mm/swap: Add locking for pagevec Matthew Wilcox
  4 siblings, 0 replies; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-04-24 11:12 UTC (permalink / raw)
  To: linux-mm
  Cc: tglx, frederic, Christoph Lameter, anna-maria, Sebastian Andrzej Siewior

From: Thomas Gleixner <tglx@linutronix.de>

When the newly introduced static key would be enabled, struct pagevec is
locked during access. So it is possible to access it from a remote CPU. The
advantage is that the work can be done from the "requesting" CPU without
firing a worker on a remote CPU and waiting for it to complete the work.

No functional change because static key is not enabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 mm/swap.c | 75 +++++++++++++++++++++++++++++++++----------------------
 1 file changed, 45 insertions(+), 30 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 136c80480dbde..ea623255cd305 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -774,7 +774,8 @@ static DEFINE_PER_CPU(struct work_struct, lru_add_drain_work);
 
 static void lru_add_drain_per_cpu(struct work_struct *dummy)
 {
-	lru_add_drain();
+	if (static_branch_unlikely(&use_pvec_lock))
+		lru_add_drain();
 }
 
 /*
@@ -786,38 +787,52 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
  */
 void lru_add_drain_all(void)
 {
-	static DEFINE_MUTEX(lock);
-	static struct cpumask has_work;
-	int cpu;
+	if (static_branch_likely(&use_pvec_lock)) {
+		int cpu;
 
-	/*
-	 * Make sure nobody triggers this path before mm_percpu_wq is fully
-	 * initialized.
-	 */
-	if (WARN_ON(!mm_percpu_wq))
-		return;
-
-	mutex_lock(&lock);
-	cpumask_clear(&has_work);
-
-	for_each_online_cpu(cpu) {
-		struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
-
-		if (pagevec_count(&per_cpu(lru_add_pvec.pvec, cpu)) ||
-		    pagevec_count(&per_cpu(lru_rotate_pvecs.pvec, cpu)) ||
-		    pagevec_count(&per_cpu(lru_deactivate_file_pvecs.pvec, cpu)) ||
-		    pagevec_count(&per_cpu(lru_lazyfree_pvecs.pvec, cpu)) ||
-		    need_activate_page_drain(cpu)) {
-			INIT_WORK(work, lru_add_drain_per_cpu);
-			queue_work_on(cpu, mm_percpu_wq, work);
-			cpumask_set_cpu(cpu, &has_work);
+		for_each_online_cpu(cpu) {
+			if (pagevec_count(&per_cpu(lru_add_pvec.pvec, cpu)) ||
+			    pagevec_count(&per_cpu(lru_rotate_pvecs.pvec, cpu)) ||
+			    pagevec_count(&per_cpu(lru_deactivate_file_pvecs.pvec, cpu)) ||
+			    pagevec_count(&per_cpu(lru_lazyfree_pvecs.pvec, cpu)) ||
+			    need_activate_page_drain(cpu)) {
+				lru_add_drain_cpu(cpu);
+			}
 		}
+	} else {
+		static DEFINE_MUTEX(lock);
+		static struct cpumask has_work;
+		int cpu;
+
+		/*
+		 * Make sure nobody triggers this path before mm_percpu_wq
+		 * is fully initialized.
+		 */
+		if (WARN_ON(!mm_percpu_wq))
+			return;
+
+		mutex_lock(&lock);
+		cpumask_clear(&has_work);
+
+		for_each_online_cpu(cpu) {
+			struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
+
+			if (pagevec_count(&per_cpu(lru_add_pvec.pvec, cpu)) ||
+			    pagevec_count(&per_cpu(lru_rotate_pvecs.pvec, cpu)) ||
+			    pagevec_count(&per_cpu(lru_deactivate_file_pvecs.pvec, cpu)) ||
+			    pagevec_count(&per_cpu(lru_lazyfree_pvecs.pvec, cpu)) ||
+			    need_activate_page_drain(cpu)) {
+				INIT_WORK(work, lru_add_drain_per_cpu);
+				queue_work_on(cpu, mm_percpu_wq, work);
+				cpumask_set_cpu(cpu, &has_work);
+			}
+		}
+
+		for_each_cpu(cpu, &has_work)
+			flush_work(&per_cpu(lru_add_drain_work, cpu));
+
+		mutex_unlock(&lock);
 	}
-
-	for_each_cpu(cpu, &has_work)
-		flush_work(&per_cpu(lru_add_drain_work, cpu));
-
-	mutex_unlock(&lock);
 }
 #else
 void lru_add_drain_all(void)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/4] mm/swap: Enable "use_pvec_lock" nohz_full dependent
  2019-04-24 11:12 [PATCH 0/4 v2] mm/swap: Add locking for pagevec Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2019-04-24 11:12 ` [PATCH 3/4] mm/swap: Access struct pagevec remotely Sebastian Andrzej Siewior
@ 2019-04-24 11:12 ` Sebastian Andrzej Siewior
  2019-04-24 12:15 ` [PATCH 0/4 v2] mm/swap: Add locking for pagevec Matthew Wilcox
  4 siblings, 0 replies; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-04-24 11:12 UTC (permalink / raw)
  To: linux-mm
  Cc: tglx, frederic, Christoph Lameter, anna-maria, Sebastian Andrzej Siewior

From: Anna-Maria Gleixner <anna-maria@linutronix.de>

When a system runs with CONFIG_NO_HZ_FULL enabled, the tick of CPUs listed
in 'nohz_full=' kernel command line parameter should be stopped whenever
possible. The tick stays longer stopped, when work for this CPU is handled
by another CPU.

With the already introduced static key 'use_pvec_lock' there is the
possibility to prevent firing a worker for mm/swap work on a remote CPU
with a stopped tick.

Therefore enabling the static key in case kernel command line parameter
'nohz_full=' setup was successful, which implies that CONFIG_NO_HZ_FULL is
set.

Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/sched/isolation.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index b02d148e76727..b532f448cab42 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -7,6 +7,7 @@
  *
  */
 #include "sched.h"
+#include "../../mm/internal.h"
 
 DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
 EXPORT_SYMBOL_GPL(housekeeping_overridden);
@@ -116,10 +117,21 @@ static int __init housekeeping_setup(char *str, enum hk_flags flags)
 static int __init housekeeping_nohz_full_setup(char *str)
 {
 	unsigned int flags;
+	int ret;
 
 	flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU | HK_FLAG_MISC;
 
-	return housekeeping_setup(str, flags);
+	ret = housekeeping_setup(str, flags);
+
+	/*
+	 * Protect struct pagevec with a lock instead using preemption disable;
+	 * with lock protection, remote handling of events instead of queue
+	 * work on remote cpu is default behavior.
+	 */
+	if (ret)
+		static_branch_enable(&use_pvec_lock);
+
+	return ret;
 }
 __setup("nohz_full=", housekeeping_nohz_full_setup);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/4 v2] mm/swap: Add locking for pagevec
  2019-04-24 11:12 [PATCH 0/4 v2] mm/swap: Add locking for pagevec Sebastian Andrzej Siewior
                   ` (3 preceding siblings ...)
  2019-04-24 11:12 ` [PATCH 4/4] mm/swap: Enable "use_pvec_lock" nohz_full dependent Sebastian Andrzej Siewior
@ 2019-04-24 12:15 ` Matthew Wilcox
  2019-04-26  8:00   ` Sebastian Andrzej Siewior
  2020-06-16 16:55   ` Marcelo Tosatti
  4 siblings, 2 replies; 8+ messages in thread
From: Matthew Wilcox @ 2019-04-24 12:15 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-mm, tglx, frederic, Christoph Lameter, anna-maria

On Wed, Apr 24, 2019 at 01:12:04PM +0200, Sebastian Andrzej Siewior wrote:
> The swap code synchronizes its access to the (four) pagevec struct
> (which is allocated per-CPU) by disabling preemption. This works and the
> one struct needs to be accessed from interrupt context is protected by
> disabling interrupts. This was manually audited and there is no lockdep
> coverage for this.
> There is one case where the per-CPU of a remote CPU needs to be accessed
> and this is solved by started a worker on the remote CPU and waiting for
> it to finish.
> 
> In v1 [0] it was attempted to add per-CPU spinlocks for the access to
> struct. This would add lockdep coverage and access from a remote CPU so
> the worker wouldn't be required.

From my point of view, what is missing from this description is why we
want to be able to access these structs from a remote CPU.  It's explained
a little better in the 4/4 changelog, but I don't see any numbers that
suggest what kinds of gains we might see (eg "reduces power consumption
by x% on a particular setup", or even "average length of time in idle
extended from x ms to y ms").


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/4 v2] mm/swap: Add locking for pagevec
  2019-04-24 12:15 ` [PATCH 0/4 v2] mm/swap: Add locking for pagevec Matthew Wilcox
@ 2019-04-26  8:00   ` Sebastian Andrzej Siewior
  2020-06-16 16:55   ` Marcelo Tosatti
  1 sibling, 0 replies; 8+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-04-26  8:00 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, tglx, frederic, Christoph Lameter, anna-maria

On 2019-04-24 05:15:52 [-0700], Matthew Wilcox wrote:
> On Wed, Apr 24, 2019 at 01:12:04PM +0200, Sebastian Andrzej Siewior wrote:
> > The swap code synchronizes its access to the (four) pagevec struct
> > (which is allocated per-CPU) by disabling preemption. This works and the
> > one struct needs to be accessed from interrupt context is protected by
> > disabling interrupts. This was manually audited and there is no lockdep
> > coverage for this.
> > There is one case where the per-CPU of a remote CPU needs to be accessed
> > and this is solved by started a worker on the remote CPU and waiting for
> > it to finish.
> > 
> > In v1 [0] it was attempted to add per-CPU spinlocks for the access to
> > struct. This would add lockdep coverage and access from a remote CPU so
> > the worker wouldn't be required.
> 
> >From my point of view, what is missing from this description is why we
> want to be able to access these structs from a remote CPU.  It's explained
> a little better in the 4/4 changelog, but I don't see any numbers that
> suggest what kinds of gains we might see (eg "reduces power consumption
> by x% on a particular setup", or even "average length of time in idle
> extended from x ms to y ms").

Pulling out a CPU from idle or userland computation looks bad. In the
first series I had numbers how long it takes to compute the loop for all
per-CPU data from one CPU vs the workqueue. Somehow the uncontended lock
was bad as per krobot report while I never got stable numbers from that
test.
The other motivation is RT where we need proper locking and can't use
that preempt-disable based locking.

Sebastian


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/4 v2] mm/swap: Add locking for pagevec
  2019-04-24 12:15 ` [PATCH 0/4 v2] mm/swap: Add locking for pagevec Matthew Wilcox
  2019-04-26  8:00   ` Sebastian Andrzej Siewior
@ 2020-06-16 16:55   ` Marcelo Tosatti
  1 sibling, 0 replies; 8+ messages in thread
From: Marcelo Tosatti @ 2020-06-16 16:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Sebastian Andrzej Siewior, linux-mm, tglx, frederic,
	Christoph Lameter, anna-maria

On Wed, Apr 24, 2019 at 05:15:52AM -0700, Matthew Wilcox wrote:
> On Wed, Apr 24, 2019 at 01:12:04PM +0200, Sebastian Andrzej Siewior wrote:
> > The swap code synchronizes its access to the (four) pagevec struct
> > (which is allocated per-CPU) by disabling preemption. This works and the
> > one struct needs to be accessed from interrupt context is protected by
> > disabling interrupts. This was manually audited and there is no lockdep
> > coverage for this.
> > There is one case where the per-CPU of a remote CPU needs to be accessed
> > and this is solved by started a worker on the remote CPU and waiting for
> > it to finish.
> > 
> > In v1 [0] it was attempted to add per-CPU spinlocks for the access to
> > struct. This would add lockdep coverage and access from a remote CPU so
> > the worker wouldn't be required.
> 
> >From my point of view, what is missing from this description is why we
> want to be able to access these structs from a remote CPU.  It's explained
> a little better in the 4/4 changelog, but I don't see any numbers that
> suggest what kinds of gains we might see (eg "reduces power consumption
> by x% on a particular setup", or even "average length of time in idle
> extended from x ms to y ms").

Willy, 

This is to avoid:

1) Interrupting CPUs which can't afford executing a workqueue item.

2) To avoid a system lockup when running a busy-spinning (on network 
RX descriptor) SCHED_FIFO application:

[ 7475.821066] INFO: task ld:274531 blocked for more than 600 seconds.
[ 7475.822157]       Not tainted 4.18.0-208.rt5.20.el8.x86_64 #1
[ 7475.823094] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message.
[ 7475.824392] ld              D    0 274531 274530 0x00084080
[ 7475.825307] Call Trace:
[ 7475.825761]  __schedule+0x342/0x850
[ 7475.826377]  schedule+0x39/0xd0
[ 7475.826923]  schedule_timeout+0x20e/0x410
[ 7475.827610]  ? __schedule+0x34a/0x850
[ 7475.828247]  ? ___preempt_schedule+0x16/0x18
[ 7475.828953]  wait_for_completion+0x85/0xe0
[ 7475.829653]  flush_work+0x11a/0x1c0
[ 7475.830313]  ? flush_workqueue_prep_pwqs+0x130/0x130
[ 7475.831148]  drain_all_pages+0x140/0x190
[ 7475.831803]  __alloc_pages_slowpath+0x3f8/0xe20
[ 7475.832571]  ? mem_cgroup_commit_charge+0xcb/0x510
[ 7475.833371]  __alloc_pages_nodemask+0x1ca/0x2b0
[ 7475.834134]  pagecache_get_page+0xb5/0x2d0
[ 7475.834814]  ? account_page_dirtied+0x11a/0x220
[ 7475.835579]  grab_cache_page_write_begin+0x1f/0x40
[ 7475.836379]  iomap_write_begin.constprop.44+0x1c1/0x370
[ 7475.837241]  ? iomap_write_end+0x91/0x290
[ 7475.837911]  iomap_write_actor+0x92/0x170




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-06-16 16:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-24 11:12 [PATCH 0/4 v2] mm/swap: Add locking for pagevec Sebastian Andrzej Siewior
2019-04-24 11:12 ` [PATCH 1/4] mm/page_alloc: Split drain_local_pages() Sebastian Andrzej Siewior
2019-04-24 11:12 ` [PATCH 2/4] mm/swap: Add static key dependent pagevec locking Sebastian Andrzej Siewior
2019-04-24 11:12 ` [PATCH 3/4] mm/swap: Access struct pagevec remotely Sebastian Andrzej Siewior
2019-04-24 11:12 ` [PATCH 4/4] mm/swap: Enable "use_pvec_lock" nohz_full dependent Sebastian Andrzej Siewior
2019-04-24 12:15 ` [PATCH 0/4 v2] mm/swap: Add locking for pagevec Matthew Wilcox
2019-04-26  8:00   ` Sebastian Andrzej Siewior
2020-06-16 16:55   ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).