All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/2] do not depend on cpuhotplug logs in lru_add_drain_all
@ 2017-11-02  9:36 ` Michal Hocko
  0 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-11-02  9:36 UTC (permalink / raw)
  To: linux-mm
  Cc: Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Mel Gorman,
	Tejun Heo, LKML

Hi,
this is an RFC to drop get_online_cpus from lru_add_drain_all ad this
has caused a very subtle lockdep splats recently [1]. I didn't get even
to properly test this yet and I am sending it early to check whether the
thinking behind is sound. I am basically following the same pattern we
have used for removing get_online_cpus from drain_all_pages which should
be the similar case.

Does anybody see any obvious problem?

[1] http://lkml.kernel.org/r/089e0825eec8955c1f055c83d476@google.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC 0/2] do not depend on cpuhotplug logs in lru_add_drain_all
@ 2017-11-02  9:36 ` Michal Hocko
  0 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-11-02  9:36 UTC (permalink / raw)
  To: linux-mm
  Cc: Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Mel Gorman,
	Tejun Heo, LKML

Hi,
this is an RFC to drop get_online_cpus from lru_add_drain_all ad this
has caused a very subtle lockdep splats recently [1]. I didn't get even
to properly test this yet and I am sending it early to check whether the
thinking behind is sound. I am basically following the same pattern we
have used for removing get_online_cpus from drain_all_pages which should
be the similar case.

Does anybody see any obvious problem?

[1] http://lkml.kernel.org/r/089e0825eec8955c1f055c83d476@google.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/2] shmem: drop lru_add_drain_all from shmem_wait_for_pins
  2017-11-02  9:36 ` Michal Hocko
@ 2017-11-02  9:36   ` Michal Hocko
  -1 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-11-02  9:36 UTC (permalink / raw)
  To: linux-mm
  Cc: Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Mel Gorman,
	Tejun Heo, LKML, Michal Hocko, David Herrmann, Hugh Dickins

From: Michal Hocko <mhocko@suse.com>

syzkaller has reported the following lockdep splat
======================================================
WARNING: possible circular locking dependency detected
4.13.0-next-20170911+ #19 Not tainted
------------------------------------------------------
syz-executor5/6914 is trying to acquire lock:
  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff818c1b3e>] get_online_cpus  include/linux/cpu.h:126 [inline]
  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff818c1b3e>] lru_add_drain_all+0xe/0x20 mm/swap.c:729

but task is already holding lock:
  (&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff818fbef7>] inode_lock include/linux/fs.h:712 [inline]
  (&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff818fbef7>] shmem_add_seals+0x197/0x1060 mm/shmem.c:2768

more details [1] and dependencies explained [2]. The problem seems to be
the usage of lru_add_drain_all from shmem_wait_for_pins. While the lock
dependency is subtle as hell and we might want to make lru_add_drain_all
less dependent on the hotplug locks the usage of lru_add_drain_all seems
dubious here. The whole function cares only about radix tree tags, page
count and page mapcount. None of those are touched from the draining
context. So it doesn't make much sense to drain pcp caches. Moreover
this looks like a wrong thing to do because it basically induces
unpredictable latency to the call because draining is not for free
(especially on larger machines with many cpus).

Let's simply drop the call to lru_add_drain_all to address both issues.

[1] http://lkml.kernel.org/r/089e0825eec8955c1f055c83d476@google.com
[2] http://lkml.kernel.org/r/http://lkml.kernel.org/r/20171030151009.ip4k7nwan7muouca@hirez.programming.kicks-ass.net

Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/shmem.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index d6947d21f66c..e784f311d4ed 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2668,9 +2668,7 @@ static int shmem_wait_for_pins(struct address_space *mapping)
 		if (!radix_tree_tagged(&mapping->page_tree, SHMEM_TAG_PINNED))
 			break;
 
-		if (!scan)
-			lru_add_drain_all();
-		else if (schedule_timeout_killable((HZ << scan) / 200))
+		if (scan && schedule_timeout_killable((HZ << scan) / 200))
 			scan = LAST_SCAN;
 
 		start = 0;
-- 
2.14.2

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/2] shmem: drop lru_add_drain_all from shmem_wait_for_pins
@ 2017-11-02  9:36   ` Michal Hocko
  0 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-11-02  9:36 UTC (permalink / raw)
  To: linux-mm
  Cc: Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Mel Gorman,
	Tejun Heo, LKML, Michal Hocko, David Herrmann, Hugh Dickins

From: Michal Hocko <mhocko@suse.com>

syzkaller has reported the following lockdep splat
======================================================
WARNING: possible circular locking dependency detected
4.13.0-next-20170911+ #19 Not tainted
------------------------------------------------------
syz-executor5/6914 is trying to acquire lock:
  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff818c1b3e>] get_online_cpus  include/linux/cpu.h:126 [inline]
  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff818c1b3e>] lru_add_drain_all+0xe/0x20 mm/swap.c:729

but task is already holding lock:
  (&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff818fbef7>] inode_lock include/linux/fs.h:712 [inline]
  (&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff818fbef7>] shmem_add_seals+0x197/0x1060 mm/shmem.c:2768

more details [1] and dependencies explained [2]. The problem seems to be
the usage of lru_add_drain_all from shmem_wait_for_pins. While the lock
dependency is subtle as hell and we might want to make lru_add_drain_all
less dependent on the hotplug locks the usage of lru_add_drain_all seems
dubious here. The whole function cares only about radix tree tags, page
count and page mapcount. None of those are touched from the draining
context. So it doesn't make much sense to drain pcp caches. Moreover
this looks like a wrong thing to do because it basically induces
unpredictable latency to the call because draining is not for free
(especially on larger machines with many cpus).

Let's simply drop the call to lru_add_drain_all to address both issues.

[1] http://lkml.kernel.org/r/089e0825eec8955c1f055c83d476@google.com
[2] http://lkml.kernel.org/r/http://lkml.kernel.org/r/20171030151009.ip4k7nwan7muouca@hirez.programming.kicks-ass.net

Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/shmem.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index d6947d21f66c..e784f311d4ed 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2668,9 +2668,7 @@ static int shmem_wait_for_pins(struct address_space *mapping)
 		if (!radix_tree_tagged(&mapping->page_tree, SHMEM_TAG_PINNED))
 			break;
 
-		if (!scan)
-			lru_add_drain_all();
-		else if (schedule_timeout_killable((HZ << scan) / 200))
+		if (scan && schedule_timeout_killable((HZ << scan) / 200))
 			scan = LAST_SCAN;
 
 		start = 0;
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 2/2] mm: drop hotplug lock from lru_add_drain_all
  2017-11-02  9:36 ` Michal Hocko
@ 2017-11-02  9:36   ` Michal Hocko
  -1 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-11-02  9:36 UTC (permalink / raw)
  To: linux-mm
  Cc: Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Mel Gorman,
	Tejun Heo, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Pulling cpu hotplug locks inside the mm core function like
lru_add_drain_all just asks for problems and the recent lockdep splat
[1] just proves this. While the usage in that particular case might
be wrong we should prevent from locking as lru_add_drain_all is used
at many places. It seems that this is not all that hard to achieve
actually.

We have done the same thing for drain_all_pages which is analogous by
a459eeb7b852 ("mm, page_alloc: do not depend on cpu hotplug locks inside
the allocator"). All we have to care about is to handle
      - the work item might be executed on a different cpu in worker from
        unbound pool so it doesn't run on pinned on the cpu

      - we have to make sure that we do not race with page_alloc_cpu_dead
        calling lru_add_drain_cpu

the first part is already handled because the worker calls lru_add_drain
which disables preemption when calling lru_add_drain_cpu on the local
cpu it is draining. The later is achieved by disabling IRQs around
lru_add_drain_cpu in the hotplug callback.

[1] http://lkml.kernel.org/r/089e0825eec8955c1f055c83d476@google.com

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/swap.h | 1 -
 mm/memory_hotplug.c  | 2 +-
 mm/page_alloc.c      | 4 ++++
 mm/swap.c            | 9 +--------
 4 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 84255b3da7c1..cfc200673e13 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -331,7 +331,6 @@ extern void mark_page_accessed(struct page *);
 extern void lru_add_drain(void);
 extern void lru_add_drain_cpu(int cpu);
 extern void lru_add_drain_all(void);
-extern void lru_add_drain_all_cpuslocked(void);
 extern void rotate_reclaimable_page(struct page *page);
 extern void deactivate_file_page(struct page *page);
 extern void mark_page_lazyfree(struct page *page);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 832a042134f8..c9f6b418be79 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1641,7 +1641,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		goto failed_removal;
 
 	cond_resched();
-	lru_add_drain_all_cpuslocked();
+	lru_add_drain_all();
 	drain_all_pages(zone);
 
 	pfn = scan_movable_pages(start_pfn, end_pfn);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 67330a438525..8c6e9c6d194c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6830,8 +6830,12 @@ void __init free_area_init(unsigned long *zones_size)
 
 static int page_alloc_cpu_dead(unsigned int cpu)
 {
+	unsigned long flags;
 
+	local_irq_save(flags);
 	lru_add_drain_cpu(cpu);
+	local_irq_restore(flags);
+
 	drain_pages(cpu);
 
 	/*
diff --git a/mm/swap.c b/mm/swap.c
index 381e0fe9efbf..6c4e77517bd2 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -688,7 +688,7 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
 
 static DEFINE_PER_CPU(struct work_struct, lru_add_drain_work);
 
-void lru_add_drain_all_cpuslocked(void)
+void lru_add_drain_all_cpus(void)
 {
 	static DEFINE_MUTEX(lock);
 	static struct cpumask has_work;
@@ -724,13 +724,6 @@ void lru_add_drain_all_cpuslocked(void)
 	mutex_unlock(&lock);
 }
 
-void lru_add_drain_all(void)
-{
-	get_online_cpus();
-	lru_add_drain_all_cpuslocked();
-	put_online_cpus();
-}
-
 /**
  * release_pages - batched put_page()
  * @pages: array of pages to release
-- 
2.14.2

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 2/2] mm: drop hotplug lock from lru_add_drain_all
@ 2017-11-02  9:36   ` Michal Hocko
  0 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-11-02  9:36 UTC (permalink / raw)
  To: linux-mm
  Cc: Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Mel Gorman,
	Tejun Heo, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Pulling cpu hotplug locks inside the mm core function like
lru_add_drain_all just asks for problems and the recent lockdep splat
[1] just proves this. While the usage in that particular case might
be wrong we should prevent from locking as lru_add_drain_all is used
at many places. It seems that this is not all that hard to achieve
actually.

We have done the same thing for drain_all_pages which is analogous by
a459eeb7b852 ("mm, page_alloc: do not depend on cpu hotplug locks inside
the allocator"). All we have to care about is to handle
      - the work item might be executed on a different cpu in worker from
        unbound pool so it doesn't run on pinned on the cpu

      - we have to make sure that we do not race with page_alloc_cpu_dead
        calling lru_add_drain_cpu

the first part is already handled because the worker calls lru_add_drain
which disables preemption when calling lru_add_drain_cpu on the local
cpu it is draining. The later is achieved by disabling IRQs around
lru_add_drain_cpu in the hotplug callback.

[1] http://lkml.kernel.org/r/089e0825eec8955c1f055c83d476@google.com

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/swap.h | 1 -
 mm/memory_hotplug.c  | 2 +-
 mm/page_alloc.c      | 4 ++++
 mm/swap.c            | 9 +--------
 4 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 84255b3da7c1..cfc200673e13 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -331,7 +331,6 @@ extern void mark_page_accessed(struct page *);
 extern void lru_add_drain(void);
 extern void lru_add_drain_cpu(int cpu);
 extern void lru_add_drain_all(void);
-extern void lru_add_drain_all_cpuslocked(void);
 extern void rotate_reclaimable_page(struct page *page);
 extern void deactivate_file_page(struct page *page);
 extern void mark_page_lazyfree(struct page *page);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 832a042134f8..c9f6b418be79 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1641,7 +1641,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		goto failed_removal;
 
 	cond_resched();
-	lru_add_drain_all_cpuslocked();
+	lru_add_drain_all();
 	drain_all_pages(zone);
 
 	pfn = scan_movable_pages(start_pfn, end_pfn);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 67330a438525..8c6e9c6d194c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6830,8 +6830,12 @@ void __init free_area_init(unsigned long *zones_size)
 
 static int page_alloc_cpu_dead(unsigned int cpu)
 {
+	unsigned long flags;
 
+	local_irq_save(flags);
 	lru_add_drain_cpu(cpu);
+	local_irq_restore(flags);
+
 	drain_pages(cpu);
 
 	/*
diff --git a/mm/swap.c b/mm/swap.c
index 381e0fe9efbf..6c4e77517bd2 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -688,7 +688,7 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
 
 static DEFINE_PER_CPU(struct work_struct, lru_add_drain_work);
 
-void lru_add_drain_all_cpuslocked(void)
+void lru_add_drain_all_cpus(void)
 {
 	static DEFINE_MUTEX(lock);
 	static struct cpumask has_work;
@@ -724,13 +724,6 @@ void lru_add_drain_all_cpuslocked(void)
 	mutex_unlock(&lock);
 }
 
-void lru_add_drain_all(void)
-{
-	get_online_cpus();
-	lru_add_drain_all_cpuslocked();
-	put_online_cpus();
-}
-
 /**
  * release_pages - batched put_page()
  * @pages: array of pages to release
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] mm: drop hotplug lock from lru_add_drain_all
  2017-11-02  9:36   ` Michal Hocko
@ 2017-11-02 12:37     ` Michal Hocko
  -1 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-11-02 12:37 UTC (permalink / raw)
  To: linux-mm
  Cc: Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Mel Gorman,
	Tejun Heo, LKML

On Thu 02-11-17 10:36:13, Michal Hocko wrote:
[...]
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 67330a438525..8c6e9c6d194c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6830,8 +6830,12 @@ void __init free_area_init(unsigned long *zones_size)
>  
>  static int page_alloc_cpu_dead(unsigned int cpu)
>  {
> +	unsigned long flags;
>  
> +	local_irq_save(flags);
>  	lru_add_drain_cpu(cpu);
> +	local_irq_restore(flags);
> +
>  	drain_pages(cpu);
  
I was staring into the hotplug code and tried to understand the context
this callback runs in and AFAIU IRQ disabling is not needed at all
because cpuhp_thread_fun runs with IRQ disabled when offlining an online
cpu. I have a bit hard time to follow the code due to all the
indirection so please correct me if I am wrong.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] mm: drop hotplug lock from lru_add_drain_all
@ 2017-11-02 12:37     ` Michal Hocko
  0 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-11-02 12:37 UTC (permalink / raw)
  To: linux-mm
  Cc: Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Mel Gorman,
	Tejun Heo, LKML

On Thu 02-11-17 10:36:13, Michal Hocko wrote:
[...]
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 67330a438525..8c6e9c6d194c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -6830,8 +6830,12 @@ void __init free_area_init(unsigned long *zones_size)
>  
>  static int page_alloc_cpu_dead(unsigned int cpu)
>  {
> +	unsigned long flags;
>  
> +	local_irq_save(flags);
>  	lru_add_drain_cpu(cpu);
> +	local_irq_restore(flags);
> +
>  	drain_pages(cpu);
  
I was staring into the hotplug code and tried to understand the context
this callback runs in and AFAIU IRQ disabling is not needed at all
because cpuhp_thread_fun runs with IRQ disabled when offlining an online
cpu. I have a bit hard time to follow the code due to all the
indirection so please correct me if I am wrong.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] mm: drop hotplug lock from lru_add_drain_all
  2017-11-02 12:37     ` Michal Hocko
@ 2017-11-02 13:02       ` Thomas Gleixner
  -1 siblings, 0 replies; 27+ messages in thread
From: Thomas Gleixner @ 2017-11-02 13:02 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Peter Zijlstra, Johannes Weiner, Mel Gorman, Tejun Heo, LKML

On Thu, 2 Nov 2017, Michal Hocko wrote:
> On Thu 02-11-17 10:36:13, Michal Hocko wrote:
> [...]
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 67330a438525..8c6e9c6d194c 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -6830,8 +6830,12 @@ void __init free_area_init(unsigned long *zones_size)
> >  
> >  static int page_alloc_cpu_dead(unsigned int cpu)
> >  {
> > +	unsigned long flags;
> >  
> > +	local_irq_save(flags);
> >  	lru_add_drain_cpu(cpu);
> > +	local_irq_restore(flags);
> > +
> >  	drain_pages(cpu);
>   
> I was staring into the hotplug code and tried to understand the context
> this callback runs in and AFAIU IRQ disabling is not needed at all
> because cpuhp_thread_fun runs with IRQ disabled when offlining an online
> cpu. I have a bit hard time to follow the code due to all the
> indirection so please correct me if I am wrong.

No. That function does neither run from the cpu hotplug thread of the
outgoing CPU nor its called with interrupts disabled.

The callback is in the DEAD section, i.e. its called on the controlling CPU
_after_ the hotplugged CPU vanished completely.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] mm: drop hotplug lock from lru_add_drain_all
@ 2017-11-02 13:02       ` Thomas Gleixner
  0 siblings, 0 replies; 27+ messages in thread
From: Thomas Gleixner @ 2017-11-02 13:02 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Peter Zijlstra, Johannes Weiner, Mel Gorman, Tejun Heo, LKML

On Thu, 2 Nov 2017, Michal Hocko wrote:
> On Thu 02-11-17 10:36:13, Michal Hocko wrote:
> [...]
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 67330a438525..8c6e9c6d194c 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -6830,8 +6830,12 @@ void __init free_area_init(unsigned long *zones_size)
> >  
> >  static int page_alloc_cpu_dead(unsigned int cpu)
> >  {
> > +	unsigned long flags;
> >  
> > +	local_irq_save(flags);
> >  	lru_add_drain_cpu(cpu);
> > +	local_irq_restore(flags);
> > +
> >  	drain_pages(cpu);
>   
> I was staring into the hotplug code and tried to understand the context
> this callback runs in and AFAIU IRQ disabling is not needed at all
> because cpuhp_thread_fun runs with IRQ disabled when offlining an online
> cpu. I have a bit hard time to follow the code due to all the
> indirection so please correct me if I am wrong.

No. That function does neither run from the cpu hotplug thread of the
outgoing CPU nor its called with interrupts disabled.

The callback is in the DEAD section, i.e. its called on the controlling CPU
_after_ the hotplugged CPU vanished completely.

Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] mm: drop hotplug lock from lru_add_drain_all
  2017-11-02 13:02       ` Thomas Gleixner
@ 2017-11-02 13:16         ` Michal Hocko
  -1 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-11-02 13:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-mm, Peter Zijlstra, Johannes Weiner, Mel Gorman, Tejun Heo, LKML

On Thu 02-11-17 14:02:53, Thomas Gleixner wrote:
> On Thu, 2 Nov 2017, Michal Hocko wrote:
> > On Thu 02-11-17 10:36:13, Michal Hocko wrote:
> > [...]
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 67330a438525..8c6e9c6d194c 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -6830,8 +6830,12 @@ void __init free_area_init(unsigned long *zones_size)
> > >  
> > >  static int page_alloc_cpu_dead(unsigned int cpu)
> > >  {
> > > +	unsigned long flags;
> > >  
> > > +	local_irq_save(flags);
> > >  	lru_add_drain_cpu(cpu);
> > > +	local_irq_restore(flags);
> > > +
> > >  	drain_pages(cpu);
> >   
> > I was staring into the hotplug code and tried to understand the context
> > this callback runs in and AFAIU IRQ disabling is not needed at all
> > because cpuhp_thread_fun runs with IRQ disabled when offlining an online
> > cpu. I have a bit hard time to follow the code due to all the
> > indirection so please correct me if I am wrong.
> 
> No. That function does neither run from the cpu hotplug thread of the
> outgoing CPU nor its called with interrupts disabled.
> 
> The callback is in the DEAD section, i.e. its called on the controlling CPU
> _after_ the hotplugged CPU vanished completely.

OK, so IIUC there is no race possible because kworkders simply do not
run on that cpu anymore.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] mm: drop hotplug lock from lru_add_drain_all
@ 2017-11-02 13:16         ` Michal Hocko
  0 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-11-02 13:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-mm, Peter Zijlstra, Johannes Weiner, Mel Gorman, Tejun Heo, LKML

On Thu 02-11-17 14:02:53, Thomas Gleixner wrote:
> On Thu, 2 Nov 2017, Michal Hocko wrote:
> > On Thu 02-11-17 10:36:13, Michal Hocko wrote:
> > [...]
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 67330a438525..8c6e9c6d194c 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -6830,8 +6830,12 @@ void __init free_area_init(unsigned long *zones_size)
> > >  
> > >  static int page_alloc_cpu_dead(unsigned int cpu)
> > >  {
> > > +	unsigned long flags;
> > >  
> > > +	local_irq_save(flags);
> > >  	lru_add_drain_cpu(cpu);
> > > +	local_irq_restore(flags);
> > > +
> > >  	drain_pages(cpu);
> >   
> > I was staring into the hotplug code and tried to understand the context
> > this callback runs in and AFAIU IRQ disabling is not needed at all
> > because cpuhp_thread_fun runs with IRQ disabled when offlining an online
> > cpu. I have a bit hard time to follow the code due to all the
> > indirection so please correct me if I am wrong.
> 
> No. That function does neither run from the cpu hotplug thread of the
> outgoing CPU nor its called with interrupts disabled.
> 
> The callback is in the DEAD section, i.e. its called on the controlling CPU
> _after_ the hotplugged CPU vanished completely.

OK, so IIUC there is no race possible because kworkders simply do not
run on that cpu anymore.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] mm: drop hotplug lock from lru_add_drain_all
  2017-11-02 13:16         ` Michal Hocko
@ 2017-11-02 13:42           ` Thomas Gleixner
  -1 siblings, 0 replies; 27+ messages in thread
From: Thomas Gleixner @ 2017-11-02 13:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Peter Zijlstra, Johannes Weiner, Mel Gorman, Tejun Heo, LKML

On Thu, 2 Nov 2017, Michal Hocko wrote:
> On Thu 02-11-17 14:02:53, Thomas Gleixner wrote:
> > On Thu, 2 Nov 2017, Michal Hocko wrote:
> > > On Thu 02-11-17 10:36:13, Michal Hocko wrote:
> > > [...]
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index 67330a438525..8c6e9c6d194c 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -6830,8 +6830,12 @@ void __init free_area_init(unsigned long *zones_size)
> > > >  
> > > >  static int page_alloc_cpu_dead(unsigned int cpu)
> > > >  {
> > > > +	unsigned long flags;
> > > >  
> > > > +	local_irq_save(flags);
> > > >  	lru_add_drain_cpu(cpu);
> > > > +	local_irq_restore(flags);
> > > > +
> > > >  	drain_pages(cpu);
> > >   
> > > I was staring into the hotplug code and tried to understand the context
> > > this callback runs in and AFAIU IRQ disabling is not needed at all
> > > because cpuhp_thread_fun runs with IRQ disabled when offlining an online
> > > cpu. I have a bit hard time to follow the code due to all the
> > > indirection so please correct me if I am wrong.
> > 
> > No. That function does neither run from the cpu hotplug thread of the
> > outgoing CPU nor its called with interrupts disabled.
> > 
> > The callback is in the DEAD section, i.e. its called on the controlling CPU
> > _after_ the hotplugged CPU vanished completely.
> 
> OK, so IIUC there is no race possible because kworkders simply do not
> run on that cpu anymore.

The CPU is gone and the kworker should have been drained and
stopped. Tejun?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] mm: drop hotplug lock from lru_add_drain_all
@ 2017-11-02 13:42           ` Thomas Gleixner
  0 siblings, 0 replies; 27+ messages in thread
From: Thomas Gleixner @ 2017-11-02 13:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Peter Zijlstra, Johannes Weiner, Mel Gorman, Tejun Heo, LKML

On Thu, 2 Nov 2017, Michal Hocko wrote:
> On Thu 02-11-17 14:02:53, Thomas Gleixner wrote:
> > On Thu, 2 Nov 2017, Michal Hocko wrote:
> > > On Thu 02-11-17 10:36:13, Michal Hocko wrote:
> > > [...]
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index 67330a438525..8c6e9c6d194c 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -6830,8 +6830,12 @@ void __init free_area_init(unsigned long *zones_size)
> > > >  
> > > >  static int page_alloc_cpu_dead(unsigned int cpu)
> > > >  {
> > > > +	unsigned long flags;
> > > >  
> > > > +	local_irq_save(flags);
> > > >  	lru_add_drain_cpu(cpu);
> > > > +	local_irq_restore(flags);
> > > > +
> > > >  	drain_pages(cpu);
> > >   
> > > I was staring into the hotplug code and tried to understand the context
> > > this callback runs in and AFAIU IRQ disabling is not needed at all
> > > because cpuhp_thread_fun runs with IRQ disabled when offlining an online
> > > cpu. I have a bit hard time to follow the code due to all the
> > > indirection so please correct me if I am wrong.
> > 
> > No. That function does neither run from the cpu hotplug thread of the
> > outgoing CPU nor its called with interrupts disabled.
> > 
> > The callback is in the DEAD section, i.e. its called on the controlling CPU
> > _after_ the hotplugged CPU vanished completely.
> 
> OK, so IIUC there is no race possible because kworkders simply do not
> run on that cpu anymore.

The CPU is gone and the kworker should have been drained and
stopped. Tejun?

Thanks,

	tglx

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] shmem: drop lru_add_drain_all from shmem_wait_for_pins
  2017-11-02  9:36   ` Michal Hocko
@ 2017-11-03  7:46     ` Hugh Dickins
  -1 siblings, 0 replies; 27+ messages in thread
From: Hugh Dickins @ 2017-11-03  7:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Peter Zijlstra, Thomas Gleixner, Johannes Weiner,
	Mel Gorman, Tejun Heo, LKML, Michal Hocko, David Herrmann,
	Hugh Dickins

On Thu, 2 Nov 2017, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> syzkaller has reported the following lockdep splat
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.13.0-next-20170911+ #19 Not tainted
> ------------------------------------------------------
> syz-executor5/6914 is trying to acquire lock:
>   (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff818c1b3e>] get_online_cpus  include/linux/cpu.h:126 [inline]
>   (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff818c1b3e>] lru_add_drain_all+0xe/0x20 mm/swap.c:729
> 
> but task is already holding lock:
>   (&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff818fbef7>] inode_lock include/linux/fs.h:712 [inline]
>   (&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff818fbef7>] shmem_add_seals+0x197/0x1060 mm/shmem.c:2768
> 
> more details [1] and dependencies explained [2]. The problem seems to be
> the usage of lru_add_drain_all from shmem_wait_for_pins. While the lock
> dependency is subtle as hell and we might want to make lru_add_drain_all
> less dependent on the hotplug locks the usage of lru_add_drain_all seems
> dubious here. The whole function cares only about radix tree tags, page
> count and page mapcount. None of those are touched from the draining
> context. So it doesn't make much sense to drain pcp caches. Moreover
> this looks like a wrong thing to do because it basically induces
> unpredictable latency to the call because draining is not for free
> (especially on larger machines with many cpus).
> 
> Let's simply drop the call to lru_add_drain_all to address both issues.
> 
> [1] http://lkml.kernel.org/r/089e0825eec8955c1f055c83d476@google.com
> [2] http://lkml.kernel.org/r/http://lkml.kernel.org/r/20171030151009.ip4k7nwan7muouca@hirez.programming.kicks-ass.net
> 
> Cc: David Herrmann <dh.herrmann@gmail.com>
> Cc: Hugh Dickins <hughd@google.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

NAK.  shmem_wait_for_pins() is waiting for temporary pins on the pages
to go away, and using lru_add_drain_all() in the usual way, to lower
the refcount of pages temporarily pinned in a pagevec somewhere.  Page
count is touched by draining pagevecs: I'm surprised to see you say
that it isn't - or have pagevec page references been eliminated by
a recent commit that I missed?

I hope your other patch, or another cpu hotplug locking fix, can deal
with this.  If not, I might be forced to spend some hours understanding
the story that lockdep is telling us there - you're probably way ahead
of me on that.  Maybe a separate inode lock initializer for shmem
inodes would offer a way out.

Hugh

> ---
>  mm/shmem.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index d6947d21f66c..e784f311d4ed 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2668,9 +2668,7 @@ static int shmem_wait_for_pins(struct address_space *mapping)
>  		if (!radix_tree_tagged(&mapping->page_tree, SHMEM_TAG_PINNED))
>  			break;
>  
> -		if (!scan)
> -			lru_add_drain_all();
> -		else if (schedule_timeout_killable((HZ << scan) / 200))
> +		if (scan && schedule_timeout_killable((HZ << scan) / 200))
>  			scan = LAST_SCAN;
>  
>  		start = 0;
> -- 
> 2.14.2
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] shmem: drop lru_add_drain_all from shmem_wait_for_pins
@ 2017-11-03  7:46     ` Hugh Dickins
  0 siblings, 0 replies; 27+ messages in thread
From: Hugh Dickins @ 2017-11-03  7:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Peter Zijlstra, Thomas Gleixner, Johannes Weiner,
	Mel Gorman, Tejun Heo, LKML, Michal Hocko, David Herrmann,
	Hugh Dickins

On Thu, 2 Nov 2017, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> syzkaller has reported the following lockdep splat
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.13.0-next-20170911+ #19 Not tainted
> ------------------------------------------------------
> syz-executor5/6914 is trying to acquire lock:
>   (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff818c1b3e>] get_online_cpus  include/linux/cpu.h:126 [inline]
>   (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff818c1b3e>] lru_add_drain_all+0xe/0x20 mm/swap.c:729
> 
> but task is already holding lock:
>   (&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff818fbef7>] inode_lock include/linux/fs.h:712 [inline]
>   (&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff818fbef7>] shmem_add_seals+0x197/0x1060 mm/shmem.c:2768
> 
> more details [1] and dependencies explained [2]. The problem seems to be
> the usage of lru_add_drain_all from shmem_wait_for_pins. While the lock
> dependency is subtle as hell and we might want to make lru_add_drain_all
> less dependent on the hotplug locks the usage of lru_add_drain_all seems
> dubious here. The whole function cares only about radix tree tags, page
> count and page mapcount. None of those are touched from the draining
> context. So it doesn't make much sense to drain pcp caches. Moreover
> this looks like a wrong thing to do because it basically induces
> unpredictable latency to the call because draining is not for free
> (especially on larger machines with many cpus).
> 
> Let's simply drop the call to lru_add_drain_all to address both issues.
> 
> [1] http://lkml.kernel.org/r/089e0825eec8955c1f055c83d476@google.com
> [2] http://lkml.kernel.org/r/http://lkml.kernel.org/r/20171030151009.ip4k7nwan7muouca@hirez.programming.kicks-ass.net
> 
> Cc: David Herrmann <dh.herrmann@gmail.com>
> Cc: Hugh Dickins <hughd@google.com>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

NAK.  shmem_wait_for_pins() is waiting for temporary pins on the pages
to go away, and using lru_add_drain_all() in the usual way, to lower
the refcount of pages temporarily pinned in a pagevec somewhere.  Page
count is touched by draining pagevecs: I'm surprised to see you say
that it isn't - or have pagevec page references been eliminated by
a recent commit that I missed?

I hope your other patch, or another cpu hotplug locking fix, can deal
with this.  If not, I might be forced to spend some hours understanding
the story that lockdep is telling us there - you're probably way ahead
of me on that.  Maybe a separate inode lock initializer for shmem
inodes would offer a way out.

Hugh

> ---
>  mm/shmem.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index d6947d21f66c..e784f311d4ed 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2668,9 +2668,7 @@ static int shmem_wait_for_pins(struct address_space *mapping)
>  		if (!radix_tree_tagged(&mapping->page_tree, SHMEM_TAG_PINNED))
>  			break;
>  
> -		if (!scan)
> -			lru_add_drain_all();
> -		else if (schedule_timeout_killable((HZ << scan) / 200))
> +		if (scan && schedule_timeout_killable((HZ << scan) / 200))
>  			scan = LAST_SCAN;
>  
>  		start = 0;
> -- 
> 2.14.2
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] shmem: drop lru_add_drain_all from shmem_wait_for_pins
  2017-11-03  7:46     ` Hugh Dickins
@ 2017-11-03  8:24       ` Michal Hocko
  -1 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-11-03  8:24 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, Peter Zijlstra, Thomas Gleixner, Johannes Weiner,
	Mel Gorman, Tejun Heo, LKML, David Herrmann

On Fri 03-11-17 00:46:18, Hugh Dickins wrote:
> On Thu, 2 Nov 2017, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > syzkaller has reported the following lockdep splat
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 4.13.0-next-20170911+ #19 Not tainted
> > ------------------------------------------------------
> > syz-executor5/6914 is trying to acquire lock:
> >   (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff818c1b3e>] get_online_cpus  include/linux/cpu.h:126 [inline]
> >   (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff818c1b3e>] lru_add_drain_all+0xe/0x20 mm/swap.c:729
> > 
> > but task is already holding lock:
> >   (&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff818fbef7>] inode_lock include/linux/fs.h:712 [inline]
> >   (&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff818fbef7>] shmem_add_seals+0x197/0x1060 mm/shmem.c:2768
> > 
> > more details [1] and dependencies explained [2]. The problem seems to be
> > the usage of lru_add_drain_all from shmem_wait_for_pins. While the lock
> > dependency is subtle as hell and we might want to make lru_add_drain_all
> > less dependent on the hotplug locks the usage of lru_add_drain_all seems
> > dubious here. The whole function cares only about radix tree tags, page
> > count and page mapcount. None of those are touched from the draining
> > context. So it doesn't make much sense to drain pcp caches. Moreover
> > this looks like a wrong thing to do because it basically induces
> > unpredictable latency to the call because draining is not for free
> > (especially on larger machines with many cpus).
> > 
> > Let's simply drop the call to lru_add_drain_all to address both issues.
> > 
> > [1] http://lkml.kernel.org/r/089e0825eec8955c1f055c83d476@google.com
> > [2] http://lkml.kernel.org/r/http://lkml.kernel.org/r/20171030151009.ip4k7nwan7muouca@hirez.programming.kicks-ass.net
> > 
> > Cc: David Herrmann <dh.herrmann@gmail.com>
> > Cc: Hugh Dickins <hughd@google.com>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> NAK.  shmem_wait_for_pins() is waiting for temporary pins on the pages
> to go away, and using lru_add_drain_all() in the usual way, to lower
> the refcount of pages temporarily pinned in a pagevec somewhere.  Page
> count is touched by draining pagevecs: I'm surprised to see you say
> that it isn't - or have pagevec page references been eliminated by
> a recent commit that I missed?

I must be missing something here. __pagevec_lru_add_fn merely about
moving the page into the appropriate LRU list, pagevec_move_tail only
rotates, lru_deactivate_file_fn moves from active to inactive LRUs,
lru_lazyfree_fn moves from anon to file LRUs and activate_page_drain
just moves to the active list. None of those operations touch the page
count AFAICS. So I would agree that some pages might be pinned outside
of the LRU (lru_add_pvec) and thus unreclaimable but does this really
matter. Or what else I am missing?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] shmem: drop lru_add_drain_all from shmem_wait_for_pins
@ 2017-11-03  8:24       ` Michal Hocko
  0 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-11-03  8:24 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, Peter Zijlstra, Thomas Gleixner, Johannes Weiner,
	Mel Gorman, Tejun Heo, LKML, David Herrmann

On Fri 03-11-17 00:46:18, Hugh Dickins wrote:
> On Thu, 2 Nov 2017, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > syzkaller has reported the following lockdep splat
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 4.13.0-next-20170911+ #19 Not tainted
> > ------------------------------------------------------
> > syz-executor5/6914 is trying to acquire lock:
> >   (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff818c1b3e>] get_online_cpus  include/linux/cpu.h:126 [inline]
> >   (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff818c1b3e>] lru_add_drain_all+0xe/0x20 mm/swap.c:729
> > 
> > but task is already holding lock:
> >   (&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff818fbef7>] inode_lock include/linux/fs.h:712 [inline]
> >   (&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff818fbef7>] shmem_add_seals+0x197/0x1060 mm/shmem.c:2768
> > 
> > more details [1] and dependencies explained [2]. The problem seems to be
> > the usage of lru_add_drain_all from shmem_wait_for_pins. While the lock
> > dependency is subtle as hell and we might want to make lru_add_drain_all
> > less dependent on the hotplug locks the usage of lru_add_drain_all seems
> > dubious here. The whole function cares only about radix tree tags, page
> > count and page mapcount. None of those are touched from the draining
> > context. So it doesn't make much sense to drain pcp caches. Moreover
> > this looks like a wrong thing to do because it basically induces
> > unpredictable latency to the call because draining is not for free
> > (especially on larger machines with many cpus).
> > 
> > Let's simply drop the call to lru_add_drain_all to address both issues.
> > 
> > [1] http://lkml.kernel.org/r/089e0825eec8955c1f055c83d476@google.com
> > [2] http://lkml.kernel.org/r/http://lkml.kernel.org/r/20171030151009.ip4k7nwan7muouca@hirez.programming.kicks-ass.net
> > 
> > Cc: David Herrmann <dh.herrmann@gmail.com>
> > Cc: Hugh Dickins <hughd@google.com>
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> NAK.  shmem_wait_for_pins() is waiting for temporary pins on the pages
> to go away, and using lru_add_drain_all() in the usual way, to lower
> the refcount of pages temporarily pinned in a pagevec somewhere.  Page
> count is touched by draining pagevecs: I'm surprised to see you say
> that it isn't - or have pagevec page references been eliminated by
> a recent commit that I missed?

I must be missing something here. __pagevec_lru_add_fn merely about
moving the page into the appropriate LRU list, pagevec_move_tail only
rotates, lru_deactivate_file_fn moves from active to inactive LRUs,
lru_lazyfree_fn moves from anon to file LRUs and activate_page_drain
just moves to the active list. None of those operations touch the page
count AFAICS. So I would agree that some pages might be pinned outside
of the LRU (lru_add_pvec) and thus unreclaimable but does this really
matter. Or what else I am missing?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] shmem: drop lru_add_drain_all from shmem_wait_for_pins
  2017-11-03  8:24       ` Michal Hocko
@ 2017-11-03  9:52         ` David Herrmann
  -1 siblings, 0 replies; 27+ messages in thread
From: David Herrmann @ 2017-11-03  9:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Hugh Dickins, linux-mm, Peter Zijlstra, Thomas Gleixner,
	Johannes Weiner, Mel Gorman, Tejun Heo, LKML

Hi

On Fri, Nov 3, 2017 at 9:24 AM, Michal Hocko <mhocko@kernel.org> wrote:
> On Fri 03-11-17 00:46:18, Hugh Dickins wrote:
>> On Thu, 2 Nov 2017, Michal Hocko wrote:
>> > From: Michal Hocko <mhocko@suse.com>
>> >
>> > syzkaller has reported the following lockdep splat
>> > ======================================================
>> > WARNING: possible circular locking dependency detected
>> > 4.13.0-next-20170911+ #19 Not tainted
>> > ------------------------------------------------------
>> > syz-executor5/6914 is trying to acquire lock:
>> >   (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff818c1b3e>] get_online_cpus  include/linux/cpu.h:126 [inline]
>> >   (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff818c1b3e>] lru_add_drain_all+0xe/0x20 mm/swap.c:729
>> >
>> > but task is already holding lock:
>> >   (&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff818fbef7>] inode_lock include/linux/fs.h:712 [inline]
>> >   (&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff818fbef7>] shmem_add_seals+0x197/0x1060 mm/shmem.c:2768
>> >
>> > more details [1] and dependencies explained [2]. The problem seems to be
>> > the usage of lru_add_drain_all from shmem_wait_for_pins. While the lock
>> > dependency is subtle as hell and we might want to make lru_add_drain_all
>> > less dependent on the hotplug locks the usage of lru_add_drain_all seems
>> > dubious here. The whole function cares only about radix tree tags, page
>> > count and page mapcount. None of those are touched from the draining
>> > context. So it doesn't make much sense to drain pcp caches. Moreover
>> > this looks like a wrong thing to do because it basically induces
>> > unpredictable latency to the call because draining is not for free
>> > (especially on larger machines with many cpus).
>> >
>> > Let's simply drop the call to lru_add_drain_all to address both issues.
>> >
>> > [1] http://lkml.kernel.org/r/089e0825eec8955c1f055c83d476@google.com
>> > [2] http://lkml.kernel.org/r/http://lkml.kernel.org/r/20171030151009.ip4k7nwan7muouca@hirez.programming.kicks-ass.net
>> >
>> > Cc: David Herrmann <dh.herrmann@gmail.com>
>> > Cc: Hugh Dickins <hughd@google.com>
>> > Signed-off-by: Michal Hocko <mhocko@suse.com>
>>
>> NAK.  shmem_wait_for_pins() is waiting for temporary pins on the pages
>> to go away, and using lru_add_drain_all() in the usual way, to lower
>> the refcount of pages temporarily pinned in a pagevec somewhere.  Page
>> count is touched by draining pagevecs: I'm surprised to see you say
>> that it isn't - or have pagevec page references been eliminated by
>> a recent commit that I missed?
>
> I must be missing something here. __pagevec_lru_add_fn merely about
> moving the page into the appropriate LRU list, pagevec_move_tail only
> rotates, lru_deactivate_file_fn moves from active to inactive LRUs,
> lru_lazyfree_fn moves from anon to file LRUs and activate_page_drain
> just moves to the active list. None of those operations touch the page
> count AFAICS. So I would agree that some pages might be pinned outside
> of the LRU (lru_add_pvec) and thus unreclaimable but does this really
> matter. Or what else I am missing?

Yes, we need to make sure those page-pins are dropped.
shmem_wait_for_pins() literally just waits for all those to be
cleared, since there is no way to tell whether a page is still
inflight for some pending async WRITE operation. Hence, if the
pagevecs keep pinning those pages, we must fail the shmem-seal
operation, as we cannot guarantee there are no further WRITEs to this
file. The refcount is our only way to tell.

I think the caller could just call lru_add_drain_all() between
mapping_deny_writable() and shmem_wait_for_pins(), releasing the
inode-lock in between. But that means we drain it even if
shmem_tag_pins() does not find anything (presumably the common case).
It would also have weird interactions with parallel inode-operations,
in case the seal-operation fails and is reverted. Not sure I like
that.

Thanks
David

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] shmem: drop lru_add_drain_all from shmem_wait_for_pins
@ 2017-11-03  9:52         ` David Herrmann
  0 siblings, 0 replies; 27+ messages in thread
From: David Herrmann @ 2017-11-03  9:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Hugh Dickins, linux-mm, Peter Zijlstra, Thomas Gleixner,
	Johannes Weiner, Mel Gorman, Tejun Heo, LKML

Hi

On Fri, Nov 3, 2017 at 9:24 AM, Michal Hocko <mhocko@kernel.org> wrote:
> On Fri 03-11-17 00:46:18, Hugh Dickins wrote:
>> On Thu, 2 Nov 2017, Michal Hocko wrote:
>> > From: Michal Hocko <mhocko@suse.com>
>> >
>> > syzkaller has reported the following lockdep splat
>> > ======================================================
>> > WARNING: possible circular locking dependency detected
>> > 4.13.0-next-20170911+ #19 Not tainted
>> > ------------------------------------------------------
>> > syz-executor5/6914 is trying to acquire lock:
>> >   (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff818c1b3e>] get_online_cpus  include/linux/cpu.h:126 [inline]
>> >   (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff818c1b3e>] lru_add_drain_all+0xe/0x20 mm/swap.c:729
>> >
>> > but task is already holding lock:
>> >   (&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff818fbef7>] inode_lock include/linux/fs.h:712 [inline]
>> >   (&sb->s_type->i_mutex_key#9){++++}, at: [<ffffffff818fbef7>] shmem_add_seals+0x197/0x1060 mm/shmem.c:2768
>> >
>> > more details [1] and dependencies explained [2]. The problem seems to be
>> > the usage of lru_add_drain_all from shmem_wait_for_pins. While the lock
>> > dependency is subtle as hell and we might want to make lru_add_drain_all
>> > less dependent on the hotplug locks the usage of lru_add_drain_all seems
>> > dubious here. The whole function cares only about radix tree tags, page
>> > count and page mapcount. None of those are touched from the draining
>> > context. So it doesn't make much sense to drain pcp caches. Moreover
>> > this looks like a wrong thing to do because it basically induces
>> > unpredictable latency to the call because draining is not for free
>> > (especially on larger machines with many cpus).
>> >
>> > Let's simply drop the call to lru_add_drain_all to address both issues.
>> >
>> > [1] http://lkml.kernel.org/r/089e0825eec8955c1f055c83d476@google.com
>> > [2] http://lkml.kernel.org/r/http://lkml.kernel.org/r/20171030151009.ip4k7nwan7muouca@hirez.programming.kicks-ass.net
>> >
>> > Cc: David Herrmann <dh.herrmann@gmail.com>
>> > Cc: Hugh Dickins <hughd@google.com>
>> > Signed-off-by: Michal Hocko <mhocko@suse.com>
>>
>> NAK.  shmem_wait_for_pins() is waiting for temporary pins on the pages
>> to go away, and using lru_add_drain_all() in the usual way, to lower
>> the refcount of pages temporarily pinned in a pagevec somewhere.  Page
>> count is touched by draining pagevecs: I'm surprised to see you say
>> that it isn't - or have pagevec page references been eliminated by
>> a recent commit that I missed?
>
> I must be missing something here. __pagevec_lru_add_fn merely about
> moving the page into the appropriate LRU list, pagevec_move_tail only
> rotates, lru_deactivate_file_fn moves from active to inactive LRUs,
> lru_lazyfree_fn moves from anon to file LRUs and activate_page_drain
> just moves to the active list. None of those operations touch the page
> count AFAICS. So I would agree that some pages might be pinned outside
> of the LRU (lru_add_pvec) and thus unreclaimable but does this really
> matter. Or what else I am missing?

Yes, we need to make sure those page-pins are dropped.
shmem_wait_for_pins() literally just waits for all those to be
cleared, since there is no way to tell whether a page is still
inflight for some pending async WRITE operation. Hence, if the
pagevecs keep pinning those pages, we must fail the shmem-seal
operation, as we cannot guarantee there are no further WRITEs to this
file. The refcount is our only way to tell.

I think the caller could just call lru_add_drain_all() between
mapping_deny_writable() and shmem_wait_for_pins(), releasing the
inode-lock in between. But that means we drain it even if
shmem_tag_pins() does not find anything (presumably the common case).
It would also have weird interactions with parallel inode-operations,
in case the seal-operation fails and is reverted. Not sure I like
that.

Thanks
David

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] mm: drop hotplug lock from lru_add_drain_all
  2017-11-02  9:36   ` Michal Hocko
  (?)
  (?)
@ 2017-11-03 20:02   ` kbuild test robot
  -1 siblings, 0 replies; 27+ messages in thread
From: kbuild test robot @ 2017-11-03 20:02 UTC (permalink / raw)
  To: Michal Hocko
  Cc: kbuild-all, linux-mm, Peter Zijlstra, Thomas Gleixner,
	Johannes Weiner, Mel Gorman, Tejun Heo, LKML, Michal Hocko

[-- Attachment #1: Type: text/plain, Size: 1533 bytes --]

Hi Michal,

I love your patch! Yet something to improve:

[auto build test ERROR on mmotm/master]
[also build test ERROR on next-20171103]
[cannot apply to v4.14-rc7]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Michal-Hocko/do-not-depend-on-cpuhotplug-logs-in-lru_add_drain_all/20171104-023818
base:   git://git.cmpxchg.org/linux-mmotm.git master
config: i386-randconfig-s1-201744 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   mm/compaction.o: In function `sysctl_compaction_handler':
   compaction.c:(.text+0x18f0): undefined reference to `lru_add_drain_all'
   mm/mlock.o: In function `do_mlock':
   mlock.c:(.text+0x6f7): undefined reference to `lru_add_drain_all'
   mm/mlock.o: In function `SyS_mlockall':
   mlock.c:(.text+0xa20): undefined reference to `lru_add_drain_all'
   mm/ksm.o: In function `ksm_scan_thread':
>> ksm.c:(.text+0x139f): undefined reference to `lru_add_drain_all'
   mm/migrate.o: In function `migrate_prep':
>> migrate.c:(.text+0x92e): undefined reference to `lru_add_drain_all'
   fs/block_dev.o:block_dev.c:(.text+0x176): more undefined references to `lru_add_drain_all' follow

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28282 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] shmem: drop lru_add_drain_all from shmem_wait_for_pins
  2017-11-03  8:24       ` Michal Hocko
@ 2017-11-05  0:28         ` Hugh Dickins
  -1 siblings, 0 replies; 27+ messages in thread
From: Hugh Dickins @ 2017-11-05  0:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Hugh Dickins, linux-mm, Peter Zijlstra, Thomas Gleixner,
	Johannes Weiner, Mel Gorman, Tejun Heo, LKML, David Herrmann

On Fri, 3 Nov 2017, Michal Hocko wrote:
> On Fri 03-11-17 00:46:18, Hugh Dickins wrote:
> > 
> > NAK.  shmem_wait_for_pins() is waiting for temporary pins on the pages
> > to go away, and using lru_add_drain_all() in the usual way, to lower
> > the refcount of pages temporarily pinned in a pagevec somewhere.  Page
> > count is touched by draining pagevecs: I'm surprised to see you say
> > that it isn't - or have pagevec page references been eliminated by
> > a recent commit that I missed?
> 
> I must be missing something here. __pagevec_lru_add_fn merely about
> moving the page into the appropriate LRU list, pagevec_move_tail only
> rotates, lru_deactivate_file_fn moves from active to inactive LRUs,
> lru_lazyfree_fn moves from anon to file LRUs and activate_page_drain
> just moves to the active list. None of those operations touch the page
> count AFAICS. So I would agree that some pages might be pinned outside
> of the LRU (lru_add_pvec) and thus unreclaimable but does this really
> matter. Or what else I am missing?

Line 213 of mm/swap.c?  Where pagevec_lru_move_fn() calls release_pages()
to release the extra references (which each page came in with when added).
Think about it, the mayhem that would follow from a page being freed while
on pagevec: of course it must hold a reference.  The only surprise is that
the extra reference is not needed while on LRU: one can think of PageLRU
as an extension of the page count.

Hugh

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] shmem: drop lru_add_drain_all from shmem_wait_for_pins
@ 2017-11-05  0:28         ` Hugh Dickins
  0 siblings, 0 replies; 27+ messages in thread
From: Hugh Dickins @ 2017-11-05  0:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Hugh Dickins, linux-mm, Peter Zijlstra, Thomas Gleixner,
	Johannes Weiner, Mel Gorman, Tejun Heo, LKML, David Herrmann

On Fri, 3 Nov 2017, Michal Hocko wrote:
> On Fri 03-11-17 00:46:18, Hugh Dickins wrote:
> > 
> > NAK.  shmem_wait_for_pins() is waiting for temporary pins on the pages
> > to go away, and using lru_add_drain_all() in the usual way, to lower
> > the refcount of pages temporarily pinned in a pagevec somewhere.  Page
> > count is touched by draining pagevecs: I'm surprised to see you say
> > that it isn't - or have pagevec page references been eliminated by
> > a recent commit that I missed?
> 
> I must be missing something here. __pagevec_lru_add_fn merely about
> moving the page into the appropriate LRU list, pagevec_move_tail only
> rotates, lru_deactivate_file_fn moves from active to inactive LRUs,
> lru_lazyfree_fn moves from anon to file LRUs and activate_page_drain
> just moves to the active list. None of those operations touch the page
> count AFAICS. So I would agree that some pages might be pinned outside
> of the LRU (lru_add_pvec) and thus unreclaimable but does this really
> matter. Or what else I am missing?

Line 213 of mm/swap.c?  Where pagevec_lru_move_fn() calls release_pages()
to release the extra references (which each page came in with when added).
Think about it, the mayhem that would follow from a page being freed while
on pagevec: of course it must hold a reference.  The only surprise is that
the extra reference is not needed while on LRU: one can think of PageLRU
as an extension of the page count.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] shmem: drop lru_add_drain_all from shmem_wait_for_pins
  2017-11-05  0:28         ` Hugh Dickins
@ 2017-11-05  7:37           ` Michal Hocko
  -1 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-11-05  7:37 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, Peter Zijlstra, Thomas Gleixner, Johannes Weiner,
	Mel Gorman, Tejun Heo, LKML, David Herrmann

On Sat 04-11-17 17:28:05, Hugh Dickins wrote:
> On Fri, 3 Nov 2017, Michal Hocko wrote:
> > On Fri 03-11-17 00:46:18, Hugh Dickins wrote:
> > > 
> > > NAK.  shmem_wait_for_pins() is waiting for temporary pins on the pages
> > > to go away, and using lru_add_drain_all() in the usual way, to lower
> > > the refcount of pages temporarily pinned in a pagevec somewhere.  Page
> > > count is touched by draining pagevecs: I'm surprised to see you say
> > > that it isn't - or have pagevec page references been eliminated by
> > > a recent commit that I missed?
> > 
> > I must be missing something here. __pagevec_lru_add_fn merely about
> > moving the page into the appropriate LRU list, pagevec_move_tail only
> > rotates, lru_deactivate_file_fn moves from active to inactive LRUs,
> > lru_lazyfree_fn moves from anon to file LRUs and activate_page_drain
> > just moves to the active list. None of those operations touch the page
> > count AFAICS. So I would agree that some pages might be pinned outside
> > of the LRU (lru_add_pvec) and thus unreclaimable but does this really
> > matter. Or what else I am missing?
> 
> Line 213 of mm/swap.c?  Where pagevec_lru_move_fn() calls release_pages()
> to release the extra references (which each page came in with when added).

I am obviously blind. I was staring at that function many times simply
missing this part. My bad and sorry about not taking a deeper look.
Shame...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/2] shmem: drop lru_add_drain_all from shmem_wait_for_pins
@ 2017-11-05  7:37           ` Michal Hocko
  0 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-11-05  7:37 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, Peter Zijlstra, Thomas Gleixner, Johannes Weiner,
	Mel Gorman, Tejun Heo, LKML, David Herrmann

On Sat 04-11-17 17:28:05, Hugh Dickins wrote:
> On Fri, 3 Nov 2017, Michal Hocko wrote:
> > On Fri 03-11-17 00:46:18, Hugh Dickins wrote:
> > > 
> > > NAK.  shmem_wait_for_pins() is waiting for temporary pins on the pages
> > > to go away, and using lru_add_drain_all() in the usual way, to lower
> > > the refcount of pages temporarily pinned in a pagevec somewhere.  Page
> > > count is touched by draining pagevecs: I'm surprised to see you say
> > > that it isn't - or have pagevec page references been eliminated by
> > > a recent commit that I missed?
> > 
> > I must be missing something here. __pagevec_lru_add_fn merely about
> > moving the page into the appropriate LRU list, pagevec_move_tail only
> > rotates, lru_deactivate_file_fn moves from active to inactive LRUs,
> > lru_lazyfree_fn moves from anon to file LRUs and activate_page_drain
> > just moves to the active list. None of those operations touch the page
> > count AFAICS. So I would agree that some pages might be pinned outside
> > of the LRU (lru_add_pvec) and thus unreclaimable but does this really
> > matter. Or what else I am missing?
> 
> Line 213 of mm/swap.c?  Where pagevec_lru_move_fn() calls release_pages()
> to release the extra references (which each page came in with when added).

I am obviously blind. I was staring at that function many times simply
missing this part. My bad and sorry about not taking a deeper look.
Shame...
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH -v2] mm: drop hotplug lock from lru_add_drain_all
  2017-11-02  9:36   ` Michal Hocko
@ 2017-11-05  7:45     ` Michal Hocko
  -1 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-11-05  7:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Mel Gorman,
	Tejun Heo, LKML

Here is v2 which drops the IRQ disabling in the hotplug callback and
fixes the compile breakage. The patch 1 is wrong as pointed out by Hugh,
but this one alone should help already.
---
>From 53d3003635bfa619745048aa4b70ca62f714960d Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 2 Nov 2017 10:19:17 +0100
Subject: [PATCH] mm: drop hotplug lock from lru_add_drain_all

Pulling cpu hotplug locks inside the mm core function like
lru_add_drain_all just asks for problems and the recent lockdep splat
[1] just proves this. While the usage in that particular case might
be wrong we should prevent from locking as lru_add_drain_all is used
at many places. It seems that this is not all that hard to achieve
actually.

We have done the same thing for drain_all_pages which is analogous by
a459eeb7b852 ("mm, page_alloc: do not depend on cpu hotplug locks inside
the allocator"). All we have to care about is to handle
      - the work item might be executed on a different cpu in worker from
        unbound pool so it doesn't run on pinned on the cpu

      - we have to make sure that we do not race with page_alloc_cpu_dead
        calling lru_add_drain_cpu

the first part is already handled because the worker calls lru_add_drain
which disables preemption when calling lru_add_drain_cpu on the local
cpu it is draining. The later is true because page_alloc_cpu_dead
is called on the controlling CPU after the hotplugged CPU vanished
completely.

[1] http://lkml.kernel.org/r/089e0825eec8955c1f055c83d476@google.com

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 include/linux/swap.h | 1 -
 mm/memory_hotplug.c  | 2 +-
 mm/swap.c            | 9 +--------
 3 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 84255b3da7c1..cfc200673e13 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -331,7 +331,6 @@ extern void mark_page_accessed(struct page *);
 extern void lru_add_drain(void);
 extern void lru_add_drain_cpu(int cpu);
 extern void lru_add_drain_all(void);
-extern void lru_add_drain_all_cpuslocked(void);
 extern void rotate_reclaimable_page(struct page *page);
 extern void deactivate_file_page(struct page *page);
 extern void mark_page_lazyfree(struct page *page);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 832a042134f8..c9f6b418be79 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1641,7 +1641,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
 		goto failed_removal;
 
 	cond_resched();
-	lru_add_drain_all_cpuslocked();
+	lru_add_drain_all();
 	drain_all_pages(zone);
 
 	pfn = scan_movable_pages(start_pfn, end_pfn);
diff --git a/mm/swap.c b/mm/swap.c
index 381e0fe9efbf..8bfdcab9f83e 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -688,7 +688,7 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy)
 
 static DEFINE_PER_CPU(struct work_struct, lru_add_drain_work);
 
-void lru_add_drain_all_cpuslocked(void)
+void lru_add_drain_all(void)
 {
 	static DEFINE_MUTEX(lock);
 	static struct cpumask has_work;
@@ -724,13 +724,6 @@ void lru_add_drain_all_cpuslocked(void)
 	mutex_unlock(&lock);
 }
 
-void lru_add_drain_all(void)
-{
-	get_online_cpus();
-	lru_add_drain_all_cpuslocked();
-	put_online_cpus();
-}
-
 /**
  * release_pages - batched put_page()
  * @pages: array of pages to release
-- 
2.14.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH -v2] mm: drop hotplug lock from lru_add_drain_all
@ 2017-11-05  7:45     ` Michal Hocko
  0 siblings, 0 replies; 27+ messages in thread
From: Michal Hocko @ 2017-11-05  7:45 UTC (permalink / raw)
  To: linux-mm
  Cc: Peter Zijlstra, Thomas Gleixner, Johannes Weiner, Mel Gorman,
	Tejun Heo, LKML

Here is v2 which drops the IRQ disabling in the hotplug callback and
fixes the compile breakage. The patch 1 is wrong as pointed out by Hugh,
but this one alone should help already.
---

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-11-05  7:45 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-02  9:36 [RFC 0/2] do not depend on cpuhotplug logs in lru_add_drain_all Michal Hocko
2017-11-02  9:36 ` Michal Hocko
2017-11-02  9:36 ` [PATCH 1/2] shmem: drop lru_add_drain_all from shmem_wait_for_pins Michal Hocko
2017-11-02  9:36   ` Michal Hocko
2017-11-03  7:46   ` Hugh Dickins
2017-11-03  7:46     ` Hugh Dickins
2017-11-03  8:24     ` Michal Hocko
2017-11-03  8:24       ` Michal Hocko
2017-11-03  9:52       ` David Herrmann
2017-11-03  9:52         ` David Herrmann
2017-11-05  0:28       ` Hugh Dickins
2017-11-05  0:28         ` Hugh Dickins
2017-11-05  7:37         ` Michal Hocko
2017-11-05  7:37           ` Michal Hocko
2017-11-02  9:36 ` [PATCH 2/2] mm: drop hotplug lock from lru_add_drain_all Michal Hocko
2017-11-02  9:36   ` Michal Hocko
2017-11-02 12:37   ` Michal Hocko
2017-11-02 12:37     ` Michal Hocko
2017-11-02 13:02     ` Thomas Gleixner
2017-11-02 13:02       ` Thomas Gleixner
2017-11-02 13:16       ` Michal Hocko
2017-11-02 13:16         ` Michal Hocko
2017-11-02 13:42         ` Thomas Gleixner
2017-11-02 13:42           ` Thomas Gleixner
2017-11-03 20:02   ` kbuild test robot
2017-11-05  7:45   ` [PATCH -v2] " Michal Hocko
2017-11-05  7:45     ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.