All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] mm,oom: speed up select_bad_process() loop.
@ 2016-05-18 12:20 Tetsuo Handa
  2016-05-18 12:51 ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: Tetsuo Handa @ 2016-05-18 12:20 UTC (permalink / raw)
  To: mhocko, akpm, rientjes; +Cc: linux-mm, Tetsuo Handa, Oleg Nesterov

Since commit 3a5dda7a17cf3706 ("oom: prevent unnecessary oom kills or
kernel panics"), select_bad_process() is using for_each_process_thread().

Since oom_unkillable_task() scans all threads in the caller's thread group
and oom_task_origin() scans signal_struct of the caller's thread group, we
don't need to call oom_unkillable_task() and oom_task_origin() on each
thread. Also, since !mm test will be done later at oom_badness(), we don't
need to do !mm test on each thread. Therefore, we only need to do
TIF_MEMDIE test on each thread.

If we track number of TIF_MEMDIE threads inside signal_struct, we don't
need to do TIF_MEMDIE test on each thread. This will allow
select_bad_process() to use for_each_process().

This patch adds a counter to signal_struct for tracking how many
TIF_MEMDIE threads are in a given thread group, and check it at
oom_scan_process_thread() so that select_bad_process() can use
for_each_process() rather than for_each_process_thread().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/sched.h |  1 +
 mm/oom_kill.c         | 12 ++++++------
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 870a700..1589f8e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -794,6 +794,7 @@ struct signal_struct {
 	struct tty_audit_buf *tty_audit_buf;
 #endif
 
+	atomic_t oom_victims; /* # of TIF_MEDIE threads in this thread group */
 	/*
 	 * Thread is the potential origin of an oom condition; kill first on
 	 * oom
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index c0e37dd..1ac24e8 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -283,10 +283,8 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
 	 * This task already has access to memory reserves and is being killed.
 	 * Don't allow any other task to have access to the reserves.
 	 */
-	if (test_tsk_thread_flag(task, TIF_MEMDIE)) {
-		if (!is_sysrq_oom(oc))
-			return OOM_SCAN_ABORT;
-	}
+	if (!is_sysrq_oom(oc) && atomic_read(&task->signal->oom_victims))
+		return OOM_SCAN_ABORT;
 	if (!task->mm)
 		return OOM_SCAN_CONTINUE;
 
@@ -307,12 +305,12 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
 static struct task_struct *select_bad_process(struct oom_control *oc,
 		unsigned int *ppoints, unsigned long totalpages)
 {
-	struct task_struct *g, *p;
+	struct task_struct *p;
 	struct task_struct *chosen = NULL;
 	unsigned long chosen_points = 0;
 
 	rcu_read_lock();
-	for_each_process_thread(g, p) {
+	for_each_process(p) {
 		unsigned int points;
 
 		switch (oom_scan_process_thread(oc, p, totalpages)) {
@@ -673,6 +671,7 @@ void mark_oom_victim(struct task_struct *tsk)
 	/* OOM killer might race with memcg OOM */
 	if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE))
 		return;
+	atomic_inc(&tsk->signal->oom_victims);
 	/*
 	 * Make sure that the task is woken up from uninterruptible sleep
 	 * if it is frozen because OOM killer wouldn't be able to free
@@ -690,6 +689,7 @@ void exit_oom_victim(struct task_struct *tsk)
 {
 	if (!test_and_clear_tsk_thread_flag(tsk, TIF_MEMDIE))
 		return;
+	atomic_dec(&tsk->signal->oom_victims);
 
 	if (!atomic_dec_return(&oom_victims))
 		wake_up_all(&oom_victims_wait);
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] mm,oom: speed up select_bad_process() loop.
  2016-05-18 12:20 [PATCH v2] mm,oom: speed up select_bad_process() loop Tetsuo Handa
@ 2016-05-18 12:51 ` Michal Hocko
  2016-05-18 13:30   ` [PATCH v3] " Tetsuo Handa
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2016-05-18 12:51 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, rientjes, linux-mm, Oleg Nesterov

On Wed 18-05-16 21:20:24, Tetsuo Handa wrote:
> Since commit 3a5dda7a17cf3706 ("oom: prevent unnecessary oom kills or
> kernel panics"), select_bad_process() is using for_each_process_thread().
> 
> Since oom_unkillable_task() scans all threads in the caller's thread group
> and oom_task_origin() scans signal_struct of the caller's thread group, we
> don't need to call oom_unkillable_task() and oom_task_origin() on each
> thread. Also, since !mm test will be done later at oom_badness(), we don't
> need to do !mm test on each thread. Therefore, we only need to do
> TIF_MEMDIE test on each thread.
> 
> If we track number of TIF_MEMDIE threads inside signal_struct, we don't
> need to do TIF_MEMDIE test on each thread. This will allow
> select_bad_process() to use for_each_process().

I am wondering whether signal_struct is the best way forward. The oom
killing is more about mm_struct than anything else. We can record that
the mm was oom killed in mm->flags (similar to MMF_OOM_REAPED). I guess
this would require more work at this stage so maybe starting with signal
struct is not that bad afterall. Just thinking...

> This patch adds a counter to signal_struct for tracking how many
> TIF_MEMDIE threads are in a given thread group, and check it at
> oom_scan_process_thread() so that select_bad_process() can use
> for_each_process() rather than for_each_process_thread().

In general I do agree that for_each_process is preferable. I guess you
are missing one case here, though (or maybe just forgot to refresh the
patch because the changelog mentions !mm test):

[...]
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index c0e37dd..1ac24e8 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -283,10 +283,8 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
>  	 * This task already has access to memory reserves and is being killed.
>  	 * Don't allow any other task to have access to the reserves.
>  	 */
> -	if (test_tsk_thread_flag(task, TIF_MEMDIE)) {
> -		if (!is_sysrq_oom(oc))
> -			return OOM_SCAN_ABORT;
> -	}
> +	if (!is_sysrq_oom(oc) && atomic_read(&task->signal->oom_victims))
> +		return OOM_SCAN_ABORT;
>  	if (!task->mm)
>  		return OOM_SCAN_CONTINUE;

So let's say that the group leader is gone, now you would skip the whole
thread group AFAICS. This is an easy way to hide from the OOM killer,
unless I've missed something.

You can safely drop if (!task->mm) check because oom_badness does
find_lock_task_mm so we will catch this case anyway.

Other than that the patch looks good to me.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3] mm,oom: speed up select_bad_process() loop.
  2016-05-18 12:51 ` Michal Hocko
@ 2016-05-18 13:30   ` Tetsuo Handa
  2016-05-18 14:15     ` Michal Hocko
                       ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Tetsuo Handa @ 2016-05-18 13:30 UTC (permalink / raw)
  To: mhocko; +Cc: akpm, rientjes, linux-mm, oleg

Michal Hocko wrote:
> On Wed 18-05-16 21:20:24, Tetsuo Handa wrote:
> > Since commit 3a5dda7a17cf3706 ("oom: prevent unnecessary oom kills or
> > kernel panics"), select_bad_process() is using for_each_process_thread().
> > 
> > Since oom_unkillable_task() scans all threads in the caller's thread group
> > and oom_task_origin() scans signal_struct of the caller's thread group, we
> > don't need to call oom_unkillable_task() and oom_task_origin() on each
> > thread. Also, since !mm test will be done later at oom_badness(), we don't
> > need to do !mm test on each thread. Therefore, we only need to do
> > TIF_MEMDIE test on each thread.
> > 
> > If we track number of TIF_MEMDIE threads inside signal_struct, we don't
> > need to do TIF_MEMDIE test on each thread. This will allow
> > select_bad_process() to use for_each_process().
> 
> I am wondering whether signal_struct is the best way forward. The oom
> killing is more about mm_struct than anything else. We can record that
> the mm was oom killed in mm->flags (similar to MMF_OOM_REAPED). I guess
> this would require more work at this stage so maybe starting with signal
> struct is not that bad afterall. Just thinking...

Even if you call p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN case a bug,
(p->flags & PF_KTHREAD) || is_global_init(p) case is still possible. Thus,
I think we can't mark the mm was oom killed in mm->flags.

> 
> > This patch adds a counter to signal_struct for tracking how many
> > TIF_MEMDIE threads are in a given thread group, and check it at
> > oom_scan_process_thread() so that select_bad_process() can use
> > for_each_process() rather than for_each_process_thread().
> 
> In general I do agree that for_each_process is preferable. I guess you
> are missing one case here, though (or maybe just forgot to refresh the
> patch because the changelog mentions !mm test):

Oops, I forgot to delete that test. Thanks.
----------------------------------------
>From d770bd777e628e9d1ae250249433cf576aae8961 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Wed, 18 May 2016 22:17:47 +0900
Subject: [PATCH v3] mm,oom: speed up select_bad_process() loop.

Since commit 3a5dda7a17cf3706 ("oom: prevent unnecessary oom kills or
kernel panics"), select_bad_process() is using for_each_process_thread().

Since oom_unkillable_task() scans all threads in the caller's thread group
and oom_task_origin() scans signal_struct of the caller's thread group, we
don't need to call oom_unkillable_task() and oom_task_origin() on each
thread. Also, since !mm test will be done later at oom_badness(), we don't
need to do !mm test on each thread. Therefore, we only need to do
TIF_MEMDIE test on each thread.

If we track number of TIF_MEMDIE threads inside signal_struct, we don't
need to do TIF_MEMDIE test on each thread. This will allow
select_bad_process() to use for_each_process().

This patch adds a counter to signal_struct for tracking how many
TIF_MEMDIE threads are in a given thread group, and check it at
oom_scan_process_thread() so that select_bad_process() can use
for_each_process() rather than for_each_process_thread().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/sched.h |  1 +
 mm/oom_kill.c         | 14 ++++++--------
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 870a700..1589f8e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -794,6 +794,7 @@ struct signal_struct {
 	struct tty_audit_buf *tty_audit_buf;
 #endif
 
+	atomic_t oom_victims; /* # of TIF_MEDIE threads in this thread group */
 	/*
 	 * Thread is the potential origin of an oom condition; kill first on
 	 * oom
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index c0e37dd..8e151d0 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -283,12 +283,8 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
 	 * This task already has access to memory reserves and is being killed.
 	 * Don't allow any other task to have access to the reserves.
 	 */
-	if (test_tsk_thread_flag(task, TIF_MEMDIE)) {
-		if (!is_sysrq_oom(oc))
-			return OOM_SCAN_ABORT;
-	}
-	if (!task->mm)
-		return OOM_SCAN_CONTINUE;
+	if (!is_sysrq_oom(oc) && atomic_read(&task->signal->oom_victims))
+		return OOM_SCAN_ABORT;
 
 	/*
 	 * If task is allocating a lot of memory and has been marked to be
@@ -307,12 +303,12 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
 static struct task_struct *select_bad_process(struct oom_control *oc,
 		unsigned int *ppoints, unsigned long totalpages)
 {
-	struct task_struct *g, *p;
+	struct task_struct *p;
 	struct task_struct *chosen = NULL;
 	unsigned long chosen_points = 0;
 
 	rcu_read_lock();
-	for_each_process_thread(g, p) {
+	for_each_process(p) {
 		unsigned int points;
 
 		switch (oom_scan_process_thread(oc, p, totalpages)) {
@@ -673,6 +669,7 @@ void mark_oom_victim(struct task_struct *tsk)
 	/* OOM killer might race with memcg OOM */
 	if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE))
 		return;
+	atomic_inc(&tsk->signal->oom_victims);
 	/*
 	 * Make sure that the task is woken up from uninterruptible sleep
 	 * if it is frozen because OOM killer wouldn't be able to free
@@ -690,6 +687,7 @@ void exit_oom_victim(struct task_struct *tsk)
 {
 	if (!test_and_clear_tsk_thread_flag(tsk, TIF_MEMDIE))
 		return;
+	atomic_dec(&tsk->signal->oom_victims);
 
 	if (!atomic_dec_return(&oom_victims))
 		wake_up_all(&oom_victims_wait);
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] mm,oom: speed up select_bad_process() loop.
  2016-05-18 13:30   ` [PATCH v3] " Tetsuo Handa
@ 2016-05-18 14:15     ` Michal Hocko
  2016-05-18 21:09       ` Andrew Morton
  2016-05-18 14:44     ` Tetsuo Handa
  2016-05-20  7:50     ` Michal Hocko
  2 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2016-05-18 14:15 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, rientjes, linux-mm, oleg

On Wed 18-05-16 22:30:14, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Wed 18-05-16 21:20:24, Tetsuo Handa wrote:
> > > Since commit 3a5dda7a17cf3706 ("oom: prevent unnecessary oom kills or
> > > kernel panics"), select_bad_process() is using for_each_process_thread().
> > > 
> > > Since oom_unkillable_task() scans all threads in the caller's thread group
> > > and oom_task_origin() scans signal_struct of the caller's thread group, we
> > > don't need to call oom_unkillable_task() and oom_task_origin() on each
> > > thread. Also, since !mm test will be done later at oom_badness(), we don't
> > > need to do !mm test on each thread. Therefore, we only need to do
> > > TIF_MEMDIE test on each thread.
> > > 
> > > If we track number of TIF_MEMDIE threads inside signal_struct, we don't
> > > need to do TIF_MEMDIE test on each thread. This will allow
> > > select_bad_process() to use for_each_process().
> > 
> > I am wondering whether signal_struct is the best way forward. The oom
> > killing is more about mm_struct than anything else. We can record that
> > the mm was oom killed in mm->flags (similar to MMF_OOM_REAPED). I guess
> > this would require more work at this stage so maybe starting with signal
> > struct is not that bad afterall. Just thinking...
> 
> Even if you call p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN case a bug,
> (p->flags & PF_KTHREAD) || is_global_init(p) case is still possible.

I couldn't care less about such a case to be honest, and that is not a
reason the cripple the code for such an insanity. There simply doesn't
make any sense to share init's mm with a different task.

> Thus, I think we can't mark the mm was oom killed in mm->flags.

[...]
> >From d770bd777e628e9d1ae250249433cf576aae8961 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Wed, 18 May 2016 22:17:47 +0900
> Subject: [PATCH v3] mm,oom: speed up select_bad_process() loop.
> 
> Since commit 3a5dda7a17cf3706 ("oom: prevent unnecessary oom kills or
> kernel panics"), select_bad_process() is using for_each_process_thread().
> 
> Since oom_unkillable_task() scans all threads in the caller's thread group
> and oom_task_origin() scans signal_struct of the caller's thread group, we
> don't need to call oom_unkillable_task() and oom_task_origin() on each
> thread. Also, since !mm test will be done later at oom_badness(), we don't
> need to do !mm test on each thread. Therefore, we only need to do
> TIF_MEMDIE test on each thread.
> 
> If we track number of TIF_MEMDIE threads inside signal_struct, we don't
> need to do TIF_MEMDIE test on each thread. This will allow
> select_bad_process() to use for_each_process().
> 
> This patch adds a counter to signal_struct for tracking how many
> TIF_MEMDIE threads are in a given thread group, and check it at
> oom_scan_process_thread() so that select_bad_process() can use
> for_each_process() rather than for_each_process_thread().

OK, this looks correct. Strictly speaking the patch is missing any note
on _why_ this is needed or an improvement. I would add something like
the following:
"
Although the original code was correct it was quite inefficient because
each thread group was scanned num_threads times which can be a lot
especially with processes with many threads. Even though the OOM is
extremely cold path it is always good to be as effective as possible
when we are inside rcu_read_lock() - aka unpreemptible context.
"

> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Oleg Nesterov <oleg@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/sched.h |  1 +
>  mm/oom_kill.c         | 14 ++++++--------
>  2 files changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 870a700..1589f8e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -794,6 +794,7 @@ struct signal_struct {
>  	struct tty_audit_buf *tty_audit_buf;
>  #endif
>  
> +	atomic_t oom_victims; /* # of TIF_MEDIE threads in this thread group */
>  	/*
>  	 * Thread is the potential origin of an oom condition; kill first on
>  	 * oom
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index c0e37dd..8e151d0 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -283,12 +283,8 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
>  	 * This task already has access to memory reserves and is being killed.
>  	 * Don't allow any other task to have access to the reserves.
>  	 */
> -	if (test_tsk_thread_flag(task, TIF_MEMDIE)) {
> -		if (!is_sysrq_oom(oc))
> -			return OOM_SCAN_ABORT;
> -	}
> -	if (!task->mm)
> -		return OOM_SCAN_CONTINUE;
> +	if (!is_sysrq_oom(oc) && atomic_read(&task->signal->oom_victims))
> +		return OOM_SCAN_ABORT;
>  
>  	/*
>  	 * If task is allocating a lot of memory and has been marked to be
> @@ -307,12 +303,12 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
>  static struct task_struct *select_bad_process(struct oom_control *oc,
>  		unsigned int *ppoints, unsigned long totalpages)
>  {
> -	struct task_struct *g, *p;
> +	struct task_struct *p;
>  	struct task_struct *chosen = NULL;
>  	unsigned long chosen_points = 0;
>  
>  	rcu_read_lock();
> -	for_each_process_thread(g, p) {
> +	for_each_process(p) {
>  		unsigned int points;
>  
>  		switch (oom_scan_process_thread(oc, p, totalpages)) {
> @@ -673,6 +669,7 @@ void mark_oom_victim(struct task_struct *tsk)
>  	/* OOM killer might race with memcg OOM */
>  	if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE))
>  		return;
> +	atomic_inc(&tsk->signal->oom_victims);
>  	/*
>  	 * Make sure that the task is woken up from uninterruptible sleep
>  	 * if it is frozen because OOM killer wouldn't be able to free
> @@ -690,6 +687,7 @@ void exit_oom_victim(struct task_struct *tsk)
>  {
>  	if (!test_and_clear_tsk_thread_flag(tsk, TIF_MEMDIE))
>  		return;
> +	atomic_dec(&tsk->signal->oom_victims);
>  
>  	if (!atomic_dec_return(&oom_victims))
>  		wake_up_all(&oom_victims_wait);
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] mm,oom: speed up select_bad_process() loop.
  2016-05-18 13:30   ` [PATCH v3] " Tetsuo Handa
  2016-05-18 14:15     ` Michal Hocko
@ 2016-05-18 14:44     ` Tetsuo Handa
  2016-05-20  7:50     ` Michal Hocko
  2 siblings, 0 replies; 21+ messages in thread
From: Tetsuo Handa @ 2016-05-18 14:44 UTC (permalink / raw)
  To: mhocko; +Cc: akpm, rientjes, linux-mm, oleg, penguin-kernel

Michal Hocko wrote:
> On Wed 18-05-16 22:30:14, Tetsuo Handa wrote:
> > Even if you call p->signal->oom_score_adj == OOM_SCORE_ADJ_MIN case a bug,
> > (p->flags & PF_KTHREAD) || is_global_init(p) case is still possible.
> 
> I couldn't care less about such a case to be honest, and that is not a
> reason the cripple the code for such an insanity. There simply doesn't
> make any sense to share init's mm with a different task.

The global init called vfork(), and the child tried to call execve() with
large argv/envp array, and the child got OOM-killed is possible.

> OK, this looks correct. Strictly speaking the patch is missing any note
> on _why_ this is needed or an improvement. I would add something like
> the following:
> "
> Although the original code was correct it was quite inefficient because
> each thread group was scanned num_threads times which can be a lot
> especially with processes with many threads. Even though the OOM is
> extremely cold path it is always good to be as effective as possible
> when we are inside rcu_read_lock() - aka unpreemptible context.
> "

rcu_read_lock() is not always unpreemptible context. rcu_read_lock() says:

  In non-preemptible RCU implementations (TREE_RCU and TINY_RCU),
  it is illegal to block while in an RCU read-side critical section.
  In preemptible RCU implementations (PREEMPT_RCU) in CONFIG_PREEMPT
  kernel builds, RCU read-side critical sections may be preempted,
  but explicit blocking is illegal.  Finally, in preemptible RCU
  implementations in real-time (with -rt patchset) kernel builds, RCU
  read-side critical sections may be preempted and they may also block, but
  only when acquiring spinlocks that are subject to priority inheritance.

We will need preempt_disable() if we want to make out_of_memory() return
as fast as possible.

> 
> > Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Oleg Nesterov <oleg@redhat.com>
> 
> Acked-by: Michal Hocko <mhocko@suse.com>

Thank you.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] mm,oom: speed up select_bad_process() loop.
  2016-05-18 14:15     ` Michal Hocko
@ 2016-05-18 21:09       ` Andrew Morton
  2016-05-19  6:53         ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2016-05-18 21:09 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Tetsuo Handa, rientjes, linux-mm, oleg

On Wed, 18 May 2016 16:15:45 +0200 Michal Hocko <mhocko@kernel.org> wrote:

> > This patch adds a counter to signal_struct for tracking how many
> > TIF_MEMDIE threads are in a given thread group, and check it at
> > oom_scan_process_thread() so that select_bad_process() can use
> > for_each_process() rather than for_each_process_thread().
> 
> OK, this looks correct. Strictly speaking the patch is missing any note
> on _why_ this is needed or an improvement. I would add something like
> the following:
> "
> Although the original code was correct it was quite inefficient because
> each thread group was scanned num_threads times which can be a lot
> especially with processes with many threads. Even though the OOM is
> extremely cold path it is always good to be as effective as possible
> when we are inside rcu_read_lock() - aka unpreemptible context.
> "

This sounds quite rubbery to me.  Lots of code calls
for_each_process_thread() and presumably that isn't causing problems. 
We're bloating up the signal_struct to solve some problem on a
rarely-called slowpath with no evidence that there is actually a
problem to be solved.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] mm,oom: speed up select_bad_process() loop.
  2016-05-18 21:09       ` Andrew Morton
@ 2016-05-19  6:53         ` Michal Hocko
  2016-05-19  7:17           ` Michal Hocko
  2016-05-20  1:50           ` Oleg Nesterov
  0 siblings, 2 replies; 21+ messages in thread
From: Michal Hocko @ 2016-05-19  6:53 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Tetsuo Handa, rientjes, linux-mm, oleg

On Wed 18-05-16 14:09:32, Andrew Morton wrote:
> On Wed, 18 May 2016 16:15:45 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> 
> > > This patch adds a counter to signal_struct for tracking how many
> > > TIF_MEMDIE threads are in a given thread group, and check it at
> > > oom_scan_process_thread() so that select_bad_process() can use
> > > for_each_process() rather than for_each_process_thread().
> > 
> > OK, this looks correct. Strictly speaking the patch is missing any note
> > on _why_ this is needed or an improvement. I would add something like
> > the following:
> > "
> > Although the original code was correct it was quite inefficient because
> > each thread group was scanned num_threads times which can be a lot
> > especially with processes with many threads. Even though the OOM is
> > extremely cold path it is always good to be as effective as possible
> > when we are inside rcu_read_lock() - aka unpreemptible context.
> > "
> 
> This sounds quite rubbery to me.  Lots of code calls
> for_each_process_thread() and presumably that isn't causing problems. 

Yeah, many paths call for_each_process_thread but they are
O(num_threads) while this is O(num_threads^2).

> We're bloating up the signal_struct to solve some problem on a
> rarely-called slowpath with no evidence that there is actually a
> problem to be solved.

signal_struct has some holes[1] so we can stitch it there. Long term I
would like to to move this logic into the mm_struct, it would be just
larger surgery I guess.

[1]

die__process_function: tag not supported (INVALID)!
struct signal_struct {
	atomic_t                   sigcnt;               /*     0     4 */
	atomic_t                   live;                 /*     4     4 */
	int                        nr_threads;           /*     8     4 */

	/* XXX 4 bytes hole, try to pack */

	struct list_head           thread_head;          /*    16    16 */
	wait_queue_head_t          wait_chldexit;        /*    32    24 */
	struct task_struct *       curr_target;          /*    56     8 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	struct sigpending          shared_pending;       /*    64    24 */
	int                        group_exit_code;      /*    88     4 */
	int                        notify_count;         /*    92     4 */
	struct task_struct *       group_exit_task;      /*    96     8 */
	int                        group_stop_count;     /*   104     4 */
	unsigned int               flags;                /*   108     4 */
	unsigned int               is_child_subreaper:1; /*   112:31  4 */
	unsigned int               has_child_subreaper:1; /*   112:30  4 */

	/* XXX 30 bits hole, try to pack */

	int                        posix_timer_id;       /*   116     4 */
	struct list_head           posix_timers;         /*   120    16 */
	/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
	struct hrtimer             real_timer;           /*   136    88 */
	/* --- cacheline 3 boundary (192 bytes) was 32 bytes ago --- */
	struct pid *               leader_pid;           /*   224     8 */
	ktime_t                    it_real_incr;         /*   232     8 */
	struct cpu_itimer          it[2];                /*   240    48 */
	/* --- cacheline 4 boundary (256 bytes) was 32 bytes ago --- */
	struct thread_group_cputimer cputimer;           /*   288    32 */
	/* --- cacheline 5 boundary (320 bytes) --- */
	struct task_cputime        cputime_expires;      /*   320    24 */
	atomic_t                   tick_dep_mask;        /*   344     4 */

	/* XXX 4 bytes hole, try to pack */

	struct list_head           cpu_timers[3];        /*   352    48 */
	/* --- cacheline 6 boundary (384 bytes) was 16 bytes ago --- */
	struct pid *               tty_old_pgrp;         /*   400     8 */
	int                        leader;               /*   408     4 */

	/* XXX 4 bytes hole, try to pack */

	struct tty_struct *        tty;                  /*   416     8 */
	struct autogroup *         autogroup;            /*   424     8 */
	seqlock_t                  stats_lock;           /*   432     8 */
	cputime_t                  utime;                /*   440     8 */
	/* --- cacheline 7 boundary (448 bytes) --- */
	cputime_t                  stime;                /*   448     8 */
	cputime_t                  cutime;               /*   456     8 */
	cputime_t                  cstime;               /*   464     8 */
	cputime_t                  gtime;                /*   472     8 */
	cputime_t                  cgtime;               /*   480     8 */
	struct prev_cputime        prev_cputime;         /*   488    24 */
	/* --- cacheline 8 boundary (512 bytes) --- */
	long unsigned int          nvcsw;                /*   512     8 */
	long unsigned int          nivcsw;               /*   520     8 */
	long unsigned int          cnvcsw;               /*   528     8 */
	long unsigned int          cnivcsw;              /*   536     8 */
	long unsigned int          min_flt;              /*   544     8 */
	long unsigned int          maj_flt;              /*   552     8 */
	long unsigned int          cmin_flt;             /*   560     8 */
	long unsigned int          cmaj_flt;             /*   568     8 */
	/* --- cacheline 9 boundary (576 bytes) --- */
	long unsigned int          inblock;              /*   576     8 */
	long unsigned int          oublock;              /*   584     8 */
	long unsigned int          cinblock;             /*   592     8 */
	long unsigned int          coublock;             /*   600     8 */
	long unsigned int          maxrss;               /*   608     8 */
	long unsigned int          cmaxrss;              /*   616     8 */
	struct task_io_accounting  ioac;                 /*   624    56 */
	/* --- cacheline 10 boundary (640 bytes) was 40 bytes ago --- */
	long long unsigned int     sum_sched_runtime;    /*   680     8 */
	struct rlimit              rlim[16];             /*   688   256 */
	/* --- cacheline 14 boundary (896 bytes) was 48 bytes ago --- */
	struct taskstats *         stats;                /*   944     8 */
	oom_flags_t                oom_flags;            /*   952     4 */
	short int                  oom_score_adj;        /*   956     2 */
	short int                  oom_score_adj_min;    /*   958     2 */
	/* --- cacheline 15 boundary (960 bytes) --- */
	struct mutex               cred_guard_mutex;     /*   960    40 */

	/* size: 1000, cachelines: 16, members: 58 */
	/* sum members: 988, holes: 3, sum holes: 12 */
	/* bit holes: 1, sum bit holes: 30 bits */
	/* last cacheline: 40 bytes */
};
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] mm,oom: speed up select_bad_process() loop.
  2016-05-19  6:53         ` Michal Hocko
@ 2016-05-19  7:17           ` Michal Hocko
  2016-05-19  8:17             ` Michal Hocko
  2016-05-20  1:50           ` Oleg Nesterov
  1 sibling, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2016-05-19  7:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Tetsuo Handa, rientjes, linux-mm, oleg

On Thu 19-05-16 08:53:29, Michal Hocko wrote:
> On Wed 18-05-16 14:09:32, Andrew Morton wrote:
> > On Wed, 18 May 2016 16:15:45 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> > 
> > > > This patch adds a counter to signal_struct for tracking how many
> > > > TIF_MEMDIE threads are in a given thread group, and check it at
> > > > oom_scan_process_thread() so that select_bad_process() can use
> > > > for_each_process() rather than for_each_process_thread().
> > > 
> > > OK, this looks correct. Strictly speaking the patch is missing any note
> > > on _why_ this is needed or an improvement. I would add something like
> > > the following:
> > > "
> > > Although the original code was correct it was quite inefficient because
> > > each thread group was scanned num_threads times which can be a lot
> > > especially with processes with many threads. Even though the OOM is
> > > extremely cold path it is always good to be as effective as possible
> > > when we are inside rcu_read_lock() - aka unpreemptible context.
> > > "
> > 
> > This sounds quite rubbery to me.  Lots of code calls
> > for_each_process_thread() and presumably that isn't causing problems. 
> 
> Yeah, many paths call for_each_process_thread but they are
> O(num_threads) while this is O(num_threads^2).

And just to clarify the regular num_threads^2 is the absolute worst case
which doesn't happen normally. We would be closer to O(num_threads) but
there is no reason to risk pathological cases when we can simply use
for_each_process to achieve the same.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] mm,oom: speed up select_bad_process() loop.
  2016-05-19  7:17           ` Michal Hocko
@ 2016-05-19  8:17             ` Michal Hocko
  0 siblings, 0 replies; 21+ messages in thread
From: Michal Hocko @ 2016-05-19  8:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Tetsuo Handa, rientjes, linux-mm, oleg

[Sorry for spamming you]

On Thu 19-05-16 09:17:36, Michal Hocko wrote:
> On Thu 19-05-16 08:53:29, Michal Hocko wrote:
> > On Wed 18-05-16 14:09:32, Andrew Morton wrote:
> > > On Wed, 18 May 2016 16:15:45 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> > > 
> > > > > This patch adds a counter to signal_struct for tracking how many
> > > > > TIF_MEMDIE threads are in a given thread group, and check it at
> > > > > oom_scan_process_thread() so that select_bad_process() can use
> > > > > for_each_process() rather than for_each_process_thread().
> > > > 
> > > > OK, this looks correct. Strictly speaking the patch is missing any note
> > > > on _why_ this is needed or an improvement. I would add something like
> > > > the following:
> > > > "
> > > > Although the original code was correct it was quite inefficient because
> > > > each thread group was scanned num_threads times which can be a lot
> > > > especially with processes with many threads. Even though the OOM is
> > > > extremely cold path it is always good to be as effective as possible
> > > > when we are inside rcu_read_lock() - aka unpreemptible context.
> > > > "
> > > 
> > > This sounds quite rubbery to me.  Lots of code calls
> > > for_each_process_thread() and presumably that isn't causing problems. 
> > 
> > Yeah, many paths call for_each_process_thread but they are
> > O(num_threads) while this is O(num_threads^2).
> 
> And just to clarify the regular num_threads^2 is the absolute worst case
> which doesn't happen normally. We would be closer to O(num_threads) but
> there is no reason to risk pathological cases when we can simply use
> for_each_process to achieve the same.

Blee, fat fingers today... some vim-foo removed the rest of the
paragraph which was:
"
Especially when calculating oom_badness for all threads in the same
thread group is just pointless wasting of cycles (e.g. take task_lock
etc.).
"

Tetsuo, btw. I guess you can safely drop
		/* Prefer thread group leaders for display purposes */
		if (points == chosen_points && thread_group_leader(chosen))
			continue;

from select_bad_process because you are iterating group leaders.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] mm,oom: speed up select_bad_process() loop.
  2016-05-19  6:53         ` Michal Hocko
  2016-05-19  7:17           ` Michal Hocko
@ 2016-05-20  1:50           ` Oleg Nesterov
  2016-05-20  2:13             ` Oleg Nesterov
  2016-05-20  6:42             ` Michal Hocko
  1 sibling, 2 replies; 21+ messages in thread
From: Oleg Nesterov @ 2016-05-20  1:50 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Tetsuo Handa, rientjes, linux-mm

On 05/19, Michal Hocko wrote:
>
> Long term I
> would like to to move this logic into the mm_struct, it would be just
> larger surgery I guess.

Why we can't do this right now? Just another MMF_ flag set only once and
never cleared.

And. I personally like this change "in general", if nothing else I recently
blamed this for_each_process_thread() loop. But if we do this, I think we
should also shift find_lock_task_mm() into this loop.

And this makes me think again we need something like

	struct task_struct *next_task_with_mm(struct task_struct *p)
	{
		struct task_struct *t;

		p = p->group_leader;
		while ((p = next_task(p)) != &init_task) {
			if (p->flags & PF_KTHREAD)
				continue;

			t = find_lock_task_mm(p);
			if (t)
				return t;
		}

		return NULL;
	}

	#define for_each_task_lock_mm(p)
		for (p = &init_task; (p = next_task_with_mm(p)); task_unlock(p))

Or we we can move task_unlock() into next_task_with_mm(), it can check mm != NULL
or p != init_task.

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] mm,oom: speed up select_bad_process() loop.
  2016-05-20  1:50           ` Oleg Nesterov
@ 2016-05-20  2:13             ` Oleg Nesterov
  2016-05-20  6:42             ` Michal Hocko
  1 sibling, 0 replies; 21+ messages in thread
From: Oleg Nesterov @ 2016-05-20  2:13 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Tetsuo Handa, rientjes, linux-mm

On 05/20, Oleg Nesterov wrote:
>
> On 05/19, Michal Hocko wrote:
> >
> > Long term I
> > would like to to move this logic into the mm_struct, it would be just
> > larger surgery I guess.
>
> Why we can't do this right now? Just another MMF_ flag set only once and
> never cleared.

Just in case... yes, "never cleared" is not that simple because oom_kill_process()
can find a OOM_SCORE_ADJ_MIN process (or is_global_init) later. But to me this just
looks like another proof that select_bad_process() must not pick such a task (mm!)
as victim.

Nevermind, I said many times that I do not understand OOM-killer, please ignore me.
But sorry, can't resist, I do not think that "signal_struct has some holes[1] so
we can stitch it there." can excuse the new member ;)

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] mm,oom: speed up select_bad_process() loop.
  2016-05-20  1:50           ` Oleg Nesterov
  2016-05-20  2:13             ` Oleg Nesterov
@ 2016-05-20  6:42             ` Michal Hocko
  2016-05-20 22:04               ` Oleg Nesterov
  1 sibling, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2016-05-20  6:42 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Andrew Morton, Tetsuo Handa, rientjes, linux-mm

On Fri 20-05-16 03:50:01, Oleg Nesterov wrote:
> On 05/19, Michal Hocko wrote:
> >
> > Long term I
> > would like to to move this logic into the mm_struct, it would be just
> > larger surgery I guess.
> 
> Why we can't do this right now? Just another MMF_ flag set only once and
> never cleared.

It is more complicated and so more error prone. We have to sort out
shortcuts which get TIF_MEMDIE without killing first. And we have that
nasty "mm shared between independant processes" case there. I definitely
want to get to this after the merge window but the oom pile in the
Andrew's tree is quite large already. So this patch seems like a good
start to build on top.

If you feel that this step is not really worth it I can accept it of
course, this is an area where we do not have to hurry.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] mm,oom: speed up select_bad_process() loop.
  2016-05-18 13:30   ` [PATCH v3] " Tetsuo Handa
  2016-05-18 14:15     ` Michal Hocko
  2016-05-18 14:44     ` Tetsuo Handa
@ 2016-05-20  7:50     ` Michal Hocko
  2016-05-20 11:51       ` Tetsuo Handa
  2 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2016-05-20  7:50 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, rientjes, linux-mm, oleg

Here is a follow up for this patch. As I've mentioned in the other
email, I would like to mark oom victim in the mm_struct but that
requires more changes and the patch simplifies select_bad_process
nicely already so I like this patch even now.
---

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] mm,oom: speed up select_bad_process() loop.
  2016-05-20  7:50     ` Michal Hocko
@ 2016-05-20 11:51       ` Tetsuo Handa
  2016-05-20 12:09         ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: Tetsuo Handa @ 2016-05-20 11:51 UTC (permalink / raw)
  To: mhocko; +Cc: akpm, rientjes, linux-mm, oleg

Michal Hocko wrote:
> Here is a follow up for this patch. As I've mentioned in the other
> email, I would like to mark oom victim in the mm_struct but that
> requires more changes and the patch simplifies select_bad_process
> nicely already so I like this patch even now.
> ---
> From 06fc6821e581f82fb186770d84f5ee28f9fe18c3 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Fri, 20 May 2016 09:45:05 +0200
> Subject: [PATCH] mmotm: mmoom-speed-up-select_bad_process-loop-fix
> 
> Do not blow the signal_struct size. pahole -C signal_struct says:
> 
> struct signal_struct {
> 	atomic_t                   sigcnt;               /*     0     4 */
> 	atomic_t                   live;                 /*     4     4 */
> 	int                        nr_threads;           /*     8     4 */
> 
> 	/* XXX 4 bytes hole, try to pack */
> 
> 	struct list_head           thread_head;          /*    16    16 */
> 
> So we can stick the new counter after nr_threads and keep the size
> of the structure on 64b.
> 
> While we are at it also remove the thread_group_leader check from
> select_bad_process because it is not really needed as we are iterating
> processes rather than threads.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Or, we can do equivalent thing without adding "atomic_t oom_victims".
If you prefer below approach, we can revert "[PATCH v3] mm,oom: speed up
select_bad_process() loop.".

---
 include/linux/oom.h   |  3 ++-
 include/linux/sched.h |  1 -
 mm/memcontrol.c       |  2 +-
 mm/oom_kill.c         | 37 ++++++++++++++++++++++++++++---------
 4 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
index 8346952..6b4a2f3 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -90,7 +90,8 @@ extern void check_panic_on_oom(struct oom_control *oc,
 			       struct mem_cgroup *memcg);
 
 extern enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
-		struct task_struct *task, unsigned long totalpages);
+					       struct task_struct *task,
+					       bool is_thread_group);
 
 extern bool out_of_memory(struct oom_control *oc);
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1496c50..b245c72 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -772,7 +772,6 @@ struct signal_struct {
 	 */
 	unsigned long long sum_sched_runtime;
 
-	atomic_t oom_victims; /* # of TIF_MEDIE threads in this thread group */
 	/*
 	 * We don't bother to synchronize most readers of this at all,
 	 * because there is no reader checking a limit that actually needs
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b3f16ab..a1fa626 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1287,7 +1287,7 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 
 		css_task_iter_start(&iter->css, &it);
 		while ((task = css_task_iter_next(&it))) {
-			switch (oom_scan_process_thread(&oc, task, totalpages)) {
+			switch (oom_scan_process_thread(&oc, task, false)) {
 			case OOM_SCAN_SELECT:
 				if (chosen)
 					put_task_struct(chosen);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 8e151d0..7d9437c 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -273,8 +273,25 @@ static enum oom_constraint constrained_alloc(struct oom_control *oc,
 }
 #endif
 
+static bool has_pending_victim(struct task_struct *p)
+{
+	struct task_struct *t;
+	bool ret = false;
+
+	rcu_read_lock();
+	for_each_thread(p, t) {
+		if (test_tsk_thread_flag(t, TIF_MEMDIE)) {
+			ret = true;
+			break;
+		}
+	}
+	rcu_read_unlock();
+	return ret;
+}
+
 enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
-			struct task_struct *task, unsigned long totalpages)
+					struct task_struct *task,
+					bool is_thread_group)
 {
 	if (oom_unkillable_task(task, NULL, oc->nodemask))
 		return OOM_SCAN_CONTINUE;
@@ -283,8 +300,15 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
 	 * This task already has access to memory reserves and is being killed.
 	 * Don't allow any other task to have access to the reserves.
 	 */
-	if (!is_sysrq_oom(oc) && atomic_read(&task->signal->oom_victims))
-		return OOM_SCAN_ABORT;
+	if (is_thread_group) {
+		if (!is_sysrq_oom(oc) && has_pending_victim(task))
+			return OOM_SCAN_ABORT;
+	} else {
+		if (test_tsk_thread_flag(task, TIF_MEMDIE))
+			return OOM_SCAN_ABORT;
+		if (!task->mm)
+			return OOM_SCAN_CONTINUE;
+	}
 
 	/*
 	 * If task is allocating a lot of memory and has been marked to be
@@ -311,7 +335,7 @@ static struct task_struct *select_bad_process(struct oom_control *oc,
 	for_each_process(p) {
 		unsigned int points;
 
-		switch (oom_scan_process_thread(oc, p, totalpages)) {
+		switch (oom_scan_process_thread(oc, p, true)) {
 		case OOM_SCAN_SELECT:
 			chosen = p;
 			chosen_points = ULONG_MAX;
@@ -327,9 +351,6 @@ static struct task_struct *select_bad_process(struct oom_control *oc,
 		points = oom_badness(p, NULL, oc->nodemask, totalpages);
 		if (!points || points < chosen_points)
 			continue;
-		/* Prefer thread group leaders for display purposes */
-		if (points == chosen_points && thread_group_leader(chosen))
-			continue;
 
 		chosen = p;
 		chosen_points = points;
@@ -669,7 +690,6 @@ void mark_oom_victim(struct task_struct *tsk)
 	/* OOM killer might race with memcg OOM */
 	if (test_and_set_tsk_thread_flag(tsk, TIF_MEMDIE))
 		return;
-	atomic_inc(&tsk->signal->oom_victims);
 	/*
 	 * Make sure that the task is woken up from uninterruptible sleep
 	 * if it is frozen because OOM killer wouldn't be able to free
@@ -687,7 +707,6 @@ void exit_oom_victim(struct task_struct *tsk)
 {
 	if (!test_and_clear_tsk_thread_flag(tsk, TIF_MEMDIE))
 		return;
-	atomic_dec(&tsk->signal->oom_victims);
 
 	if (!atomic_dec_return(&oom_victims))
 		wake_up_all(&oom_victims_wait);
-- 
1.8.3.1

Note that "[PATCH v3] mm,oom: speed up select_bad_process() loop." temporarily
broke oom_task_origin(task) case, for oom_select_bad_process() might select
a task without mm because oom_badness() which checks for mm != NULL will not be
called. This regression can be fixed by changing oom_badness() to return large
value by moving oom_task_origin(task) test into oom_badness().

 mm/oom_kill.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 7d9437c..c40c649 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -175,6 +175,15 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
 		return 0;
 
 	/*
+	 * If task is allocating a lot of memory and has been marked to be
+	 * killed first if it triggers an oom, then select it.
+	 */
+	if (oom_task_origin(p)) {
+		task_unlock(p);
+		return UINT_MAX - 1;
+	}
+
+	/*
 	 * Do not even consider tasks which are explicitly marked oom
 	 * unkillable or have been already oom reaped.
 	 */
@@ -310,13 +319,6 @@ enum oom_scan_t oom_scan_process_thread(struct oom_control *oc,
 			return OOM_SCAN_CONTINUE;
 	}
 
-	/*
-	 * If task is allocating a lot of memory and has been marked to be
-	 * killed first if it triggers an oom, then select it.
-	 */
-	if (oom_task_origin(task))
-		return OOM_SCAN_SELECT;
-
 	return OOM_SCAN_OK;
 }
 
-- 
1.8.3.1

Presumably, we want to do oom_task_origin(p) test after adj == OOM_SCORE_ADJ_MIN ||
test_bit(MMF_OOM_REAPED, &p->mm->flags) test because oom_task_origin(p) could become
"not suitable for victims" after p was selected as OOM victim and is OOM reaped.



By the way, I noticed that mem_cgroup_out_of_memory() might have a bug about its
return value. It returns true if hit OOM_SCAN_ABORT after chosen != NULL, false
if hit OOM_SCAN_ABORT before chosen != NULL. Which is expected return value?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] mm,oom: speed up select_bad_process() loop.
  2016-05-20 11:51       ` Tetsuo Handa
@ 2016-05-20 12:09         ` Michal Hocko
  2016-05-20 13:41           ` Tetsuo Handa
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2016-05-20 12:09 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, rientjes, linux-mm, oleg

On Fri 20-05-16 20:51:56, Tetsuo Handa wrote:
[...]
> +static bool has_pending_victim(struct task_struct *p)
> +{
> +	struct task_struct *t;
> +	bool ret = false;
> +
> +	rcu_read_lock();
> +	for_each_thread(p, t) {
> +		if (test_tsk_thread_flag(t, TIF_MEMDIE)) {
> +			ret = true;
> +			break;
> +		}
> +	}
> +	rcu_read_unlock();
> +	return ret;
> +}

And so you do not speed up anything in the end because you have to
iterate all threads anyway yet you add quite some code on top. No I do
not like it. This is no longer a cleanup...

[...]
> Note that "[PATCH v3] mm,oom: speed up select_bad_process() loop." temporarily
> broke oom_task_origin(task) case, for oom_select_bad_process() might select
> a task without mm because oom_badness() which checks for mm != NULL will not be
> called.

How can we have oom_task_origin without mm? The flag is set explicitly
while doing swapoff resp. writing to ksm. We clear the flag before
exiting.

[...]

> By the way, I noticed that mem_cgroup_out_of_memory() might have a bug about its
> return value. It returns true if hit OOM_SCAN_ABORT after chosen != NULL, false
> if hit OOM_SCAN_ABORT before chosen != NULL. Which is expected return value?

true. Care to send a patch?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] mm,oom: speed up select_bad_process() loop.
  2016-05-20 12:09         ` Michal Hocko
@ 2016-05-20 13:41           ` Tetsuo Handa
  2016-05-20 13:44             ` Tetsuo Handa
  2016-05-20 15:23             ` Michal Hocko
  0 siblings, 2 replies; 21+ messages in thread
From: Tetsuo Handa @ 2016-05-20 13:41 UTC (permalink / raw)
  To: mhocko; +Cc: akpm, rientjes, linux-mm, oleg

Michal Hocko wrote:
> On Fri 20-05-16 20:51:56, Tetsuo Handa wrote:
> [...]
> > +static bool has_pending_victim(struct task_struct *p)
> > +{
> > +	struct task_struct *t;
> > +	bool ret = false;
> > +
> > +	rcu_read_lock();
> > +	for_each_thread(p, t) {
> > +		if (test_tsk_thread_flag(t, TIF_MEMDIE)) {
> > +			ret = true;
> > +			break;
> > +		}
> > +	}
> > +	rcu_read_unlock();
> > +	return ret;
> > +}
> 
> And so you do not speed up anything in the end because you have to
> iterate all threads anyway yet you add quite some code on top. No I do
> not like it. This is no longer a cleanup...

I changed for_each_process_thread() to for_each_process(). This means
O(num_threads^2) task_in_mem_cgroup() and O(num_threads^2)
has_intersects_mems_allowed() are replaced with O(num_threads)
task_in_mem_cgroup() and O(num_threads) has_intersects_mems_allowed()
at the cost of adding O(num_threads) has_pending_victim().

I expect that O(num_threads) (task_in_mem_cgroup() + has_intersects_mems_allowed() +
has_pending_victim()) is faster than O(num_threads^2) (task_in_mem_cgroup() +
has_intersects_mems_allowed()) + O(num_threads) test_tsk_thread_flag().

> 
> [...]
> > Note that "[PATCH v3] mm,oom: speed up select_bad_process() loop." temporarily
> > broke oom_task_origin(task) case, for oom_select_bad_process() might select
> > a task without mm because oom_badness() which checks for mm != NULL will not be
> > called.
> 
> How can we have oom_task_origin without mm? The flag is set explicitly
> while doing swapoff resp. writing to ksm. We clear the flag before
> exiting.

What if oom_task_origin(task) received SIGKILL, but task was unable to run for
very long period (e.g. 30 seconds) due to scheduling priority, and the OOM-reaper
reaped task's mm within a second. Next round of OOM-killer selects the same task
due to oom_task_origin(task) without doing MMF_OOM_REAPED test.

Once the OOM-reaper reaped task's mm (or gave up reaping it), subsequent
OOM-killer should treat that task as task->mm = NULL. Moving
oom_task_origin(task) test to after test_bit(MMF_OOM_REAPED, &p->mm->flags)
test will let the OOM-killer think as "oom_task_origin without mm".

> 
> [...]
> 
> > By the way, I noticed that mem_cgroup_out_of_memory() might have a bug about its
> > return value. It returns true if hit OOM_SCAN_ABORT after chosen != NULL, false
> > if hit OOM_SCAN_ABORT before chosen != NULL. Which is expected return value?
> 
> true. Care to send a patch?

I don't know what memory_max_write() wants to do when it found a TIF_MEMDIE thread
in the given memcg. Thus, I can't tell whether setting chosen to NULL (which means
mem_cgroup_out_of_memory() returns false) is the expected behavior.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] mm,oom: speed up select_bad_process() loop.
  2016-05-20 13:41           ` Tetsuo Handa
@ 2016-05-20 13:44             ` Tetsuo Handa
  2016-05-20 15:23             ` Michal Hocko
  1 sibling, 0 replies; 21+ messages in thread
From: Tetsuo Handa @ 2016-05-20 13:44 UTC (permalink / raw)
  To: mhocko; +Cc: akpm, rientjes, linux-mm, oleg

Tetsuo Handa wrote:
> > > By the way, I noticed that mem_cgroup_out_of_memory() might have a bug about its
> > > return value. It returns true if hit OOM_SCAN_ABORT after chosen != NULL, false
> > > if hit OOM_SCAN_ABORT before chosen != NULL. Which is expected return value?
> > 
> > true. Care to send a patch?
> 
> I don't know what memory_max_write() wants to do when it found a TIF_MEMDIE thread
> in the given memcg. Thus, I can't tell whether setting chosen to NULL (which means
> mem_cgroup_out_of_memory() returns false) is the expected behavior.
> 
Oops. "true" not "True" in this case meant "1" !?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] mm,oom: speed up select_bad_process() loop.
  2016-05-20 13:41           ` Tetsuo Handa
  2016-05-20 13:44             ` Tetsuo Handa
@ 2016-05-20 15:23             ` Michal Hocko
  2016-05-20 15:56               ` Tetsuo Handa
  1 sibling, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2016-05-20 15:23 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, rientjes, linux-mm, oleg

On Fri 20-05-16 22:41:27, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Fri 20-05-16 20:51:56, Tetsuo Handa wrote:
> > [...]
> > > +static bool has_pending_victim(struct task_struct *p)
> > > +{
> > > +	struct task_struct *t;
> > > +	bool ret = false;
> > > +
> > > +	rcu_read_lock();
> > > +	for_each_thread(p, t) {
> > > +		if (test_tsk_thread_flag(t, TIF_MEMDIE)) {
> > > +			ret = true;
> > > +			break;
> > > +		}
> > > +	}
> > > +	rcu_read_unlock();
> > > +	return ret;
> > > +}
> > 
> > And so you do not speed up anything in the end because you have to
> > iterate all threads anyway yet you add quite some code on top. No I do
> > not like it. This is no longer a cleanup...
> 
> I changed for_each_process_thread() to for_each_process(). This means
> O(num_threads^2) task_in_mem_cgroup()

oom_unkillable_task is called with NULL memcg so we do not call
task_in_mem_cgroup.

> and O(num_threads^2)
> has_intersects_mems_allowed() are replaced with O(num_threads)

I am really confused why has_intersects_mems_allowed has to iterate all
threads. Do we really allow different mempolicies for threads in the
same thread group?

> task_in_mem_cgroup() and O(num_threads) has_intersects_mems_allowed()
> at the cost of adding O(num_threads) has_pending_victim().
> 
> I expect that O(num_threads) (task_in_mem_cgroup() + has_intersects_mems_allowed() +
> has_pending_victim()) is faster than O(num_threads^2) (task_in_mem_cgroup() +
> has_intersects_mems_allowed()) + O(num_threads) test_tsk_thread_flag().

I thought the whole point of the cleanup was to get rid of O(num_thread)
because num_threads >> num_processes in most workloads. It seems that
we are still not there because of has_intersects_mems_allowed but we
should rather address that than add another O(num_threads) sources.
 
> > [...]
> > > Note that "[PATCH v3] mm,oom: speed up select_bad_process() loop." temporarily
> > > broke oom_task_origin(task) case, for oom_select_bad_process() might select
> > > a task without mm because oom_badness() which checks for mm != NULL will not be
> > > called.
> > 
> > How can we have oom_task_origin without mm? The flag is set explicitly
> > while doing swapoff resp. writing to ksm. We clear the flag before
> > exiting.
> 
> What if oom_task_origin(task) received SIGKILL, but task was unable to run for
> very long period (e.g. 30 seconds) due to scheduling priority, and the OOM-reaper
> reaped task's mm within a second. Next round of OOM-killer selects the same task
> due to oom_task_origin(task) without doing MMF_OOM_REAPED test.

Which is actuall the intended behavior. The whole point of
oom_task_origin is to prevent from killing somebody because of
potentially memory hungry operation (e.g. swapoff) and rather kill the
initiator. 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] mm,oom: speed up select_bad_process() loop.
  2016-05-20 15:23             ` Michal Hocko
@ 2016-05-20 15:56               ` Tetsuo Handa
  2016-05-23  7:55                 ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: Tetsuo Handa @ 2016-05-20 15:56 UTC (permalink / raw)
  To: mhocko; +Cc: akpm, rientjes, linux-mm, oleg

Michal Hocko wrote:
> > > > Note that "[PATCH v3] mm,oom: speed up select_bad_process() loop." temporarily
> > > > broke oom_task_origin(task) case, for oom_select_bad_process() might select
> > > > a task without mm because oom_badness() which checks for mm != NULL will not be
> > > > called.
> > > 
> > > How can we have oom_task_origin without mm? The flag is set explicitly
> > > while doing swapoff resp. writing to ksm. We clear the flag before
> > > exiting.
> > 
> > What if oom_task_origin(task) received SIGKILL, but task was unable to run for
> > very long period (e.g. 30 seconds) due to scheduling priority, and the OOM-reaper
> > reaped task's mm within a second. Next round of OOM-killer selects the same task
> > due to oom_task_origin(task) without doing MMF_OOM_REAPED test.
> 
> Which is actuall the intended behavior. The whole point of
> oom_task_origin is to prevent from killing somebody because of
> potentially memory hungry operation (e.g. swapoff) and rather kill the
> initiator. 

Is it guaranteed that try_to_unuse() from swapoff is never blocked on memory
allocation (e.g. mmput(), wait_on_page_*()) ?

If there is possibility of being blocked on memory allocation, it is not safe to
wait for oom_task_origin(task) unconditionally forever.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] mm,oom: speed up select_bad_process() loop.
  2016-05-20  6:42             ` Michal Hocko
@ 2016-05-20 22:04               ` Oleg Nesterov
  0 siblings, 0 replies; 21+ messages in thread
From: Oleg Nesterov @ 2016-05-20 22:04 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Tetsuo Handa, rientjes, linux-mm

On 05/20, Michal Hocko wrote:
>
> On Fri 20-05-16 03:50:01, Oleg Nesterov wrote:
> > On 05/19, Michal Hocko wrote:
> > >
> > > Long term I
> > > would like to to move this logic into the mm_struct, it would be just
> > > larger surgery I guess.
> >
> > Why we can't do this right now? Just another MMF_ flag set only once and
> > never cleared.
>
> It is more complicated and so more error prone.

Sure, but don't we want this anyway in the long term?

> We have to sort out
> shortcuts which get TIF_MEMDIE without killing first.

Yes, but this seems a bit "off-topic" to me... but probably I do not understand
the problem enough.

> And we have that
> nasty "mm shared between independant processes" case there.

Yes, yes, please see another email.

> If you feel that this step is not really worth it

No, no. Unless I see something which looks "obviously wrong" to me, I won't argue
with this (or any other) change as long as you and Tetsuo agree on it.

I understand that it is veru easy to blame OOM-killer (and the changes in this
area), but it is not easy to fix this code ;)

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] mm,oom: speed up select_bad_process() loop.
  2016-05-20 15:56               ` Tetsuo Handa
@ 2016-05-23  7:55                 ` Michal Hocko
  0 siblings, 0 replies; 21+ messages in thread
From: Michal Hocko @ 2016-05-23  7:55 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, rientjes, linux-mm, oleg

On Sat 21-05-16 00:56:26, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > > > > Note that "[PATCH v3] mm,oom: speed up select_bad_process() loop." temporarily
> > > > > broke oom_task_origin(task) case, for oom_select_bad_process() might select
> > > > > a task without mm because oom_badness() which checks for mm != NULL will not be
> > > > > called.
> > > > 
> > > > How can we have oom_task_origin without mm? The flag is set explicitly
> > > > while doing swapoff resp. writing to ksm. We clear the flag before
> > > > exiting.
> > > 
> > > What if oom_task_origin(task) received SIGKILL, but task was unable to run for
> > > very long period (e.g. 30 seconds) due to scheduling priority, and the OOM-reaper
> > > reaped task's mm within a second. Next round of OOM-killer selects the same task
> > > due to oom_task_origin(task) without doing MMF_OOM_REAPED test.
> > 
> > Which is actuall the intended behavior. The whole point of
> > oom_task_origin is to prevent from killing somebody because of
> > potentially memory hungry operation (e.g. swapoff) and rather kill the
> > initiator. 
> 
> Is it guaranteed that try_to_unuse() from swapoff is never blocked on memory
> allocation (e.g. mmput(), wait_on_page_*()) ?

It shouldn't. All the waiting should be killable. If not it is a bug and
should be fixed.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-05-23  7:55 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-18 12:20 [PATCH v2] mm,oom: speed up select_bad_process() loop Tetsuo Handa
2016-05-18 12:51 ` Michal Hocko
2016-05-18 13:30   ` [PATCH v3] " Tetsuo Handa
2016-05-18 14:15     ` Michal Hocko
2016-05-18 21:09       ` Andrew Morton
2016-05-19  6:53         ` Michal Hocko
2016-05-19  7:17           ` Michal Hocko
2016-05-19  8:17             ` Michal Hocko
2016-05-20  1:50           ` Oleg Nesterov
2016-05-20  2:13             ` Oleg Nesterov
2016-05-20  6:42             ` Michal Hocko
2016-05-20 22:04               ` Oleg Nesterov
2016-05-18 14:44     ` Tetsuo Handa
2016-05-20  7:50     ` Michal Hocko
2016-05-20 11:51       ` Tetsuo Handa
2016-05-20 12:09         ` Michal Hocko
2016-05-20 13:41           ` Tetsuo Handa
2016-05-20 13:44             ` Tetsuo Handa
2016-05-20 15:23             ` Michal Hocko
2016-05-20 15:56               ` Tetsuo Handa
2016-05-23  7:55                 ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.