[PATCH 0/5] core: Convert thread iteration to use for_each[_process]

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5] core: Convert thread iteration to use for_each[_process]_thread APIs, 1st pile
@ 2014-04-09 16:11 Frederic Weisbecker
  2014-04-09 16:11 ` [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread() Frederic Weisbecker
                   ` (4 more replies)
  0 siblings, 5 replies; 55+ messages in thread
From: Frederic Weisbecker @ 2014-04-09 16:11 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Ingo Molnar, Oleg Nesterov,
	Peter Zijlstra, Mathieu Desnoyers, Steven Rostedt

Hi,

This is the first pile of a longer series that convert deprecated and
RCU-unsafe thread group iterators (do_each_thread, while_each_thread)
to use for_each_thread/for_each_process_thread RCU-safe iterators that
have been introduced by Oleg.

for_each[_process]_thread() is already upstream so these patches don't
depend on any pending preparatory work.

So ideally it would be nice if maintainers cherry-pick the patches
corresponding to their own subsystem.

Thanks,
	Frederic
---

Frederic Weisbecker (5):
      sched: Convert thread_group_cputime() to use for_each_thread()
      tracepoint: Convert process iteration to use for_each_process_thread()
      hung_task: Convert process iteration to use for_each_process_thread()
      procfs: Convert process iteration to use for_each_thread()
      sched: Convert tasks iteration to use for_each_process_thread()


 fs/proc/array.c        |  7 ++++---
 fs/proc/base.c         |  4 ++--
 kernel/hung_task.c     |  8 ++++----
 kernel/sched/core.c    | 13 ++++++-------
 kernel/sched/cputime.c | 13 ++++++-------
 kernel/tracepoint.c    | 12 ++++++------
 6 files changed, 28 insertions(+), 29 deletions(-)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread()
  2014-04-09 16:11 [PATCH 0/5] core: Convert thread iteration to use for_each[_process]_thread APIs, 1st pile Frederic Weisbecker
@ 2014-04-09 16:11 ` Frederic Weisbecker
  2014-04-09 17:12   ` Oleg Nesterov
  2014-04-09 17:16   ` Peter Zijlstra
  2014-04-09 16:11 ` [PATCH 2/5] tracepoint: Convert process iteration to use for_each_process_thread() Frederic Weisbecker
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 55+ messages in thread
From: Frederic Weisbecker @ 2014-04-09 16:11 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Ingo Molnar, Oleg Nesterov,
	Peter Zijlstra

do_each_thread/while_each_thread iterators are deprecated by
for_each_thread/for_each_process_thread() APIs.

Lets convert the callers in the cputime code accounting. The ultimate
goal is to remove the struct task_struct::thread_group field and
the corresponding do_each_thread/while_each_thread iterators that are
RCU unsafe.

It also makes thread_group_cputime() eventually RCU-safe.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sched/cputime.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index a95097c..5d27848 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -281,11 +281,11 @@ static __always_inline bool steal_account_process_tick(void)
 
 /*
  * Accumulate raw cputime values of dead tasks (sig->[us]time) and live
- * tasks (sum on group iteration) belonging to @tsk's group.
+ * tasks (sum on group iteration) belonging to @p's group.
  */
-void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
+void thread_group_cputime(struct task_struct *p, struct task_cputime *times)
 {
-	struct signal_struct *sig = tsk->signal;
+	struct signal_struct *sig = p->signal;
 	cputime_t utime, stime;
 	struct task_struct *t;
 
@@ -295,16 +295,15 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
 
 	rcu_read_lock();
 	/* make sure we can trust tsk->thread_group list */
-	if (!likely(pid_alive(tsk)))
+	if (!likely(pid_alive(p)))
 		goto out;
 
-	t = tsk;
-	do {
+	for_each_thread(p, t) {
 		task_cputime(t, &utime, &stime);
 		times->utime += utime;
 		times->stime += stime;
 		times->sum_exec_runtime += task_sched_runtime(t);
-	} while_each_thread(tsk, t);
+	}
 out:
 	rcu_read_unlock();
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 2/5] tracepoint: Convert process iteration to use for_each_process_thread()
  2014-04-09 16:11 [PATCH 0/5] core: Convert thread iteration to use for_each[_process]_thread APIs, 1st pile Frederic Weisbecker
  2014-04-09 16:11 ` [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread() Frederic Weisbecker
@ 2014-04-09 16:11 ` Frederic Weisbecker
  2014-04-09 16:28   ` Mathieu Desnoyers
  2014-04-09 16:11 ` [PATCH 3/5] hung_task: " Frederic Weisbecker
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 55+ messages in thread
From: Frederic Weisbecker @ 2014-04-09 16:11 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Ingo Molnar,
	Mathieu Desnoyers, Oleg Nesterov, Steven Rostedt

do_each_thread/while_each_thread iterators are deprecated by
for_each_thread/for_each_process_thread() APIs.

Lets convert the callers in the tracepoint code. The ultimate
goal is to remove the struct task_struct::thread_group field and
the corresponding do_each_thread/while_each_thread iterators that are
RCU unsafe.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/tracepoint.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index fb0a38a..00a7e8b 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -561,15 +561,15 @@ static int sys_tracepoint_refcount;
 void syscall_regfunc(void)
 {
 	unsigned long flags;
-	struct task_struct *g, *t;
+	struct task_struct *p, *t;
 
 	if (!sys_tracepoint_refcount) {
 		read_lock_irqsave(&tasklist_lock, flags);
-		do_each_thread(g, t) {
+		for_each_process_thread(p, t) {
 			/* Skip kernel threads. */
 			if (t->mm)
 				set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
-		} while_each_thread(g, t);
+		}
 		read_unlock_irqrestore(&tasklist_lock, flags);
 	}
 	sys_tracepoint_refcount++;
@@ -578,14 +578,14 @@ void syscall_regfunc(void)
 void syscall_unregfunc(void)
 {
 	unsigned long flags;
-	struct task_struct *g, *t;
+	struct task_struct *p, *t;
 
 	sys_tracepoint_refcount--;
 	if (!sys_tracepoint_refcount) {
 		read_lock_irqsave(&tasklist_lock, flags);
-		do_each_thread(g, t) {
+		for_each_process_thread(p, t) {
 			clear_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
-		} while_each_thread(g, t);
+		}
 		read_unlock_irqrestore(&tasklist_lock, flags);
 	}
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 3/5] hung_task: Convert process iteration to use for_each_process_thread()
  2014-04-09 16:11 [PATCH 0/5] core: Convert thread iteration to use for_each[_process]_thread APIs, 1st pile Frederic Weisbecker
  2014-04-09 16:11 ` [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread() Frederic Weisbecker
  2014-04-09 16:11 ` [PATCH 2/5] tracepoint: Convert process iteration to use for_each_process_thread() Frederic Weisbecker
@ 2014-04-09 16:11 ` Frederic Weisbecker
  2014-04-09 17:23   ` Oleg Nesterov
  2014-04-09 16:11 ` [PATCH 4/5] procfs: Convert process iteration to use for_each_thread() Frederic Weisbecker
  2014-04-09 16:11 ` [PATCH 5/5] sched: Convert tasks iteration to use for_each_process_thread() Frederic Weisbecker
  4 siblings, 1 reply; 55+ messages in thread
From: Frederic Weisbecker @ 2014-04-09 16:11 UTC (permalink / raw)
  To: LKML; +Cc: Frederic Weisbecker, Andrew Morton, Ingo Molnar, Oleg Nesterov

do_each_thread/while_each_thread iterators are deprecated by
for_each_thread/for_each_process_thread() APIs.

Lets convert the callers in the hung task detector code. The ultimate
goal is to remove the struct task_struct::thread_group field and
the corresponding do_each_thread/while_each_thread iterators that are
RCU unsafe.

It also makes the hung task threads iteration eventually RCU safe.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/hung_task.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 06bb141..d19744b 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -157,7 +157,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 {
 	int max_count = sysctl_hung_task_check_count;
 	int batch_count = HUNG_TASK_BATCHING;
-	struct task_struct *g, *t;
+	struct task_struct *p, *t;
 
 	/*
 	 * If the system crashed already then all bets are off,
@@ -167,18 +167,18 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
 		return;
 
 	rcu_read_lock();
-	do_each_thread(g, t) {
+	for_each_process_thread(p, t) {
 		if (!max_count--)
 			goto unlock;
 		if (!--batch_count) {
 			batch_count = HUNG_TASK_BATCHING;
-			if (!rcu_lock_break(g, t))
+			if (!rcu_lock_break(p, t))
 				goto unlock;
 		}
 		/* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */
 		if (t->state == TASK_UNINTERRUPTIBLE)
 			check_hung_task(t, timeout);
-	} while_each_thread(g, t);
+	}
  unlock:
 	rcu_read_unlock();
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 4/5] procfs: Convert process iteration to use for_each_thread()
  2014-04-09 16:11 [PATCH 0/5] core: Convert thread iteration to use for_each[_process]_thread APIs, 1st pile Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2014-04-09 16:11 ` [PATCH 3/5] hung_task: " Frederic Weisbecker
@ 2014-04-09 16:11 ` Frederic Weisbecker
  2014-04-09 16:11 ` [PATCH 5/5] sched: Convert tasks iteration to use for_each_process_thread() Frederic Weisbecker
  4 siblings, 0 replies; 55+ messages in thread
From: Frederic Weisbecker @ 2014-04-09 16:11 UTC (permalink / raw)
  To: LKML; +Cc: Frederic Weisbecker, Andrew Morton, Ingo Molnar, Oleg Nesterov

do_each_thread/while_each_thread iterators are deprecated by
for_each_thread/for_each_process_thread() APIs.

Lets convert the callers in procfs. The ultimate goal is to remove the
struct task_struct::thread_group field and the corresponding
do_each_thread/while_each_thread iterators that are RCU unsafe.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 fs/proc/array.c | 7 ++++---
 fs/proc/base.c  | 4 ++--
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 656e401..b7df5d5 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -439,12 +439,13 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 
 		/* add up live thread stats at the group level */
 		if (whole) {
-			struct task_struct *t = task;
-			do {
+			struct task_struct *t;
+
+			for_each_thread(task, t) {
 				min_flt += t->min_flt;
 				maj_flt += t->maj_flt;
 				gtime += task_gtime(t);
-			} while_each_thread(task, t);
+			}
 
 			min_flt += sig->min_flt;
 			maj_flt += sig->maj_flt;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index b976062..f595d8e 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2437,10 +2437,10 @@ static int do_io_accounting(struct task_struct *task, char *buffer, int whole)
 	}
 
 	if (whole && lock_task_sighand(task, &flags)) {
-		struct task_struct *t = task;
+		struct task_struct *t;
 
 		task_io_accounting_add(&acct, &task->signal->ioac);
-		while_each_thread(task, t)
+		for_each_thread(task, t)
 			task_io_accounting_add(&acct, &t->ioac);
 
 		unlock_task_sighand(task, &flags);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 5/5] sched: Convert tasks iteration to use for_each_process_thread()
  2014-04-09 16:11 [PATCH 0/5] core: Convert thread iteration to use for_each[_process]_thread APIs, 1st pile Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2014-04-09 16:11 ` [PATCH 4/5] procfs: Convert process iteration to use for_each_thread() Frederic Weisbecker
@ 2014-04-09 16:11 ` Frederic Weisbecker
  4 siblings, 0 replies; 55+ messages in thread
From: Frederic Weisbecker @ 2014-04-09 16:11 UTC (permalink / raw)
  To: LKML
  Cc: Frederic Weisbecker, Andrew Morton, Ingo Molnar, Oleg Nesterov,
	Peter Zijlstra

do_each_thread/while_each_thread iterators are deprecated by
for_each_thread/for_each_process_thread() APIs.

Lets convert the callers in the sched subsystem. The ultimate
goal is to remove the struct task_struct::thread_group field and
the corresponding do_each_thread/while_each_thread iterators that are
RCU unsafe.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
---
 kernel/sched/core.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0ff3f34..82df639 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4389,7 +4389,7 @@ void show_state_filter(unsigned long state_filter)
 		"  task                        PC stack   pid father\n");
 #endif
 	rcu_read_lock();
-	do_each_thread(g, p) {
+	for_each_process_thread(g, p) {
 		/*
 		 * reset the NMI-timeout, listing all files on a slow
 		 * console might take a lot of time:
@@ -4397,8 +4397,7 @@ void show_state_filter(unsigned long state_filter)
 		touch_nmi_watchdog();
 		if (!state_filter || (p->state & state_filter))
 			sched_show_task(p);
-	} while_each_thread(g, p);
-
+	}
 	touch_all_softlockup_watchdogs();
 
 #ifdef CONFIG_SCHED_DEBUG
@@ -6997,7 +6996,7 @@ void normalize_rt_tasks(void)
 	struct rq *rq;
 
 	read_lock_irqsave(&tasklist_lock, flags);
-	do_each_thread(g, p) {
+	for_each_process_thread(g, p) {
 		/*
 		 * Only normalize user tasks:
 		 */
@@ -7028,7 +7027,7 @@ void normalize_rt_tasks(void)
 
 		__task_rq_unlock(rq);
 		raw_spin_unlock(&p->pi_lock);
-	} while_each_thread(g, p);
+	}
 
 	read_unlock_irqrestore(&tasklist_lock, flags);
 }
@@ -7217,10 +7216,10 @@ static inline int tg_has_rt_tasks(struct task_group *tg)
 {
 	struct task_struct *g, *p;
 
-	do_each_thread(g, p) {
+	for_each_process_thread(g, p) {
 		if (rt_task(p) && task_rq(p)->rt.tg == tg)
 			return 1;
-	} while_each_thread(g, p);
+	}
 
 	return 0;
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/5] tracepoint: Convert process iteration to use for_each_process_thread()
  2014-04-09 16:11 ` [PATCH 2/5] tracepoint: Convert process iteration to use for_each_process_thread() Frederic Weisbecker
@ 2014-04-09 16:28   ` Mathieu Desnoyers
  2014-04-09 16:40     ` Frederic Weisbecker
  2014-04-09 16:42     ` Steven Rostedt
  0 siblings, 2 replies; 55+ messages in thread
From: Mathieu Desnoyers @ 2014-04-09 16:28 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Andrew Morton, Ingo Molnar, Oleg Nesterov, Steven Rostedt

----- Original Message -----
> From: "Frederic Weisbecker" <fweisbec@gmail.com>
> To: "LKML" <linux-kernel@vger.kernel.org>
> Cc: "Frederic Weisbecker" <fweisbec@gmail.com>, "Andrew Morton" <akpm@linux-foundation.org>, "Ingo Molnar"
> <mingo@kernel.org>, "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>, "Oleg Nesterov" <oleg@redhat.com>, "Steven
> Rostedt" <rostedt@goodmis.org>
> Sent: Wednesday, April 9, 2014 12:11:19 PM
> Subject: [PATCH 2/5] tracepoint: Convert process iteration to use for_each_process_thread()
> 
> do_each_thread/while_each_thread iterators are deprecated by
> for_each_thread/for_each_process_thread() APIs.
> 
> Lets convert the callers in the tracepoint code. The ultimate
> goal is to remove the struct task_struct::thread_group field and
> the corresponding do_each_thread/while_each_thread iterators that are
> RCU unsafe.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> ---
>  kernel/tracepoint.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index fb0a38a..00a7e8b 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -561,15 +561,15 @@ static int sys_tracepoint_refcount;
>  void syscall_regfunc(void)
>  {
>  	unsigned long flags;
> -	struct task_struct *g, *t;
> +	struct task_struct *p, *t;
>  
>  	if (!sys_tracepoint_refcount) {
>  		read_lock_irqsave(&tasklist_lock, flags);
> -		do_each_thread(g, t) {
> +		for_each_process_thread(p, t) {

What are the locking rules for for_each_process_thread() ?

Is it required to hold RCU read-side lock ? (it's not the case here)

Is tasklist_lock read-side lock sufficient ?

A quick glance at those for_each iterator defines in sched.h was not
helpful in finding this information.

Thanks,

Mathieu

>  			/* Skip kernel threads. */
>  			if (t->mm)
>  				set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
> -		} while_each_thread(g, t);
> +		}
>  		read_unlock_irqrestore(&tasklist_lock, flags);
>  	}
>  	sys_tracepoint_refcount++;
> @@ -578,14 +578,14 @@ void syscall_regfunc(void)
>  void syscall_unregfunc(void)
>  {
>  	unsigned long flags;
> -	struct task_struct *g, *t;
> +	struct task_struct *p, *t;
>  
>  	sys_tracepoint_refcount--;
>  	if (!sys_tracepoint_refcount) {
>  		read_lock_irqsave(&tasklist_lock, flags);
> -		do_each_thread(g, t) {
> +		for_each_process_thread(p, t) {
>  			clear_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
> -		} while_each_thread(g, t);
> +		}
>  		read_unlock_irqrestore(&tasklist_lock, flags);
>  	}
>  }
> --
> 1.8.3.1
> 
> 

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/5] tracepoint: Convert process iteration to use for_each_process_thread()
  2014-04-09 16:28   ` Mathieu Desnoyers
@ 2014-04-09 16:40     ` Frederic Weisbecker
  2014-04-09 16:42     ` Steven Rostedt
  1 sibling, 0 replies; 55+ messages in thread
From: Frederic Weisbecker @ 2014-04-09 16:40 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: LKML, Andrew Morton, Ingo Molnar, Oleg Nesterov, Steven Rostedt

On Wed, Apr 09, 2014 at 04:28:35PM +0000, Mathieu Desnoyers wrote:
> ----- Original Message -----
> > From: "Frederic Weisbecker" <fweisbec@gmail.com>
> > To: "LKML" <linux-kernel@vger.kernel.org>
> > Cc: "Frederic Weisbecker" <fweisbec@gmail.com>, "Andrew Morton" <akpm@linux-foundation.org>, "Ingo Molnar"
> > <mingo@kernel.org>, "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>, "Oleg Nesterov" <oleg@redhat.com>, "Steven
> > Rostedt" <rostedt@goodmis.org>
> > Sent: Wednesday, April 9, 2014 12:11:19 PM
> > Subject: [PATCH 2/5] tracepoint: Convert process iteration to use for_each_process_thread()
> > 
> > do_each_thread/while_each_thread iterators are deprecated by
> > for_each_thread/for_each_process_thread() APIs.
> > 
> > Lets convert the callers in the tracepoint code. The ultimate
> > goal is to remove the struct task_struct::thread_group field and
> > the corresponding do_each_thread/while_each_thread iterators that are
> > RCU unsafe.
> > 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > ---
> >  kernel/tracepoint.c | 12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> > index fb0a38a..00a7e8b 100644
> > --- a/kernel/tracepoint.c
> > +++ b/kernel/tracepoint.c
> > @@ -561,15 +561,15 @@ static int sys_tracepoint_refcount;
> >  void syscall_regfunc(void)
> >  {
> >  	unsigned long flags;
> > -	struct task_struct *g, *t;
> > +	struct task_struct *p, *t;
> >  
> >  	if (!sys_tracepoint_refcount) {
> >  		read_lock_irqsave(&tasklist_lock, flags);
> > -		do_each_thread(g, t) {
> > +		for_each_process_thread(p, t) {
> 
> What are the locking rules for for_each_process_thread() ?
> 
> Is it required to hold RCU read-side lock ? (it's not the case here)
> 
> Is tasklist_lock read-side lock sufficient ?

It's the same requirements than do_each_thread while_each_thread: tasklist_lock
or RCU. Except that it's really RCU-safe. while_each_thread isn't really
safe due to issues with concurrent exec/de_thread()

Then it depends on your requirement, if you can tolerate concurrent
adds and removals or not. Here tasklist_lock seems required or we may miss
some tasks' syscall traces.

> 
> A quick glance at those for_each iterator defines in sched.h was not
> helpful in finding this information.

Agreed, I'm going to add comments to precise that.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 2/5] tracepoint: Convert process iteration to use for_each_process_thread()
  2014-04-09 16:28   ` Mathieu Desnoyers
  2014-04-09 16:40     ` Frederic Weisbecker
@ 2014-04-09 16:42     ` Steven Rostedt
  2014-04-09 17:05       ` [PATCH 0/2] Was: " Oleg Nesterov
  1 sibling, 1 reply; 55+ messages in thread
From: Steven Rostedt @ 2014-04-09 16:42 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar, Oleg Nesterov

On Wed, 9 Apr 2014 16:28:35 +0000 (UTC)
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> ----- Original Message -----
> > From: "Frederic Weisbecker" <fweisbec@gmail.com>
> > To: "LKML" <linux-kernel@vger.kernel.org>
> > Cc: "Frederic Weisbecker" <fweisbec@gmail.com>, "Andrew Morton" <akpm@linux-foundation.org>, "Ingo Molnar"
> > <mingo@kernel.org>, "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>, "Oleg Nesterov" <oleg@redhat.com>, "Steven
> > Rostedt" <rostedt@goodmis.org>
> > Sent: Wednesday, April 9, 2014 12:11:19 PM
> > Subject: [PATCH 2/5] tracepoint: Convert process iteration to use for_each_process_thread()
> > 
> > do_each_thread/while_each_thread iterators are deprecated by
> > for_each_thread/for_each_process_thread() APIs.
> > 
> > Lets convert the callers in the tracepoint code. The ultimate
> > goal is to remove the struct task_struct::thread_group field and
> > the corresponding do_each_thread/while_each_thread iterators that are
> > RCU unsafe.
> > 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > Cc: Oleg Nesterov <oleg@redhat.com>
> > Cc: Steven Rostedt <rostedt@goodmis.org>
> > Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
> > ---
> >  kernel/tracepoint.c | 12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> > index fb0a38a..00a7e8b 100644
> > --- a/kernel/tracepoint.c
> > +++ b/kernel/tracepoint.c
> > @@ -561,15 +561,15 @@ static int sys_tracepoint_refcount;
> >  void syscall_regfunc(void)
> >  {
> >  	unsigned long flags;
> > -	struct task_struct *g, *t;
> > +	struct task_struct *p, *t;
> >  
> >  	if (!sys_tracepoint_refcount) {
> >  		read_lock_irqsave(&tasklist_lock, flags);
> > -		do_each_thread(g, t) {
> > +		for_each_process_thread(p, t) {
> 
> What are the locking rules for for_each_process_thread() ?
> 
> Is it required to hold RCU read-side lock ? (it's not the case here)
> 
> Is tasklist_lock read-side lock sufficient ?

The tasklist_lock is all that is needed. Nothing about the list will
change when that's held. Ideally we would like to start converting
things to rcu, where it's OK if the list actually changes while you are
iterating.

But if you require the list to stay the same while you read all the
tasks, then you need the bigger hammer (tasklist_lock).

Also, this is just about iterating the tasklist list. If something
within the loop requires rcu, then that's a different story.

-- Steve

> 
> A quick glance at those for_each iterator defines in sched.h was not
> helpful in finding this information.
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 0/2] Was: Convert process iteration to use for_each_process_thread()
  2014-04-09 16:42     ` Steven Rostedt
@ 2014-04-09 17:05       ` Oleg Nesterov
  2014-04-09 17:05         ` [PATCH RESEND 1/2] tracing: syscall_*regfunc() can race with copy_process() Oleg Nesterov
                           ` (2 more replies)
  0 siblings, 3 replies; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-09 17:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On 04/09, Steven Rostedt wrote:
>
> The tasklist_lock is all that is needed.

Yes, but not _irq. Plus this code is buggy ;)

Steven, I sent these patches twice (at least!). This time I didn't
bother to recheck this code, but the patches still apply so perhaps
you can take a look?

Last attempt, I promise ;)

Oleg.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH RESEND 1/2] tracing: syscall_*regfunc() can race with copy_process()
  2014-04-09 17:05       ` [PATCH 0/2] Was: " Oleg Nesterov
@ 2014-04-09 17:05         ` Oleg Nesterov
  2014-04-10 13:04           ` Steven Rostedt
  2014-04-10 13:06           ` Steven Rostedt
  2014-04-09 17:06         ` [PATCH RESEND 2/2] tracing: syscall_regfunc() should not skip kernel threads Oleg Nesterov
  2014-04-10 13:03         ` [PATCH 0/2] Was: Convert process iteration to use for_each_process_thread() Steven Rostedt
  2 siblings, 2 replies; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-09 17:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

syscall_regfunc() and syscall_unregfunc() should set/clear
TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race
with copy_process() and miss the new child which was not added to
init_task.tasks list yet.

Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT
under tasklist.

While at it,

	- remove _irqsafe from syscall_regfunc/syscall_unregfunc,
	  read_lock(tasklist) doesn't need to disable irqs.

	- change syscall_unregfunc() to check PF_KTHREAD to skip
	  the kernel threads, ->mm != NULL is the common mistake.

	  Note: probably this check should be simply removed, needs
	  another patch.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/trace/syscall.h |   15 +++++++++++++++
 kernel/fork.c           |    2 ++
 kernel/tracepoint.c     |   12 +++++-------
 3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/include/trace/syscall.h b/include/trace/syscall.h
index 84bc419..15a954b 100644
--- a/include/trace/syscall.h
+++ b/include/trace/syscall.h
@@ -4,6 +4,7 @@
 #include <linux/tracepoint.h>
 #include <linux/unistd.h>
 #include <linux/ftrace_event.h>
+#include <linux/thread_info.h>
 
 #include <asm/ptrace.h>
 
@@ -31,4 +32,18 @@ struct syscall_metadata {
 	struct ftrace_event_call *exit_event;
 };
 
+#ifdef CONFIG_TRACEPOINTS
+static inline void syscall_tracepoint_update(struct task_struct *p)
+{
+	if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
+		set_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
+	else
+		clear_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
+}
+#else
+static inline void syscall_tracepoint_update(struct task_struct *p)
+{
+}
+#endif
+
 #endif /* _TRACE_SYSCALL_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 1766d32..e463f99 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1472,7 +1472,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 
 	total_forks++;
 	spin_unlock(&current->sighand->siglock);
+	syscall_tracepoint_update(p);
 	write_unlock_irq(&tasklist_lock);
+
 	proc_fork_connector(p);
 	cgroup_post_fork(p);
 	if (clone_flags & CLONE_THREAD)
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 0c05a45..a16754b 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -732,33 +732,31 @@ static int sys_tracepoint_refcount;
 
 void syscall_regfunc(void)
 {
-	unsigned long flags;
 	struct task_struct *g, *t;
 
 	if (!sys_tracepoint_refcount) {
-		read_lock_irqsave(&tasklist_lock, flags);
+		read_lock(&tasklist_lock);
 		do_each_thread(g, t) {
 			/* Skip kernel threads. */
-			if (t->mm)
+			if (!(t->flags & PF_KTHREAD))
 				set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
 		} while_each_thread(g, t);
-		read_unlock_irqrestore(&tasklist_lock, flags);
+		read_unlock(&tasklist_lock);
 	}
 	sys_tracepoint_refcount++;
 }
 
 void syscall_unregfunc(void)
 {
-	unsigned long flags;
 	struct task_struct *g, *t;
 
 	sys_tracepoint_refcount--;
 	if (!sys_tracepoint_refcount) {
-		read_lock_irqsave(&tasklist_lock, flags);
+		read_lock(&tasklist_lock);
 		do_each_thread(g, t) {
 			clear_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
 		} while_each_thread(g, t);
-		read_unlock_irqrestore(&tasklist_lock, flags);
+		read_unlock(&tasklist_lock);
 	}
 }
 #endif
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH RESEND 2/2] tracing: syscall_regfunc() should not skip kernel threads
  2014-04-09 17:05       ` [PATCH 0/2] Was: " Oleg Nesterov
  2014-04-09 17:05         ` [PATCH RESEND 1/2] tracing: syscall_*regfunc() can race with copy_process() Oleg Nesterov
@ 2014-04-09 17:06         ` Oleg Nesterov
  2014-04-10 13:28           ` Steven Rostedt
  2014-04-10 13:03         ` [PATCH 0/2] Was: Convert process iteration to use for_each_process_thread() Steven Rostedt
  2 siblings, 1 reply; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-09 17:06 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

syscall_regfunc() ignores the kernel thread because "it has
no effect", see cc3b13c1 "Don't trace kernel thread syscalls".

However, this means that a user-space task spawned by
call_usermodehelper() won't report the system calls if
kernel_execve() is called when sys_tracepoint_refcount != 0.

Remove this check. Hopefully the unnecessary report from
ret_from_fork path mentioned by cc3b13c1 is fine. In fact
"this is the only case" is not true. Say, kernel_execve()
itself does "int 80" on X86_32. Hopefully fine too.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/tracepoint.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index a16754b..4e1e4ca 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -737,9 +737,7 @@ void syscall_regfunc(void)
 	if (!sys_tracepoint_refcount) {
 		read_lock(&tasklist_lock);
 		do_each_thread(g, t) {
-			/* Skip kernel threads. */
-			if (!(t->flags & PF_KTHREAD))
-				set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
+			set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
 		} while_each_thread(g, t);
 		read_unlock(&tasklist_lock);
 	}
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread()
  2014-04-09 16:11 ` [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread() Frederic Weisbecker
@ 2014-04-09 17:12   ` Oleg Nesterov
  2014-04-09 17:16   ` Peter Zijlstra
  1 sibling, 0 replies; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-09 17:12 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: LKML, Andrew Morton, Ingo Molnar, Peter Zijlstra

On 04/09, Frederic Weisbecker wrote:
>
> @@ -295,16 +295,15 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
>  
>  	rcu_read_lock();
>  	/* make sure we can trust tsk->thread_group list */
> -	if (!likely(pid_alive(tsk)))
> +	if (!likely(pid_alive(p)))

You can simply remove this check now. for_each_thread() is always safe.

Oleg.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread()
  2014-04-09 16:11 ` [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread() Frederic Weisbecker
  2014-04-09 17:12   ` Oleg Nesterov
@ 2014-04-09 17:16   ` Peter Zijlstra
  2014-04-09 17:32     ` Oleg Nesterov
  1 sibling, 1 reply; 55+ messages in thread
From: Peter Zijlstra @ 2014-04-09 17:16 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: LKML, Andrew Morton, Ingo Molnar, Oleg Nesterov

On Wed, Apr 09, 2014 at 06:11:18PM +0200, Frederic Weisbecker wrote:
> do_each_thread/while_each_thread iterators are deprecated by
> for_each_thread/for_each_process_thread() APIs.
> 
> Lets convert the callers in the cputime code accounting. The ultimate
> goal is to remove the struct task_struct::thread_group field and
> the corresponding do_each_thread/while_each_thread iterators that are
> RCU unsafe.
> 
> It also makes thread_group_cputime() eventually RCU-safe.

this fails to explain how the current code is broken.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 3/5] hung_task: Convert process iteration to use for_each_process_thread()
  2014-04-09 16:11 ` [PATCH 3/5] hung_task: " Frederic Weisbecker
@ 2014-04-09 17:23   ` Oleg Nesterov
  0 siblings, 0 replies; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-09 17:23 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: LKML, Andrew Morton, Ingo Molnar

On 04/09, Frederic Weisbecker wrote:
>
> @@ -167,18 +167,18 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
>  		return;
>  
>  	rcu_read_lock();
> -	do_each_thread(g, t) {
> +	for_each_process_thread(p, t) {
>  		if (!max_count--)
>  			goto unlock;
>  		if (!--batch_count) {
>  			batch_count = HUNG_TASK_BATCHING;
> -			if (!rcu_lock_break(g, t))
> +			if (!rcu_lock_break(p, t))

Not sure. And in any case this is suboptimal. Frederic, I'll write
another email tomorrow. In fact I already have the patch which changes
check_hung_uninterruptible_tasks() somewhere.

Oleg.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread()
  2014-04-09 17:16   ` Peter Zijlstra
@ 2014-04-09 17:32     ` Oleg Nesterov
  2014-04-09 18:30       ` Peter Zijlstra
  0 siblings, 1 reply; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-09 17:32 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On 04/09, Peter Zijlstra wrote:
>
> On Wed, Apr 09, 2014 at 06:11:18PM +0200, Frederic Weisbecker wrote:
> > do_each_thread/while_each_thread iterators are deprecated by
> > for_each_thread/for_each_process_thread() APIs.
> >
> > Lets convert the callers in the cputime code accounting. The ultimate
> > goal is to remove the struct task_struct::thread_group field and
> > the corresponding do_each_thread/while_each_thread iterators that are
> > RCU unsafe.
> >
> > It also makes thread_group_cputime() eventually RCU-safe.
>
> this fails to explain how the current code is broken.

while_each_thread(g, t) will loop forever if g exits and removes itself
from ->thread_group. This can happen even if it is the group leader,
de_thread() can do this.

Another problem is that it is used wrongly very often, even if
while_each_thread() was fine people forget to check pid_alive() to ensure
it didn't exit even before we take rcu_read_lock().

for_each_thread(p, t) is always safe. Unless p's task_struct can't go away,
of course.

But there is a difference. Ignoring the bug above

	p = g;
	do {
		printk("pid=%d\n", p->pid);
	} while_each_thread(g, p);

always prints at least one pid.

	for_each_thread(g, p)
		printk("pid=%d\n", p->pid);

can print nothing if g has already exited.

Oleg.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread()
  2014-04-09 17:32     ` Oleg Nesterov
@ 2014-04-09 18:30       ` Peter Zijlstra
  2014-04-09 19:46         ` Oleg Nesterov
  0 siblings, 1 reply; 55+ messages in thread
From: Peter Zijlstra @ 2014-04-09 18:30 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On Wed, Apr 09, 2014 at 07:32:34PM +0200, Oleg Nesterov wrote:
> On 04/09, Peter Zijlstra wrote:
> >
> > On Wed, Apr 09, 2014 at 06:11:18PM +0200, Frederic Weisbecker wrote:
> > > do_each_thread/while_each_thread iterators are deprecated by
> > > for_each_thread/for_each_process_thread() APIs.
> > >
> > > Lets convert the callers in the cputime code accounting. The ultimate
> > > goal is to remove the struct task_struct::thread_group field and
> > > the corresponding do_each_thread/while_each_thread iterators that are
> > > RCU unsafe.
> > >
> > > It also makes thread_group_cputime() eventually RCU-safe.
> >
> > this fails to explain how the current code is broken.
> 
> while_each_thread(g, t) will loop forever if g exits and removes itself
> from ->thread_group. This can happen even if it is the group leader,
> de_thread() can do this.
> 
> Another problem is that it is used wrongly very often, even if
> while_each_thread() was fine people forget to check pid_alive() to ensure
> it didn't exit even before we take rcu_read_lock().
> 
> for_each_thread(p, t) is always safe. Unless p's task_struct can't go away,
> of course.
> 
> But there is a difference. Ignoring the bug above
> 
> 	p = g;
> 	do {
> 		printk("pid=%d\n", p->pid);
> 	} while_each_thread(g, p);
> 
> always prints at least one pid.
> 
> 	for_each_thread(g, p)
> 		printk("pid=%d\n", p->pid);
> 
> can print nothing if g has already exited.

Cute,.. and all that lives in sched.h and I wasn't even Cc'ed :-(

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread()
  2014-04-09 18:30       ` Peter Zijlstra
@ 2014-04-09 19:46         ` Oleg Nesterov
  2014-04-09 19:49           ` Peter Zijlstra
  2014-04-10  7:56           ` Ingo Molnar
  0 siblings, 2 replies; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-09 19:46 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On 04/09, Peter Zijlstra wrote:
>
> Cute,.. and all that lives in sched.h and I wasn't even Cc'ed :-(

Sorry ;) I didn't think you can be interested. I'll cc you in future,
this still needs more work.

Oleg.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread()
  2014-04-09 19:46         ` Oleg Nesterov
@ 2014-04-09 19:49           ` Peter Zijlstra
  2014-04-10 16:19             ` Peter Zijlstra
  2014-04-10  7:56           ` Ingo Molnar
  1 sibling, 1 reply; 55+ messages in thread
From: Peter Zijlstra @ 2014-04-09 19:49 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On Wed, Apr 09, 2014 at 09:46:03PM +0200, Oleg Nesterov wrote:
> On 04/09, Peter Zijlstra wrote:
> >
> > Cute,.. and all that lives in sched.h and I wasn't even Cc'ed :-(
> 
> Sorry ;) I didn't think you can be interested. I'll cc you in future,
> this still needs more work.

In general anything locking and/or task related, and this is both, is
'interesting'.

Anyway, I'll look over the stuff tomorrow.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread()
  2014-04-09 19:46         ` Oleg Nesterov
  2014-04-09 19:49           ` Peter Zijlstra
@ 2014-04-10  7:56           ` Ingo Molnar
  1 sibling, 0 replies; 55+ messages in thread
From: Ingo Molnar @ 2014-04-10  7:56 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Peter Zijlstra, Frederic Weisbecker, LKML, Andrew Morton


* Oleg Nesterov <oleg@redhat.com> wrote:

> On 04/09, Peter Zijlstra wrote:
> >
> > Cute,.. and all that lives in sched.h and I wasn't even Cc'ed :-(
> 
> Sorry ;) I didn't think you can be interested. I'll cc you in future,
> this still needs more work.

Please generally Cc: the folks listed in:

  scripts/get_maintainer.pl -f include/linux/sched.h

If maintainers are truly not interested in changes to a file they are 
listed to maintain then they'll get it unlisted in the MAINTAINERS 
patterns.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 0/2] Was: Convert process iteration to use for_each_process_thread()
  2014-04-09 17:05       ` [PATCH 0/2] Was: " Oleg Nesterov
  2014-04-09 17:05         ` [PATCH RESEND 1/2] tracing: syscall_*regfunc() can race with copy_process() Oleg Nesterov
  2014-04-09 17:06         ` [PATCH RESEND 2/2] tracing: syscall_regfunc() should not skip kernel threads Oleg Nesterov
@ 2014-04-10 13:03         ` Steven Rostedt
  2 siblings, 0 replies; 55+ messages in thread
From: Steven Rostedt @ 2014-04-10 13:03 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On Wed, 9 Apr 2014 19:05:05 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> On 04/09, Steven Rostedt wrote:
> >
> > The tasklist_lock is all that is needed.
> 
> Yes, but not _irq. Plus this code is buggy ;)
> 
> Steven, I sent these patches twice (at least!). This time I didn't
> bother to recheck this code, but the patches still apply so perhaps
> you can take a look?
> 
> Last attempt, I promise ;)

Ah OK, and I actually have time to look at them now :-)

-- Steve

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH RESEND 1/2] tracing: syscall_*regfunc() can race with copy_process()
  2014-04-09 17:05         ` [PATCH RESEND 1/2] tracing: syscall_*regfunc() can race with copy_process() Oleg Nesterov
@ 2014-04-10 13:04           ` Steven Rostedt
  2014-04-10 13:33             ` Oleg Nesterov
  2014-04-10 13:06           ` Steven Rostedt
  1 sibling, 1 reply; 55+ messages in thread
From: Steven Rostedt @ 2014-04-10 13:04 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On Wed, 9 Apr 2014 19:05:42 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> syscall_regfunc() and syscall_unregfunc() should set/clear
> TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race
> with copy_process() and miss the new child which was not added to
> init_task.tasks list yet.
> 
> Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT
> under tasklist.
> 
> While at it,
> 
> 	- remove _irqsafe from syscall_regfunc/syscall_unregfunc,
> 	  read_lock(tasklist) doesn't need to disable irqs.
> 
> 	- change syscall_unregfunc() to check PF_KTHREAD to skip
> 	  the kernel threads, ->mm != NULL is the common mistake.
> 
> 	  Note: probably this check should be simply removed, needs
> 	  another patch.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

BTW, how important is this fix? Something we should aim for the current
merge window? stable? Or can it wait till 3.16?

-- Steve

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH RESEND 1/2] tracing: syscall_*regfunc() can race with copy_process()
  2014-04-09 17:05         ` [PATCH RESEND 1/2] tracing: syscall_*regfunc() can race with copy_process() Oleg Nesterov
  2014-04-10 13:04           ` Steven Rostedt
@ 2014-04-10 13:06           ` Steven Rostedt
  2014-04-10 13:34             ` Oleg Nesterov
  1 sibling, 1 reply; 55+ messages in thread
From: Steven Rostedt @ 2014-04-10 13:06 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On Wed, 9 Apr 2014 19:05:42 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> syscall_regfunc() and syscall_unregfunc() should set/clear
> TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race
> with copy_process() and miss the new child which was not added to
> init_task.tasks list yet.
> 
> Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT
> under tasklist.
> 
> While at it,
> 
> 	- remove _irqsafe from syscall_regfunc/syscall_unregfunc,
> 	  read_lock(tasklist) doesn't need to disable irqs.
> 
> 	- change syscall_unregfunc() to check PF_KTHREAD to skip
> 	  the kernel threads, ->mm != NULL is the common mistake.
> 
> 	  Note: probably this check should be simply removed, needs
> 	  another patch.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  include/trace/syscall.h |   15 +++++++++++++++
>  kernel/fork.c           |    2 ++
>  kernel/tracepoint.c     |   12 +++++-------
>  3 files changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/include/trace/syscall.h b/include/trace/syscall.h
> index 84bc419..15a954b 100644
> --- a/include/trace/syscall.h
> +++ b/include/trace/syscall.h
> @@ -4,6 +4,7 @@
>  #include <linux/tracepoint.h>
>  #include <linux/unistd.h>
>  #include <linux/ftrace_event.h>
> +#include <linux/thread_info.h>
>  
>  #include <asm/ptrace.h>
>  
> @@ -31,4 +32,18 @@ struct syscall_metadata {
>  	struct ftrace_event_call *exit_event;
>  };
>  
> +#ifdef CONFIG_TRACEPOINTS
> +static inline void syscall_tracepoint_update(struct task_struct *p)
> +{
> +	if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
> +		set_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
> +	else
> +		clear_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
> +}
> +#else
> +static inline void syscall_tracepoint_update(struct task_struct *p)
> +{
> +}
> +#endif
> +
>  #endif /* _TRACE_SYSCALL_H */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 1766d32..e463f99 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1472,7 +1472,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>  
>  	total_forks++;
>  	spin_unlock(&current->sighand->siglock);
> +	syscall_tracepoint_update(p);
>  	write_unlock_irq(&tasklist_lock);
> +

So this looks to be a fix that probably should go to stable?

>  	proc_fork_connector(p);
>  	cgroup_post_fork(p);
>  	if (clone_flags & CLONE_THREAD)
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 0c05a45..a16754b 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -732,33 +732,31 @@ static int sys_tracepoint_refcount;
>  
>  void syscall_regfunc(void)
>  {
> -	unsigned long flags;
>  	struct task_struct *g, *t;
>  
>  	if (!sys_tracepoint_refcount) {
> -		read_lock_irqsave(&tasklist_lock, flags);
> +		read_lock(&tasklist_lock);
>  		do_each_thread(g, t) {
>  			/* Skip kernel threads. */
> -			if (t->mm)
> +			if (!(t->flags & PF_KTHREAD))
>  				set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
>  		} while_each_thread(g, t);
> -		read_unlock_irqrestore(&tasklist_lock, flags);
> +		read_unlock(&tasklist_lock);
>  	}
>  	sys_tracepoint_refcount++;
>  }
>  
>  void syscall_unregfunc(void)
>  {
> -	unsigned long flags;
>  	struct task_struct *g, *t;
>  
>  	sys_tracepoint_refcount--;
>  	if (!sys_tracepoint_refcount) {
> -		read_lock_irqsave(&tasklist_lock, flags);
> +		read_lock(&tasklist_lock);
>  		do_each_thread(g, t) {
>  			clear_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
>  		} while_each_thread(g, t);
> -		read_unlock_irqrestore(&tasklist_lock, flags);
> +		read_unlock(&tasklist_lock);

These changes should be in a separate patch, as they can wait till 3.16.

-- Steve

>  	}
>  }
>  #endif


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH RESEND 2/2] tracing: syscall_regfunc() should not skip kernel threads
  2014-04-09 17:06         ` [PATCH RESEND 2/2] tracing: syscall_regfunc() should not skip kernel threads Oleg Nesterov
@ 2014-04-10 13:28           ` Steven Rostedt
  2014-04-10 13:38             ` Oleg Nesterov
  0 siblings, 1 reply; 55+ messages in thread
From: Steven Rostedt @ 2014-04-10 13:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On Wed, 9 Apr 2014 19:06:16 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> syscall_regfunc() ignores the kernel thread because "it has
> no effect", see cc3b13c1 "Don't trace kernel thread syscalls".
> 
> However, this means that a user-space task spawned by
> call_usermodehelper() won't report the system calls if
> kernel_execve() is called when sys_tracepoint_refcount != 0.

What about doing the set there? That is, we could add a check in the
call_userspacehelper() just before it does the do_execve, that if
sys_tracepoint_refcount is set, we set the TIF flag.


-- Steve

> 
> Remove this check. Hopefully the unnecessary report from
> ret_from_fork path mentioned by cc3b13c1 is fine. In fact
> "this is the only case" is not true. Say, kernel_execve()
> itself does "int 80" on X86_32. Hopefully fine too.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  kernel/tracepoint.c |    4 +---
>  1 files changed, 1 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index a16754b..4e1e4ca 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -737,9 +737,7 @@ void syscall_regfunc(void)
>  	if (!sys_tracepoint_refcount) {
>  		read_lock(&tasklist_lock);
>  		do_each_thread(g, t) {
> -			/* Skip kernel threads. */
> -			if (!(t->flags & PF_KTHREAD))
> -				set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
> +			set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
>  		} while_each_thread(g, t);
>  		read_unlock(&tasklist_lock);
>  	}


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH RESEND 1/2] tracing: syscall_*regfunc() can race with copy_process()
  2014-04-10 13:04           ` Steven Rostedt
@ 2014-04-10 13:33             ` Oleg Nesterov
  0 siblings, 0 replies; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-10 13:33 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On 04/10, Steven Rostedt wrote:
>
> On Wed, 9 Apr 2014 19:05:42 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > syscall_regfunc() and syscall_unregfunc() should set/clear
> > TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race
> > with copy_process() and miss the new child which was not added to
> > init_task.tasks list yet.
> >
> > Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT
> > under tasklist.
> >
> > While at it,
> >
> > 	- remove _irqsafe from syscall_regfunc/syscall_unregfunc,
> > 	  read_lock(tasklist) doesn't need to disable irqs.
> >
> > 	- change syscall_unregfunc() to check PF_KTHREAD to skip
> > 	  the kernel threads, ->mm != NULL is the common mistake.
> >
> > 	  Note: probably this check should be simply removed, needs
> > 	  another patch.
> >
> > Signed-off-by: Oleg Nesterov <oleg@redhat.com>
>
> BTW, how important is this fix? Something we should aim for the current
> merge window? stable? Or can it wait till 3.16?

I think this all can wait till 3.16. The problem is really minor.

Oleg.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH RESEND 1/2] tracing: syscall_*regfunc() can race with copy_process()
  2014-04-10 13:06           ` Steven Rostedt
@ 2014-04-10 13:34             ` Oleg Nesterov
  2014-04-11 15:22               ` Steven Rostedt
  0 siblings, 1 reply; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-10 13:34 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On 04/10, Steven Rostedt wrote:
>
> On Wed, 9 Apr 2014 19:05:42 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -1472,7 +1472,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,
> >
> >  	total_forks++;
> >  	spin_unlock(&current->sighand->siglock);
> > +	syscall_tracepoint_update(p);
> >  	write_unlock_irq(&tasklist_lock);
> > +
>
> So this looks to be a fix that probably should go to stable?

Not sure, this is up to you.

> > @@ -732,33 +732,31 @@ static int sys_tracepoint_refcount;
> >
> >  void syscall_regfunc(void)
> >  {
> > -	unsigned long flags;
> >  	struct task_struct *g, *t;
> >
> >  	if (!sys_tracepoint_refcount) {
> > -		read_lock_irqsave(&tasklist_lock, flags);
> > +		read_lock(&tasklist_lock);
> >  		do_each_thread(g, t) {
> >  			/* Skip kernel threads. */
> > -			if (t->mm)
> > +			if (!(t->flags & PF_KTHREAD))
> >  				set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
> >  		} while_each_thread(g, t);
> > -		read_unlock_irqrestore(&tasklist_lock, flags);
> > +		read_unlock(&tasklist_lock);
> >  	}
> >  	sys_tracepoint_refcount++;
> >  }
> >
> >  void syscall_unregfunc(void)
> >  {
> > -	unsigned long flags;
> >  	struct task_struct *g, *t;
> >
> >  	sys_tracepoint_refcount--;
> >  	if (!sys_tracepoint_refcount) {
> > -		read_lock_irqsave(&tasklist_lock, flags);
> > +		read_lock(&tasklist_lock);
> >  		do_each_thread(g, t) {
> >  			clear_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
> >  		} while_each_thread(g, t);
> > -		read_unlock_irqrestore(&tasklist_lock, flags);
> > +		read_unlock(&tasklist_lock);
>
> These changes should be in a separate patch, as they can wait till 3.16.

OK, I'll split this change.

Oleg.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH RESEND 2/2] tracing: syscall_regfunc() should not skip kernel threads
  2014-04-10 13:28           ` Steven Rostedt
@ 2014-04-10 13:38             ` Oleg Nesterov
  2014-04-10 14:28               ` Steven Rostedt
  0 siblings, 1 reply; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-10 13:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On 04/10, Steven Rostedt wrote:
>
> On Wed, 9 Apr 2014 19:06:16 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > syscall_regfunc() ignores the kernel thread because "it has
> > no effect", see cc3b13c1 "Don't trace kernel thread syscalls".
> >
> > However, this means that a user-space task spawned by
> > call_usermodehelper() won't report the system calls if
> > kernel_execve() is called when sys_tracepoint_refcount != 0.
>
> What about doing the set there? That is, we could add a check in the
> call_userspacehelper() just before it does the do_execve, that if
> sys_tracepoint_refcount is set, we set the TIF flag.

But for what?

And if we do this, ____call_usermodehelper() needs write_lock_irq(tasklist)
to serialize with syscall_*regfunc().

Oleg.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH RESEND 2/2] tracing: syscall_regfunc() should not skip kernel threads
  2014-04-10 13:38             ` Oleg Nesterov
@ 2014-04-10 14:28               ` Steven Rostedt
  2014-04-10 14:46                 ` Oleg Nesterov
  0 siblings, 1 reply; 55+ messages in thread
From: Steven Rostedt @ 2014-04-10 14:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On Thu, 10 Apr 2014 15:38:55 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> On 04/10, Steven Rostedt wrote:
> >
> > On Wed, 9 Apr 2014 19:06:16 +0200
> > Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > syscall_regfunc() ignores the kernel thread because "it has
> > > no effect", see cc3b13c1 "Don't trace kernel thread syscalls".
> > >
> > > However, this means that a user-space task spawned by
> > > call_usermodehelper() won't report the system calls if
> > > kernel_execve() is called when sys_tracepoint_refcount != 0.
> >
> > What about doing the set there? That is, we could add a check in the
> > call_userspacehelper() just before it does the do_execve, that if
> > sys_tracepoint_refcount is set, we set the TIF flag.
> 
> But for what?

Isn't call_usermodehelper() the reason you added this?

> 
> And if we do this, ____call_usermodehelper() needs write_lock_irq(tasklist)
> to serialize with syscall_*regfunc().

You mean for the slight race between checking if its set and when the
tracepoint is actually activated?

I don't think we really care about that race. I mean, the tracepoint is
activated usually by humans, and if they enabled it just as a usermode
helper is activated, and those are really fast to run, do we even care
if it is missed?

Now, if tracing is on and we need to set the flag, that should take the
task list lock to make sure that we don't miss clearing it. Missing the
set isn't a big deal, but missing the clearing of the flag is.

void tracepoint_check_syscalls(void)
{
	if (!sys_tracepoint_refcount)
		return;

	read_lock(&tasklist_lock);
	/* Make sure it wasn't cleared since taking the lock */
	if (sys_tracepoint_refcount)
		set_tsk_thread_flag(current, TIF_SYSCALL_TRACEPOINT);
	read_unlock(&tasklist_lock);
}

Hmm, probably need to use another lock than tasklist_lock as the
updating of sys_tracepoint_refcount is not done under it. But as this
is all local to tracepoint.c we could easily add something to do the
job

s/read_lock(&tasklist_lock)/spin_lock(&tracepoint_sys_lock)/

-- Steve

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH RESEND 2/2] tracing: syscall_regfunc() should not skip kernel threads
  2014-04-10 14:28               ` Steven Rostedt
@ 2014-04-10 14:46                 ` Oleg Nesterov
  2014-04-10 15:08                   ` Steven Rostedt
  0 siblings, 1 reply; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-10 14:46 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On 04/10, Steven Rostedt wrote:
>
> On Thu, 10 Apr 2014 15:38:55 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > > > However, this means that a user-space task spawned by
> > > > call_usermodehelper() won't report the system calls if
> > > > kernel_execve() is called when sys_tracepoint_refcount != 0.
> > >
> > > What about doing the set there? That is, we could add a check in the
> > > call_userspacehelper() just before it does the do_execve, that if
> > > sys_tracepoint_refcount is set, we set the TIF flag.
> >
> > But for what?
>
> Isn't call_usermodehelper() the reason you added this?

Sure. I meant, why complicate ____call_usermodehelper() and keep the
unnecessary complication (PF_KTHREAD check( in syscall_*regfunc() ?

> > And if we do this, ____call_usermodehelper() needs write_lock_irq(tasklist)
> > to serialize with syscall_*regfunc().
>
> You mean for the slight race between checking if its set and when the
> tracepoint is actually activated?

Or deactivated.

> I don't think we really care about that race.

OK, I won't argue. I agree, the problem is minor, but in this case imho
it is better to do nothing than add the racy fix.

> I mean, the tracepoint is
> activated usually by humans, and if they enabled it just as a usermode
> helper is activated, and those are really fast to run, do we even care
> if it is missed?

A user space task spawned by call_usermodehelper() can do everything, it
can run forever.

> Now, if tracing is on and we need to set the flag, that should take the
> task list lock to make sure that we don't miss clearing it. Missing the
> set isn't a big deal, but missing the clearing of the flag is.
>
> void tracepoint_check_syscalls(void)
> {
> 	if (!sys_tracepoint_refcount)
> 		return;
>
> 	read_lock(&tasklist_lock);
> 	/* Make sure it wasn't cleared since taking the lock */
> 	if (sys_tracepoint_refcount)
> 		set_tsk_thread_flag(current, TIF_SYSCALL_TRACEPOINT);
> 	read_unlock(&tasklist_lock);
> }

And how this can help to avoid the race? We need write_lock_irq().

Perhaps I missed something... and I simply do not understand why do you
want to do this.

Oleg.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH RESEND 2/2] tracing: syscall_regfunc() should not skip kernel threads
  2014-04-10 14:46                 ` Oleg Nesterov
@ 2014-04-10 15:08                   ` Steven Rostedt
  2014-04-10 17:57                     ` Oleg Nesterov
  0 siblings, 1 reply; 55+ messages in thread
From: Steven Rostedt @ 2014-04-10 15:08 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton,
	Ingo Molnar, Hendrik Brueckner

On Thu, 10 Apr 2014 16:46:55 +0200
Oleg Nesterov <oleg@redhat.com> wrote:


> > I mean, the tracepoint is
> > activated usually by humans, and if they enabled it just as a usermode
> > helper is activated, and those are really fast to run, do we even care
> > if it is missed?
> 
> A user space task spawned by call_usermodehelper() can do everything, it
> can run forever.

Sounds nasty ;-)

> 
> > Now, if tracing is on and we need to set the flag, that should take the
> > task list lock to make sure that we don't miss clearing it. Missing the
> > set isn't a big deal, but missing the clearing of the flag is.
> >
> > void tracepoint_check_syscalls(void)
> > {
> > 	if (!sys_tracepoint_refcount)
> > 		return;
> >
> > 	read_lock(&tasklist_lock);
> > 	/* Make sure it wasn't cleared since taking the lock */
> > 	if (sys_tracepoint_refcount)
> > 		set_tsk_thread_flag(current, TIF_SYSCALL_TRACEPOINT);
> > 	read_unlock(&tasklist_lock);
> > }
> 
> And how this can help to avoid the race? We need write_lock_irq().

But you chopped off the last part. Where I replaced tasklist_lock with
a tracepoint specific lock that would synchronize
sys_tracepoint_refcount with the setting of the flags.


> 
> Perhaps I missed something... and I simply do not understand why do you
> want to do this.

Because I'm being an ass ;-)

The real reason I'm doing this debate is more to find out exactly what
the problems are. A learning exercise if you will. I just don't want to
add a regression, as Hendrik (which I just Cc'd) added the commit for a
reason. Perhaps you are correct that we should just go back to the way
things were.

Hendrik, we are debating about removing
cc3b13c11c567c69a6356be98d0c03ff11541d5c as it stops
call_usermodehelper tasks from tracing their syscalls.

If Hendrik has no problems with this, neither do I.

-- Steve

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread()
  2014-04-09 19:49           ` Peter Zijlstra
@ 2014-04-10 16:19             ` Peter Zijlstra
  2014-04-10 16:32               ` Peter Zijlstra
  2014-04-10 17:29               ` Oleg Nesterov
  0 siblings, 2 replies; 55+ messages in thread
From: Peter Zijlstra @ 2014-04-10 16:19 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On Wed, Apr 09, 2014 at 09:49:57PM +0200, Peter Zijlstra wrote:
> Anyway, I'll look over the stuff tomorrow.

That patch was missing useful clues..

Also, I shot __for_each_thread in the head, it seemed useless.

---
 include/linux/sched.h | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 7cb07fd26680..6f995bef45d3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2377,6 +2377,9 @@ extern bool current_is_single_threaded(void);
 /*
  * Careful: do_each_thread/while_each_thread is a double loop so
  *          'break' will not work as expected - use goto instead.
+ *
+ * Deprecated; use for_each_thread() instead; this has serious issues with
+ * g != current and RCU.
  */
 #define do_each_thread(g, t) \
 	for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do
@@ -2384,11 +2387,8 @@ extern bool current_is_single_threaded(void);
 #define while_each_thread(g, t) \
 	while ((t = next_thread(t)) != g)
 
-#define __for_each_thread(signal, t)	\
-	list_for_each_entry_rcu(t, &(signal)->thread_head, thread_node)
-
 #define for_each_thread(p, t)		\
-	__for_each_thread((p)->signal, t)
+	list_for_each_entry_rcu(t, &(p)->signal->thread_head, thread_node)
 
 /* Careful: this is a double loop, 'break' won't work as expected. */
 #define for_each_process_thread(p, t)	\
@@ -2421,7 +2421,8 @@ bool same_thread_group(struct task_struct *p1, struct task_struct *p2)
 	return p1->signal == p2->signal;
 }
 
-static inline struct task_struct *next_thread(const struct task_struct *p)
+static inline __deprecated
+struct task_struct *next_thread(const struct task_struct *p)
 {
 	return list_entry_rcu(p->thread_group.next,
 			      struct task_struct, thread_group);

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread()
  2014-04-10 16:19             ` Peter Zijlstra
@ 2014-04-10 16:32               ` Peter Zijlstra
  2014-04-10 17:29               ` Oleg Nesterov
  1 sibling, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2014-04-10 16:32 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On Thu, Apr 10, 2014 at 06:19:55PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 09, 2014 at 09:49:57PM +0200, Peter Zijlstra wrote:
> > Anyway, I'll look over the stuff tomorrow.
> 
> That patch was missing useful clues..
> 
> Also, I shot __for_each_thread in the head, it seemed useless.

Hmm, do we really want to have for_each_process_thread() ? Having people
explicitly write:

  for_each_process(p) {
  	for_each_thread(p, t) {
	}
  }

isn't that much more effort and its a whole lot more explicit on the
double loop thing.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread()
  2014-04-10 16:19             ` Peter Zijlstra
  2014-04-10 16:32               ` Peter Zijlstra
@ 2014-04-10 17:29               ` Oleg Nesterov
  2014-04-10 17:36                 ` Peter Zijlstra
  1 sibling, 1 reply; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-10 17:29 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On 04/10, Peter Zijlstra wrote:
>
> On Wed, Apr 09, 2014 at 09:49:57PM +0200, Peter Zijlstra wrote:
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2377,6 +2377,9 @@ extern bool current_is_single_threaded(void);
>  /*
>   * Careful: do_each_thread/while_each_thread is a double loop so
>   *          'break' will not work as expected - use goto instead.
> + *
> + * Deprecated; use for_each_thread() instead; this has serious issues with
> + * g != current and RCU.
>   */

Yes, thanks.

>  #define do_each_thread(g, t) \
>  	for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do
> @@ -2384,11 +2387,8 @@ extern bool current_is_single_threaded(void);
>  #define while_each_thread(g, t) \
>  	while ((t = next_thread(t)) != g)
>  
> -#define __for_each_thread(signal, t)	\
> -	list_for_each_entry_rcu(t, &(signal)->thread_head, thread_node)
> -
>  #define for_each_thread(p, t)		\
> -	__for_each_thread((p)->signal, t)
> +	list_for_each_entry_rcu(t, &(p)->signal->thread_head, thread_node)

Why? __for_each_thread(signal) can generate a better code, if we do care.

In fact, ignoring the bad "signal" name, __for_each_thread(signal, t)
even looks better. "signal" represents the whole thread group.

But I won't argue. Besides, this reminds me about CONST_CAST() and making
task_struct->signal "const". This can improve the code generation too.

> +static inline __deprecated
> +struct task_struct *next_thread(const struct task_struct *p)
>  {

Not sure... But probably fine too.

I already killed some users of next_thread(). This reminds me about
next_tid(), probably it should be converted too.

As for, say, __exit_signal() it really needs next_thread(). We can fix
it instead of deprecating, or we can add another one with another name.

Oleg.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread()
  2014-04-10 17:29               ` Oleg Nesterov
@ 2014-04-10 17:36                 ` Peter Zijlstra
  2014-04-10 17:42                   ` Peter Zijlstra
  2014-04-10 19:15                   ` Oleg Nesterov
  0 siblings, 2 replies; 55+ messages in thread
From: Peter Zijlstra @ 2014-04-10 17:36 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On Thu, Apr 10, 2014 at 07:29:38PM +0200, Oleg Nesterov wrote:
> >  #define do_each_thread(g, t) \
> >  	for (g = t = &init_task ; (g = t = next_task(g)) != &init_task ; ) do
> > @@ -2384,11 +2387,8 @@ extern bool current_is_single_threaded(void);
> >  #define while_each_thread(g, t) \
> >  	while ((t = next_thread(t)) != g)
> >  
> > -#define __for_each_thread(signal, t)	\
> > -	list_for_each_entry_rcu(t, &(signal)->thread_head, thread_node)
> > -
> >  #define for_each_thread(p, t)		\
> > -	__for_each_thread((p)->signal, t)
> > +	list_for_each_entry_rcu(t, &(p)->signal->thread_head, thread_node)
> 
> Why? __for_each_thread(signal) can generate a better code, if we do care.

Well, there were no users and it wasn't mentioned anywhere.

> In fact, ignoring the bad "signal" name, __for_each_thread(signal, t)
> even looks better. "signal" represents the whole thread group.
> 
> But I won't argue. Besides, this reminds me about CONST_CAST() and making
> task_struct->signal "const". This can improve the code generation too.

Yeah, I always disliked how we mixed up the signal handling and the
thread group stuff. We should probably rename the lot.

> > +static inline __deprecated
> > +struct task_struct *next_thread(const struct task_struct *p)
> >  {
> 
> Not sure... But probably fine too.
> 
> I already killed some users of next_thread(). This reminds me about
> next_tid(), probably it should be converted too.
> 
> As for, say, __exit_signal() it really needs next_thread(). We can fix
> it instead of deprecating, or we can add another one with another name.

Well, your Changelog said that next_thread() was faulty too; if
__exit_signal() is the only site where it is correct we can open-code it
there. If there's more we should probably create a new function and
audit all current sites.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread()
  2014-04-10 17:36                 ` Peter Zijlstra
@ 2014-04-10 17:42                   ` Peter Zijlstra
  2014-04-10 19:15                   ` Oleg Nesterov
  1 sibling, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2014-04-10 17:42 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On Thu, Apr 10, 2014 at 07:36:10PM +0200, Peter Zijlstra wrote:
> > > +static inline __deprecated
> > > +struct task_struct *next_thread(const struct task_struct *p)
> > >  {
> > 
> > Not sure... But probably fine too.

Also, since its used in while_each_thread() each usage of that will also
generate an appropriate warn.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH RESEND 2/2] tracing: syscall_regfunc() should not skip kernel threads
  2014-04-10 15:08                   ` Steven Rostedt
@ 2014-04-10 17:57                     ` Oleg Nesterov
  2014-04-10 18:14                       ` Oleg Nesterov
  0 siblings, 1 reply; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-10 17:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton,
	Ingo Molnar, Hendrik Brueckner

On 04/10, Steven Rostedt wrote:
>
> On Thu, 10 Apr 2014 16:46:55 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > > void tracepoint_check_syscalls(void)
> > > {
> > > 	if (!sys_tracepoint_refcount)
> > > 		return;
> > >
> > > 	read_lock(&tasklist_lock);
> > > 	/* Make sure it wasn't cleared since taking the lock */
> > > 	if (sys_tracepoint_refcount)
> > > 		set_tsk_thread_flag(current, TIF_SYSCALL_TRACEPOINT);
> > > 	read_unlock(&tasklist_lock);
> > > }
> >
> > And how this can help to avoid the race? We need write_lock_irq().
>
> But you chopped off the last part. Where I replaced tasklist_lock with
> a tracepoint specific lock that would synchronize
> sys_tracepoint_refcount with the setting of the flags.

Yes sure, if we add another lock everything is fine.

> > Perhaps I missed something... and I simply do not understand why do you
> > want to do this.
>
> Because I'm being an ass ;-)

Nothing new, I always knew this ;)

> The real reason I'm doing this debate is more to find out exactly what
> the problems are. A learning exercise if you will. I just don't want to
> add a regression, as Hendrik (which I just Cc'd) added the commit for a
> reason. Perhaps you are correct that we should just go back to the way
> things were.

Sure, this should be verified. Besides, the changelog is very old. It says
"kernel_execve() itself does "int 80" on X86_32.", this is no longer true.

> Hendrik, we are debating about removing
> cc3b13c11c567c69a6356be98d0c03ff11541d5c as it stops
> call_usermodehelper tasks from tracing their syscalls.
>
> If Hendrik has no problems with this, neither do I.

OK.

cc3b13c11c567 mentions ret_from_fork, today copy_thread(PF_KTHREAD) uses
ret_from_kernel_thread on 32bit, and still ret_from_fork on 64 bit but
in the last case it checks PF_KTHREAD... I am wondering why they both
(ret_from_kernel_thread and "1: " label in ret_from_fork) can't simply
call do_exit() at the end?

And, since they do not, every kernel_thread's function (fn argument of
kernel_thread) must call do_exit itself?

Looks a bit strange, I guess I missed something obvious.

Oleg.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH RESEND 2/2] tracing: syscall_regfunc() should not skip kernel threads
  2014-04-10 17:57                     ` Oleg Nesterov
@ 2014-04-10 18:14                       ` Oleg Nesterov
  2014-04-10 19:00                         ` Oleg Nesterov
  2014-04-10 19:13                         ` Steven Rostedt
  0 siblings, 2 replies; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-10 18:14 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton,
	Ingo Molnar, Hendrik Brueckner

On 04/10, Oleg Nesterov wrote:
>
> On 04/10, Steven Rostedt wrote:
> >
> > Hendrik, we are debating about removing
> > cc3b13c11c567c69a6356be98d0c03ff11541d5c as it stops
> > call_usermodehelper tasks from tracing their syscalls.
> >
> > If Hendrik has no problems with this, neither do I.
>
> OK.
>
> cc3b13c11c567 mentions ret_from_fork, today copy_thread(PF_KTHREAD) uses
> ret_from_kernel_thread on 32bit, and still ret_from_fork on 64 bit but
> in the last case it checks PF_KTHREAD... I am wondering why they both
> (ret_from_kernel_thread and "1: " label in ret_from_fork) can't simply
> call do_exit() at the end?

probably because we need to change all architectures...

> And, since they do not, every kernel_thread's function (fn argument of
> kernel_thread) must call do_exit itself?

Hmm yes. See fb45550d76bb5 "make sure that kernel_thread() callbacks call
do_exit() themselves".

> Looks a bit strange, I guess I missed something obvious.

And I forgot to mention, given that the kernel_thread() callback should
call do_exit() itself, then this part of cc3b13c11c567c69a63

	one case when a kernel thread can reach the
	usual syscall exit tracing path: when we create a kernel thread, the
	child comes to ret_from_fork

is no longer relevant? A PF_KTHREAD child should never return from the
callback and thus it should never do "jmp syscall_exit" ?

Or I am totally confused?

Oleg.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH RESEND 2/2] tracing: syscall_regfunc() should not skip kernel threads
  2014-04-10 18:14                       ` Oleg Nesterov
@ 2014-04-10 19:00                         ` Oleg Nesterov
  2014-04-10 19:13                         ` Steven Rostedt
  1 sibling, 0 replies; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-10 19:00 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton,
	Ingo Molnar, Hendrik Brueckner

On 04/10, Oleg Nesterov wrote:
>
> On 04/10, Oleg Nesterov wrote:
> >
> > On 04/10, Steven Rostedt wrote:
> > >
> > > Hendrik, we are debating about removing
> > > cc3b13c11c567c69a6356be98d0c03ff11541d5c as it stops
> > > call_usermodehelper tasks from tracing their syscalls.
> > >
> > > If Hendrik has no problems with this, neither do I.
> >
> > OK.
> >
> > cc3b13c11c567 mentions ret_from_fork, today copy_thread(PF_KTHREAD) uses
> > ret_from_kernel_thread on 32bit, and still ret_from_fork on 64 bit but
> > in the last case it checks PF_KTHREAD... I am wondering why they both
> > (ret_from_kernel_thread and "1: " label in ret_from_fork) can't simply
> > call do_exit() at the end?
>
> probably because we need to change all architectures...

Heh. And because "call *%rbx" can actually return if this kernel thread
execs ;)

But in this case we want TIF_SYSCALL_TRACING, so

> > And, since they do not, every kernel_thread's function (fn argument of
> > kernel_thread) must call do_exit itself?
>
> Hmm yes. See fb45550d76bb5 "make sure that kernel_thread() callbacks call
> do_exit() themselves".
>
> > Looks a bit strange, I guess I missed something obvious.
>
> And I forgot to mention, given that the kernel_thread() callback should
> call do_exit() itself, then this part of cc3b13c11c567c69a63
>
> 	one case when a kernel thread can reach the
> 	usual syscall exit tracing path: when we create a kernel thread, the
> 	child comes to ret_from_fork
>
> is no longer relevant? A PF_KTHREAD child should never return from the
> callback and thus it should never do "jmp syscall_exit" ?

this is still true, I guess.

Oleg.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH RESEND 2/2] tracing: syscall_regfunc() should not skip kernel threads
  2014-04-10 18:14                       ` Oleg Nesterov
  2014-04-10 19:00                         ` Oleg Nesterov
@ 2014-04-10 19:13                         ` Steven Rostedt
  2014-04-10 19:38                           ` Oleg Nesterov
  1 sibling, 1 reply; 55+ messages in thread
From: Steven Rostedt @ 2014-04-10 19:13 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton,
	Ingo Molnar, Hendrik Brueckner

On Thu, 10 Apr 2014 20:14:17 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> And I forgot to mention, given that the kernel_thread() callback should
> call do_exit() itself, then this part of cc3b13c11c567c69a63
> 
> 	one case when a kernel thread can reach the
> 	usual syscall exit tracing path: when we create a kernel thread, the
> 	child comes to ret_from_fork
> 
> is no longer relevant? A PF_KTHREAD child should never return from the
> callback and thus it should never do "jmp syscall_exit" ?
> 

Are you sure. On set up of the kthread, create_kthread() calls
kernel_thread() with "kthread()" as its first parameter.

kernel_thread() then calls do_fork() passing the "kthread" function as
the stack_start parameter, which if you follow where that goes, it gets
to copy_thread() in process_[63][42].c which assigns sp (the function)
to the bx register for the PF_KTHREAD case. But more importantly, it
sets up the stack to have ip pointing to ret_from_kernel_thread (32 bit
version).

The jmp syscall_exit when it goes to return to "userspace" will in
actuality return to ret_from_kernel_thread (32 bit). Which this does:

	call *PT_EBX(%esp)

which calls your handler. But then again, this calls syscall_exit when
done, which probably will never be hit as kthread() calls do_exit()
itself. Perhaps if something goes wrong, syscall_exit can handle any
faults that can happen?

For 64 bit, the check for kernel thread is in ret_from_fork itself.
which does the call *%rbx, but again, if it fails, it then calls
int_ret_from_sys_call, which it may also handle faults.

Looks like kernel threads on 32bit call syscall exit at least once, to
get to ret_from_kernel_thread. Not sure why it does that. Perhaps this
could be another 32bit clean up to make it more like x86_64.

-- Steve

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread()
  2014-04-10 17:36                 ` Peter Zijlstra
  2014-04-10 17:42                   ` Peter Zijlstra
@ 2014-04-10 19:15                   ` Oleg Nesterov
  2014-04-10 20:55                     ` Peter Zijlstra
  1 sibling, 1 reply; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-10 19:15 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On 04/10, Peter Zijlstra wrote:
>
> On Thu, Apr 10, 2014 at 07:29:38PM +0200, Oleg Nesterov wrote:
>
> > > +static inline __deprecated
> > > +struct task_struct *next_thread(const struct task_struct *p)
> > >  {
> >
> > Not sure... But probably fine too.
> >
> > I already killed some users of next_thread(). This reminds me about
> > next_tid(), probably it should be converted too.
> >
> > As for, say, __exit_signal() it really needs next_thread(). We can fix
> > it instead of deprecating, or we can add another one with another name.
>
> Well, your Changelog said that next_thread() was faulty too;

If lockless. Sure.

> if
> __exit_signal() is the only site where it is correct we can open-code it
> there. If there's more we should probably create a new function and
> audit all current sites.

So far I can see only 2 users, __exit_signal() and complete_signal().

May be next_tid(), but in fact it needs next_thread_or_zero(). And this
connects to check_hung_task(). What they actually (OK, perhaps) need is
for_each_thread_continue, like list_for_each_entry_continue_rcu.

We will see. I didn't finish this all because I was distracted (and lazy
of course ;).

My plan (== TODO) was:

	- check all current users, carefully convert those who need
	  a special attention (because the semantics differs a bit)

	- the same for thread_group_empty()

	- reimplement next_thread/while_each_thread via ->thread_head

	- kill task_struct->thread_group

	- all other users can be converted later

Oleg.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH RESEND 2/2] tracing: syscall_regfunc() should not skip kernel threads
  2014-04-10 19:13                         ` Steven Rostedt
@ 2014-04-10 19:38                           ` Oleg Nesterov
  2014-04-10 19:55                             ` Steven Rostedt
  0 siblings, 1 reply; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-10 19:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton,
	Ingo Molnar, Hendrik Brueckner

On 04/10, Steven Rostedt wrote:
>
> On Thu, 10 Apr 2014 20:14:17 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
> 
> 
> > And I forgot to mention, given that the kernel_thread() callback should
> > call do_exit() itself, then this part of cc3b13c11c567c69a63
> > 
> > 	one case when a kernel thread can reach the
> > 	usual syscall exit tracing path: when we create a kernel thread, the
> > 	child comes to ret_from_fork
> > 
> > is no longer relevant? A PF_KTHREAD child should never return from the
> > callback and thus it should never do "jmp syscall_exit" ?
> > 
> 
> Are you sure.

Not.

> On set up of the kthread, create_kthread() calls
> kernel_thread() with "kthread()" as its first parameter.
> 
> kernel_thread() then calls do_fork() passing the "kthread" function as
> the stack_start parameter, which if you follow where that goes, it gets
> to copy_thread() in process_[63][42].c which assigns sp (the function)
> to the bx register for the PF_KTHREAD case. But more importantly, it
> sets up the stack to have ip pointing to ret_from_kernel_thread (32 bit
> version).
> 
> The jmp syscall_exit when it goes to return to "userspace" will in
> actuality return to ret_from_kernel_thread (32 bit). Which this does:
> 
> 	call *PT_EBX(%esp)
> 
> which calls your handler. But then again, this calls syscall_exit when
> done, which probably will never be hit as kthread() calls do_exit()
> itself. Perhaps if something goes wrong, syscall_exit can handle any
> faults that can happen?
> 
> For 64 bit, the check for kernel thread is in ret_from_fork itself.
> which does the call *%rbx, but again, if it fails, it then calls
> int_ret_from_sys_call, which it may also handle faults.

See my previous email.

I _think_ that the kernel thread can only return from "call *%rbx" if
it is no longer a kernel thread, iow, do_execve() was called.

Oleg.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH RESEND 2/2] tracing: syscall_regfunc() should not skip kernel threads
  2014-04-10 19:38                           ` Oleg Nesterov
@ 2014-04-10 19:55                             ` Steven Rostedt
  2014-04-11 12:03                               ` Oleg Nesterov
  0 siblings, 1 reply; 55+ messages in thread
From: Steven Rostedt @ 2014-04-10 19:55 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton,
	Ingo Molnar, Hendrik Brueckner

On Thu, 10 Apr 2014 21:38:20 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> I _think_ that the kernel thread can only return from "call *%rbx" if
> it is no longer a kernel thread, iow, do_execve() was called.

Ah right. But only in special cases.

Actually, it only returns if the function in kernel_thread() returns,
and in the case of ____call_usermodehelper, it does an exec and
returns on success. But if it fails, it needs to call do_exit().

Thus, it's not sufficient to just say "only if do_execve is called" but
to say, if the handler that is called is only allowed to return iff it
did a execve first and succeeded.

-- Steve

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread()
  2014-04-10 19:15                   ` Oleg Nesterov
@ 2014-04-10 20:55                     ` Peter Zijlstra
  0 siblings, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2014-04-10 20:55 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On Thu, Apr 10, 2014 at 09:15:25PM +0200, Oleg Nesterov wrote:
> So far I can see only 2 users, __exit_signal() and complete_signal().

If that's it we could just open-code them and be done.

> May be next_tid(), but in fact it needs next_thread_or_zero(). And this
> connects to check_hung_task(). What they actually (OK, perhaps) need is
> for_each_thread_continue, like list_for_each_entry_continue_rcu.

That could work I suppose; brain has left the building already..

> We will see. I didn't finish this all because I was distracted (and lazy
> of course ;).

:-)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH RESEND 2/2] tracing: syscall_regfunc() should not skip kernel threads
  2014-04-10 19:55                             ` Steven Rostedt
@ 2014-04-11 12:03                               ` Oleg Nesterov
  2014-04-11 12:37                                 ` Steven Rostedt
  0 siblings, 1 reply; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-11 12:03 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton,
	Ingo Molnar, Hendrik Brueckner

On 04/10, Steven Rostedt wrote:
>
> On Thu, 10 Apr 2014 21:38:20 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > I _think_ that the kernel thread can only return from "call *%rbx" if
> > it is no longer a kernel thread, iow, do_execve() was called.
>
> Ah right. But only in special cases.
>
> Actually, it only returns if the function in kernel_thread() returns,
> and in the case of ____call_usermodehelper, it does an exec and
> returns on success. But if it fails, it needs to call do_exit().

Yes. If it simply returns the kernel will crash, without start_thread/etc
"int_ret_from_sys_call" can do nothing good.

So we can conclude that "there is only one case when a kernel thread can
reach the usual syscall exit tracing path:" from cc3b13c11c567c69 is no
longer valid and we can kille the PF_KTHREAD check in regfuncs.

> Thus, it's not sufficient to just say "only if do_execve is called" but
> to say, if the handler that is called is only allowed to return iff it
> did a execve first and succeeded.

I do remember, I already told you are pedant!

Oleg.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH RESEND 2/2] tracing: syscall_regfunc() should not skip kernel threads
  2014-04-11 12:03                               ` Oleg Nesterov
@ 2014-04-11 12:37                                 ` Steven Rostedt
  0 siblings, 0 replies; 55+ messages in thread
From: Steven Rostedt @ 2014-04-11 12:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton,
	Ingo Molnar, Hendrik Brueckner

On Fri, 11 Apr 2014 14:03:47 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

 
> I do remember, I already told you are pedant!

Point taken ;-)

I'll pull your patches into my 3.16 queue.

Thanks!

-- Steve


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH RESEND 1/2] tracing: syscall_*regfunc() can race with copy_process()
  2014-04-10 13:34             ` Oleg Nesterov
@ 2014-04-11 15:22               ` Steven Rostedt
  2014-04-11 15:58                 ` Oleg Nesterov
  0 siblings, 1 reply; 55+ messages in thread
From: Steven Rostedt @ 2014-04-11 15:22 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On Thu, 10 Apr 2014 15:34:35 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> On 04/10, Steven Rostedt wrote:
> >
> > On Wed, 9 Apr 2014 19:05:42 +0200
> > Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > > --- a/kernel/fork.c
> > > +++ b/kernel/fork.c
> > > @@ -1472,7 +1472,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,
> > >
> > >  	total_forks++;
> > >  	spin_unlock(&current->sighand->siglock);
> > > +	syscall_tracepoint_update(p);
> > >  	write_unlock_irq(&tasklist_lock);
> > > +
> >
> > So this looks to be a fix that probably should go to stable?
> 
> Not sure, this is up to you.

I'll have to think about this one, whether or not it should go to
stable.

> 
> > > @@ -732,33 +732,31 @@ static int sys_tracepoint_refcount;
> > >
> > >  void syscall_regfunc(void)
> > >  {
> > > -	unsigned long flags;
> > >  	struct task_struct *g, *t;
> > >
> > >  	if (!sys_tracepoint_refcount) {
> > > -		read_lock_irqsave(&tasklist_lock, flags);
> > > +		read_lock(&tasklist_lock);
> > >  		do_each_thread(g, t) {
> > >  			/* Skip kernel threads. */
> > > -			if (t->mm)
> > > +			if (!(t->flags & PF_KTHREAD))
> > >  				set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
> > >  		} while_each_thread(g, t);
> > > -		read_unlock_irqrestore(&tasklist_lock, flags);
> > > +		read_unlock(&tasklist_lock);
> > >  	}
> > >  	sys_tracepoint_refcount++;
> > >  }
> > >
> > >  void syscall_unregfunc(void)
> > >  {
> > > -	unsigned long flags;
> > >  	struct task_struct *g, *t;
> > >
> > >  	sys_tracepoint_refcount--;
> > >  	if (!sys_tracepoint_refcount) {
> > > -		read_lock_irqsave(&tasklist_lock, flags);
> > > +		read_lock(&tasklist_lock);
> > >  		do_each_thread(g, t) {
> > >  			clear_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
> > >  		} while_each_thread(g, t);
> > > -		read_unlock_irqrestore(&tasklist_lock, flags);
> > > +		read_unlock(&tasklist_lock);
> >
> > These changes should be in a separate patch, as they can wait till 3.16.
> 
> OK, I'll split this change.

Are you going to send a new series?

-- Steve


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH RESEND 1/2] tracing: syscall_*regfunc() can race with copy_process()
  2014-04-11 15:22               ` Steven Rostedt
@ 2014-04-11 15:58                 ` Oleg Nesterov
  2014-04-13 18:58                   ` [PATCH v2 0/3] tracing: syscall_*regfunc() fixes Oleg Nesterov
  0 siblings, 1 reply; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-11 15:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On 04/11, Steven Rostedt wrote:
>
> On Thu, 10 Apr 2014 15:34:35 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > On 04/10, Steven Rostedt wrote:
> > >
> > > On Wed, 9 Apr 2014 19:05:42 +0200
> > > Oleg Nesterov <oleg@redhat.com> wrote:
> > >
> > > > --- a/kernel/fork.c
> > > > +++ b/kernel/fork.c
> > > > @@ -1472,7 +1472,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,
> > > >
> > > >  	total_forks++;
> > > >  	spin_unlock(&current->sighand->siglock);
> > > > +	syscall_tracepoint_update(p);
> > > >  	write_unlock_irq(&tasklist_lock);
> > > > +
> > >
> > > So this looks to be a fix that probably should go to stable?
> >
> > Not sure, this is up to you.
>
> I'll have to think about this one, whether or not it should go to
> stable.

...

> > > These changes should be in a separate patch, as they can wait till 3.16.
> >
> > OK, I'll split this change.
>
> Are you going to send a new series?

Yes, will do. I will split 1/2, and I need to update the changelog
in 2/2.

Oleg.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 0/3] tracing: syscall_*regfunc() fixes
  2014-04-11 15:58                 ` Oleg Nesterov
@ 2014-04-13 18:58                   ` Oleg Nesterov
  2014-04-13 18:58                     ` [PATCH v2 1/3] tracing: fix syscall_*regfunc() vs copy_process() race Oleg Nesterov
                                       ` (4 more replies)
  0 siblings, 5 replies; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-13 18:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On 04/11, Oleg Nesterov wrote:
>
> On 04/11, Steven Rostedt wrote:
> >
> > Are you going to send a new series?
>
> Yes, will do. I will split 1/2, and I need to update the changelog
> in 2/2.

Please see the patches.

Frederic! I am sorry, I decided to steal your patch to simplify the
merging and avoid too many changes in the trivial syscall_*regfunc's.
In case you do not object, could you add your sob into 2/3 ? Otherwise
I can wait for your patch, then send v3 on top of it.

Hmm. This is off-topic, but CLONE_KERNEL should die, and kernel_init()
should be spawned without CLONE_SIGHAND. I'll send another patch.

Oleg.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 1/3] tracing: fix syscall_*regfunc() vs copy_process() race
  2014-04-13 18:58                   ` [PATCH v2 0/3] tracing: syscall_*regfunc() fixes Oleg Nesterov
@ 2014-04-13 18:58                     ` Oleg Nesterov
  2014-04-14 23:57                       ` Frederic Weisbecker
  2014-04-13 18:59                     ` [PATCH v2 2/3] tracing: change syscall_*regfunc() to check PF_KTHREAD and use for_each_process_thread() Oleg Nesterov
                                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-13 18:58 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

syscall_regfunc() and syscall_unregfunc() should set/clear
TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race
with copy_process() and miss the new child which was not added to
the process/thread lists yet.

Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT
under tasklist.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/trace/syscall.h |   15 +++++++++++++++
 kernel/fork.c           |    2 ++
 2 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/include/trace/syscall.h b/include/trace/syscall.h
index fed853f..291c282 100644
--- a/include/trace/syscall.h
+++ b/include/trace/syscall.h
@@ -4,6 +4,7 @@
 #include <linux/tracepoint.h>
 #include <linux/unistd.h>
 #include <linux/ftrace_event.h>
+#include <linux/thread_info.h>
 
 #include <asm/ptrace.h>
 
@@ -32,4 +33,18 @@ struct syscall_metadata {
 	struct ftrace_event_call *exit_event;
 };
 
+#ifdef CONFIG_TRACEPOINTS
+static inline void syscall_tracepoint_update(struct task_struct *p)
+{
+	if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
+		set_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
+	else
+		clear_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
+}
+#else
+static inline void syscall_tracepoint_update(struct task_struct *p)
+{
+}
+#endif
+
 #endif /* _TRACE_SYSCALL_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index a17621c..2d55a47 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1484,7 +1484,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 
 	total_forks++;
 	spin_unlock(&current->sighand->siglock);
+	syscall_tracepoint_update(p);
 	write_unlock_irq(&tasklist_lock);
+
 	proc_fork_connector(p);
 	cgroup_post_fork(p);
 	if (clone_flags & CLONE_THREAD)
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 2/3] tracing: change syscall_*regfunc() to check PF_KTHREAD and use for_each_process_thread()
  2014-04-13 18:58                   ` [PATCH v2 0/3] tracing: syscall_*regfunc() fixes Oleg Nesterov
  2014-04-13 18:58                     ` [PATCH v2 1/3] tracing: fix syscall_*regfunc() vs copy_process() race Oleg Nesterov
@ 2014-04-13 18:59                     ` Oleg Nesterov
  2014-04-13 18:59                     ` [PATCH v2 3/3] tracing: syscall_regfunc() should not skip kernel threads Oleg Nesterov
                                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-13 18:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

1. Remove _irqsafe from syscall_regfunc/syscall_unregfunc,
   read_lock(tasklist) doesn't need to disable irqs.

2. Change this code to avoid the deprecated do_each_thread()
   and use for_each_process_thread() (stolen from the patch
   from Frederic).

3. Change syscall_regfunc() to check PF_KTHREAD to skip
   the kernel threads, ->mm != NULL is the common mistake.

   Note: probably this check should be simply removed, needs
   another patch.

[fweisbec@gmail.com: s/do_each_thread/for_each_process_thread/]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/tracepoint.c |   24 +++++++++++-------------
 1 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 031cc56..d907b7b 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -742,33 +742,31 @@ static int sys_tracepoint_refcount;
 
 void syscall_regfunc(void)
 {
-	unsigned long flags;
-	struct task_struct *g, *t;
+	struct task_struct *p, *t;
 
 	if (!sys_tracepoint_refcount) {
-		read_lock_irqsave(&tasklist_lock, flags);
-		do_each_thread(g, t) {
+		read_lock(&tasklist_lock);
+		for_each_process_thread(p, t) {
 			/* Skip kernel threads. */
-			if (t->mm)
+			if (!(t->flags & PF_KTHREAD))
 				set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
-		} while_each_thread(g, t);
-		read_unlock_irqrestore(&tasklist_lock, flags);
+		}
+		read_unlock(&tasklist_lock);
 	}
 	sys_tracepoint_refcount++;
 }
 
 void syscall_unregfunc(void)
 {
-	unsigned long flags;
-	struct task_struct *g, *t;
+	struct task_struct *p, *t;
 
 	sys_tracepoint_refcount--;
 	if (!sys_tracepoint_refcount) {
-		read_lock_irqsave(&tasklist_lock, flags);
-		do_each_thread(g, t) {
+		read_lock(&tasklist_lock);
+		for_each_process_thread(p, t) {
 			clear_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
-		} while_each_thread(g, t);
-		read_unlock_irqrestore(&tasklist_lock, flags);
+		}
+		read_unlock(&tasklist_lock);
 	}
 }
 #endif
-- 
1.5.5.1



^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 3/3] tracing: syscall_regfunc() should not skip kernel threads
  2014-04-13 18:58                   ` [PATCH v2 0/3] tracing: syscall_*regfunc() fixes Oleg Nesterov
  2014-04-13 18:58                     ` [PATCH v2 1/3] tracing: fix syscall_*regfunc() vs copy_process() race Oleg Nesterov
  2014-04-13 18:59                     ` [PATCH v2 2/3] tracing: change syscall_*regfunc() to check PF_KTHREAD and use for_each_process_thread() Oleg Nesterov
@ 2014-04-13 18:59                     ` Oleg Nesterov
  2014-04-14 23:46                     ` [PATCH v2 0/3] tracing: syscall_*regfunc() fixes Frederic Weisbecker
  2014-06-18 14:23                     ` Steven Rostedt
  4 siblings, 0 replies; 55+ messages in thread
From: Oleg Nesterov @ 2014-04-13 18:59 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

syscall_regfunc() ignores the kernel threads because "it has no effect",
see cc3b13c1 "Don't trace kernel thread syscalls" which added this check.

However, this means that a user-space task spawned by call_usermodehelper()
will run without TIF_SYSCALL_TRACEPOINT if sys_tracepoint_refcount != 0.

Remove this check. The unnecessary report from ret_from_fork path mentioned
by cc3b13c1 is no longer possible, see See commit fb45550d76bb5 "make sure
that kernel_thread() callbacks call do_exit() themselves".

A kernel_thread() callback can only return and take the int_ret_from_sys_call
path after do_execve() succeeds, otherwise the kernel will crash. But in this
case it is no longer a kernel thread and thus is needs TIF_SYSCALL_TRACEPOINT.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/tracepoint.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index d907b7b..08456ba 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -747,9 +747,7 @@ void syscall_regfunc(void)
 	if (!sys_tracepoint_refcount) {
 		read_lock(&tasklist_lock);
 		for_each_process_thread(p, t) {
-			/* Skip kernel threads. */
-			if (!(t->flags & PF_KTHREAD))
-				set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
+			set_tsk_thread_flag(t, TIF_SYSCALL_TRACEPOINT);
 		}
 		read_unlock(&tasklist_lock);
 	}
-- 
1.5.5.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 0/3] tracing: syscall_*regfunc() fixes
  2014-04-13 18:58                   ` [PATCH v2 0/3] tracing: syscall_*regfunc() fixes Oleg Nesterov
                                       ` (2 preceding siblings ...)
  2014-04-13 18:59                     ` [PATCH v2 3/3] tracing: syscall_regfunc() should not skip kernel threads Oleg Nesterov
@ 2014-04-14 23:46                     ` Frederic Weisbecker
  2014-06-18 14:23                     ` Steven Rostedt
  4 siblings, 0 replies; 55+ messages in thread
From: Frederic Weisbecker @ 2014-04-14 23:46 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Mathieu Desnoyers, LKML, Andrew Morton, Ingo Molnar

On Sun, Apr 13, 2014 at 08:58:28PM +0200, Oleg Nesterov wrote:
> On 04/11, Oleg Nesterov wrote:
> >
> > On 04/11, Steven Rostedt wrote:
> > >
> > > Are you going to send a new series?
> >
> > Yes, will do. I will split 1/2, and I need to update the changelog
> > in 2/2.
> 
> Please see the patches.
> 
> Frederic! I am sorry, I decided to steal your patch to simplify the
> merging and avoid too many changes in the trivial syscall_*regfunc's.
> In case you do not object, could you add your sob into 2/3 ?

Sure, feel free to add it!

Thanks!

> Otherwise
> I can wait for your patch, then send v3 on top of it.
> 
> 
> 
> Hmm. This is off-topic, but CLONE_KERNEL should die, and kernel_init()
> should be spawned without CLONE_SIGHAND. I'll send another patch.
> 
> Oleg.
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 1/3] tracing: fix syscall_*regfunc() vs copy_process() race
  2014-04-13 18:58                     ` [PATCH v2 1/3] tracing: fix syscall_*regfunc() vs copy_process() race Oleg Nesterov
@ 2014-04-14 23:57                       ` Frederic Weisbecker
  0 siblings, 0 replies; 55+ messages in thread
From: Frederic Weisbecker @ 2014-04-14 23:57 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Steven Rostedt, Mathieu Desnoyers, LKML, Andrew Morton, Ingo Molnar

On Sun, Apr 13, 2014 at 08:58:54PM +0200, Oleg Nesterov wrote:
> syscall_regfunc() and syscall_unregfunc() should set/clear
> TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race
> with copy_process() and miss the new child which was not added to
> the process/thread lists yet.
> 
> Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT
> under tasklist.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>

> ---
>  include/trace/syscall.h |   15 +++++++++++++++
>  kernel/fork.c           |    2 ++
>  2 files changed, 17 insertions(+), 0 deletions(-)
> 
> diff --git a/include/trace/syscall.h b/include/trace/syscall.h
> index fed853f..291c282 100644
> --- a/include/trace/syscall.h
> +++ b/include/trace/syscall.h
> @@ -4,6 +4,7 @@
>  #include <linux/tracepoint.h>
>  #include <linux/unistd.h>
>  #include <linux/ftrace_event.h>
> +#include <linux/thread_info.h>
>  
>  #include <asm/ptrace.h>
>  
> @@ -32,4 +33,18 @@ struct syscall_metadata {
>  	struct ftrace_event_call *exit_event;
>  };
>  
> +#ifdef CONFIG_TRACEPOINTS
> +static inline void syscall_tracepoint_update(struct task_struct *p)
> +{
> +	if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
> +		set_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
> +	else
> +		clear_tsk_thread_flag(p, TIF_SYSCALL_TRACEPOINT);
> +}
> +#else
> +static inline void syscall_tracepoint_update(struct task_struct *p)
> +{
> +}
> +#endif
> +
>  #endif /* _TRACE_SYSCALL_H */
> diff --git a/kernel/fork.c b/kernel/fork.c
> index a17621c..2d55a47 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1484,7 +1484,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>  
>  	total_forks++;
>  	spin_unlock(&current->sighand->siglock);
> +	syscall_tracepoint_update(p);
>  	write_unlock_irq(&tasklist_lock);
> +
>  	proc_fork_connector(p);
>  	cgroup_post_fork(p);
>  	if (clone_flags & CLONE_THREAD)
> -- 
> 1.5.5.1
> 
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 0/3] tracing: syscall_*regfunc() fixes
  2014-04-13 18:58                   ` [PATCH v2 0/3] tracing: syscall_*regfunc() fixes Oleg Nesterov
                                       ` (3 preceding siblings ...)
  2014-04-14 23:46                     ` [PATCH v2 0/3] tracing: syscall_*regfunc() fixes Frederic Weisbecker
@ 2014-06-18 14:23                     ` Steven Rostedt
  2014-06-18 15:36                       ` Oleg Nesterov
  4 siblings, 1 reply; 55+ messages in thread
From: Steven Rostedt @ 2014-06-18 14:23 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On Sun, 13 Apr 2014 20:58:28 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> On 04/11, Oleg Nesterov wrote:
> >
> > On 04/11, Steven Rostedt wrote:
> > >
> > > Are you going to send a new series?
> >
> > Yes, will do. I will split 1/2, and I need to update the changelog
> > in 2/2.
> 
> Please see the patches.
> 

I'm pulling these in now and will start tests, wont be till tomorrow
till I can push them.

If you start a new patch series, either make them a new thread or at
least a reply from the 0/x of the previous version. Having a v2 series
starting at the bottom of the 1/x thread will get lost.

-- Steve

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 0/3] tracing: syscall_*regfunc() fixes
  2014-06-18 14:23                     ` Steven Rostedt
@ 2014-06-18 15:36                       ` Oleg Nesterov
  0 siblings, 0 replies; 55+ messages in thread
From: Oleg Nesterov @ 2014-06-18 15:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, Frederic Weisbecker, LKML, Andrew Morton, Ingo Molnar

On 06/18, Steven Rostedt wrote:
>
> On Sun, 13 Apr 2014 20:58:28 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > On 04/11, Oleg Nesterov wrote:
> > >
> > > On 04/11, Steven Rostedt wrote:
> > > >
> > > > Are you going to send a new series?
> > >
> > > Yes, will do. I will split 1/2, and I need to update the changelog
> > > in 2/2.
> >
> > Please see the patches.
> >
>
> I'm pulling these in now and will start tests, wont be till tomorrow
> till I can push them.

Great, thanks ;)

> If you start a new patch series, either make them a new thread or at
> least a reply from the 0/x of the previous version. Having a v2 series
> starting at the bottom of the 1/x thread will get lost.

Ah, OK, sorry about that.

Oleg.


^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2014-06-18 15:38 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-09 16:11 [PATCH 0/5] core: Convert thread iteration to use for_each[_process]_thread APIs, 1st pile Frederic Weisbecker
2014-04-09 16:11 ` [PATCH 1/5] sched: Convert thread_group_cputime() to use for_each_thread() Frederic Weisbecker
2014-04-09 17:12   ` Oleg Nesterov
2014-04-09 17:16   ` Peter Zijlstra
2014-04-09 17:32     ` Oleg Nesterov
2014-04-09 18:30       ` Peter Zijlstra
2014-04-09 19:46         ` Oleg Nesterov
2014-04-09 19:49           ` Peter Zijlstra
2014-04-10 16:19             ` Peter Zijlstra
2014-04-10 16:32               ` Peter Zijlstra
2014-04-10 17:29               ` Oleg Nesterov
2014-04-10 17:36                 ` Peter Zijlstra
2014-04-10 17:42                   ` Peter Zijlstra
2014-04-10 19:15                   ` Oleg Nesterov
2014-04-10 20:55                     ` Peter Zijlstra
2014-04-10  7:56           ` Ingo Molnar
2014-04-09 16:11 ` [PATCH 2/5] tracepoint: Convert process iteration to use for_each_process_thread() Frederic Weisbecker
2014-04-09 16:28   ` Mathieu Desnoyers
2014-04-09 16:40     ` Frederic Weisbecker
2014-04-09 16:42     ` Steven Rostedt
2014-04-09 17:05       ` [PATCH 0/2] Was: " Oleg Nesterov
2014-04-09 17:05         ` [PATCH RESEND 1/2] tracing: syscall_*regfunc() can race with copy_process() Oleg Nesterov
2014-04-10 13:04           ` Steven Rostedt
2014-04-10 13:33             ` Oleg Nesterov
2014-04-10 13:06           ` Steven Rostedt
2014-04-10 13:34             ` Oleg Nesterov
2014-04-11 15:22               ` Steven Rostedt
2014-04-11 15:58                 ` Oleg Nesterov
2014-04-13 18:58                   ` [PATCH v2 0/3] tracing: syscall_*regfunc() fixes Oleg Nesterov
2014-04-13 18:58                     ` [PATCH v2 1/3] tracing: fix syscall_*regfunc() vs copy_process() race Oleg Nesterov
2014-04-14 23:57                       ` Frederic Weisbecker
2014-04-13 18:59                     ` [PATCH v2 2/3] tracing: change syscall_*regfunc() to check PF_KTHREAD and use for_each_process_thread() Oleg Nesterov
2014-04-13 18:59                     ` [PATCH v2 3/3] tracing: syscall_regfunc() should not skip kernel threads Oleg Nesterov
2014-04-14 23:46                     ` [PATCH v2 0/3] tracing: syscall_*regfunc() fixes Frederic Weisbecker
2014-06-18 14:23                     ` Steven Rostedt
2014-06-18 15:36                       ` Oleg Nesterov
2014-04-09 17:06         ` [PATCH RESEND 2/2] tracing: syscall_regfunc() should not skip kernel threads Oleg Nesterov
2014-04-10 13:28           ` Steven Rostedt
2014-04-10 13:38             ` Oleg Nesterov
2014-04-10 14:28               ` Steven Rostedt
2014-04-10 14:46                 ` Oleg Nesterov
2014-04-10 15:08                   ` Steven Rostedt
2014-04-10 17:57                     ` Oleg Nesterov
2014-04-10 18:14                       ` Oleg Nesterov
2014-04-10 19:00                         ` Oleg Nesterov
2014-04-10 19:13                         ` Steven Rostedt
2014-04-10 19:38                           ` Oleg Nesterov
2014-04-10 19:55                             ` Steven Rostedt
2014-04-11 12:03                               ` Oleg Nesterov
2014-04-11 12:37                                 ` Steven Rostedt
2014-04-10 13:03         ` [PATCH 0/2] Was: Convert process iteration to use for_each_process_thread() Steven Rostedt
2014-04-09 16:11 ` [PATCH 3/5] hung_task: " Frederic Weisbecker
2014-04-09 17:23   ` Oleg Nesterov
2014-04-09 16:11 ` [PATCH 4/5] procfs: Convert process iteration to use for_each_thread() Frederic Weisbecker
2014-04-09 16:11 ` [PATCH 5/5] sched: Convert tasks iteration to use for_each_process_thread() Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.