All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] task_work_queue() && keyctl_session_to_parent()
@ 2012-04-12  2:47 Oleg Nesterov
  2012-04-12  2:48 ` [PATCH v2 1/2] task_work_queue: add generic process-context callbacks Oleg Nesterov
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Oleg Nesterov @ 2012-04-12  2:47 UTC (permalink / raw)
  To: Andrew Morton, David Howells, Linus Torvalds
  Cc: David Smith, Frank Ch. Eigler, Larry Woodman, Peter Zijlstra,
	Tejun Heo, linux-kernel

Hello.

So, task_work_queue() again, now with the first user.

Compared to v1, 1/2 adds task_work_cancel(func), it removes
the first work with the same callback. May be it should remove
all and return the list, or we can add task_work_requeue()
instead. Easy to change if needed, although I think a simple
_cancel is enough.

checkpatch.pl dislikes replace_session_keyring() in 2/2, but
I disagree. This code was copy-and-past'ed, and I really think
that the "unnecessary" spaces make it more readable.

Of course, I can "fix" the code to make checkpatch.pl happy.

I tried to test this, seems to work. But please review the
the error-handling in keyctl_session_to_parent().

I do not like when diffstat shows the code bloat, but note that
the numbers will be much better after we kill the no longer used
->replacement_session_keyring and cleanup arch/*/kernel/signal.c

Oleg.

 include/linux/key.h          |    2 +-
 include/linux/sched.h        |    2 +
 include/linux/task_work.h    |   48 +++++++++++++++++++++++
 include/linux/tracehook.h    |   10 ++++-
 kernel/Makefile              |    2 +-
 kernel/exit.c                |    5 ++-
 kernel/fork.c                |    1 +
 kernel/task_work.c           |   77 +++++++++++++++++++++++++++++++++++++
 security/keys/keyctl.c       |   87 +++++++++++++++++++++++++++++++-----------
 security/keys/process_keys.c |   49 -----------------------
 10 files changed, 207 insertions(+), 76 deletions(-)


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/2] task_work_queue: add generic process-context callbacks
  2012-04-12  2:47 [PATCH v2 0/2] task_work_queue() && keyctl_session_to_parent() Oleg Nesterov
@ 2012-04-12  2:48 ` Oleg Nesterov
  2012-04-12  4:00   ` hlist_for_each_entry && pos (Was: task_work_queue) Oleg Nesterov
  2012-04-12  2:48 ` [PATCH v2 2/2] cred: change keyctl_session_to_parent() to use task_work_queue() Oleg Nesterov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2012-04-12  2:48 UTC (permalink / raw)
  To: Andrew Morton, David Howells, Linus Torvalds
  Cc: David Smith, Frank Ch. Eigler, Larry Woodman, Peter Zijlstra,
	Tejun Heo, linux-kernel

Provide a simple mechanism that allows running code in the
(nonatomic) context of the arbitrary task.

The caller does task_work_queue(task, task_work) and this task
executes task_work->func() either from do_notify_resume() or
from do_exit(). The callback can rely on PF_EXITING to detect
the latter case.

"struct task_work" can be embedded in another struct, still it
has "void *data" to handle the most common/simple case.

This allows us to kill the ->replacement_session_keyring hack,
and potentially this can have more users.

Performance-wise, this adds 2 "unlikely(!hlist_empty())" checks
into tracehook_notify_resume() and do_exit(). But at the same
time we can remove the "replacement_session_keyring != NULL"
checks from arch/*/signal.c and exit_creds().

Note: task_work_queue/task_work_run abuses ->pi_lock. This is
only because this lock is already used by lookup_pi_state() to
synchronize with do_exit() setting PF_EXITING. Fortunately the
scope of this lock in task_work.c is really tiny, and the code
is unlikely anyway.

Todo:
	- move clear_thread_flag(TIF_NOTIFY_RESUME) from arch/
	  to tracehook_notify_resume()

	- rename tracehook_notify_resume() and move it into
	  linux/task_work.h

	- m68k and xtensa don't have TIF_NOTIFY_RESUME and thus
	  they can't use this feature.

	  However, ->replacement_session_keyring equally needs
	  this logic, task_work_queue() is not worse.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/sched.h     |    2 +
 include/linux/task_work.h |   48 ++++++++++++++++++++++++++++
 include/linux/tracehook.h |   10 +++++-
 kernel/Makefile           |    2 +-
 kernel/exit.c             |    5 ++-
 kernel/fork.c             |    1 +
 kernel/task_work.c        |   77 +++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 142 insertions(+), 3 deletions(-)
 create mode 100644 include/linux/task_work.h
 create mode 100644 kernel/task_work.c

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 81a173c..be004ac 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1445,6 +1445,8 @@ struct task_struct {
 	int (*notifier)(void *priv);
 	void *notifier_data;
 	sigset_t *notifier_mask;
+	struct hlist_head task_works;
+
 	struct audit_context *audit_context;
 #ifdef CONFIG_AUDITSYSCALL
 	uid_t loginuid;
diff --git a/include/linux/task_work.h b/include/linux/task_work.h
new file mode 100644
index 0000000..141a2b5
--- /dev/null
+++ b/include/linux/task_work.h
@@ -0,0 +1,48 @@
+#ifndef _LINUX_TASK_WORK_H
+#define _LINUX_TASK_WORK_H	1
+
+#include <linux/sched.h>
+
+struct task_work;
+typedef void (*task_work_func_t)(struct task_work *);
+
+struct task_work {
+	struct hlist_node hlist;
+	task_work_func_t func;
+	void *data;
+};
+
+static inline void
+init_task_work(struct task_work *twork, task_work_func_t func, void *data)
+{
+	twork->func = func;
+	twork->data = data;
+}
+
+#ifdef TIF_NOTIFY_RESUME
+int task_work_queue(struct task_struct *task, struct task_work *twork);
+struct task_work *task_work_cancel(struct task_struct *, task_work_func_t);
+void task_work_run(struct task_struct *task);
+#else
+static inline int
+task_work_queue(struct task_struct *task, struct task_work *twork)
+{
+	return -ENOTSUPP;
+}
+static inline struct task_work *
+task_work_cancel(struct task_struct *task, task_work_func_t func)
+{
+	return NULL;
+}
+static inline void task_work_run(struct task_struct *task)
+{
+}
+#endif	/* TIF_NOTIFY_RESUME */
+
+static inline void exit_task_work(struct task_struct *task)
+{
+	if (unlikely(!hlist_empty(&task->task_works)))
+		task_work_run(task);
+}
+
+#endif	/* _LINUX_TASK_WORK_H */
diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h
index 51bd91d..67dd262 100644
--- a/include/linux/tracehook.h
+++ b/include/linux/tracehook.h
@@ -46,7 +46,7 @@
 #ifndef _LINUX_TRACEHOOK_H
 #define _LINUX_TRACEHOOK_H	1
 
-#include <linux/sched.h>
+#include <linux/task_work.h>
 #include <linux/ptrace.h>
 #include <linux/security.h>
 struct linux_binprm;
@@ -184,6 +184,14 @@ static inline void set_notify_resume(struct task_struct *task)
  */
 static inline void tracehook_notify_resume(struct pt_regs *regs)
 {
+	/*
+	 * The caller just cleared TIF_NOTIFY_RESUME. This barrier
+	 * pairs with task_work_queue()->set_notify_resume() after
+	 * hlist_add_head(task->task_works);
+	 */
+	smp_mb__after_clear_bit();
+	if (unlikely(!hlist_empty(&current->task_works)))
+		task_work_run(current);
 }
 #endif	/* TIF_NOTIFY_RESUME */
 
diff --git a/kernel/Makefile b/kernel/Makefile
index cb41b95..5790f8b 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -10,7 +10,7 @@ obj-y     = fork.o exec_domain.o panic.o printk.o \
 	    kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
 	    hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \
 	    notifier.o ksysfs.o cred.o \
-	    async.o range.o groups.o
+	    async.o range.o groups.o task_work.o
 
 ifdef CONFIG_FUNCTION_TRACER
 # Do not trace debug files and internal ftrace files
diff --git a/kernel/exit.c b/kernel/exit.c
index d8bd3b4..dc852c2 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -946,11 +946,14 @@ void do_exit(long code)
 	exit_signals(tsk);  /* sets PF_EXITING */
 	/*
 	 * tsk->flags are checked in the futex code to protect against
-	 * an exiting task cleaning up the robust pi futexes.
+	 * an exiting task cleaning up the robust pi futexes, and in
+	 * task_work_queue() to avoid the race with exit_task_work().
 	 */
 	smp_mb();
 	raw_spin_unlock_wait(&tsk->pi_lock);
 
+	exit_task_work(tsk);
+
 	exit_irq_thread();
 
 	if (unlikely(in_atomic()))
diff --git a/kernel/fork.c b/kernel/fork.c
index b9372a0..d1108ac 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1380,6 +1380,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	 */
 	p->group_leader = p;
 	INIT_LIST_HEAD(&p->thread_group);
+	INIT_HLIST_HEAD(&p->task_works);
 
 	/* Now that the task is set up, run cgroup callbacks if
 	 * necessary. We need to run them before the task is visible
diff --git a/kernel/task_work.c b/kernel/task_work.c
new file mode 100644
index 0000000..a721a5e
--- /dev/null
+++ b/kernel/task_work.c
@@ -0,0 +1,77 @@
+#include <linux/tracehook.h>
+
+#ifdef TIF_NOTIFY_RESUME
+int task_work_queue(struct task_struct *task, struct task_work *twork)
+{
+	unsigned long flags;
+	int err = -ESRCH;
+	/*
+	 * We must not insert the new work if the task has already passed
+	 * exit_task_work(). We rely on do_exit()->raw_spin_unlock_wait()
+	 * and check PF_EXITING under pi_lock.
+	 */
+	raw_spin_lock_irqsave(&task->pi_lock, flags);
+	if (likely(!(task->flags & PF_EXITING))) {
+		hlist_add_head(&twork->hlist, &task->task_works);
+		err = 0;
+	}
+	raw_spin_unlock_irqrestore(&task->pi_lock, flags);
+
+	/* test_and_set_bit() implies mb(), see tracehook_notify_resume(). */
+	if (likely(!err))
+		set_notify_resume(task);
+	return err;
+}
+
+struct task_work *
+task_work_cancel(struct task_struct *task, task_work_func_t func)
+{
+	unsigned long flags;
+	struct task_work *twork;
+	struct hlist_node *pos;
+
+	raw_spin_lock_irqsave(&task->pi_lock, flags);
+	hlist_for_each_entry(twork, pos, &task->task_works, hlist) {
+		if (twork->func == func) {
+			hlist_del(&twork->hlist);
+			goto found;
+		}
+	}
+	twork = NULL;
+ found:
+	raw_spin_unlock_irqrestore(&task->pi_lock, flags);
+
+	return twork;
+}
+
+void task_work_run(struct task_struct *task)
+{
+	struct hlist_head task_works;
+	struct hlist_node *pos;
+
+	raw_spin_lock_irq(&task->pi_lock);
+	hlist_move_list(&task->task_works, &task_works);
+	raw_spin_unlock_irq(&task->pi_lock);
+
+	if (unlikely(hlist_empty(&task_works)))
+		return;
+	/*
+	 * We use hlist to save the space in task_struct, but we want fifo.
+	 * Find the last entry, the list should be short, then process them
+	 * in reverse order.
+	 */
+	for (pos = task_works.first; pos->next; pos = pos->next)
+		;
+
+	for (;;) {
+		struct hlist_node **pprev = pos->pprev;
+		struct task_work *twork = container_of(pos, struct task_work,
+							hlist);
+		twork->func(twork);
+
+		if (pprev == &task_works.first)
+			break;
+		pos = container_of(pprev, struct hlist_node, next);
+	}
+}
+#endif /* TIF_NOTIFY_RESUME */
-- 
1.5.5.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 2/2] cred: change keyctl_session_to_parent() to use task_work_queue()
  2012-04-12  2:47 [PATCH v2 0/2] task_work_queue() && keyctl_session_to_parent() Oleg Nesterov
  2012-04-12  2:48 ` [PATCH v2 1/2] task_work_queue: add generic process-context callbacks Oleg Nesterov
@ 2012-04-12  2:48 ` Oleg Nesterov
  2012-04-12  9:29 ` David Howells
  2012-04-12  9:35 ` TIF_NOTIFY_RESUME [was Re: [PATCH v2 1/2] task_work_queue: add generic process-context callbacks] David Howells
  3 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2012-04-12  2:48 UTC (permalink / raw)
  To: Andrew Morton, David Howells, Linus Torvalds
  Cc: David Smith, Frank Ch. Eigler, Larry Woodman, Peter Zijlstra,
	Tejun Heo, linux-kernel

Change keyctl_session_to_parent() to use task_work_queue() and
move key_replace_session_keyring() logic into task_work->func().

Note that we do task_work_cancel() before task_work_queue() to
ensure that only one work can be pending at any time. This is
important, we must not allow user-space to abuse the parent's
->task_works list.

The callback, replace_session_keyring(), checks PF_EXITING.
I guess this is not really needed but looks better.

We do not report the error if we race with the exiting parent
and task_work_queue() fails, this matches the current behaviour.

As a side effect, this fixes the (unlikely) race. The callers
of key_replace_session_keyring() and keyctl_session_to_parent()
lack the necessary barriers, the parent can miss the request.

Now we can remove task_struct->replacement_session_keyring and
related code.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/key.h          |    2 +-
 security/keys/keyctl.c       |   87 +++++++++++++++++++++++++++++++-----------
 security/keys/process_keys.c |   49 -----------------------
 3 files changed, 65 insertions(+), 73 deletions(-)

diff --git a/include/linux/key.h b/include/linux/key.h
index 96933b1..2f00c2a 100644
--- a/include/linux/key.h
+++ b/include/linux/key.h
@@ -303,7 +303,7 @@ static inline bool key_is_instantiated(const struct key *key)
 extern ctl_table key_sysctls[];
 #endif
 
-extern void key_replace_session_keyring(void);
+#define key_replace_session_keyring()	do { } while (0)
 
 /*
  * the userspace interface
diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c
index fb767c6..00b2e02 100644
--- a/security/keys/keyctl.c
+++ b/security/keys/keyctl.c
@@ -22,6 +22,7 @@
 #include <linux/err.h>
 #include <linux/vmalloc.h>
 #include <linux/security.h>
+#include <linux/task_work.h>
 #include <asm/uaccess.h>
 #include "internal.h"
 
@@ -1409,6 +1410,53 @@ long keyctl_get_security(key_serial_t keyid,
 	return ret;
 }
 
+static void replace_session_keyring(struct task_work *twork)
+{
+	const struct cred *old = current_cred();
+	struct cred *new = twork->data;
+
+	kfree(twork);
+	if (unlikely(current->flags & PF_EXITING)) {
+		put_cred(new);
+		return;
+	}
+
+	new->  uid	= old->  uid;
+	new-> euid	= old-> euid;
+	new-> suid	= old-> suid;
+	new->fsuid	= old->fsuid;
+	new->  gid	= old->  gid;
+	new-> egid	= old-> egid;
+	new-> sgid	= old-> sgid;
+	new->fsgid	= old->fsgid;
+	new->user	= get_uid(old->user);
+	new->user_ns	= new->user->user_ns;
+	new->group_info	= get_group_info(old->group_info);
+
+	new->securebits	= old->securebits;
+	new->cap_inheritable	= old->cap_inheritable;
+	new->cap_permitted	= old->cap_permitted;
+	new->cap_effective	= old->cap_effective;
+	new->cap_bset		= old->cap_bset;
+
+	new->jit_keyring	= old->jit_keyring;
+	new->thread_keyring	= key_get(old->thread_keyring);
+	new->tgcred->tgid	= old->tgcred->tgid;
+	new->tgcred->process_keyring = key_get(old->tgcred->process_keyring);
+
+	security_transfer_creds(new, old);
+
+	commit_creds(new);
+}
+
+static void free_cred_work(struct task_work *twork)
+{
+	if (twork) {
+		put_cred(twork->data);
+		kfree(twork);
+	}
+}
+
 /*
  * Attempt to install the calling process's session keyring on the process's
  * parent process.
@@ -1423,27 +1471,31 @@ long keyctl_get_security(key_serial_t keyid,
  */
 long keyctl_session_to_parent(void)
 {
-#ifdef TIF_NOTIFY_RESUME
 	struct task_struct *me, *parent;
 	const struct cred *mycred, *pcred;
-	struct cred *cred, *oldcred;
+	struct task_work *newwork, *oldwork;
 	key_ref_t keyring_r;
+	struct cred *cred;
 	int ret;
 
 	keyring_r = lookup_user_key(KEY_SPEC_SESSION_KEYRING, 0, KEY_LINK);
 	if (IS_ERR(keyring_r))
 		return PTR_ERR(keyring_r);
 
+	ret = -ENOMEM;
+	newwork = kmalloc(sizeof(struct task_work), GFP_KERNEL);
+	if (!newwork)
+		goto error_keyring;
+
 	/* our parent is going to need a new cred struct, a new tgcred struct
 	 * and new security data, so we allocate them here to prevent ENOMEM in
 	 * our parent */
-	ret = -ENOMEM;
 	cred = cred_alloc_blank();
 	if (!cred)
 		goto error_keyring;
 
 	cred->tgcred->session_keyring = key_ref_to_ptr(keyring_r);
-	keyring_r = NULL;
+	init_task_work(newwork, replace_session_keyring, cred);
 
 	me = current;
 	rcu_read_lock();
@@ -1484,20 +1536,17 @@ long keyctl_session_to_parent(void)
 	    mycred->tgcred->session_keyring->uid != mycred->euid)
 		goto not_permitted;
 
-	/* if there's an already pending keyring replacement, then we replace
-	 * that */
-	oldcred = parent->replacement_session_keyring;
+	/* cancel an already pending keyring replacement */
+	oldwork = task_work_cancel(parent, replace_session_keyring);
 
 	/* the replacement session keyring is applied just prior to userspace
 	 * restarting */
-	parent->replacement_session_keyring = cred;
-	cred = NULL;
-	set_ti_thread_flag(task_thread_info(parent), TIF_NOTIFY_RESUME);
-
+	if (!task_work_queue(parent, newwork))
+		newwork = NULL;
 	write_unlock_irq(&tasklist_lock);
 	rcu_read_unlock();
-	if (oldcred)
-		put_cred(oldcred);
+	free_cred_work(oldwork);
+	free_cred_work(newwork);
 	return 0;
 
 already_same:
@@ -1505,21 +1554,13 @@ already_same:
 not_permitted:
 	write_unlock_irq(&tasklist_lock);
 	rcu_read_unlock();
-	put_cred(cred);
+	free_cred_work(newwork);
 	return ret;
 
 error_keyring:
+	kfree(newwork);
 	key_ref_put(keyring_r);
 	return ret;
-
-#else /* !TIF_NOTIFY_RESUME */
-	/*
-	 * To be removed when TIF_NOTIFY_RESUME has been implemented on
-	 * m68k/xtensa
-	 */
-#warning TIF_NOTIFY_RESUME not implemented
-	return -EOPNOTSUPP;
-#endif /* !TIF_NOTIFY_RESUME */
 }
 
 /*
diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
index be7ecb2..6629137 100644
--- a/security/keys/process_keys.c
+++ b/security/keys/process_keys.c
@@ -827,52 +827,3 @@ error:
 	abort_creds(new);
 	return ret;
 }
-
-/*
- * Replace a process's session keyring on behalf of one of its children when
- * the target  process is about to resume userspace execution.
- */
-void key_replace_session_keyring(void)
-{
-	const struct cred *old;
-	struct cred *new;
-
-	if (!current->replacement_session_keyring)
-		return;
-
-	write_lock_irq(&tasklist_lock);
-	new = current->replacement_session_keyring;
-	current->replacement_session_keyring = NULL;
-	write_unlock_irq(&tasklist_lock);
-
-	if (!new)
-		return;
-
-	old = current_cred();
-	new->  uid	= old->  uid;
-	new-> euid	= old-> euid;
-	new-> suid	= old-> suid;
-	new->fsuid	= old->fsuid;
-	new->  gid	= old->  gid;
-	new-> egid	= old-> egid;
-	new-> sgid	= old-> sgid;
-	new->fsgid	= old->fsgid;
-	new->user	= get_uid(old->user);
-	new->user_ns	= new->user->user_ns;
-	new->group_info	= get_group_info(old->group_info);
-
-	new->securebits	= old->securebits;
-	new->cap_inheritable	= old->cap_inheritable;
-	new->cap_permitted	= old->cap_permitted;
-	new->cap_effective	= old->cap_effective;
-	new->cap_bset		= old->cap_bset;
-
-	new->jit_keyring	= old->jit_keyring;
-	new->thread_keyring	= key_get(old->thread_keyring);
-	new->tgcred->tgid	= old->tgcred->tgid;
-	new->tgcred->process_keyring = key_get(old->tgcred->process_keyring);
-
-	security_transfer_creds(new, old);
-
-	commit_creds(new);
-}
-- 
1.5.5.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* hlist_for_each_entry && pos (Was: task_work_queue)
  2012-04-12  2:48 ` [PATCH v2 1/2] task_work_queue: add generic process-context callbacks Oleg Nesterov
@ 2012-04-12  4:00   ` Oleg Nesterov
  2012-04-12  4:12     ` Linus Torvalds
  2012-04-16 22:35     ` Paul E. McKenney
  0 siblings, 2 replies; 14+ messages in thread
From: Oleg Nesterov @ 2012-04-12  4:00 UTC (permalink / raw)
  To: Andrew Morton, David Howells, Linus Torvalds
  Cc: David Smith, Frank Ch. Eigler, Larry Woodman, Peter Zijlstra,
	Tejun Heo, linux-kernel

On 04/12, Oleg Nesterov wrote:
>
> +	hlist_for_each_entry(twork, pos, &task->task_works, hlist) {

This reminds me.

hlist_for_each_entry_*() do not need "pos", it can be

	#define hlist_for_each_entry(pos, head, member)					\
		for (pos = (void*)(head)->first;					\
		pos && ({ pos = hlist_entry((void*)pos, typeof(*pos), member); 1; });	\
		pos = (void*)(pos)->member.next)

The only problem, is there any possibility to change the callers
somehow??? I even wrote the script which converts them all, but the
patch is huge.

Please see the old (2008-04-21) message I sent to lkml below, today
the diffstat is even "worse":

	152 files changed, 611 insertions(+), 906 deletions(-)

and the patch size is 242k.

No? we can't?

-------------------------------------------------------------------------------
[PATCH 0.01/1] hlist_for_each_entry_xxx: kill the "pos" argument

(The actual patch is huge, 116K, I'll send it offline. This email contains
 the chunk for list.h only).

COMPILE TESTED ONLY (make allyesconfig).

All hlist_for_each_entry_xxx() macros require the "pos" argument, which is not
actually needed and can be removed. See the changes in include/linux/list.h
(note that hlist_for_each_entry_safe() now does prefetch() too).

Now we should convert the callers somehow. Unfortunately, this is not always
easy. Consider this code for example:

	{
		struct hlist_node *tmp1, *tmp2;

		hlist_for_each_entry(pos, tmp1, head,mem)
			do_something(pos);

		hlist_for_each_entry(pos, tmp2, head,mem)
			use(tmp2);
	}

The first hlist_for_each_entry is easy, but the second can't be converted
automatically because "tmp2" is used.

So, this patch

	- copies these macros to "obsolete" __hlist_for_each_entry_xxx()

	- changes hlist_for_each_entry_xxxx() to avoid the "struct hlist_nod*"

	- converts the rest of the kernel to use either new or old macros

For example, the patch for the code above is

	 {
	-	struct hlist_node *tmp1, *tmp2;
	+	struct hlist_node *tmp2;
	 
	-	hlist_for_each_entry(pos, tmp1, head,mem)
	+	hlist_for_each_entry(pos, head,mem)
			do_something(pos);
	 
	-	hlist_for_each_entry(pos, tmp2, head,mem)
	+	__hlist_for_each_entry(pos, tmp2, head,mem)
			use(tmp2);
	 }

I believe it is very easy to convert the users of the obsolete macros, but
this needs separate patches.

Except for changes in include/linux/list.h the patch was generated by this
script:

	#!/usr/bin/perl -w
	use strict;

	my ($N_CNT, $O_CNT, @SRC, $CTXT, @CTXT);

	sub say { printf STDERR "%4d: %s.\n", $., "@_" }

	sub var { return $_->{$_[0]} || next for @CTXT }

	sub parse_line
	{
		if (ord == ord '{') {
			unshift @CTXT, $CTXT = {
				-v_list => \@{$CTXT->{-v_list}},
				-v_rexp =>    $CTXT->{-v_rexp},
			};
		}
		elsif (ord == ord '}') {
			say "WARN! unmatched '}'" unless $CTXT;

			while (my ($v, $c) = each %$CTXT) {
				my $trim = ord $v != ord '-' &&
					!$c->{used} && $c->{trim} || next;

				for my $tr (@$trim) {
					substr($_, $tr->[1], 2, ''),
					substr($_, $tr->[2], $tr->[3], '')
						for $SRC[$tr->[0]];
					$N_CNT++;
				}

				for ($SRC[$c->{decl}]) {
					s/\* \s* $v \b (?: \s* , \s*)?//x || die;
					s/,\s*(?=;)//; s/\bstruct\s+hlist_node\s*;//;
					$_ = '' if /^\s*\\?\s*$/;
				}
			}
			shift @CTXT; $CTXT = $CTXT[0];
		}
		elsif ($CTXT && /\b struct \s+ hlist_node \b ([^;]+)/x) {
			my $v_list = $CTXT->{-v_list};
			for (split /,/, $1) {
				/^\s* \* (\w+) \s* \z/x or next;
				$CTXT->{$1} = { decl => 0+@SRC };
				push @$v_list, $1;
			}
			$CTXT->{-v_rexp} = qr/\b(@{[join '|', @$v_list]})\b/
				if @$v_list;
		}
		elsif (/\b hlist_for_each_entry (?:_continue|_from|_safe|_rcu)? \b/xg) {
			my $u = length $`;
			if (/\G \s* \( .+? (, \s* (\w+) \s*)/x) {
				my $tr = [0+@SRC, $u, $-[1], length $1];
				if (my $v = var $2) { push @{$v->{trim}}, $tr; }
			} else {
				say "suspicious usage of hlist_for_each_entry_xxx()";
			}
			substr $_, $u, 0, '__';
			$O_CNT++;
		}
		elsif (my $re = $CTXT && $CTXT->{-v_rexp}) {
			var($1)->{used}++ while /$re/g;
		}

		push @SRC, $_;
	}

	sub diff_file
	{
		my $file = $_;
		warn "====> $file ...\n";
		($N_CNT, $O_CNT, @SRC, $CTXT, @CTXT) = 0;

		open my $fd, '<', $file or die;
		while (<$fd>) {
			my $comm = s/(\/\*.*)//s && $1;
			parse_line for split /([{}])/, $_, -1;
			while ($comm) {
				push @SRC, $comm;
				$comm = $comm !~ /\*\// && <$fd>;
			}
		}

		warn "WARN! unmatched '{'\n" if $CTXT;
		return warn "WARN! not changed\n" unless $O_CNT;
		warn "stat: $N_CNT from $O_CNT\n";

		open my $diff, "|diff -up --label a/$file $file --label b/$file -";
		print $diff @SRC;
	}


	@ARGV || die "usage: $0 path_to_kernel || list_of_files\n";

	if (-d $ARGV[0]) {
		chdir $ARGV[0] || die "ERR!! can't cd to $ARGV[0]\n";
		warn "grep ...\n";
		@ARGV = sort grep {
			chomp; s/^\.\///; $_ ne 'include/linux/list.h';
		} qx{grep --include='*.[ch]' -rle '\\bhlist_for_each_entry' .};
	}

	diff_file for @ARGV;

(it open files readonly, so can be used safely)

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

 arch/arm/kernel/kprobes.c            |    6 +--
 arch/ia64/kernel/kprobes.c           |    8 ++--
 arch/powerpc/kernel/kprobes.c        |    6 +--
 arch/s390/kernel/kprobes.c           |    6 +--
 arch/sparc64/kernel/kprobes.c        |    6 +--
 arch/sparc64/kernel/ldc.c            |    3 -
 arch/x86/kernel/kprobes.c            |    6 +--
 arch/x86/kvm/mmu.c                   |   20 ++++------
 block/cfq-iosched.c                  |    3 -
 block/elevator.c                     |    4 +-
 crypto/algapi.c                      |    6 +--
 drivers/infiniband/core/cma.c        |    3 -
 drivers/infiniband/core/fmr_pool.c   |    3 -
 drivers/md/raid5.c                   |    6 +--
 drivers/net/macvlan.c                |    6 +--
 drivers/net/pppol2tp.c               |    6 +--
 drivers/net/sunvnet.c                |    3 -
 drivers/net/wireless/zd1201.c        |    7 +--
 fs/dcache.c                          |    3 -
 fs/ecryptfs/messaging.c              |    5 +-
 fs/fat/inode.c                       |    3 -
 fs/gfs2/glock.c                      |    6 +--
 fs/lockd/host.c                      |   17 +++------
 fs/lockd/svcsubs.c                   |    7 +--
 fs/nfsd/nfscache.c                   |    3 -
 fs/ocfs2/dlm/dlmdebug.c              |    3 -
 fs/ocfs2/dlm/dlmrecovery.c           |    6 +--
 fs/xfs/xfs_inode.c                   |    3 -
 include/linux/pci.h                  |    3 -
 include/linux/pid.h                  |    3 -
 include/net/ax25.h                   |    4 +-
 include/net/inet_hashtables.h        |    2 -
 include/net/inet_timewait_sock.h     |    6 +--
 include/net/netrom.h                 |    8 ++--
 include/net/sctp/sctp.h              |    2 -
 include/net/sock.h                   |   10 ++---
 kernel/kprobes.c                     |   36 ++++++++-----------
 kernel/marker.c                      |   15 ++------
 kernel/pid.c                         |    3 -
 kernel/sched.c                       |    6 +--
 kernel/user.c                        |    3 -
 net/8021q/vlan.c                     |    3 -
 net/9p/error.c                       |    2 -
 net/atm/lec.c                        |   64 +++++++++++++++--------------------
 net/ax25/ax25_iface.c                |    3 -
 net/bridge/br_fdb.c                  |   17 +++------
 net/can/af_can.c                     |   21 +++++------
 net/can/proc.c                       |    6 +--
 net/decnet/dn_table.c                |   13 ++-----
 net/ipv4/fib_frontend.c              |   11 ++----
 net/ipv4/fib_hash.c                  |   30 ++++++----------
 net/ipv4/fib_semantics.c             |   23 ++++--------
 net/ipv4/fib_trie.c                  |   25 ++++---------
 net/ipv4/inet_fragment.c             |   10 ++---
 net/ipv4/netfilter/nf_nat_core.c     |    3 -
 net/ipv6/addrlabel.c                 |   14 +++----
 net/ipv6/ip6_fib.c                   |    9 +---
 net/ipv6/xfrm6_tunnel.c              |   12 ++----
 net/netfilter/nf_conntrack_core.c    |   18 +++------
 net/netfilter/nf_conntrack_expect.c  |   12 ++----
 net/netfilter/nf_conntrack_helper.c  |   14 +++----
 net/netfilter/nf_conntrack_netlink.c |   12 ++----
 net/netfilter/nfnetlink_log.c        |    7 +--
 net/netfilter/nfnetlink_queue.c      |   10 ++---
 net/netfilter/xt_RATEEST.c           |    3 -
 net/netfilter/xt_hashlimit.c         |   13 ++-----
 net/sched/sch_htb.c                  |    9 +---
 net/sunrpc/auth.c                    |    5 +-
 net/sunrpc/svcauth.c                 |    3 -
 net/tipc/name_table.c                |    8 +---
 net/xfrm/xfrm_policy.c               |   53 ++++++++++++----------------
 net/xfrm/xfrm_state.c                |   43 ++++++++---------------
 72 files changed, 302 insertions(+), 439 deletions(-)

--- HL/include/linux/list.h~HLIST	2007-10-25 16:22:12.000000000 +0400
+++ HL/include/linux/list.h	2008-04-21 17:17:44.000000000 +0400
@@ -926,58 +926,54 @@ static inline void hlist_add_after_rcu(s
 
 /**
  * hlist_for_each_entry	- iterate over list of given type
- * @tpos:	the type * to use as a loop cursor.
- * @pos:	the &struct hlist_node to use as a loop cursor.
+ * @pos:	the type * to use as a loop cursor.
  * @head:	the head for your list.
  * @member:	the name of the hlist_node within the struct.
  */
-#define hlist_for_each_entry(tpos, pos, head, member)			 \
-	for (pos = (head)->first;					 \
-	     pos && ({ prefetch(pos->next); 1;}) &&			 \
-		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
-	     pos = pos->next)
+#define hlist_for_each_entry(pos, head, member)				\
+	for (pos = (void*)(head)->first; pos && ({			\
+		prefetch(((struct hlist_node*)pos)->next);		\
+		pos = hlist_entry((void*)pos, typeof(*pos), member); 1;	\
+	     }); pos = (void*)(pos)->member.next)
 
 /**
  * hlist_for_each_entry_continue - iterate over a hlist continuing after current point
- * @tpos:	the type * to use as a loop cursor.
- * @pos:	the &struct hlist_node to use as a loop cursor.
+ * @pos:	the type * to use as a loop cursor.
  * @member:	the name of the hlist_node within the struct.
  */
-#define hlist_for_each_entry_continue(tpos, pos, member)		 \
-	for (pos = (pos)->next;						 \
-	     pos && ({ prefetch(pos->next); 1;}) &&			 \
-		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
-	     pos = pos->next)
+#define hlist_for_each_entry_continue(pos, member)			\
+	for (; (pos = (void*)(pos)->member.next) && ({			\
+		prefetch(((struct hlist_node*)pos)->next);		\
+		pos = hlist_entry((void*)pos, typeof(*pos), member); 1;	\
+	     }); )
 
 /**
  * hlist_for_each_entry_from - iterate over a hlist continuing from current point
- * @tpos:	the type * to use as a loop cursor.
- * @pos:	the &struct hlist_node to use as a loop cursor.
+ * @pos:	the type * to use as a loop cursor.
  * @member:	the name of the hlist_node within the struct.
  */
-#define hlist_for_each_entry_from(tpos, pos, member)			 \
-	for (; pos && ({ prefetch(pos->next); 1;}) &&			 \
-		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
-	     pos = pos->next)
+#define hlist_for_each_entry_from(pos, member)				\
+	for (pos = pos ? (void*)&(pos)->member.next : pos; pos && ({	\
+		prefetch(((struct hlist_node*)pos)->next);		\
+		pos = hlist_entry((void*)pos, typeof(*pos), member); 1;	\
+	     }); pos = (void*)(pos)->member.next)
 
 /**
  * hlist_for_each_entry_safe - iterate over list of given type safe against removal of list entry
- * @tpos:	the type * to use as a loop cursor.
- * @pos:	the &struct hlist_node to use as a loop cursor.
+ * @pos:	the type * to use as a loop cursor.
  * @n:		another &struct hlist_node to use as temporary storage
  * @head:	the head for your list.
  * @member:	the name of the hlist_node within the struct.
  */
-#define hlist_for_each_entry_safe(tpos, pos, n, head, member) 		 \
-	for (pos = (head)->first;					 \
-	     pos && ({ n = pos->next; 1; }) && 				 \
-		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
-	     pos = n)
+#define hlist_for_each_entry_safe(pos, n, head, member)			\
+	for (pos = (void*)(head)->first; pos && ({			\
+		n = ((struct hlist_node*)pos)->next; prefetch(n);	\
+		pos = hlist_entry((void*)pos, typeof(*pos), member); 1;	\
+	     }); pos = (void*)n)
 
 /**
  * hlist_for_each_entry_rcu - iterate over rcu list of given type
- * @tpos:	the type * to use as a loop cursor.
- * @pos:	the &struct hlist_node to use as a loop cursor.
+ * @pos:	the type * to use as a loop cursor.
  * @head:	the head for your list.
  * @member:	the name of the hlist_node within the struct.
  *
@@ -985,7 +981,38 @@ static inline void hlist_add_after_rcu(s
  * the _rcu list-mutation primitives such as hlist_add_head_rcu()
  * as long as the traversal is guarded by rcu_read_lock().
  */
-#define hlist_for_each_entry_rcu(tpos, pos, head, member)		 \
+#define hlist_for_each_entry_rcu(pos, head, member)			\
+	for (pos = (void*)(head)->first; rcu_dereference(pos) && ({	\
+		prefetch(((struct hlist_node*)pos)->next);		\
+		pos = hlist_entry((void*)pos, typeof(*pos), member); 1;	\
+	     }); pos = (void*)(pos)->member.next)
+
+/* -------- Obsolete, to be removed soon, do not use -------- */
+
+#define __hlist_for_each_entry(tpos, pos, head, member)			 \
+	for (pos = (head)->first;					 \
+	     pos && ({ prefetch(pos->next); 1;}) &&			 \
+		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
+	     pos = pos->next)
+
+#define __hlist_for_each_entry_continue(tpos, pos, member)		 \
+	for (pos = (pos)->next;						 \
+	     pos && ({ prefetch(pos->next); 1;}) &&			 \
+		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
+	     pos = pos->next)
+
+#define __hlist_for_each_entry_from(tpos, pos, member)			 \
+	for (; pos && ({ prefetch(pos->next); 1;}) &&			 \
+		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
+	     pos = pos->next)
+
+#define __hlist_for_each_entry_safe(tpos, pos, n, head, member) 	 \
+	for (pos = (head)->first;					 \
+	     pos && ({ n = pos->next; 1; }) && 				 \
+		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
+	     pos = n)
+
+#define __hlist_for_each_entry_rcu(tpos, pos, head, member)		 \
 	for (pos = (head)->first;					 \
 	     rcu_dereference(pos) && ({ prefetch(pos->next); 1;}) &&	 \
 		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hlist_for_each_entry && pos (Was: task_work_queue)
  2012-04-12  4:00   ` hlist_for_each_entry && pos (Was: task_work_queue) Oleg Nesterov
@ 2012-04-12  4:12     ` Linus Torvalds
  2012-04-12  4:28       ` Oleg Nesterov
  2012-04-16 22:35     ` Paul E. McKenney
  1 sibling, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2012-04-12  4:12 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, David Howells, David Smith, Frank Ch. Eigler,
	Larry Woodman, Peter Zijlstra, Tejun Heo, linux-kernel

On Wed, Apr 11, 2012 at 9:00 PM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> This reminds me.
>
> hlist_for_each_entry_*() do not need "pos", it can be
>
>        #define hlist_for_each_entry(pos, head, member)                                 \
>                for (pos = (void*)(head)->first;                                        \
>                pos && ({ pos = hlist_entry((void*)pos, typeof(*pos), member); 1; });   \
>                pos = (void*)(pos)->member.next)

Ugh. I'm not sure that is any better, with the extra casts to hide the
fact that you use the wrong type pointers for it.

Are there any code generation improvements?

Because quite frankly, if there aren't, I think the code churn just
isn't worth it - especially with how ugly the macro is.

This is one of those things where the C99 features would actually be
nice: one of the few features from C++ that I actually liked is the
ability to declare the induction variable. So

  #define hlist_for_each_entry(pos, head, member) \
    for (void *__pos = (head)->first; \
        __pos && ({ pos = hlist_entry(__pos, typeof(*pos), member); 1; });   \
        __pos = __pos->next)

would actually be prettier. That said, "pretty macro" isn't very high
on the list of things to worry about. Not nearly as high as the pain
changing the interface would cause for things that *should* be trivial
(like backporting patches etc).

So I'd really want to see some more tangible advantage.

                 Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hlist_for_each_entry && pos (Was: task_work_queue)
  2012-04-12  4:12     ` Linus Torvalds
@ 2012-04-12  4:28       ` Oleg Nesterov
  2012-04-12  4:39         ` Linus Torvalds
  0 siblings, 1 reply; 14+ messages in thread
From: Oleg Nesterov @ 2012-04-12  4:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, David Howells, David Smith, Frank Ch. Eigler,
	Larry Woodman, Peter Zijlstra, Tejun Heo, linux-kernel

On 04/11, Linus Torvalds wrote:
>
> On Wed, Apr 11, 2012 at 9:00 PM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > This reminds me.
> >
> > hlist_for_each_entry_*() do not need "pos", it can be
> >
> >        #define hlist_for_each_entry(pos, head, member)                                 \
> >                for (pos = (void*)(head)->first;                 �	\
> >                pos && ({ pos = hlist_entry((void*)pos, typeof(*pos), member); 1; });   \
> >                pos = (void*)(pos)->member.next)
>
> Ugh. I'm not sure that is any better, with the extra casts to hide the
> fact that you use the wrong type pointers for it.
>
> Are there any code generation improvements?

Not sure, I'll check...

> Because quite frankly, if there aren't, I think the code churn just
> isn't worth it - especially with how ugly the macro is.

Ah, personally I think that "how ugly the macro" doesn't matter.
What does matter (imho again), it simplifies the usage.

> This is one of those things where the C99 features would actually be
> nice: one of the few features from C++ that I actually liked is the
> ability to declare the induction variable. So
>
>   #define hlist_for_each_entry(pos, head, member) \
>     for (void *__pos = (head)->first; \

Agreed. But,

	error: 'for' loop initial declaration used outside C99 mode

we should change CFLAGS, I guess. BTW, personally I'd like very much
to use "for (type var; ...")" if this was allowed.

> That said, "pretty macro" isn't very high
> on the list of things to worry about. Not nearly as high as the pain
> changing the interface would cause for things that *should* be trivial
> (like backporting patches etc).

Yes, agreed, that was the question.

> So I'd really want to see some more tangible advantage.

OK, I'll check the code generation just in case.

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hlist_for_each_entry && pos (Was: task_work_queue)
  2012-04-12  4:28       ` Oleg Nesterov
@ 2012-04-12  4:39         ` Linus Torvalds
  2012-04-12  5:02           ` Al Viro
  0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2012-04-12  4:39 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, David Howells, David Smith, Frank Ch. Eigler,
	Larry Woodman, Peter Zijlstra, Tejun Heo, linux-kernel

On Wed, Apr 11, 2012 at 9:28 PM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> Agreed. But,
>
>        error: 'for' loop initial declaration used outside C99 mode
>
> we should change CFLAGS, I guess. BTW, personally I'd like very much
> to use "for (type var; ...")" if this was allowed.

The sad part is that if we allow that, we also get that *other* insane
C99 variable thing - mixing variables and code.

I *like* getting warnings for confused people who start introducing
variables in the middle of blocks of code. That's not well-contained
like the loop variable.

That said, most of the stuff in C99 are extensions that we used long
before C99, so I guess we might as well just add the stupid flag. And
discourage people from mixing declarations and code other ways (sparse
etc).

                         Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hlist_for_each_entry && pos (Was: task_work_queue)
  2012-04-12  4:39         ` Linus Torvalds
@ 2012-04-12  5:02           ` Al Viro
  0 siblings, 0 replies; 14+ messages in thread
From: Al Viro @ 2012-04-12  5:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Nesterov, Andrew Morton, David Howells, David Smith,
	Frank Ch. Eigler, Larry Woodman, Peter Zijlstra, Tejun Heo,
	linux-kernel

On Wed, Apr 11, 2012 at 09:39:29PM -0700, Linus Torvalds wrote:

> The sad part is that if we allow that, we also get that *other* insane
> C99 variable thing - mixing variables and code.
> 
> I *like* getting warnings for confused people who start introducing
> variables in the middle of blocks of code. That's not well-contained
> like the loop variable.
> 
> That said, most of the stuff in C99 are extensions that we used long
> before C99, so I guess we might as well just add the stupid flag. And
> discourage people from mixing declarations and code other ways (sparse
> etc).

	Yes, but... -std=gnu99 will break one of your pet extensions -
(struct foo){0,1,2} will cease to be accepted in initializers of
static storage duration objects (and -std=c99 will break a *lot* more
than that).  I wouldn't mind going for that (it's not a terribly large
patch, at least it wasn't about a year ago when I've looked at that),
but IIRC you really insisted on using that one...  It mostly boiled
down to things like
-       .lock           = __SPIN_LOCK_UNLOCKED(init_fs.lock),
+       .lock           = __SPIN_LOCK_INITIALIZER(init_fs.lock),
etc.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] cred: change keyctl_session_to_parent() to use task_work_queue()
  2012-04-12  2:47 [PATCH v2 0/2] task_work_queue() && keyctl_session_to_parent() Oleg Nesterov
  2012-04-12  2:48 ` [PATCH v2 1/2] task_work_queue: add generic process-context callbacks Oleg Nesterov
  2012-04-12  2:48 ` [PATCH v2 2/2] cred: change keyctl_session_to_parent() to use task_work_queue() Oleg Nesterov
@ 2012-04-12  9:29 ` David Howells
  2012-04-12 17:34   ` Oleg Nesterov
  2012-04-12  9:35 ` TIF_NOTIFY_RESUME [was Re: [PATCH v2 1/2] task_work_queue: add generic process-context callbacks] David Howells
  3 siblings, 1 reply; 14+ messages in thread
From: David Howells @ 2012-04-12  9:29 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: dhowells, Andrew Morton, Linus Torvalds, David Smith,
	Frank Ch. Eigler, Larry Woodman, Peter Zijlstra, Tejun Heo,
	linux-kernel

Oleg Nesterov <oleg@redhat.com> wrote:

> Change keyctl_session_to_parent() to use task_work_queue() and
> move key_replace_session_keyring() logic into task_work->func().

I'm generally okay with this, but there are a couple of issues with the patch.

> +static void replace_session_keyring(struct task_work *twork)

Can you keep this in process_keys.c please?  Then everything that actually
updates a process's keyrings is done there.  Admittedly, on that basis, you
can argue that I should move a chunk of keyctl_session_to_parent() there too.

And, also, can you please keep the "key_" on the front of the name?

>  long keyctl_session_to_parent(void)
>  {
> -#ifdef TIF_NOTIFY_RESUME

Unless TIF_NOTIFY_RESUME is defined, this operation cannot be performed and
should generate an error.  I don't see how this happens now.

> +	if (!task_work_queue(parent, newwork))

I hate this type of construct.  "if not function()" indicating the function
succeeded.  Can you make it "== 0" instead?  Also, shouldn't we tell the user
that it failed?

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* TIF_NOTIFY_RESUME [was Re: [PATCH v2 1/2] task_work_queue: add generic process-context callbacks]
  2012-04-12  2:47 [PATCH v2 0/2] task_work_queue() && keyctl_session_to_parent() Oleg Nesterov
                   ` (2 preceding siblings ...)
  2012-04-12  9:29 ` David Howells
@ 2012-04-12  9:35 ` David Howells
  2012-04-12 17:41   ` Oleg Nesterov
  3 siblings, 1 reply; 14+ messages in thread
From: David Howells @ 2012-04-12  9:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: dhowells, Andrew Morton, Linus Torvalds, David Smith,
	Frank Ch. Eigler, Larry Woodman, Peter Zijlstra, Tejun Heo,
	linux-kernel, geert, chris


Oleg Nesterov <oleg@redhat.com> wrote:

> +#ifdef TIF_NOTIFY_RESUME

If we're going to keep this feature, I wonder if it's worth just requiring the
arches lacking this feature to add it at this point, rather than adding all
this conditional logic.

It doesn't look like it should be too hard for m68k, but xtensa looks tricky.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] cred: change keyctl_session_to_parent() to use task_work_queue()
  2012-04-12  9:29 ` David Howells
@ 2012-04-12 17:34   ` Oleg Nesterov
  0 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2012-04-12 17:34 UTC (permalink / raw)
  To: David Howells
  Cc: Andrew Morton, Linus Torvalds, David Smith, Frank Ch. Eigler,
	Larry Woodman, Peter Zijlstra, Tejun Heo, linux-kernel

On 04/12, David Howells wrote:
>
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > Change keyctl_session_to_parent() to use task_work_queue() and
> > move key_replace_session_keyring() logic into task_work->func().
>
> I'm generally okay with this, but there are a couple of issues with the patch.

Great, thanks.

I'll send v3 soon. I'll also update 1/2 a little bit and add the
3rd patch with the new user of task_work.

> > +static void replace_session_keyring(struct task_work *twork)
>
> Can you keep this in process_keys.c please?  Then everything that actually
> updates a process's keyrings is done there.  Admittedly, on that basis, you
> can argue that I should move a chunk of keyctl_session_to_parent() there too.

Sure. But then I need to export it in internal.h.

> And, also, can you please keep the "key_" on the front of the name?

Oh, yes, just I do not know how to name it.

The obviously good name is the old name, but until we remove the
->replacement_session_keyring code from arch/* we can't use it.

OK, how about key_change_session_keyring() ?

> >  long keyctl_session_to_parent(void)
> >  {
> > -#ifdef TIF_NOTIFY_RESUME
>
> Unless TIF_NOTIFY_RESUME is defined, this operation cannot be performed and
> should generate an error.  I don't see how this happens now.

Yes, see below. I forgot about -EOPNOTSUPP.

> > +	if (!task_work_queue(parent, newwork))
>
> I hate this type of construct.  "if not function()" indicating the function
> succeeded.  Can you make it "== 0" instead?

Agreed. Even better, we can rename "int ret" to "int err" and do

	err = task_work_queue();
	if (!err)
		...;

this also allows us to kill already_same/not_permitted error paths.

> Also, shouldn't we tell the user
> that it failed?

>From the changelog:

	We do not report the error if we race with the exiting parent
	and task_work_queue() fails, this matches the current behaviour.

Yes. task_work_queue() can only fail if it races with the exiting
parent. The window before it calls exit_notify() is small, and this
doesn't differ from the case when the parent does do_exit() right
after we queue the work.

But! As you pointed out, I forgot about TIF_NOTIFY_RESUME problems,
so lets report the error.

Thanks. Before I re-check and send v3, perhaps you can look at the
updated keyctl_session_to_parent() below.

Oleg.


long keyctl_session_to_parent(void)
{
	struct task_struct *me, *parent;
	const struct cred *mycred, *pcred;
	struct task_work *newwork, *oldwork;
	key_ref_t keyring_r;
	struct cred *cred;
	int err;

	keyring_r = lookup_user_key(KEY_SPEC_SESSION_KEYRING, 0, KEY_LINK);
	if (IS_ERR(keyring_r))
		return PTR_ERR(keyring_r);

	err = -ENOMEM;
	newwork = kmalloc(sizeof(struct task_work), GFP_KERNEL);
	if (!newwork)
		goto error_keyring;

	/* our parent is going to need a new cred struct, a new tgcred struct
	 * and new security data, so we allocate them here to prevent ENOMEM in
	 * our parent */
	cred = cred_alloc_blank();
	if (!cred)
		goto error_keyring;

	cred->tgcred->session_keyring = key_ref_to_ptr(keyring_r);
	init_task_work(newwork, replace_session_keyring, cred);

	me = current;
	rcu_read_lock();
	write_lock_irq(&tasklist_lock);

	err = -EPERM;
	oldwork = NULL;
	parent = me->real_parent;

	/* the parent mustn't be init and mustn't be a kernel thread */
	if (parent->pid <= 1 || !parent->mm)
		goto unlock;

	/* the parent must be single threaded */
	if (!thread_group_empty(parent))
		goto unlock;

	/* the parent and the child must have different session keyrings or
	 * there's no point */
	mycred = current_cred();
	pcred = __task_cred(parent);
	if (mycred == pcred ||
	    mycred->tgcred->session_keyring == pcred->tgcred->session_keyring) {
	    	err = 0;
		goto unlock;
	}

	/* the parent must have the same effective ownership and mustn't be
	 * SUID/SGID */
	if (pcred->uid	!= mycred->euid	||
	    pcred->euid	!= mycred->euid	||
	    pcred->suid	!= mycred->euid	||
	    pcred->gid	!= mycred->egid	||
	    pcred->egid	!= mycred->egid	||
	    pcred->sgid	!= mycred->egid)
		goto unlock;

	/* the keyrings must have the same UID */
	if ((pcred->tgcred->session_keyring &&
	     pcred->tgcred->session_keyring->uid != mycred->euid) ||
	    mycred->tgcred->session_keyring->uid != mycred->euid)
		goto unlock;

	/* cancel an already pending keyring replacement */
	oldwork = task_work_cancel(parent, replace_session_keyring);

	/* the replacement session keyring is applied just prior to userspace
	 * restarting */
	err = task_work_queue(parent, newwork);
	if (!err)
		newwork = NULL;
 unlock:
	write_unlock_irq(&tasklist_lock);
	rcu_read_unlock();
	free_cred_work(oldwork);
	free_cred_work(newwork);
	return err;

error_keyring:
	kfree(newwork);
	key_ref_put(keyring_r);
	return err;
}


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: TIF_NOTIFY_RESUME [was Re: [PATCH v2 1/2] task_work_queue: add generic process-context callbacks]
  2012-04-12  9:35 ` TIF_NOTIFY_RESUME [was Re: [PATCH v2 1/2] task_work_queue: add generic process-context callbacks] David Howells
@ 2012-04-12 17:41   ` Oleg Nesterov
  0 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2012-04-12 17:41 UTC (permalink / raw)
  To: David Howells
  Cc: Andrew Morton, Linus Torvalds, David Smith, Frank Ch. Eigler,
	Larry Woodman, Peter Zijlstra, Tejun Heo, linux-kernel, geert,
	chris

On 04/12, David Howells wrote:
>
> Oleg Nesterov <oleg@redhat.com> wrote:
>
> > +#ifdef TIF_NOTIFY_RESUME
>
> If we're going to keep this feature, I wonder if it's worth just requiring the
> arches lacking this feature to add it at this point, rather than adding all
> this conditional logic.

Yes, agreed. But until we update m68k/xtensa task_work has to check
if it is defined.

I'll try to cleanup this in v3. In fact, task_work_queue() makes sense
even without TIF_NOTIFY_RESUME, just it lacks set_notify_resume().

> It doesn't look like it should be too hard for m68k, but xtensa looks tricky.

I was going to add the maintainers ;)

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hlist_for_each_entry && pos (Was: task_work_queue)
  2012-04-12  4:00   ` hlist_for_each_entry && pos (Was: task_work_queue) Oleg Nesterov
  2012-04-12  4:12     ` Linus Torvalds
@ 2012-04-16 22:35     ` Paul E. McKenney
  2012-04-17 20:43       ` Oleg Nesterov
  1 sibling, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2012-04-16 22:35 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, David Howells, Linus Torvalds, David Smith,
	Frank Ch. Eigler, Larry Woodman, Peter Zijlstra, Tejun Heo,
	linux-kernel

On Thu, Apr 12, 2012 at 06:00:59AM +0200, Oleg Nesterov wrote:
> On 04/12, Oleg Nesterov wrote:
> >
> > +	hlist_for_each_entry(twork, pos, &task->task_works, hlist) {
> 
> This reminds me.
> 
> hlist_for_each_entry_*() do not need "pos", it can be
> 
> 	#define hlist_for_each_entry(pos, head, member)					\
> 		for (pos = (void*)(head)->first;					\
> 		pos && ({ pos = hlist_entry((void*)pos, typeof(*pos), member); 1; });	\
> 		pos = (void*)(pos)->member.next)
> 
> The only problem, is there any possibility to change the callers
> somehow??? I even wrote the script which converts them all, but the
> patch is huge.
> 
> Please see the old (2008-04-21) message I sent to lkml below, today
> the diffstat is even "worse":
> 
> 	152 files changed, 611 insertions(+), 906 deletions(-)
> 
> and the patch size is 242k.
> 
> No? we can't?

Maybe this needs a phased approach:

1.	Add a new API name without the "pos" argument.

2.	Send individual patches to the uses, which allows time to
	clean up stragglers.

3.	Remove the old API name.  If any patches from #2 have been
	ignored, push them with the removal patch.

4.	Rename the new API name to the old one, if desired.

Yeah, cowardly of me, I know.

							Thanx, Paul

> -------------------------------------------------------------------------------
> [PATCH 0.01/1] hlist_for_each_entry_xxx: kill the "pos" argument
> 
> (The actual patch is huge, 116K, I'll send it offline. This email contains
>  the chunk for list.h only).
> 
> COMPILE TESTED ONLY (make allyesconfig).
> 
> All hlist_for_each_entry_xxx() macros require the "pos" argument, which is not
> actually needed and can be removed. See the changes in include/linux/list.h
> (note that hlist_for_each_entry_safe() now does prefetch() too).
> 
> Now we should convert the callers somehow. Unfortunately, this is not always
> easy. Consider this code for example:
> 
> 	{
> 		struct hlist_node *tmp1, *tmp2;
> 
> 		hlist_for_each_entry(pos, tmp1, head,mem)
> 			do_something(pos);
> 
> 		hlist_for_each_entry(pos, tmp2, head,mem)
> 			use(tmp2);
> 	}
> 
> The first hlist_for_each_entry is easy, but the second can't be converted
> automatically because "tmp2" is used.
> 
> So, this patch
> 
> 	- copies these macros to "obsolete" __hlist_for_each_entry_xxx()
> 
> 	- changes hlist_for_each_entry_xxxx() to avoid the "struct hlist_nod*"
> 
> 	- converts the rest of the kernel to use either new or old macros
> 
> For example, the patch for the code above is
> 
> 	 {
> 	-	struct hlist_node *tmp1, *tmp2;
> 	+	struct hlist_node *tmp2;
> 	 
> 	-	hlist_for_each_entry(pos, tmp1, head,mem)
> 	+	hlist_for_each_entry(pos, head,mem)
> 			do_something(pos);
> 	 
> 	-	hlist_for_each_entry(pos, tmp2, head,mem)
> 	+	__hlist_for_each_entry(pos, tmp2, head,mem)
> 			use(tmp2);
> 	 }
> 
> I believe it is very easy to convert the users of the obsolete macros, but
> this needs separate patches.
> 
> Except for changes in include/linux/list.h the patch was generated by this
> script:
> 
> 	#!/usr/bin/perl -w
> 	use strict;
> 
> 	my ($N_CNT, $O_CNT, @SRC, $CTXT, @CTXT);
> 
> 	sub say { printf STDERR "%4d: %s.\n", $., "@_" }
> 
> 	sub var { return $_->{$_[0]} || next for @CTXT }
> 
> 	sub parse_line
> 	{
> 		if (ord == ord '{') {
> 			unshift @CTXT, $CTXT = {
> 				-v_list => \@{$CTXT->{-v_list}},
> 				-v_rexp =>    $CTXT->{-v_rexp},
> 			};
> 		}
> 		elsif (ord == ord '}') {
> 			say "WARN! unmatched '}'" unless $CTXT;
> 
> 			while (my ($v, $c) = each %$CTXT) {
> 				my $trim = ord $v != ord '-' &&
> 					!$c->{used} && $c->{trim} || next;
> 
> 				for my $tr (@$trim) {
> 					substr($_, $tr->[1], 2, ''),
> 					substr($_, $tr->[2], $tr->[3], '')
> 						for $SRC[$tr->[0]];
> 					$N_CNT++;
> 				}
> 
> 				for ($SRC[$c->{decl}]) {
> 					s/\* \s* $v \b (?: \s* , \s*)?//x || die;
> 					s/,\s*(?=;)//; s/\bstruct\s+hlist_node\s*;//;
> 					$_ = '' if /^\s*\\?\s*$/;
> 				}
> 			}
> 			shift @CTXT; $CTXT = $CTXT[0];
> 		}
> 		elsif ($CTXT && /\b struct \s+ hlist_node \b ([^;]+)/x) {
> 			my $v_list = $CTXT->{-v_list};
> 			for (split /,/, $1) {
> 				/^\s* \* (\w+) \s* \z/x or next;
> 				$CTXT->{$1} = { decl => 0+@SRC };
> 				push @$v_list, $1;
> 			}
> 			$CTXT->{-v_rexp} = qr/\b(@{[join '|', @$v_list]})\b/
> 				if @$v_list;
> 		}
> 		elsif (/\b hlist_for_each_entry (?:_continue|_from|_safe|_rcu)? \b/xg) {
> 			my $u = length $`;
> 			if (/\G \s* \( .+? (, \s* (\w+) \s*)/x) {
> 				my $tr = [0+@SRC, $u, $-[1], length $1];
> 				if (my $v = var $2) { push @{$v->{trim}}, $tr; }
> 			} else {
> 				say "suspicious usage of hlist_for_each_entry_xxx()";
> 			}
> 			substr $_, $u, 0, '__';
> 			$O_CNT++;
> 		}
> 		elsif (my $re = $CTXT && $CTXT->{-v_rexp}) {
> 			var($1)->{used}++ while /$re/g;
> 		}
> 
> 		push @SRC, $_;
> 	}
> 
> 	sub diff_file
> 	{
> 		my $file = $_;
> 		warn "====> $file ...\n";
> 		($N_CNT, $O_CNT, @SRC, $CTXT, @CTXT) = 0;
> 
> 		open my $fd, '<', $file or die;
> 		while (<$fd>) {
> 			my $comm = s/(\/\*.*)//s && $1;
> 			parse_line for split /([{}])/, $_, -1;
> 			while ($comm) {
> 				push @SRC, $comm;
> 				$comm = $comm !~ /\*\// && <$fd>;
> 			}
> 		}
> 
> 		warn "WARN! unmatched '{'\n" if $CTXT;
> 		return warn "WARN! not changed\n" unless $O_CNT;
> 		warn "stat: $N_CNT from $O_CNT\n";
> 
> 		open my $diff, "|diff -up --label a/$file $file --label b/$file -";
> 		print $diff @SRC;
> 	}
> 
> 
> 	@ARGV || die "usage: $0 path_to_kernel || list_of_files\n";
> 
> 	if (-d $ARGV[0]) {
> 		chdir $ARGV[0] || die "ERR!! can't cd to $ARGV[0]\n";
> 		warn "grep ...\n";
> 		@ARGV = sort grep {
> 			chomp; s/^\.\///; $_ ne 'include/linux/list.h';
> 		} qx{grep --include='*.[ch]' -rle '\\bhlist_for_each_entry' .};
> 	}
> 
> 	diff_file for @ARGV;
> 
> (it open files readonly, so can be used safely)
> 
> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
> 
>  arch/arm/kernel/kprobes.c            |    6 +--
>  arch/ia64/kernel/kprobes.c           |    8 ++--
>  arch/powerpc/kernel/kprobes.c        |    6 +--
>  arch/s390/kernel/kprobes.c           |    6 +--
>  arch/sparc64/kernel/kprobes.c        |    6 +--
>  arch/sparc64/kernel/ldc.c            |    3 -
>  arch/x86/kernel/kprobes.c            |    6 +--
>  arch/x86/kvm/mmu.c                   |   20 ++++------
>  block/cfq-iosched.c                  |    3 -
>  block/elevator.c                     |    4 +-
>  crypto/algapi.c                      |    6 +--
>  drivers/infiniband/core/cma.c        |    3 -
>  drivers/infiniband/core/fmr_pool.c   |    3 -
>  drivers/md/raid5.c                   |    6 +--
>  drivers/net/macvlan.c                |    6 +--
>  drivers/net/pppol2tp.c               |    6 +--
>  drivers/net/sunvnet.c                |    3 -
>  drivers/net/wireless/zd1201.c        |    7 +--
>  fs/dcache.c                          |    3 -
>  fs/ecryptfs/messaging.c              |    5 +-
>  fs/fat/inode.c                       |    3 -
>  fs/gfs2/glock.c                      |    6 +--
>  fs/lockd/host.c                      |   17 +++------
>  fs/lockd/svcsubs.c                   |    7 +--
>  fs/nfsd/nfscache.c                   |    3 -
>  fs/ocfs2/dlm/dlmdebug.c              |    3 -
>  fs/ocfs2/dlm/dlmrecovery.c           |    6 +--
>  fs/xfs/xfs_inode.c                   |    3 -
>  include/linux/pci.h                  |    3 -
>  include/linux/pid.h                  |    3 -
>  include/net/ax25.h                   |    4 +-
>  include/net/inet_hashtables.h        |    2 -
>  include/net/inet_timewait_sock.h     |    6 +--
>  include/net/netrom.h                 |    8 ++--
>  include/net/sctp/sctp.h              |    2 -
>  include/net/sock.h                   |   10 ++---
>  kernel/kprobes.c                     |   36 ++++++++-----------
>  kernel/marker.c                      |   15 ++------
>  kernel/pid.c                         |    3 -
>  kernel/sched.c                       |    6 +--
>  kernel/user.c                        |    3 -
>  net/8021q/vlan.c                     |    3 -
>  net/9p/error.c                       |    2 -
>  net/atm/lec.c                        |   64 +++++++++++++++--------------------
>  net/ax25/ax25_iface.c                |    3 -
>  net/bridge/br_fdb.c                  |   17 +++------
>  net/can/af_can.c                     |   21 +++++------
>  net/can/proc.c                       |    6 +--
>  net/decnet/dn_table.c                |   13 ++-----
>  net/ipv4/fib_frontend.c              |   11 ++----
>  net/ipv4/fib_hash.c                  |   30 ++++++----------
>  net/ipv4/fib_semantics.c             |   23 ++++--------
>  net/ipv4/fib_trie.c                  |   25 ++++---------
>  net/ipv4/inet_fragment.c             |   10 ++---
>  net/ipv4/netfilter/nf_nat_core.c     |    3 -
>  net/ipv6/addrlabel.c                 |   14 +++----
>  net/ipv6/ip6_fib.c                   |    9 +---
>  net/ipv6/xfrm6_tunnel.c              |   12 ++----
>  net/netfilter/nf_conntrack_core.c    |   18 +++------
>  net/netfilter/nf_conntrack_expect.c  |   12 ++----
>  net/netfilter/nf_conntrack_helper.c  |   14 +++----
>  net/netfilter/nf_conntrack_netlink.c |   12 ++----
>  net/netfilter/nfnetlink_log.c        |    7 +--
>  net/netfilter/nfnetlink_queue.c      |   10 ++---
>  net/netfilter/xt_RATEEST.c           |    3 -
>  net/netfilter/xt_hashlimit.c         |   13 ++-----
>  net/sched/sch_htb.c                  |    9 +---
>  net/sunrpc/auth.c                    |    5 +-
>  net/sunrpc/svcauth.c                 |    3 -
>  net/tipc/name_table.c                |    8 +---
>  net/xfrm/xfrm_policy.c               |   53 ++++++++++++----------------
>  net/xfrm/xfrm_state.c                |   43 ++++++++---------------
>  72 files changed, 302 insertions(+), 439 deletions(-)
> 
> --- HL/include/linux/list.h~HLIST	2007-10-25 16:22:12.000000000 +0400
> +++ HL/include/linux/list.h	2008-04-21 17:17:44.000000000 +0400
> @@ -926,58 +926,54 @@ static inline void hlist_add_after_rcu(s
> 
>  /**
>   * hlist_for_each_entry	- iterate over list of given type
> - * @tpos:	the type * to use as a loop cursor.
> - * @pos:	the &struct hlist_node to use as a loop cursor.
> + * @pos:	the type * to use as a loop cursor.
>   * @head:	the head for your list.
>   * @member:	the name of the hlist_node within the struct.
>   */
> -#define hlist_for_each_entry(tpos, pos, head, member)			 \
> -	for (pos = (head)->first;					 \
> -	     pos && ({ prefetch(pos->next); 1;}) &&			 \
> -		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
> -	     pos = pos->next)
> +#define hlist_for_each_entry(pos, head, member)				\
> +	for (pos = (void*)(head)->first; pos && ({			\
> +		prefetch(((struct hlist_node*)pos)->next);		\
> +		pos = hlist_entry((void*)pos, typeof(*pos), member); 1;	\
> +	     }); pos = (void*)(pos)->member.next)
> 
>  /**
>   * hlist_for_each_entry_continue - iterate over a hlist continuing after current point
> - * @tpos:	the type * to use as a loop cursor.
> - * @pos:	the &struct hlist_node to use as a loop cursor.
> + * @pos:	the type * to use as a loop cursor.
>   * @member:	the name of the hlist_node within the struct.
>   */
> -#define hlist_for_each_entry_continue(tpos, pos, member)		 \
> -	for (pos = (pos)->next;						 \
> -	     pos && ({ prefetch(pos->next); 1;}) &&			 \
> -		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
> -	     pos = pos->next)
> +#define hlist_for_each_entry_continue(pos, member)			\
> +	for (; (pos = (void*)(pos)->member.next) && ({			\
> +		prefetch(((struct hlist_node*)pos)->next);		\
> +		pos = hlist_entry((void*)pos, typeof(*pos), member); 1;	\
> +	     }); )
> 
>  /**
>   * hlist_for_each_entry_from - iterate over a hlist continuing from current point
> - * @tpos:	the type * to use as a loop cursor.
> - * @pos:	the &struct hlist_node to use as a loop cursor.
> + * @pos:	the type * to use as a loop cursor.
>   * @member:	the name of the hlist_node within the struct.
>   */
> -#define hlist_for_each_entry_from(tpos, pos, member)			 \
> -	for (; pos && ({ prefetch(pos->next); 1;}) &&			 \
> -		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
> -	     pos = pos->next)
> +#define hlist_for_each_entry_from(pos, member)				\
> +	for (pos = pos ? (void*)&(pos)->member.next : pos; pos && ({	\
> +		prefetch(((struct hlist_node*)pos)->next);		\
> +		pos = hlist_entry((void*)pos, typeof(*pos), member); 1;	\
> +	     }); pos = (void*)(pos)->member.next)
> 
>  /**
>   * hlist_for_each_entry_safe - iterate over list of given type safe against removal of list entry
> - * @tpos:	the type * to use as a loop cursor.
> - * @pos:	the &struct hlist_node to use as a loop cursor.
> + * @pos:	the type * to use as a loop cursor.
>   * @n:		another &struct hlist_node to use as temporary storage
>   * @head:	the head for your list.
>   * @member:	the name of the hlist_node within the struct.
>   */
> -#define hlist_for_each_entry_safe(tpos, pos, n, head, member) 		 \
> -	for (pos = (head)->first;					 \
> -	     pos && ({ n = pos->next; 1; }) && 				 \
> -		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
> -	     pos = n)
> +#define hlist_for_each_entry_safe(pos, n, head, member)			\
> +	for (pos = (void*)(head)->first; pos && ({			\
> +		n = ((struct hlist_node*)pos)->next; prefetch(n);	\
> +		pos = hlist_entry((void*)pos, typeof(*pos), member); 1;	\
> +	     }); pos = (void*)n)
> 
>  /**
>   * hlist_for_each_entry_rcu - iterate over rcu list of given type
> - * @tpos:	the type * to use as a loop cursor.
> - * @pos:	the &struct hlist_node to use as a loop cursor.
> + * @pos:	the type * to use as a loop cursor.
>   * @head:	the head for your list.
>   * @member:	the name of the hlist_node within the struct.
>   *
> @@ -985,7 +981,38 @@ static inline void hlist_add_after_rcu(s
>   * the _rcu list-mutation primitives such as hlist_add_head_rcu()
>   * as long as the traversal is guarded by rcu_read_lock().
>   */
> -#define hlist_for_each_entry_rcu(tpos, pos, head, member)		 \
> +#define hlist_for_each_entry_rcu(pos, head, member)			\
> +	for (pos = (void*)(head)->first; rcu_dereference(pos) && ({	\
> +		prefetch(((struct hlist_node*)pos)->next);		\
> +		pos = hlist_entry((void*)pos, typeof(*pos), member); 1;	\
> +	     }); pos = (void*)(pos)->member.next)
> +
> +/* -------- Obsolete, to be removed soon, do not use -------- */
> +
> +#define __hlist_for_each_entry(tpos, pos, head, member)			 \
> +	for (pos = (head)->first;					 \
> +	     pos && ({ prefetch(pos->next); 1;}) &&			 \
> +		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
> +	     pos = pos->next)
> +
> +#define __hlist_for_each_entry_continue(tpos, pos, member)		 \
> +	for (pos = (pos)->next;						 \
> +	     pos && ({ prefetch(pos->next); 1;}) &&			 \
> +		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
> +	     pos = pos->next)
> +
> +#define __hlist_for_each_entry_from(tpos, pos, member)			 \
> +	for (; pos && ({ prefetch(pos->next); 1;}) &&			 \
> +		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
> +	     pos = pos->next)
> +
> +#define __hlist_for_each_entry_safe(tpos, pos, n, head, member) 	 \
> +	for (pos = (head)->first;					 \
> +	     pos && ({ n = pos->next; 1; }) && 				 \
> +		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
> +	     pos = n)
> +
> +#define __hlist_for_each_entry_rcu(tpos, pos, head, member)		 \
>  	for (pos = (head)->first;					 \
>  	     rcu_dereference(pos) && ({ prefetch(pos->next); 1;}) &&	 \
>  		({ tpos = hlist_entry(pos, typeof(*tpos), member); 1;}); \
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: hlist_for_each_entry && pos (Was: task_work_queue)
  2012-04-16 22:35     ` Paul E. McKenney
@ 2012-04-17 20:43       ` Oleg Nesterov
  0 siblings, 0 replies; 14+ messages in thread
From: Oleg Nesterov @ 2012-04-17 20:43 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Andrew Morton, David Howells, Linus Torvalds, David Smith,
	Frank Ch. Eigler, Larry Woodman, Peter Zijlstra, Tejun Heo,
	linux-kernel

On 04/16, Paul E. McKenney wrote:
>
> On Thu, Apr 12, 2012 at 06:00:59AM +0200, Oleg Nesterov wrote:
> >
> > hlist_for_each_entry_*() do not need "pos", it can be
> >
> > 	#define hlist_for_each_entry(pos, head, member)					\
> > 		for (pos = (void*)(head)->first;					\
> > 		pos && ({ pos = hlist_entry((void*)pos, typeof(*pos), member); 1; });	\
> > 		pos = (void*)(pos)->member.next)
> >
> > The only problem, is there any possibility to change the callers
> > somehow??? I even wrote the script which converts them all, but the
> > patch is huge.
> >
> > Please see the old (2008-04-21) message I sent to lkml below, today
> > the diffstat is even "worse":
> >
> > 	152 files changed, 611 insertions(+), 906 deletions(-)
> >
> > and the patch size is 242k.
> >
> > No? we can't?
>
> Maybe this needs a phased approach:
>
> 1.	Add a new API name without the "pos" argument.

and this is the first (or main?) problem. Which name??

> 2.	Send individual patches to the uses, which allows time to
> 	clean up stragglers.
>
> 3.	Remove the old API name.  If any patches from #2 have been
> 	ignored, push them with the removal patch.
>
> 4.	Rename the new API name to the old one, if desired.

Yes, this is much safer, I agree. But I'm afraid that 2. will be
never finished.

And 4. is not trivial anyway, even if it is trivial to generate
the obviously correct patch. Too many trees I guess.


This reminds me I promised to check the code generation. Will do
anyway. But I do not expect any improvement, just it should not
be worse. The only point is too make the usage more simple (you
seem to agree). But at the same time I agree with "the pain
changing the interface" from Linus.

Oleg.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-04-17 20:44 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-12  2:47 [PATCH v2 0/2] task_work_queue() && keyctl_session_to_parent() Oleg Nesterov
2012-04-12  2:48 ` [PATCH v2 1/2] task_work_queue: add generic process-context callbacks Oleg Nesterov
2012-04-12  4:00   ` hlist_for_each_entry && pos (Was: task_work_queue) Oleg Nesterov
2012-04-12  4:12     ` Linus Torvalds
2012-04-12  4:28       ` Oleg Nesterov
2012-04-12  4:39         ` Linus Torvalds
2012-04-12  5:02           ` Al Viro
2012-04-16 22:35     ` Paul E. McKenney
2012-04-17 20:43       ` Oleg Nesterov
2012-04-12  2:48 ` [PATCH v2 2/2] cred: change keyctl_session_to_parent() to use task_work_queue() Oleg Nesterov
2012-04-12  9:29 ` David Howells
2012-04-12 17:34   ` Oleg Nesterov
2012-04-12  9:35 ` TIF_NOTIFY_RESUME [was Re: [PATCH v2 1/2] task_work_queue: add generic process-context callbacks] David Howells
2012-04-12 17:41   ` Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.