All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] signalfd: a kernel interface for dumping/restoring pending signals
@ 2012-12-24  8:13 ` Andrey Vagin
  0 siblings, 0 replies; 28+ messages in thread
From: Andrey Vagin @ 2012-12-24  8:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: criu, linux-fsdevel, Andrey Vagin, Serge Hallyn, Oleg Nesterov,
	Andrew Morton, Eric W. Biederman, Al Viro, Pavel Emelyanov,
	Cyrill Gorcunov

The idea is simple. We need to get the siginfo for each signal on dump,
and then return it back on restore.

The first problem is that the kernel doesn’t report complete siginfo-s
in user-space. In a signal handler the kernel strips SI_CODE from
siginfo. When a siginfo is received from signalfd, it has a different
format with fixed sizes of fields. The interface of signalfd was
extended. If a signalfd is created with the flag SFD_RAW, it returns
siginfo in a raw format. We need to choose a queue, so two flags
SFD_GROUP and SFD_PRIVATE were added.

rt_sigqueueinfo looks suitable for restoring signals, but it can’t send
siginfo with a positive si_code, because these codes are reserved for
the kernel. In the real world each person has right to do anything with
himself, so I think a process should able to send any siginfo to itself.

Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Pavel Emelyanov <xemul@parallels.com>
CC: Cyrill Gorcunov <gorcunov@openvz.org>

-- 
1.7.11.7


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/4] signalfd: a kernel interface for dumping/restoring pending signals
@ 2012-12-24  8:13 ` Andrey Vagin
  0 siblings, 0 replies; 28+ messages in thread
From: Andrey Vagin @ 2012-12-24  8:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: criu, linux-fsdevel, Andrey Vagin, Serge Hallyn, Oleg Nesterov,
	Andrew Morton, Eric W. Biederman, Al Viro, Pavel Emelyanov,
	Cyrill Gorcunov

The idea is simple. We need to get the siginfo for each signal on dump,
and then return it back on restore.

The first problem is that the kernel doesn’t report complete siginfo-s
in user-space. In a signal handler the kernel strips SI_CODE from
siginfo. When a siginfo is received from signalfd, it has a different
format with fixed sizes of fields. The interface of signalfd was
extended. If a signalfd is created with the flag SFD_RAW, it returns
siginfo in a raw format. We need to choose a queue, so two flags
SFD_GROUP and SFD_PRIVATE were added.

rt_sigqueueinfo looks suitable for restoring signals, but it can’t send
siginfo with a positive si_code, because these codes are reserved for
the kernel. In the real world each person has right to do anything with
himself, so I think a process should able to send any siginfo to itself.

Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Pavel Emelyanov <xemul@parallels.com>
CC: Cyrill Gorcunov <gorcunov@openvz.org>

-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/4] signalfd: add ability to return siginfo in a raw format
  2012-12-24  8:13 ` Andrey Vagin
  (?)
@ 2012-12-24  8:13 ` Andrey Vagin
  2012-12-24 16:53   ` Oleg Nesterov
  -1 siblings, 1 reply; 28+ messages in thread
From: Andrey Vagin @ 2012-12-24  8:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: criu, linux-fsdevel, Andrey Vagin, Alexander Viro,
	Paul E. McKenney, David Howells, Thomas Gleixner, Oleg Nesterov,
	Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov

signalfd should be called with the flag SFD_RAW for that.

signalfd_siginfo is not full for siginfo with a negative si_code.
copy_siginfo_to_user() is copied a full siginfo to user-space, if
si_code is negative.  signalfd_copyinfo() doesn't do that and can't be
expanded, because it has not compatiable format with siginfo_t.

Another problem is that a constant __SI_* is removed from si_code.
It's not a problem for ussual applications, because they expect
a defined type of siginfo (internal logic).
When we want to dump pending signals, we can't predict a type of
siginfo, so we should get it from kernel.

This fuctionality is required for checkpointing pending signals.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
CC: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/signalfd.c                 | 31 +++++++++++++++++++++++++++++--
 include/uapi/linux/signalfd.h |  1 +
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/fs/signalfd.c b/fs/signalfd.c
index b534869..95f1444 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -73,6 +73,21 @@ static unsigned int signalfd_poll(struct file *file, poll_table *wait)
 	return events;
 }
 
+static int signalfd_copy_raw_info(struct signalfd_siginfo __user *siginfo,
+					siginfo_t *kinfo)
+{
+	siginfo_t *uinfo = (siginfo_t *) siginfo;
+	int err;
+
+	BUILD_BUG_ON(sizeof(siginfo_t) != sizeof(struct signalfd_siginfo));
+
+	err = __clear_user(uinfo, sizeof(*uinfo));
+	err |= copy_siginfo_to_user(uinfo, kinfo);
+	err |= __put_user(kinfo->si_code, &uinfo->si_code);
+
+	return err ? -EFAULT: sizeof(*uinfo);
+}
+
 /*
  * Copied from copy_siginfo_to_user() in kernel/signal.c
  */
@@ -205,6 +220,7 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
 	struct signalfd_ctx *ctx = file->private_data;
 	struct signalfd_siginfo __user *siginfo;
 	int nonblock = file->f_flags & O_NONBLOCK;
+	bool raw = file->f_flags & SFD_RAW;
 	ssize_t ret, total = 0;
 	siginfo_t info;
 
@@ -217,9 +233,15 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
 		ret = signalfd_dequeue(ctx, &info, nonblock);
 		if (unlikely(ret <= 0))
 			break;
-		ret = signalfd_copyinfo(siginfo, &info);
+
+		if (raw)
+			ret = signalfd_copy_raw_info(siginfo, &info);
+		else
+			ret = signalfd_copyinfo(siginfo, &info);
+
 		if (ret < 0)
 			break;
+
 		siginfo++;
 		total += ret;
 		nonblock = 1;
@@ -262,7 +284,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
 	BUILD_BUG_ON(SFD_CLOEXEC != O_CLOEXEC);
 	BUILD_BUG_ON(SFD_NONBLOCK != O_NONBLOCK);
 
-	if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK))
+	if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK | SFD_RAW))
 		return -EINVAL;
 
 	if (sizemask != sizeof(sigset_t) ||
@@ -286,6 +308,11 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
 				       O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK)));
 		if (ufd < 0)
 			kfree(ctx);
+		else if (flags & SFD_RAW) {
+			struct fd f = fdget(ufd);
+			f.file->f_flags |= flags & SFD_RAW;
+			fdput(f);
+		}
 	} else {
 		struct fd f = fdget(ufd);
 		if (!f.file)
diff --git a/include/uapi/linux/signalfd.h b/include/uapi/linux/signalfd.h
index 492c6de..bc31849 100644
--- a/include/uapi/linux/signalfd.h
+++ b/include/uapi/linux/signalfd.h
@@ -15,6 +15,7 @@
 /* Flags for signalfd4.  */
 #define SFD_CLOEXEC O_CLOEXEC
 #define SFD_NONBLOCK O_NONBLOCK
+#define SFD_RAW O_DIRECT
 
 struct signalfd_siginfo {
 	__u32 ssi_signo;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/4] signal: add a helper for dequeuing signals from a specified queue
  2012-12-24  8:13 ` Andrey Vagin
  (?)
  (?)
@ 2012-12-24  8:13 ` Andrey Vagin
  2012-12-24 20:52     ` Michael Kerrisk
  -1 siblings, 1 reply; 28+ messages in thread
From: Andrey Vagin @ 2012-12-24  8:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: criu, linux-fsdevel, Andrey Vagin, Ingo Molnar, Peter Zijlstra,
	Serge Hallyn, Oleg Nesterov, Andrew Morton, Eric W. Biederman,
	Al Viro, Pavel Emelyanov, Cyrill Gorcunov

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Pavel Emelyanov <xemul@parallels.com>
CC: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 include/linux/sched.h |  9 ++++++++-
 kernel/signal.c       | 13 +++++++++----
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 0dd42a0..de9a4bf 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2159,7 +2159,14 @@ extern void flush_signals(struct task_struct *);
 extern void __flush_signals(struct task_struct *);
 extern void ignore_signals(struct task_struct *);
 extern void flush_signal_handlers(struct task_struct *, int force_default);
-extern int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info);
+extern int do_dequeue_signal(struct task_struct *tsk,
+				sigset_t *mask, siginfo_t *info, int queue);
+
+static inline int dequeue_signal(struct task_struct *tsk,
+					sigset_t *mask, siginfo_t *info)
+{
+	return do_dequeue_signal(tsk, mask, info, 0);
+}
 
 static inline int dequeue_signal_lock(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
 {
diff --git a/kernel/signal.c b/kernel/signal.c
index 0af8868..91bb76f2 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -597,17 +597,22 @@ static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
  * Dequeue a signal and return the element to the caller, which is
  * expected to free it.
  *
+ * @queue: dequeue from a private queue, if a value is not positive
+ *	   dequeue from a shared queue, if a value if not negative
+ *
  * All callers have to hold the siglock.
  */
-int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
+int do_dequeue_signal(struct task_struct *tsk, sigset_t *mask,
+					siginfo_t *info, int queue)
 {
-	int signr;
+	int signr = 0;
 
 	/* We only dequeue private signals from ourselves, we don't let
 	 * signalfd steal them
 	 */
-	signr = __dequeue_signal(&tsk->pending, mask, info);
-	if (!signr) {
+	if (queue <= 0)
+		signr = __dequeue_signal(&tsk->pending, mask, info);
+	if (!signr && queue >= 0) {
 		signr = __dequeue_signal(&tsk->signal->shared_pending,
 					 mask, info);
 		/*
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/4] signalfd: add ability to choose a private or shared queue
  2012-12-24  8:13 ` Andrey Vagin
                   ` (2 preceding siblings ...)
  (?)
@ 2012-12-24  8:13 ` Andrey Vagin
  2012-12-24 17:05   ` Oleg Nesterov
  2012-12-24 20:53   ` Michael Kerrisk
  -1 siblings, 2 replies; 28+ messages in thread
From: Andrey Vagin @ 2012-12-24  8:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: criu, linux-fsdevel, Andrey Vagin, Alexander Viro,
	Paul E. McKenney, David Howells, Thomas Gleixner, Oleg Nesterov,
	Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov

This patch is added two flags SFD_GROUP and SFD_SHARED.
SFD_SHARED - read signals from a shared queue
SFD_PRIVATE - read signals from a private queue

Without this flags or with both flags signals are read from both queues.

This functionality is requered for dumping pending signals.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
CC: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/signalfd.c                 | 25 +++++++++++++++++++------
 include/uapi/linux/signalfd.h |  2 ++
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/fs/signalfd.c b/fs/signalfd.c
index 95f1444..a35aeda 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -170,13 +170,13 @@ static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
 }
 
 static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, siginfo_t *info,
-				int nonblock)
+				int nonblock, int queue)
 {
 	ssize_t ret;
 	DECLARE_WAITQUEUE(wait, current);
 
 	spin_lock_irq(&current->sighand->siglock);
-	ret = dequeue_signal(current, &ctx->sigmask, info);
+	ret = do_dequeue_signal(current, &ctx->sigmask, info, queue);
 	switch (ret) {
 	case 0:
 		if (!nonblock)
@@ -223,14 +223,21 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
 	bool raw = file->f_flags & SFD_RAW;
 	ssize_t ret, total = 0;
 	siginfo_t info;
+	int queue = 0;
 
 	count /= sizeof(struct signalfd_siginfo);
 	if (!count)
 		return -EINVAL;
 
+	if (file->f_flags & SFD_GROUP)
+		queue++;
+
+	if (file->f_flags & SFD_PRIVATE)
+		queue--;
+
 	siginfo = (struct signalfd_siginfo __user *) buf;
 	do {
-		ret = signalfd_dequeue(ctx, &info, nonblock);
+		ret = signalfd_dequeue(ctx, &info, nonblock, queue);
 		if (unlikely(ret <= 0))
 			break;
 
@@ -274,6 +281,8 @@ static const struct file_operations signalfd_fops = {
 	.llseek		= noop_llseek,
 };
 
+#define SIGNAFD_FLAGS (SFD_RAW | SFD_GROUP | SFD_PRIVATE)
+
 SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
 		size_t, sizemask, int, flags)
 {
@@ -284,7 +293,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
 	BUILD_BUG_ON(SFD_CLOEXEC != O_CLOEXEC);
 	BUILD_BUG_ON(SFD_NONBLOCK != O_NONBLOCK);
 
-	if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK | SFD_RAW))
+	if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK | SIGNAFD_FLAGS))
 		return -EINVAL;
 
 	if (sizemask != sizeof(sigset_t) ||
@@ -308,9 +317,9 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
 				       O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK)));
 		if (ufd < 0)
 			kfree(ctx);
-		else if (flags & SFD_RAW) {
+		else if (flags & SIGNAFD_FLAGS) {
 			struct fd f = fdget(ufd);
-			f.file->f_flags |= flags & SFD_RAW;
+			f.file->f_flags |= flags & SIGNAFD_FLAGS;
 			fdput(f);
 		}
 	} else {
@@ -322,6 +331,10 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
 			fdput(f);
 			return -EINVAL;
 		}
+
+		f.file->f_flags = (f.file->f_flags & ~SIGNAFD_FLAGS) |
+						(flags & SIGNAFD_FLAGS);
+
 		spin_lock_irq(&current->sighand->siglock);
 		ctx->sigmask = sigmask;
 		spin_unlock_irq(&current->sighand->siglock);
diff --git a/include/uapi/linux/signalfd.h b/include/uapi/linux/signalfd.h
index bc31849..9421321 100644
--- a/include/uapi/linux/signalfd.h
+++ b/include/uapi/linux/signalfd.h
@@ -16,6 +16,8 @@
 #define SFD_CLOEXEC O_CLOEXEC
 #define SFD_NONBLOCK O_NONBLOCK
 #define SFD_RAW O_DIRECT
+#define SFD_GROUP O_DIRECTORY
+#define SFD_PRIVATE O_EXCL
 
 struct signalfd_siginfo {
 	__u32 ssi_signo;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/4] signal: allow to send any siginfo to itself
  2012-12-24  8:13 ` Andrey Vagin
                   ` (3 preceding siblings ...)
  (?)
@ 2012-12-24  8:13 ` Andrey Vagin
  -1 siblings, 0 replies; 28+ messages in thread
From: Andrey Vagin @ 2012-12-24  8:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: criu, linux-fsdevel, Andrey Vagin, Serge Hallyn, Oleg Nesterov,
	Andrew Morton, Eric W. Biederman, Al Viro, Pavel Emelyanov,
	Cyrill Gorcunov

A kernel prevents of sending siginfo with positive si_code, because
these codes is reserved for kernel. I think we can allow to send any
siginfo to itself. This operation should not be dangerous.

This functionality is required for restoring signals.

Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Pavel Emelyanov <xemul@parallels.com>
CC: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 kernel/signal.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 91bb76f2..1c83c71 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2957,7 +2957,8 @@ SYSCALL_DEFINE3(rt_sigqueueinfo, pid_t, pid, int, sig,
 	/* Not even root can pretend to send signals from the kernel.
 	 * Nor can they impersonate a kill()/tgkill(), which adds source info.
 	 */
-	if (info.si_code >= 0 || info.si_code == SI_TKILL) {
+	if (((info.si_code >= 0 || info.si_code == SI_TKILL)) &&
+	    (task_pid_vnr(current) != pid)) {
 		/* We used to allow any < 0 si_code */
 		WARN_ON_ONCE(info.si_code < 0);
 		return -EPERM;
@@ -2977,7 +2978,8 @@ long do_rt_tgsigqueueinfo(pid_t tgid, pid_t pid, int sig, siginfo_t *info)
 	/* Not even root can pretend to send signals from the kernel.
 	 * Nor can they impersonate a kill()/tgkill(), which adds source info.
 	 */
-	if (info->si_code >= 0 || info->si_code == SI_TKILL) {
+	if ((info->si_code >= 0 || info->si_code == SI_TKILL) &&
+	    (task_pid_vnr(current) != pid)) {
 		/* We used to allow any < 0 si_code */
 		WARN_ON_ONCE(info->si_code < 0);
 		return -EPERM;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] signalfd: add ability to return siginfo in a raw format
  2012-12-24  8:13 ` [PATCH 1/4] signalfd: add ability to return siginfo in a raw format Andrey Vagin
@ 2012-12-24 16:53   ` Oleg Nesterov
  2012-12-25  8:29     ` Andrey Wagin
  0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2012-12-24 16:53 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-kernel, criu, linux-fsdevel, Alexander Viro,
	Paul E. McKenney, David Howells, Thomas Gleixner,
	Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov

On 12/24, Andrey Vagin wrote:
>
> signalfd should be called with the flag SFD_RAW for that.
>
> signalfd_siginfo is not full for siginfo with a negative si_code.
> copy_siginfo_to_user() is copied a full siginfo to user-space, if
> si_code is negative.  signalfd_copyinfo() doesn't do that and can't be
> expanded, because it has not compatiable format with siginfo_t.

Yes, but otoh perhaps we should change (fix) signalfd_siginfo/copyinfo,
its "default" case makes no sense if si_code < 0.

> Another problem is that a constant __SI_* is removed from si_code.

OK, so you add the additional put_user(kinfo->si_code). Again, in
this case we can extend signalfd_siginfo perhaps...

Anyway, the patch doesn't look right.

> +static int signalfd_copy_raw_info(struct signalfd_siginfo __user *siginfo,
> +					siginfo_t *kinfo)
> +{
> +	siginfo_t *uinfo = (siginfo_t *) siginfo;
> +	int err;
> +
> +	BUILD_BUG_ON(sizeof(siginfo_t) != sizeof(struct signalfd_siginfo));
> +
> +	err = __clear_user(uinfo, sizeof(*uinfo));
> +	err |= copy_siginfo_to_user(uinfo, kinfo);

This probably needs copy_siginfo_to_user32() if is_compat_task...

> +	err |= __put_user(kinfo->si_code, &uinfo->si_code);

__put_user() is not safe? This allows to write to the kernel memory.

> @@ -286,6 +308,11 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
>  				       O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK)));
>  		if (ufd < 0)
>  			kfree(ctx);
> +		else if (flags & SFD_RAW) {
> +			struct fd f = fdget(ufd);
> +			f.file->f_flags |= flags & SFD_RAW;

Well, but this is racy. How we can know that fdget(ufd) still
points to the same file created by anon_inode_getfd? Not to
mention f.file can be NULL.

Another CLONE_FILES thread can do close() right after fd_install().
And it can also do dup3().

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/4] signalfd: add ability to choose a private or shared queue
  2012-12-24  8:13 ` [PATCH 3/4] signalfd: add ability to choose a private or shared queue Andrey Vagin
@ 2012-12-24 17:05   ` Oleg Nesterov
  2012-12-24 20:53   ` Michael Kerrisk
  1 sibling, 0 replies; 28+ messages in thread
From: Oleg Nesterov @ 2012-12-24 17:05 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-kernel, criu, linux-fsdevel, Alexander Viro,
	Paul E. McKenney, David Howells, Thomas Gleixner,
	Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov

On 12/24, Andrey Vagin wrote:
>
>  static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, siginfo_t *info,
> -				int nonblock)
> +				int nonblock, int queue)
>  {
>  	ssize_t ret;
>  	DECLARE_WAITQUEUE(wait, current);
>
>  	spin_lock_irq(&current->sighand->siglock);
> -	ret = dequeue_signal(current, &ctx->sigmask, info);
> +	ret = do_dequeue_signal(current, &ctx->sigmask, info, queue);

Hmm. queue != 0 && !nonblock ?

And in this case you should also update signalfd_poll().

> +	if (file->f_flags & SFD_GROUP)
> +		queue++;
> +
> +	if (file->f_flags & SFD_PRIVATE)
> +		queue--;

To be honest, personally I think this looks ugly. If you add an
argumemt to dequeue_signal() it would be better to use bitmask.
But this is minor.


Well. Perhaps we can avoid all these complications? IIUC, all you
need is to know if the dequeued signal is private or shared. Since
you add SFD_RAW anyway perhaps we can report this fact instead?
This doesn't need any changes in signal.c, signalfd_dequeue() can
easily know which queue dequeue_signal() will use.

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/4] signalfd: a kernel interface for dumping/restoring pending signals
@ 2012-12-24 20:51   ` Michael Kerrisk
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Kerrisk @ 2012-12-24 20:51 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-kernel, criu, linux-fsdevel, Serge Hallyn, Oleg Nesterov,
	Andrew Morton, Eric W. Biederman, Al Viro, Pavel Emelyanov,
	Cyrill Gorcunov, Linux API, Michael Kerrisk

[CC+=linux-api]

Hi Andrey,

On Mon, Dec 24, 2012 at 9:13 AM, Andrey Vagin <avagin@openvz.org> wrote:
> The idea is simple. We need to get the siginfo for each signal on dump,
> and then return it back on restore.
>
> The first problem is that the kernel doesn’t report complete siginfo-s
> in user-space. In a signal handler the kernel strips SI_CODE from
> siginfo. When a siginfo is received from signalfd, it has a different
> format with fixed sizes of fields. The interface of signalfd was
> extended. If a signalfd is created with the flag SFD_RAW, it returns
> siginfo in a raw format. We need to choose a queue, so two flags
> SFD_GROUP and SFD_PRIVATE were added.
>
> rt_sigqueueinfo looks suitable for restoring signals, but it can’t send
> siginfo with a positive si_code, because these codes are reserved for
> the kernel. In the real world each person has right to do anything with
> himself, so I think a process should able to send any siginfo to itself.

I see that you CCed me on a couple of the patches in this series.
Thanks for that. But, since most of the series concerns API changes,
it would be good also to CC linux-api on (future revisions of) the
entire series.

I have a few largely cosmetic comments on some of the patches, which
I'll add in the respective threads.

Thanks,

Michael


> Cc: Serge Hallyn <serge.hallyn@canonical.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Pavel Emelyanov <xemul@parallels.com>
> CC: Cyrill Gorcunov <gorcunov@openvz.org>
>
> --
> 1.7.11.7
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/4] signalfd: a kernel interface for dumping/restoring pending signals
@ 2012-12-24 20:51   ` Michael Kerrisk
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Kerrisk @ 2012-12-24 20:51 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Serge Hallyn,
	Oleg Nesterov, Andrew Morton, Eric W. Biederman, Al Viro,
	Pavel Emelyanov, Cyrill Gorcunov, Linux API, Michael Kerrisk

[CC+=linux-api]

Hi Andrey,

On Mon, Dec 24, 2012 at 9:13 AM, Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> wrote:
> The idea is simple. We need to get the siginfo for each signal on dump,
> and then return it back on restore.
>
> The first problem is that the kernel doesn’t report complete siginfo-s
> in user-space. In a signal handler the kernel strips SI_CODE from
> siginfo. When a siginfo is received from signalfd, it has a different
> format with fixed sizes of fields. The interface of signalfd was
> extended. If a signalfd is created with the flag SFD_RAW, it returns
> siginfo in a raw format. We need to choose a queue, so two flags
> SFD_GROUP and SFD_PRIVATE were added.
>
> rt_sigqueueinfo looks suitable for restoring signals, but it can’t send
> siginfo with a positive si_code, because these codes are reserved for
> the kernel. In the real world each person has right to do anything with
> himself, so I think a process should able to send any siginfo to itself.

I see that you CCed me on a couple of the patches in this series.
Thanks for that. But, since most of the series concerns API changes,
it would be good also to CC linux-api on (future revisions of) the
entire series.

I have a few largely cosmetic comments on some of the patches, which
I'll add in the respective threads.

Thanks,

Michael


> Cc: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> Cc: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> Cc: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> Cc: Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
> Cc: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> CC: Cyrill Gorcunov <gorcunov-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
>
> --
> 1.7.11.7
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/4] signal: add a helper for dequeuing signals from a specified queue
@ 2012-12-24 20:52     ` Michael Kerrisk
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Kerrisk @ 2012-12-24 20:52 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-kernel, criu, linux-fsdevel, Ingo Molnar, Peter Zijlstra,
	Serge Hallyn, Oleg Nesterov, Andrew Morton, Eric W. Biederman,
	Al Viro, Pavel Emelyanov, Cyrill Gorcunov, linux-man,
	Michael Kerrisk

[CC+=linux-api]

On Mon, Dec 24, 2012 at 9:13 AM, Andrey Vagin <avagin@openvz.org> wrote:
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Serge Hallyn <serge.hallyn@canonical.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Cc: Al Viro <viro@zeniv.linux.org.uk>
> Cc: Pavel Emelyanov <xemul@parallels.com>
> CC: Cyrill Gorcunov <gorcunov@openvz.org>
> Signed-off-by: Andrey Vagin <avagin@openvz.org>
> ---
>  include/linux/sched.h |  9 ++++++++-
>  kernel/signal.c       | 13 +++++++++----
>  2 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 0dd42a0..de9a4bf 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2159,7 +2159,14 @@ extern void flush_signals(struct task_struct *);
>  extern void __flush_signals(struct task_struct *);
>  extern void ignore_signals(struct task_struct *);
>  extern void flush_signal_handlers(struct task_struct *, int force_default);
> -extern int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info);
> +extern int do_dequeue_signal(struct task_struct *tsk,
> +                               sigset_t *mask, siginfo_t *info, int queue);
> +
> +static inline int dequeue_signal(struct task_struct *tsk,
> +                                       sigset_t *mask, siginfo_t *info)
> +{
> +       return do_dequeue_signal(tsk, mask, info, 0);
> +}
>
>  static inline int dequeue_signal_lock(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
>  {
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 0af8868..91bb76f2 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -597,17 +597,22 @@ static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
>   * Dequeue a signal and return the element to the caller, which is
>   * expected to free it.
>   *
> + * @queue: dequeue from a private queue, if a value is not positive

if a value is not positive ==> if value is nonpositive

> + *        dequeue from a shared queue, if a value if not negative

if a value is not negative ==> if value is nonnegative

Thanks,

Michael



-- 
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/4] signal: add a helper for dequeuing signals from a specified queue
@ 2012-12-24 20:52     ` Michael Kerrisk
  0 siblings, 0 replies; 28+ messages in thread
From: Michael Kerrisk @ 2012-12-24 20:52 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, criu-GEFAQzZX7r8dnm+yROfE0A,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Ingo Molnar,
	Peter Zijlstra, Serge Hallyn, Oleg Nesterov, Andrew Morton,
	Eric W. Biederman, Al Viro, Pavel Emelyanov, Cyrill Gorcunov,
	linux-man-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk

[CC+=linux-api]

On Mon, Dec 24, 2012 at 9:13 AM, Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> wrote:
> Cc: Ingo Molnar <mingo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> Cc: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
> Cc: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> Cc: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> Cc: Al Viro <viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
> Cc: Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
> CC: Cyrill Gorcunov <gorcunov-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
> Signed-off-by: Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
> ---
>  include/linux/sched.h |  9 ++++++++-
>  kernel/signal.c       | 13 +++++++++----
>  2 files changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 0dd42a0..de9a4bf 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2159,7 +2159,14 @@ extern void flush_signals(struct task_struct *);
>  extern void __flush_signals(struct task_struct *);
>  extern void ignore_signals(struct task_struct *);
>  extern void flush_signal_handlers(struct task_struct *, int force_default);
> -extern int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info);
> +extern int do_dequeue_signal(struct task_struct *tsk,
> +                               sigset_t *mask, siginfo_t *info, int queue);
> +
> +static inline int dequeue_signal(struct task_struct *tsk,
> +                                       sigset_t *mask, siginfo_t *info)
> +{
> +       return do_dequeue_signal(tsk, mask, info, 0);
> +}
>
>  static inline int dequeue_signal_lock(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
>  {
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 0af8868..91bb76f2 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -597,17 +597,22 @@ static int __dequeue_signal(struct sigpending *pending, sigset_t *mask,
>   * Dequeue a signal and return the element to the caller, which is
>   * expected to free it.
>   *
> + * @queue: dequeue from a private queue, if a value is not positive

if a value is not positive ==> if value is nonpositive

> + *        dequeue from a shared queue, if a value if not negative

if a value is not negative ==> if value is nonnegative

Thanks,

Michael



-- 
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/4] signalfd: add ability to choose a private or shared queue
  2012-12-24  8:13 ` [PATCH 3/4] signalfd: add ability to choose a private or shared queue Andrey Vagin
  2012-12-24 17:05   ` Oleg Nesterov
@ 2012-12-24 20:53   ` Michael Kerrisk
  1 sibling, 0 replies; 28+ messages in thread
From: Michael Kerrisk @ 2012-12-24 20:53 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: linux-kernel, criu, linux-fsdevel, Alexander Viro,
	Paul E. McKenney, David Howells, Thomas Gleixner, Oleg Nesterov,
	Pavel Emelyanov, Cyrill Gorcunov, Michael Kerrisk, Linux API

[CC+=linux-api]

Andrey,

(Among other things, some suggested wording fixes below to improve the
eventual commit log entry).

On Mon, Dec 24, 2012 at 9:13 AM, Andrey Vagin <avagin@openvz.org> wrote:
> This patch is added two flags SFD_GROUP and SFD_SHARED.
> SFD_SHARED - read signals from a shared queue
> SFD_PRIVATE - read signals from a private queue

These names and comments are not quite meaningful I find. Also, you
use SFD_SHARED here, but the name below is SFD_GROUP. I had to read
further to confirm what I guessed the names meant. How about these
names and comments:

SFD_SHARED_QUEUE - read signals from a shared (process wide) queue
SFD_PER_THREAD_QUEUE - read signals from a per-thread queue

> Without this flags or with both flags signals are read from both queues.

Without these flags

> This functionality is requered for dumping pending signals.

"required"

> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Cc: David Howells <dhowells@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Cc: Pavel Emelyanov <xemul@parallels.com>
> CC: Cyrill Gorcunov <gorcunov@openvz.org>
> Signed-off-by: Andrey Vagin <avagin@openvz.org>
> ---
>  fs/signalfd.c                 | 25 +++++++++++++++++++------
>  include/uapi/linux/signalfd.h |  2 ++
>  2 files changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/fs/signalfd.c b/fs/signalfd.c
> index 95f1444..a35aeda 100644
> --- a/fs/signalfd.c
> +++ b/fs/signalfd.c
> @@ -170,13 +170,13 @@ static int signalfd_copyinfo(struct signalfd_siginfo __user *uinfo,
>  }
>
>  static ssize_t signalfd_dequeue(struct signalfd_ctx *ctx, siginfo_t *info,
> -                               int nonblock)
> +                               int nonblock, int queue)
>  {
>         ssize_t ret;
>         DECLARE_WAITQUEUE(wait, current);
>
>         spin_lock_irq(&current->sighand->siglock);
> -       ret = dequeue_signal(current, &ctx->sigmask, info);
> +       ret = do_dequeue_signal(current, &ctx->sigmask, info, queue);
>         switch (ret) {
>         case 0:
>                 if (!nonblock)
> @@ -223,14 +223,21 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
>         bool raw = file->f_flags & SFD_RAW;
>         ssize_t ret, total = 0;
>         siginfo_t info;
> +       int queue = 0;
>
>         count /= sizeof(struct signalfd_siginfo);
>         if (!count)
>                 return -EINVAL;
>
> +       if (file->f_flags & SFD_GROUP)
> +               queue++;
> +
> +       if (file->f_flags & SFD_PRIVATE)
> +               queue--;
> +
>         siginfo = (struct signalfd_siginfo __user *) buf;
>         do {
> -               ret = signalfd_dequeue(ctx, &info, nonblock);
> +               ret = signalfd_dequeue(ctx, &info, nonblock, queue);
>                 if (unlikely(ret <= 0))
>                         break;
>
> @@ -274,6 +281,8 @@ static const struct file_operations signalfd_fops = {
>         .llseek         = noop_llseek,
>  };
>
> +#define SIGNAFD_FLAGS (SFD_RAW | SFD_GROUP | SFD_PRIVATE)

This name is rather obtuse. What is the purpose of grouping these
three flags like this? Yes, I understand how you use the name below,
but the grouping seems arbitrary.

Why not have a grouping of all SFD_ 5 flags?

#define SFD_ALL_FLAGS (SFD_CLOEXEC | SFD_NONBLOCK | SFD_RAW |
SFD_GROUP | SFD_PRIVATE)

And use that appropriately below.

See also the comments below about individual flags.


> +
>  SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
>                 size_t, sizemask, int, flags)
>  {
> @@ -284,7 +293,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
>         BUILD_BUG_ON(SFD_CLOEXEC != O_CLOEXEC);
>         BUILD_BUG_ON(SFD_NONBLOCK != O_NONBLOCK);
>
> -       if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK | SFD_RAW))
> +       if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK | SIGNAFD_FLAGS))
>                 return -EINVAL;
>
>         if (sizemask != sizeof(sigset_t) ||
> @@ -308,9 +317,9 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
>                                        O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK)));
>                 if (ufd < 0)
>                         kfree(ctx);
> -               else if (flags & SFD_RAW) {
> +               else if (flags & SIGNAFD_FLAGS) {
>                         struct fd f = fdget(ufd);
> -                       f.file->f_flags |= flags & SFD_RAW;
> +                       f.file->f_flags |= flags & SIGNAFD_FLAGS;
>                         fdput(f);
>                 }
>         } else {
> @@ -322,6 +331,10 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
>                         fdput(f);
>                         return -EINVAL;
>                 }
> +
> +               f.file->f_flags = (f.file->f_flags & ~SIGNAFD_FLAGS) |
> +                                               (flags & SIGNAFD_FLAGS);
> +
>                 spin_lock_irq(&current->sighand->siglock);
>                 ctx->sigmask = sigmask;
>                 spin_unlock_irq(&current->sighand->siglock);
> diff --git a/include/uapi/linux/signalfd.h b/include/uapi/linux/signalfd.h
> index bc31849..9421321 100644
> --- a/include/uapi/linux/signalfd.h
> +++ b/include/uapi/linux/signalfd.h
> @@ -16,6 +16,8 @@
>  #define SFD_CLOEXEC O_CLOEXEC
>  #define SFD_NONBLOCK O_NONBLOCK
>  #define SFD_RAW O_DIRECT
> +#define SFD_GROUP O_DIRECTORY
> +#define SFD_PRIVATE O_EXCL

What's the reason for making these flags synonyms of existing O_*
flags, as opposed to just choosing suitable numeric values. (There is
a rationale for this, in the case of CLOEXEC and NONBLOCK, as
indicated by the names, but that rationale doesn't extend to the three
new flags.) Using synonyms here suggests that there's a relationship
between SFD_RAW and O_DIRECT and so on, when there isn't.

>  struct signalfd_siginfo {
>         __u32 ssi_signo;
> --
> 1.7.11.7

Cheers,

Michael

-- 
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] signalfd: add ability to return siginfo in a raw format
  2012-12-24 16:53   ` Oleg Nesterov
@ 2012-12-25  8:29     ` Andrey Wagin
  2012-12-25 14:30       ` Oleg Nesterov
  0 siblings, 1 reply; 28+ messages in thread
From: Andrey Wagin @ 2012-12-25  8:29 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, criu, linux-fsdevel, Alexander Viro,
	Paul E. McKenney, David Howells, Thomas Gleixner,
	Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov

2012/12/24 Oleg Nesterov <oleg@redhat.com>:
> On 12/24, Andrey Vagin wrote:
>>
>> signalfd should be called with the flag SFD_RAW for that.
>>
>> signalfd_siginfo is not full for siginfo with a negative si_code.
>> copy_siginfo_to_user() is copied a full siginfo to user-space, if
>> si_code is negative.  signalfd_copyinfo() doesn't do that and can't be
>> expanded, because it has not compatiable format with siginfo_t.
>
> Yes, but otoh perhaps we should change (fix) signalfd_siginfo/copyinfo,
> its "default" case makes no sense if si_code < 0.

Its "default" case makes sense if a signal is sent by sigqueue(pid,sig,ptr).

I'm afraid, we can change (fix) signalfd_copyinfo, because for
negative si_code a whole siginfo should be copied to userspace.
Currently if si_code is unknown, signalfd_copyinfo sets only ssi_ptr
and that can't be changed due to backward compatibility. ssi_ptr is in
the midle of signalfd_siginfo and a sizeof(signalfd_siginfo) is equal
to sizeof(siginfo_t). We don't have space to copied siginfo into
signalfd_siginfo.

If we want to have another format with SFD_RAW, I prefer to have
siginfo_t instead of signalfd_siginfo. Because if si_code is negative,
it should be siginfo_t in any case. A minor thing is that it can be
sent back without modifications.

Oleg, thank you for the comments.

>
>> Another problem is that a constant __SI_* is removed from si_code.
>
> OK, so you add the additional put_user(kinfo->si_code). Again, in
> this case we can extend signalfd_siginfo perhaps...
>
> Anyway, the patch doesn't look right.
>
>> +static int signalfd_copy_raw_info(struct signalfd_siginfo __user *siginfo,
>> +                                     siginfo_t *kinfo)
>> +{
>> +     siginfo_t *uinfo = (siginfo_t *) siginfo;
>> +     int err;
>> +
>> +     BUILD_BUG_ON(sizeof(siginfo_t) != sizeof(struct signalfd_siginfo));
>> +
>> +     err = __clear_user(uinfo, sizeof(*uinfo));
>> +     err |= copy_siginfo_to_user(uinfo, kinfo);
>
> This probably needs copy_siginfo_to_user32() if is_compat_task...
>
>> +     err |= __put_user(kinfo->si_code, &uinfo->si_code);
>
> __put_user() is not safe? This allows to write to the kernel memory.
>
>> @@ -286,6 +308,11 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
>>                                      O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK)));
>>               if (ufd < 0)
>>                       kfree(ctx);
>> +             else if (flags & SFD_RAW) {
>> +                     struct fd f = fdget(ufd);
>> +                     f.file->f_flags |= flags & SFD_RAW;
>
> Well, but this is racy. How we can know that fdget(ufd) still
> points to the same file created by anon_inode_getfd? Not to
> mention f.file can be NULL.
>
> Another CLONE_FILES thread can do close() right after fd_install().
> And it can also do dup3().
>
> Oleg.
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] signalfd: add ability to return siginfo in a raw format
  2012-12-25  8:29     ` Andrey Wagin
@ 2012-12-25 14:30       ` Oleg Nesterov
  2012-12-25 15:27         ` Oleg Nesterov
  0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2012-12-25 14:30 UTC (permalink / raw)
  To: Andrey Wagin
  Cc: linux-kernel, criu, linux-fsdevel, Alexander Viro,
	Paul E. McKenney, David Howells, Thomas Gleixner,
	Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov

On 12/25, Andrey Wagin wrote:
>
> 2012/12/24 Oleg Nesterov <oleg@redhat.com>:
> > On 12/24, Andrey Vagin wrote:
> >>
> >> signalfd should be called with the flag SFD_RAW for that.
> >>
> >> signalfd_siginfo is not full for siginfo with a negative si_code.
> >> copy_siginfo_to_user() is copied a full siginfo to user-space, if
> >> si_code is negative.  signalfd_copyinfo() doesn't do that and can't be
> >> expanded, because it has not compatiable format with siginfo_t.
> >
> > Yes, but otoh perhaps we should change (fix) signalfd_siginfo/copyinfo,
> > its "default" case makes no sense if si_code < 0.
>
> Its "default" case makes sense if a signal is sent by sigqueue(pid,sig,ptr).

But it doesn't really work, that is what I was trying to say. And that
is why you want copy_siginfo_to_user ;)

> I'm afraid, we can change (fix) signalfd_copyinfo, because for
> negative si_code a whole siginfo should be copied to userspace.

Exactly, this is what I meant. We simply do not know what this info
contains if it was sent by sigqueueinfo().

> Currently if si_code is unknown, signalfd_copyinfo sets only ssi_ptr
> and that can't be changed due to backward compatibility. ssi_ptr is in
> the midle of signalfd_siginfo and a sizeof(signalfd_siginfo) is equal
> to sizeof(siginfo_t). We don't have space to copied siginfo into
> signalfd_siginfo.

Yes, I understand.

> If we want to have another format with SFD_RAW, I prefer to have
> siginfo_t instead of signalfd_siginfo. Because if si_code is negative,
> it should be siginfo_t in any case. A minor thing is that it can be
> sent back without modifications.

"without modifications" is not actually true, your patch changes the
meaning of ->si_code. Yes, I understand why do you do this, and I am
not going to argue. But it looks a bit sad that, say, sigtimedwait()
and read(SFD_RAW) will return the "same" siginfo_t except the subtle
difference in ->si_code.

What I am trying to say, is that SFD_RAW should be named
SFD_signalfd_siginfo_SUCKS_BUT_WE_CANT_CHANGE_IT_FOR_COMPATIBILITY ;) So
you need another format. And if we add another format we should think
twice. For example, if it is _RAW, why we can't simply do memcpy()
instead of copy_siginfo_to_user() ? Not that I really suggest this.

And if we change the meaning of ->si_code then, perhaps, we should
think what else we should change to avoid SFD_RAW_RAW in future.
Just for example, we can set MSB in ->si_signo if the signal was
private.

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] signalfd: add ability to return siginfo in a raw format
  2012-12-25 14:30       ` Oleg Nesterov
@ 2012-12-25 15:27         ` Oleg Nesterov
  2012-12-25 15:40           ` Pavel Emelyanov
  0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2012-12-25 15:27 UTC (permalink / raw)
  To: Andrey Wagin
  Cc: linux-kernel, criu, linux-fsdevel, Alexander Viro,
	Paul E. McKenney, David Howells, Thomas Gleixner,
	Michael Kerrisk, Pavel Emelyanov, Cyrill Gorcunov

On 12/25, Oleg Nesterov wrote:
>
> And if we add another format we should think
> twice.

And btw this applies to the whole series.

I guess that probably you actually need DUMP, not DEQUEUE. but the
latter is not trivial. However, perhaps we can do this assuming that
all other threads are sleeping and nobody can do dequeue_signal().
Say, we can play with ppos/llseek. If *ppos is not zero,
signalfd_dequeue() could dump the nth entry from list or return 0.

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] signalfd: add ability to return siginfo in a raw format
  2012-12-25 15:27         ` Oleg Nesterov
@ 2012-12-25 15:40           ` Pavel Emelyanov
  2012-12-25 16:58             ` Oleg Nesterov
  0 siblings, 1 reply; 28+ messages in thread
From: Pavel Emelyanov @ 2012-12-25 15:40 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrey Wagin, linux-kernel, criu, linux-fsdevel, Alexander Viro,
	Paul E. McKenney, David Howells, Thomas Gleixner,
	Michael Kerrisk, Cyrill Gorcunov

On 12/25/2012 07:27 PM, Oleg Nesterov wrote:
> On 12/25, Oleg Nesterov wrote:
>>
>> And if we add another format we should think
>> twice.
> 
> And btw this applies to the whole series.
> 
> I guess that probably you actually need DUMP, not DEQUEUE. but the
> latter is not trivial. However, perhaps we can do this assuming that
> all other threads are sleeping and nobody can do dequeue_signal().
> Say, we can play with ppos/llseek. If *ppos is not zero,
> signalfd_dequeue() could dump the nth entry from list or return 0.

This would be perfect, but isn't it better to preserve the pos
semantics -- we do know size of entry we're about to copy, we can
treat pos as offset in bytes, not in elements.

> Oleg.
> 
> .
> 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] signalfd: add ability to return siginfo in a raw format
  2012-12-25 15:40           ` Pavel Emelyanov
@ 2012-12-25 16:58             ` Oleg Nesterov
  2012-12-26 14:47               ` [CRIU] " Andrew Vagin
  0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2012-12-25 16:58 UTC (permalink / raw)
  To: Pavel Emelyanov
  Cc: Andrey Wagin, linux-kernel, criu, linux-fsdevel, Alexander Viro,
	Paul E. McKenney, David Howells, Thomas Gleixner,
	Michael Kerrisk, Cyrill Gorcunov

On 12/25, Pavel Emelyanov wrote:
>
> On 12/25/2012 07:27 PM, Oleg Nesterov wrote:
> >
> > I guess that probably you actually need DUMP, not DEQUEUE. but the
> > latter is not trivial. However, perhaps we can do this assuming that
> > all other threads are sleeping and nobody can do dequeue_signal().
> > Say, we can play with ppos/llseek. If *ppos is not zero,
> > signalfd_dequeue() could dump the nth entry from list or return 0.
>
> This would be perfect, but isn't it better to preserve the pos
> semantics -- we do know size of entry we're about to copy, we can
> treat pos as offset in bytes, not in elements.

nr-of-records looks better (more flexible) than nr-of-bytes to me. And
perhaps we can also encode private-or-shared into ppos. But I will not
argue in any case.

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [CRIU] [PATCH 1/4] signalfd: add ability to return siginfo in a raw format
  2012-12-25 16:58             ` Oleg Nesterov
@ 2012-12-26 14:47               ` Andrew Vagin
  2012-12-26 16:31                 ` Oleg Nesterov
  0 siblings, 1 reply; 28+ messages in thread
From: Andrew Vagin @ 2012-12-26 14:47 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Pavel Emelyanov, David Howells, linux-kernel, criu,
	Cyrill Gorcunov, Andrey Wagin, Alexander Viro, linux-fsdevel,
	Thomas Gleixner, Paul E. McKenney, Michael Kerrisk

[-- Attachment #1: Type: text/plain, Size: 1129 bytes --]

On Tue, Dec 25, 2012 at 05:58:03PM +0100, Oleg Nesterov wrote:
> On 12/25, Pavel Emelyanov wrote:
> >
> > On 12/25/2012 07:27 PM, Oleg Nesterov wrote:
> > >
> > > I guess that probably you actually need DUMP, not DEQUEUE. but the
> > > latter is not trivial. However, perhaps we can do this assuming that
> > > all other threads are sleeping and nobody can do dequeue_signal().
> > > Say, we can play with ppos/llseek. If *ppos is not zero,
> > > signalfd_dequeue() could dump the nth entry from list or return 0.
> >
> > This would be perfect, but isn't it better to preserve the pos
> > semantics -- we do know size of entry we're about to copy, we can
> > treat pos as offset in bytes, not in elements.
> 
> nr-of-records looks better (more flexible) than nr-of-bytes to me. And
> perhaps we can also encode private-or-shared into ppos. But I will not
> argue in any case.

Oleg and Pavel, could you look at these two patches. I implemented in them,
what you described here.

> 
> Oleg.
> 
> _______________________________________________
> CRIU mailing list
> CRIU@openvz.org
> http://lists.openvz.org/mailman/listinfo/criu

[-- Attachment #2: 0001-signal-add-helper-to-get-siginfo-without-removing-fr.patch --]
[-- Type: text/plain, Size: 1944 bytes --]

>From 2b8b475b39f41ca65e623905b2f7d2d9348bfa84 Mon Sep 17 00:00:00 2001
From: Andrey Vagin <avagin@openvz.org>
Date: Thu, 29 Nov 2012 20:51:49 +0400
Subject: [PATCH 1/2] signal: add helper to get siginfo without removing from
 the queue

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 include/linux/sched.h |  2 ++
 kernel/signal.c       | 28 ++++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 206bb08..a907854 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2222,6 +2222,8 @@ extern void __flush_signals(struct task_struct *);
 extern void ignore_signals(struct task_struct *);
 extern void flush_signal_handlers(struct task_struct *, int force_default);
 extern int dequeue_signal(struct task_struct *tsk, sigset_t *mask, siginfo_t *info);
+extern int peek_signal(struct task_struct *tsk, sigset_t *mask,
+				siginfo_t *info, int offset, bool group);
 
 static inline int dequeue_signal_lock(struct task_struct *tsk, sigset_t *mask, siginfo_t *info)
 {
diff --git a/kernel/signal.c b/kernel/signal.c
index ac5f5e7..aa71213 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -536,6 +536,34 @@ unblock_all_signals(void)
 	spin_unlock_irqrestore(&current->sighand->siglock, flags);
 }
 
+int peek_signal(struct task_struct *tsk, sigset_t *mask,
+			siginfo_t *info, int offset, bool group)
+{
+	struct sigpending *pending;
+	struct sigqueue *q;
+	int i = 0, ret = 0;
+
+	if (group)
+		pending = &tsk->signal->shared_pending;
+	else
+		pending = &tsk->pending;
+
+	list_for_each_entry(q, &pending->list, list) {
+		if (sigismember(mask, q->info.si_signo))
+			continue;
+
+		if (i == offset) {
+			copy_siginfo(info, &q->info);
+			ret = info->si_signo;
+			break;
+		}
+
+		i++;
+	}
+
+	return ret;
+}
+
 static void collect_signal(int sig, struct sigpending *list, siginfo_t *info)
 {
 	struct sigqueue *q, *first = NULL;
-- 
1.7.11.7


[-- Attachment #3: 0002-signalfd-add-ability-to-get-signal-without-removing-.patch --]
[-- Type: text/plain, Size: 3410 bytes --]

>From 6bf48dd6a77261af0daaa9be19cd13a9a11a008d Mon Sep 17 00:00:00 2001
From: Andrey Vagin <avagin@openvz.org>
Date: Wed, 26 Dec 2012 13:45:54 +0400
Subject: [PATCH 2/2] signalfd: add ability to get signal without removing
 from the queue

lseek sets a sequence number of signal in a queue, then read() returns
siginfo if a signal is exists, otherwise it returns 0. All signals
remain in a queue.

If lseek sets a positive position, signals are taken from a shared queue.
If lseek sets a negative position, signals are taken from a private queue.
If ppos is zero (default), signalfd dequeues signals.

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/signalfd.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/fs/signalfd.c b/fs/signalfd.c
index ee60d5f..0ce6fb3 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -48,8 +48,21 @@ void signalfd_cleanup(struct sighand_struct *sighand)
 
 struct signalfd_ctx {
 	sigset_t sigmask;
+	loff_t peek_offset;
 };
 
+static ssize_t signalfd_peek(struct signalfd_ctx *ctx, siginfo_t *info)
+{
+       int ret;
+
+       spin_lock_irq(&current->sighand->siglock);
+       ret = peek_signal(current, &ctx->sigmask, info,
+			abs(ctx->peek_offset) - 1, ctx->peek_offset > 0);
+       spin_unlock_irq(&current->sighand->siglock);
+
+       return ret;
+}
+
 static int signalfd_release(struct inode *inode, struct file *file)
 {
 	kfree(file->private_data);
@@ -230,7 +243,11 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
 
 	siginfo = (struct signalfd_siginfo __user *) buf;
 	do {
-		ret = signalfd_dequeue(ctx, &info, nonblock);
+		if (ctx->peek_offset == 0)
+			ret = signalfd_dequeue(ctx, &info, nonblock);
+		else
+			ret = signalfd_peek(ctx, &info);
+
 		if (unlikely(ret <= 0))
 			break;
 
@@ -242,6 +259,13 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
 		if (ret < 0)
 			break;
 
+		if (ctx->peek_offset) {
+			if (ctx->peek_offset > 0)
+				ctx->peek_offset++;
+			else
+				ctx->peek_offset--;
+		}
+
 		siginfo++;
 		total += ret;
 		nonblock = 1;
@@ -264,6 +288,24 @@ static int signalfd_show_fdinfo(struct seq_file *m, struct file *f)
 }
 #endif
 
+loff_t signalfd_llseek(struct file *f, loff_t offset, int whence)
+{
+	struct signalfd_ctx *ctx = f->private_data;
+
+	switch (whence) {
+	case SEEK_SET:
+		ctx->peek_offset = offset;
+		break;
+	case SEEK_CUR:
+		ctx->peek_offset += offset;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return ctx->peek_offset;
+}
+
 static const struct file_operations signalfd_fops = {
 #ifdef CONFIG_PROC_FS
 	.show_fdinfo	= signalfd_show_fdinfo,
@@ -271,7 +313,7 @@ static const struct file_operations signalfd_fops = {
 	.release	= signalfd_release,
 	.poll		= signalfd_poll,
 	.read		= signalfd_read,
-	.llseek		= noop_llseek,
+	.llseek		= signalfd_llseek,
 };
 
 SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
@@ -300,6 +342,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
 			return -ENOMEM;
 
 		ctx->sigmask = sigmask;
+		ctx->peek_offset = 0;
 
 		ufd = get_unused_fd_flags(flags);
 		if (ufd < 0) {
@@ -321,6 +364,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
 		}
 
 		file->f_flags |= flags & SFD_RAW;
+		file->f_mode |= FMODE_LSEEK;
 
 		fd_install(ufd, file);
 	} else {
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [CRIU] [PATCH 1/4] signalfd: add ability to return siginfo in a raw format
  2012-12-26 14:47               ` [CRIU] " Andrew Vagin
@ 2012-12-26 16:31                 ` Oleg Nesterov
  2012-12-27 14:36                   ` Andrey Wagin
  0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2012-12-26 16:31 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: Pavel Emelyanov, David Howells, linux-kernel, criu,
	Cyrill Gorcunov, Andrey Wagin, Alexander Viro, linux-fsdevel,
	Thomas Gleixner, Paul E. McKenney, Michael Kerrisk

On 12/26, Andrew Vagin wrote:
>
> On Tue, Dec 25, 2012 at 05:58:03PM +0100, Oleg Nesterov wrote:
> > On 12/25, Pavel Emelyanov wrote:
> > >
> > > On 12/25/2012 07:27 PM, Oleg Nesterov wrote:
> > > >
> > > > I guess that probably you actually need DUMP, not DEQUEUE. but the
> > > > latter is not trivial. However, perhaps we can do this assuming that
> > > > all other threads are sleeping and nobody can do dequeue_signal().
> > > > Say, we can play with ppos/llseek. If *ppos is not zero,
> > > > signalfd_dequeue() could dump the nth entry from list or return 0.
> > >
> > > This would be perfect, but isn't it better to preserve the pos
> > > semantics -- we do know size of entry we're about to copy, we can
> > > treat pos as offset in bytes, not in elements.
> >
> > nr-of-records looks better (more flexible) than nr-of-bytes to me. And
> > perhaps we can also encode private-or-shared into ppos. But I will not
> > argue in any case.
>
> Oleg and Pavel, could you look at these two patches. I implemented in them,
> what you described here.

cosmetics nits below, feel free to ignore...

Damn. But after I wrote this email I realized that llseek() probably can't
work. Because peek_offset/f_pos/whatever has to be shared with all processes
which have this file opened.

Suppose that the task forks after sys_signalfd(). Now if parent or child
do llseek this affects them both. This is insane because signalfd is
"strange" to say at least, fork/dup/etc inherits signalfd_ctx but not the
"source" of the data.

So I think we should not use llseek. But, probably we can rely on pread() ?
This way we can avoid the problem above, and this looks even simpler.

> +int peek_signal(struct task_struct *tsk, sigset_t *mask,
> +			siginfo_t *info, int offset, bool group)
> +{
> +	struct sigpending *pending;
> +	struct sigqueue *q;
> +	int i = 0, ret = 0;
> +
> +	if (group)
> +		pending = &tsk->signal->shared_pending;
> +	else
> +		pending = &tsk->pending;
> +
> +	list_for_each_entry(q, &pending->list, list) {
> +		if (sigismember(mask, q->info.si_signo))
> +			continue;
> +
> +		if (i == offset) {
> +			copy_siginfo(info, &q->info);
> +			ret = info->si_signo;
> +			break;
> +		}
> +
> +		i++;
> +	}
> +
> +	return ret;
> +}

This helper is trivial. Any reason it should live in signal.c ? Just put
this code into signalfd_peek(). Besides, this helps if !CONFIG_SIGNALFD.

> If lseek sets a positive position, signals are taken from a shared queue.
> If lseek sets a negative position, signals are taken from a private queue.

Personally, I'd prefer, say,

	#define SIGNALFD_SHARED_OFFSET		big-enough-number

if offset >= SIGNALFD_SHARED_OFFSET we use ->shared_pending.
Again, I won't insist. And if we add SIGNALFD_SHARED_OFFSET
then we should probably define SIGNALFD_PRIVATE_OFFSET as
well for consistency.

>  struct signalfd_ctx {
>  	sigset_t sigmask;
> +	loff_t peek_offset;

Why we can't simply use ->f_pos ?

> @@ -242,6 +259,13 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
>  		if (ret < 0)
>  			break;
>
> +		if (ctx->peek_offset) {
> +			if (ctx->peek_offset > 0)
> +				ctx->peek_offset++;
> +			else
> +				ctx->peek_offset--;

...

> +loff_t signalfd_llseek(struct file *f, loff_t offset, int whence)
> +{
> +	struct signalfd_ctx *ctx = f->private_data;
> +
> +	switch (whence) {
> +	case SEEK_SET:
> +		ctx->peek_offset = offset;
> +		break;
> +	case SEEK_CUR:
> +		ctx->peek_offset += offset;

probably you need some locking (say, f_lock) to ensure that these
peek_offset modifications can't race with each other. If you rely
on f_pos you only need to ensure thar signalfd_llseek() can't race
with itself.

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [CRIU] [PATCH 1/4] signalfd: add ability to return siginfo in a raw format
  2012-12-26 16:31                 ` Oleg Nesterov
@ 2012-12-27 14:36                   ` Andrey Wagin
  2012-12-27 15:30                     ` Oleg Nesterov
  0 siblings, 1 reply; 28+ messages in thread
From: Andrey Wagin @ 2012-12-27 14:36 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Vagin, Pavel Emelyanov, David Howells, linux-kernel, criu,
	Cyrill Gorcunov, Alexander Viro, linux-fsdevel, Thomas Gleixner,
	Paul E. McKenney, Michael Kerrisk

[-- Attachment #1: Type: text/plain, Size: 2042 bytes --]

On Wed, Dec 26, 2012 at 05:31:12PM +0100, Oleg Nesterov wrote:
> On 12/26, Andrew Vagin wrote:
> >
> > On Tue, Dec 25, 2012 at 05:58:03PM +0100, Oleg Nesterov wrote:
> > > On 12/25, Pavel Emelyanov wrote:
> > > >
> > > > On 12/25/2012 07:27 PM, Oleg Nesterov wrote:
> > > > >
> > > > > I guess that probably you actually need DUMP, not DEQUEUE. but the
> > > > > latter is not trivial. However, perhaps we can do this assuming that
> > > > > all other threads are sleeping and nobody can do dequeue_signal().
> > > > > Say, we can play with ppos/llseek. If *ppos is not zero,
> > > > > signalfd_dequeue() could dump the nth entry from list or return 0.
> > > >
> > > > This would be perfect, but isn't it better to preserve the pos
> > > > semantics -- we do know size of entry we're about to copy, we can
> > > > treat pos as offset in bytes, not in elements.
> > >
> > > nr-of-records looks better (more flexible) than nr-of-bytes to me. And
> > > perhaps we can also encode private-or-shared into ppos. But I will not
> > > argue in any case.
> >
> > Oleg and Pavel, could you look at these two patches. I implemented in them,
> > what you described here.
> 
> cosmetics nits below, feel free to ignore...
> 
> Damn. But after I wrote this email I realized that llseek() probably can't
> work. Because peek_offset/f_pos/whatever has to be shared with all processes
> which have this file opened.
> 
> Suppose that the task forks after sys_signalfd(). Now if parent or child
> do llseek this affects them both. This is insane because signalfd is
> "strange" to say at least, fork/dup/etc inherits signalfd_ctx but not the
> "source" of the data.

You are right.

> 
> So I think we should not use llseek. But, probably we can rely on pread() ?
> This way we can avoid the problem above, and this looks even simpler.

Yes. It is a good idea. A new patch is attached to this email. I
implemented pread for signalfd and fixed all your previous comments.

Could you look at this patch. If it's good for you, I will send a whole
serie.

Thanks.

[-- Attachment #2: 0001-signalfd-add-ability-to-get-signal-without-removing-.patch --]
[-- Type: text/plain, Size: 3474 bytes --]

>From 64f8fc4b209c73114622b03a43ea196b01d777f8 Mon Sep 17 00:00:00 2001
From: Andrey Vagin <avagin@openvz.org>
Date: Wed, 26 Dec 2012 13:45:54 +0400
Subject: [PATCH] signalfd: add ability to get signal without removing from
 the queue (v2)

pread(fd, buf, size, pos) with non-zero pos reads siginfo
and leaves signal in a queue.

pos = seq + SIGNALFD_*_OFFSET

seq is a sequence number of a signal in a queue.

SIGNALFD_PRIVATE_OFFSET - read from a private queue.
SIGNALFD_SHARED_OFFSET - read from a shared queue.

v2: llseek() can't be used here, because peek_offset/f_pos/whatever
has to be shared with all processes which have this file opened.

Suppose that the task forks after sys_signalfd(). Now if parent or child
do llseek this affects them both. This is insane because signalfd is
"strange" to say at least, fork/dup/etc inherits signalfd_ctx but not
the" source" of the data. // Oleg Nesterov

Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 fs/signalfd.c                 | 54 ++++++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/signalfd.h |  3 +++
 2 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/fs/signalfd.c b/fs/signalfd.c
index ee60d5f..9e4e2fc 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -50,6 +50,50 @@ struct signalfd_ctx {
 	sigset_t sigmask;
 };
 
+static ssize_t signalfd_peek(struct signalfd_ctx *ctx,
+				siginfo_t *info, unsigned long ppos)
+{
+	struct sigpending *pending;
+	struct sigqueue *q;
+	bool group = 0;
+	loff_t seq, i;
+	int ret = 0;
+
+	if (ppos < SIGNALFD_PRIVATE_OFFSET)
+		return -EINVAL;
+
+	if (ppos >= SIGNALFD_SHARED_OFFSET) {
+		seq = ppos - SIGNALFD_SHARED_OFFSET;
+		group = 1;
+	} else
+		seq = ppos - SIGNALFD_PRIVATE_OFFSET;
+
+	spin_lock_irq(&current->sighand->siglock);
+
+	if (group)
+		pending = &current->signal->shared_pending;
+	else
+		pending = &current->pending;
+
+	i = 0;
+	list_for_each_entry(q, &pending->list, list) {
+		if (sigismember(&ctx->sigmask, q->info.si_signo))
+			continue;
+
+		if (i == seq) {
+			copy_siginfo(info, &q->info);
+			ret = info->si_signo;
+			break;
+		}
+
+		i++;
+	}
+
+	spin_unlock_irq(&current->sighand->siglock);
+
+	return ret;
+}
+
 static int signalfd_release(struct inode *inode, struct file *file)
 {
 	kfree(file->private_data);
@@ -230,7 +274,11 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
 
 	siginfo = (struct signalfd_siginfo __user *) buf;
 	do {
-		ret = signalfd_dequeue(ctx, &info, nonblock);
+		if (*ppos == 0)
+			ret = signalfd_dequeue(ctx, &info, nonblock);
+		else
+			ret = signalfd_peek(ctx, &info, *ppos);
+
 		if (unlikely(ret <= 0))
 			break;
 
@@ -242,6 +290,9 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
 		if (ret < 0)
 			break;
 
+		if (*ppos)
+			(*ppos)++;
+
 		siginfo++;
 		total += ret;
 		nonblock = 1;
@@ -321,6 +372,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
 		}
 
 		file->f_flags |= flags & SFD_RAW;
+		file->f_mode |= FMODE_PREAD;
 
 		fd_install(ufd, file);
 	} else {
diff --git a/include/uapi/linux/signalfd.h b/include/uapi/linux/signalfd.h
index bc31849..c6eb535 100644
--- a/include/uapi/linux/signalfd.h
+++ b/include/uapi/linux/signalfd.h
@@ -17,6 +17,9 @@
 #define SFD_NONBLOCK O_NONBLOCK
 #define SFD_RAW O_DIRECT
 
+#define SIGNALFD_SHARED_OFFSET (1LL << 62)
+#define SIGNALFD_PRIVATE_OFFSET 1
+
 struct signalfd_siginfo {
 	__u32 ssi_signo;
 	__s32 ssi_errno;
-- 
1.7.11.7


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [CRIU] [PATCH 1/4] signalfd: add ability to return siginfo in a raw format
  2012-12-27 14:36                   ` Andrey Wagin
@ 2012-12-27 15:30                     ` Oleg Nesterov
  2012-12-27 18:40                       ` Andrey Wagin
  0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2012-12-27 15:30 UTC (permalink / raw)
  To: Andrey Wagin
  Cc: Andrew Vagin, Pavel Emelyanov, David Howells, linux-kernel, criu,
	Cyrill Gorcunov, Alexander Viro, linux-fsdevel, Thomas Gleixner,
	Paul E. McKenney, Michael Kerrisk

On 12/27, Andrey Wagin wrote:
>
> On Wed, Dec 26, 2012 at 05:31:12PM +0100, Oleg Nesterov wrote:
> >
> > So I think we should not use llseek. But, probably we can rely on pread() ?
> > This way we can avoid the problem above, and this looks even simpler.
>
> Yes. It is a good idea. A new patch is attached to this email. I
> implemented pread for signalfd and fixed all your previous comments.
>
> +static ssize_t signalfd_peek(struct signalfd_ctx *ctx,
> +				siginfo_t *info, unsigned long ppos)
> +{
> +	struct sigpending *pending;
> +	struct sigqueue *q;
> +	bool group = 0;
> +	loff_t seq, i;
> +	int ret = 0;
> +
> +	if (ppos < SIGNALFD_PRIVATE_OFFSET)
> +		return -EINVAL;
> +
> +	if (ppos >= SIGNALFD_SHARED_OFFSET) {
> +		seq = ppos - SIGNALFD_SHARED_OFFSET;
> +		group = 1;
> +	} else
> +		seq = ppos - SIGNALFD_PRIVATE_OFFSET;
> +
> +	spin_lock_irq(&current->sighand->siglock);
> +
> +	if (group)
> +		pending = &current->signal->shared_pending;
> +	else
> +		pending = &current->pending;

Oh, this looks overcomplicated. Why do you need "bool group" ? Just do

	if (ppos > SHARED) {
		seq = ppos - SHARED;
		pending = &current->signal->shared_pending;
	} else if (ppos > PRIVATE) {
		seq = ppos - PRIVATE;
		pending = &current->pending;
	} else {
		return -EINVAL;
	}

	spin_lock_irq(&current->sighand->siglock);
	...

note also I made the PRIVATE/SHARED checks more symmetric, but this is minor.

In fact I think "loff_t i" is not needed as well. You can check --seq == 0
instead in the loop below, but this is up to you.

> @@ -230,7 +274,11 @@ static ssize_t signalfd_read(struct file *file, char __user *buf, size_t count,
>
>  	siginfo = (struct signalfd_siginfo __user *) buf;
>  	do {
> -		ret = signalfd_dequeue(ctx, &info, nonblock);
> +		if (*ppos == 0)
> +			ret = signalfd_dequeue(ctx, &info, nonblock);
> +		else
> +			ret = signalfd_peek(ctx, &info, *ppos);

I think it would be better to pass ppos, not *ppos, because ...

> +		if (*ppos)
> +			(*ppos)++;

in this case we can update *ppos in signalfd_peek() and simplify the
code a bit.

Compared to the previous version it is "safe" to change *ppos even if
copy_to_user() fails, *ppos will be "lost" anyway after we return.

> @@ -321,6 +372,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
>  		}
>
>  		file->f_flags |= flags & SFD_RAW;
> +		file->f_mode |= FMODE_PREAD;
>
>  		fd_install(ufd, file);

Hmm. Looks like it is based on other patches I didnt see...

But I don't understand how FMODE_PREAD connects to this patch, we need
this flag set even for regular sys_read() ???

> +#define SIGNALFD_SHARED_OFFSET (1LL << 62)

OK... this assumes we are not going to add more SIGNAL_*_OFFSET's, and
probably this is true...

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [CRIU] [PATCH 1/4] signalfd: add ability to return siginfo in a raw format
  2012-12-27 15:30                     ` Oleg Nesterov
@ 2012-12-27 18:40                       ` Andrey Wagin
  2012-12-28 14:12                         ` Oleg Nesterov
  0 siblings, 1 reply; 28+ messages in thread
From: Andrey Wagin @ 2012-12-27 18:40 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Vagin, Pavel Emelyanov, David Howells, linux-kernel, criu,
	Cyrill Gorcunov, Alexander Viro, linux-fsdevel, Thomas Gleixner,
	Paul E. McKenney, Michael Kerrisk

2012/12/27 Oleg Nesterov <oleg@redhat.com>:
> On 12/27, Andrey Wagin wrote:
>>
>> On Wed, Dec 26, 2012 at 05:31:12PM +0100, Oleg Nesterov wrote:
>> >
>> > So I think we should not use llseek. But, probably we can rely on pread() ?
>> > This way we can avoid the problem above, and this looks even simpler.
>>
>> Yes. It is a good idea. A new patch is attached to this email. I
>> implemented pread for signalfd and fixed all your previous comments.
>>
...
> I think it would be better to pass ppos, not *ppos, because ...
>
>> +             if (*ppos)
>> +                     (*ppos)++;
>
> in this case we can update *ppos in signalfd_peek() and simplify the
> code a bit.
>
> Compared to the previous version it is "safe" to change *ppos even if
> copy_to_user() fails, *ppos will be "lost" anyway after we return.

Yes. But ppos is updated, because pread() reads all siginfo-s from a queue,
which fit in a buffer starting with the initial ppos.

>
>> @@ -321,6 +372,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
>>               }
>>
>>               file->f_flags |= flags & SFD_RAW;
>> +             file->f_mode |= FMODE_PREAD;
>>
>>               fd_install(ufd, file);
>
> Hmm. Looks like it is based on other patches I didnt see...
>
> But I don't understand how FMODE_PREAD connects to this patch, we need
> this flag set even for regular sys_read() ???

It doesn't need for sys_read(), but this patch is about pread() and
sys_pread() checks it.

SYSCALL_DEFINE(pread64)(unsigned int fd, char __user *buf,
                        size_t count, loff_t pos)
{
....
        if (f.file) {
                ret = -ESPIPE;
                if (f.file->f_mode & FMODE_PREAD)
                        ret = vfs_read(f.file, buf, count, &pos);


>
>> +#define SIGNALFD_SHARED_OFFSET (1LL << 62)
>
> OK... this assumes we are not going to add more SIGNAL_*_OFFSET's, and
> probably this is true...
>
> Oleg.
>
Thanks for review.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [CRIU] [PATCH 1/4] signalfd: add ability to return siginfo in a raw format
  2012-12-27 18:40                       ` Andrey Wagin
@ 2012-12-28 14:12                         ` Oleg Nesterov
  2012-12-28 14:28                           ` Andrey Wagin
  0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2012-12-28 14:12 UTC (permalink / raw)
  To: Andrey Wagin
  Cc: Andrew Vagin, Pavel Emelyanov, David Howells, linux-kernel, criu,
	Cyrill Gorcunov, Alexander Viro, linux-fsdevel, Thomas Gleixner,
	Paul E. McKenney, Michael Kerrisk

On 12/27, Andrey Wagin wrote:
>
> 2012/12/27 Oleg Nesterov <oleg@redhat.com>:
> > On 12/27, Andrey Wagin wrote:
> >>
> >> On Wed, Dec 26, 2012 at 05:31:12PM +0100, Oleg Nesterov wrote:
> >> >
> >> > So I think we should not use llseek. But, probably we can rely on pread() ?
> >> > This way we can avoid the problem above, and this looks even simpler.
> >>
> >> Yes. It is a good idea. A new patch is attached to this email. I
> >> implemented pread for signalfd and fixed all your previous comments.
> >>
> ...
> > I think it would be better to pass ppos, not *ppos, because ...
> >
> >> +             if (*ppos)
> >> +                     (*ppos)++;
> >
> > in this case we can update *ppos in signalfd_peek() and simplify the
> > code a bit.
> >
> > Compared to the previous version it is "safe" to change *ppos even if
> > copy_to_user() fails, *ppos will be "lost" anyway after we return.
>
> Yes. But ppos is updated, because pread() reads all siginfo-s from a queue,
> which fit in a buffer starting with the initial ppos.

Can't understand. And I guess you misunderstood.

I meant, we can update *ppos in signalfd_peek() _and_ we can do this
unconditionally even if copy_to_user() failed.

> >> @@ -321,6 +372,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
> >>               }
> >>
> >>               file->f_flags |= flags & SFD_RAW;
> >> +             file->f_mode |= FMODE_PREAD;
> >>
> >>               fd_install(ufd, file);
> >
> > Hmm. Looks like it is based on other patches I didnt see...
> >
> > But I don't understand how FMODE_PREAD connects to this patch, we need
> > this flag set even for regular sys_read() ???
>
> It doesn't need for sys_read(), but this patch is about pread() and
> sys_pread() checks it.
>
> SYSCALL_DEFINE(pread64)(unsigned int fd, char __user *buf,
>                         size_t count, loff_t pos)
> {
> ....
>         if (f.file) {
>                 ret = -ESPIPE;
>                 if (f.file->f_mode & FMODE_PREAD)
>                         ret = vfs_read(f.file, buf, count, &pos);

And sys_read() checks it too, that was my point.

So either this code was already broken before this patch, or FMODE_PREAD
is already set and you do not need this chunk.

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [CRIU] [PATCH 1/4] signalfd: add ability to return siginfo in a raw format
  2012-12-28 14:12                         ` Oleg Nesterov
@ 2012-12-28 14:28                           ` Andrey Wagin
  2012-12-28 14:46                             ` Oleg Nesterov
  0 siblings, 1 reply; 28+ messages in thread
From: Andrey Wagin @ 2012-12-28 14:28 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Vagin, Pavel Emelyanov, David Howells, linux-kernel, criu,
	Cyrill Gorcunov, Alexander Viro, linux-fsdevel, Thomas Gleixner,
	Paul E. McKenney, Michael Kerrisk

2012/12/28 Oleg Nesterov <oleg@redhat.com>:
> On 12/27, Andrey Wagin wrote:
>>
>> 2012/12/27 Oleg Nesterov <oleg@redhat.com>:
>> > On 12/27, Andrey Wagin wrote:
>> >>
>> >> On Wed, Dec 26, 2012 at 05:31:12PM +0100, Oleg Nesterov wrote:
>> >> >
>> >> > So I think we should not use llseek. But, probably we can rely on pread() ?
>> >> > This way we can avoid the problem above, and this looks even simpler.
>> >>
>> >> Yes. It is a good idea. A new patch is attached to this email. I
>> >> implemented pread for signalfd and fixed all your previous comments.
>> >>
>> ...
>> > I think it would be better to pass ppos, not *ppos, because ...
>> >
>> >> +             if (*ppos)
>> >> +                     (*ppos)++;
>> >
>> > in this case we can update *ppos in signalfd_peek() and simplify the
>> > code a bit.
>> >
>> > Compared to the previous version it is "safe" to change *ppos even if
>> > copy_to_user() fails, *ppos will be "lost" anyway after we return.
>>
>> Yes. But ppos is updated, because pread() reads all siginfo-s from a queue,
>> which fit in a buffer starting with the initial ppos.
>
> Can't understand. And I guess you misunderstood.
>
> I meant, we can update *ppos in signalfd_peek() _and_ we can do this
> unconditionally even if copy_to_user() failed.

I see. Any way I fixed that in a new version of this patch. Thanks.

The new patch is a part of
[PATCH 0/3] signalfd: a kernel interface for dumping/restoring pending
signals (v2)

>
>> >> @@ -321,6 +372,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
>> >>               }
>> >>
>> >>               file->f_flags |= flags & SFD_RAW;
>> >> +             file->f_mode |= FMODE_PREAD;
>> >>
>> >>               fd_install(ufd, file);
>> >
>> > Hmm. Looks like it is based on other patches I didnt see...
>> >
>> > But I don't understand how FMODE_PREAD connects to this patch, we need
>> > this flag set even for regular sys_read() ???
>>
>> It doesn't need for sys_read(), but this patch is about pread() and
>> sys_pread() checks it.
>>
>> SYSCALL_DEFINE(pread64)(unsigned int fd, char __user *buf,
>>                         size_t count, loff_t pos)
>> {
>> ....
>>         if (f.file) {
>>                 ret = -ESPIPE;
>>                 if (f.file->f_mode & FMODE_PREAD)
>>                         ret = vfs_read(f.file, buf, count, &pos);
>
> And sys_read() checks it too, that was my point.

sys_read() doesn't check this flag. I tryed to remove this code from
this patch and pread returned ESPIPE as expected.

>
> So either this code was already broken before this patch, or FMODE_PREAD
> is already set and you do not need this chunk.

>
> Oleg.
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [CRIU] [PATCH 1/4] signalfd: add ability to return siginfo in a raw format
  2012-12-28 14:28                           ` Andrey Wagin
@ 2012-12-28 14:46                             ` Oleg Nesterov
  2012-12-28 14:48                               ` Andrey Wagin
  0 siblings, 1 reply; 28+ messages in thread
From: Oleg Nesterov @ 2012-12-28 14:46 UTC (permalink / raw)
  To: Andrey Wagin
  Cc: Andrew Vagin, Pavel Emelyanov, David Howells, linux-kernel, criu,
	Cyrill Gorcunov, Alexander Viro, linux-fsdevel, Thomas Gleixner,
	Paul E. McKenney, Michael Kerrisk

On 12/28, Andrey Wagin wrote:
>
> 2012/12/28 Oleg Nesterov <oleg@redhat.com>:
> >> >> @@ -321,6 +372,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
> >> >>               }
> >> >>
> >> >>               file->f_flags |= flags & SFD_RAW;
> >> >> +             file->f_mode |= FMODE_PREAD;
> >> >>
> >> >>               fd_install(ufd, file);
> >> >
> >> > Hmm. Looks like it is based on other patches I didnt see...
> >> >
> >> > But I don't understand how FMODE_PREAD connects to this patch, we need
> >> > this flag set even for regular sys_read() ???
> >>
> >> It doesn't need for sys_read(), but this patch is about pread() and
> >> sys_pread() checks it.
> >>
> >> SYSCALL_DEFINE(pread64)(unsigned int fd, char __user *buf,
> >>                         size_t count, loff_t pos)
> >> {
> >> ....
> >>         if (f.file) {
> >>                 ret = -ESPIPE;
> >>                 if (f.file->f_mode & FMODE_PREAD)
> >>                         ret = vfs_read(f.file, buf, count, &pos);
> >
> > And sys_read() checks it too, that was my point.
>
> sys_read() doesn't check this flag. I tryed to remove this code from
> this patch and pread returned ESPIPE as expected.

sys_read() calls vfs_read() which checks f_mode & FMODE_READ at the
start ?

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [CRIU] [PATCH 1/4] signalfd: add ability to return siginfo in a raw format
  2012-12-28 14:46                             ` Oleg Nesterov
@ 2012-12-28 14:48                               ` Andrey Wagin
  2012-12-28 14:56                                 ` Oleg Nesterov
  0 siblings, 1 reply; 28+ messages in thread
From: Andrey Wagin @ 2012-12-28 14:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Vagin, Pavel Emelyanov, David Howells, linux-kernel, criu,
	Cyrill Gorcunov, Alexander Viro, linux-fsdevel, Thomas Gleixner,
	Paul E. McKenney, Michael Kerrisk

2012/12/28 Oleg Nesterov <oleg@redhat.com>:
> On 12/28, Andrey Wagin wrote:
>>
>> 2012/12/28 Oleg Nesterov <oleg@redhat.com>:
>> >> >> @@ -321,6 +372,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
>> >> >>               }
>> >> >>
>> >> >>               file->f_flags |= flags & SFD_RAW;
>> >> >> +             file->f_mode |= FMODE_PREAD;
>> >> >>
>> >> >>               fd_install(ufd, file);
>> >> >
>> >> > Hmm. Looks like it is based on other patches I didnt see...
>> >> >
>> >> > But I don't understand how FMODE_PREAD connects to this patch, we need
>> >> > this flag set even for regular sys_read() ???
>> >>
>> >> It doesn't need for sys_read(), but this patch is about pread() and
>> >> sys_pread() checks it.
>> >>
>> >> SYSCALL_DEFINE(pread64)(unsigned int fd, char __user *buf,
>> >>                         size_t count, loff_t pos)
>> >> {
>> >> ....
>> >>         if (f.file) {
>> >>                 ret = -ESPIPE;
>> >>                 if (f.file->f_mode & FMODE_PREAD)
>> >>                         ret = vfs_read(f.file, buf, count, &pos);
>> >
>> > And sys_read() checks it too, that was my point.
>>
>> sys_read() doesn't check this flag. I tryed to remove this code from
>> this patch and pread returned ESPIPE as expected.
>
> sys_read() calls vfs_read() which checks f_mode & FMODE_READ at the
> start ?

FMODE_READ != FMODE_PREAD ;)

>
> Oleg.
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [CRIU] [PATCH 1/4] signalfd: add ability to return siginfo in a raw format
  2012-12-28 14:48                               ` Andrey Wagin
@ 2012-12-28 14:56                                 ` Oleg Nesterov
  0 siblings, 0 replies; 28+ messages in thread
From: Oleg Nesterov @ 2012-12-28 14:56 UTC (permalink / raw)
  To: Andrey Wagin
  Cc: Andrew Vagin, Pavel Emelyanov, David Howells, linux-kernel, criu,
	Cyrill Gorcunov, Alexander Viro, linux-fsdevel, Thomas Gleixner,
	Paul E. McKenney, Michael Kerrisk

On 12/28, Andrey Wagin wrote:
>
> 2012/12/28 Oleg Nesterov <oleg@redhat.com>:
> > On 12/28, Andrey Wagin wrote:
> >>
> >> 2012/12/28 Oleg Nesterov <oleg@redhat.com>:
> >> >> >> @@ -321,6 +372,7 @@ SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,
> >> >> >>               }
> >> >> >>
> >> >> >>               file->f_flags |= flags & SFD_RAW;
> >> >> >> +             file->f_mode |= FMODE_PREAD;
> >> >> >>
> >> >> >>               fd_install(ufd, file);
> >> >> >
> >> >> > Hmm. Looks like it is based on other patches I didnt see...
> >> >> >
> >> >> > But I don't understand how FMODE_PREAD connects to this patch, we need
> >> >> > this flag set even for regular sys_read() ???
> >> >>
> >> >> It doesn't need for sys_read(), but this patch is about pread() and
> >> >> sys_pread() checks it.
> >> >>
> >> >> SYSCALL_DEFINE(pread64)(unsigned int fd, char __user *buf,
> >> >>                         size_t count, loff_t pos)
> >> >> {
> >> >> ....
> >> >>         if (f.file) {
> >> >>                 ret = -ESPIPE;
> >> >>                 if (f.file->f_mode & FMODE_PREAD)
> >> >>                         ret = vfs_read(f.file, buf, count, &pos);
> >> >
> >> > And sys_read() checks it too, that was my point.
> >>
> >> sys_read() doesn't check this flag. I tryed to remove this code from
> >> this patch and pread returned ESPIPE as expected.
> >
> > sys_read() calls vfs_read() which checks f_mode & FMODE_READ at the
> > start ?
>
> FMODE_READ != FMODE_PREAD ;)

OOPS ;)

I did not even know we have FMODE_PREAD! I perceived this define as
FMODE_READ.

Thanks for correctting me.

Oleg.


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2012-12-28 14:56 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-24  8:13 [PATCH 0/4] signalfd: a kernel interface for dumping/restoring pending signals Andrey Vagin
2012-12-24  8:13 ` Andrey Vagin
2012-12-24  8:13 ` [PATCH 1/4] signalfd: add ability to return siginfo in a raw format Andrey Vagin
2012-12-24 16:53   ` Oleg Nesterov
2012-12-25  8:29     ` Andrey Wagin
2012-12-25 14:30       ` Oleg Nesterov
2012-12-25 15:27         ` Oleg Nesterov
2012-12-25 15:40           ` Pavel Emelyanov
2012-12-25 16:58             ` Oleg Nesterov
2012-12-26 14:47               ` [CRIU] " Andrew Vagin
2012-12-26 16:31                 ` Oleg Nesterov
2012-12-27 14:36                   ` Andrey Wagin
2012-12-27 15:30                     ` Oleg Nesterov
2012-12-27 18:40                       ` Andrey Wagin
2012-12-28 14:12                         ` Oleg Nesterov
2012-12-28 14:28                           ` Andrey Wagin
2012-12-28 14:46                             ` Oleg Nesterov
2012-12-28 14:48                               ` Andrey Wagin
2012-12-28 14:56                                 ` Oleg Nesterov
2012-12-24  8:13 ` [PATCH 2/4] signal: add a helper for dequeuing signals from a specified queue Andrey Vagin
2012-12-24 20:52   ` Michael Kerrisk
2012-12-24 20:52     ` Michael Kerrisk
2012-12-24  8:13 ` [PATCH 3/4] signalfd: add ability to choose a private or shared queue Andrey Vagin
2012-12-24 17:05   ` Oleg Nesterov
2012-12-24 20:53   ` Michael Kerrisk
2012-12-24  8:13 ` [PATCH 4/4] signal: allow to send any siginfo to itself Andrey Vagin
2012-12-24 20:51 ` [PATCH 0/4] signalfd: a kernel interface for dumping/restoring pending signals Michael Kerrisk
2012-12-24 20:51   ` Michael Kerrisk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.