All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -mm 0/7][AIO] - AIO completion signal notification v6
@ 2007-02-01  9:22 Sébastien Dugué
  2007-02-01  9:26 ` [PATCH -mm 1/7][AIO] - Rework compat_sys_io_submit Sébastien Dugué
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Sébastien Dugué @ 2007-02-01  9:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, linux-aio, Bharata B Rao, Christoph Hellwig,
	Suparna Bhattacharya, Ulrich Drepper, Zach Brown, Oleg Nesterov,
	Badari Pulavarty, Benjamin LaHaise, Jean Pierre Dion


  Hi Andrew,

  here is the latest rework of the aio notification and listio support
patches with comments from Oleg Nesterov and you addressed.

  Sébastien.

  This set now consists in 7 patches:

	1. rework-compat-sys-io-submit: cleanup the sys_io_submit() compat layer,
	   making it more efficient and laying out the base for the following patches

	2. aio-header-fix-includes: fixes the double inclusion of uio.h in aio.h

	3. aio-fix-access_ok-check: fixes the access_ok() checks

	4. make-good_sigevent-non-static: move good_sigevent into signal.c, rename
	   it to something more meaningfull and make it non-static

	5. make-__sigqueue_alloc-free-non-static: make __sigqueue_alloc() and
	   __sigqueue_free() non static

	6. aio-notify-sig: the AIO completion signal notification

	7. add-listio-syscall-support: adds listio support via a new syscall

  Description are in the individual patches.


  Changes from v5: based on comments from Oleg Nesterov and Andrew Morton

	- Fix for signo range not checked in the SIGEV_THREAD_ID case in
	  good_sigevent()
	- renamed good_sigevent() to sigevent_find_task()

	- Fixed the access_ok checks in sys_io_submit() and
	  compat_sys_io_submit()

	- Made __sigqueue_alloc() and __sigqueue_free() non static
	- Reworked the aio code to directly use __sigqueue_alloc() and
	  __sigqueue_free() and avoid messing with SIGQUEUE_PREALLOC

	- Removed the unneeded PF_EXITING check in aio_setup_sigevent()
	- Added a comment in aio_setup_sigevent() for why there is no task
	  ref leak
	- Changed read_lock(&tasklist_lock) to rcu_read_lock() in
	  aio_setup_sigevent()

	- Fixed an ioctx reference leak in compat_sys_lio_submit()
	- Renamed lio_check() as lio_notify()
	- Added comments to lio_notify() and lio_create()
	- Replaced sigqueue_free() by __sigqueue_free() in lio_notify()

  Changes from v4:
	No comments received for v4, so it must be perfect ;-)
	replaced the IOCB_CMD_GROUP listio implementation with Bharata's
	syscall approach which I find much cleaner.

  Changes from v3:

	- added justification for the compat_sys_io_submit() cleanup - Zach Brown
	- more cleanups in compat_sys_io_submit() to put it in line with
	  sys_io_submit() - Zach Brown

	- Changed "Export good_sigevent()" patch name to "Make good_sigevent()
	  non-static" to better describe what it does. - Christoph Hellwig
	- Reworked good_sigevent() to make it more readable. - Christoph Hellwig

	- Simplified the use of the SIGEV_* constants in signal notification -
	  Christoph Hellwig
	- Take a reference on the target task both for the SIGEV_THREAD_ID and
	  SIGEV_SIGNAL cases.

  Changes from v2:
	- rebased to 2.6.19-rc6-mm2
	- reworked the sys_io_submit() compat layer as suggested by Zach Brown
	- fixed saving of a pointer to a task struct in aio-notify-sig as
	  pointed out by Andrew Morton

  Changes from v1:
	- cleanups suggested by Christoph Hellwig, Badari Pulavarty and
	Zach Brown
	- added lisio patch



^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH -mm 1/7][AIO] - Rework compat_sys_io_submit
  2007-02-01  9:22 [PATCH -mm 0/7][AIO] - AIO completion signal notification v6 Sébastien Dugué
@ 2007-02-01  9:26 ` Sébastien Dugué
  2007-02-01  9:29 ` [PATCH -mm 2/7][AIO] - fix aio.h includes Sébastien Dugué
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Sébastien Dugué @ 2007-02-01  9:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, linux-aio, Bharata B Rao, Christoph Hellwig,
	Suparna Bhattacharya, Ulrich Drepper, Zach Brown, Oleg Nesterov,
	Badari Pulavarty, Benjamin LaHaise, Jean Pierre Dion


From: Sébastien Dugué <sebastien.dugue@bull.net>

                 compat_sys_io_submit() cleanup


  Cleanup compat_sys_io_submit by duplicating some of the native syscall
logic in the compat layer and directly calling io_submit_one() instead
of fooling the syscall into thinking it is called from a native 64-bit
caller.

  This eliminates:

  - the overhead of copying the nr iocb pointers on the userspace stack

  - the PAGE_SIZE/(sizeof(void *)) limit on the number of iocbs that
    can be submitted.

  This is also needed for the completion notification patch to avoid having
to rewrite each iocb on the caller stack for io_submit_one() to find the
sigevents.


 compat.c |   61 ++++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 34 insertions(+), 27 deletions(-)

Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net>

Index: linux-2.6.20-rc6-mm3/fs/compat.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/compat.c	2007-01-30 10:05:08.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/compat.c	2007-01-30 11:30:29.000000000 +0100
@@ -644,40 +644,47 @@ out:
 	return ret;
 }
 
-static inline long
-copy_iocb(long nr, u32 __user *ptr32, struct iocb __user * __user *ptr64)
-{
-	compat_uptr_t uptr;
-	int i;
-
-	for (i = 0; i < nr; ++i) {
-		if (get_user(uptr, ptr32 + i))
-			return -EFAULT;
-		if (put_user(compat_ptr(uptr), ptr64 + i))
-			return -EFAULT;
-	}
-	return 0;
-}
-
-#define MAX_AIO_SUBMITS 	(PAGE_SIZE/sizeof(struct iocb *))
-
 asmlinkage long
 compat_sys_io_submit(aio_context_t ctx_id, int nr, u32 __user *iocb)
 {
-	struct iocb __user * __user *iocb64; 
-	long ret;
+	struct kioctx *ctx;
+	long ret = 0;
+	int i;
 
 	if (unlikely(nr < 0))
 		return -EINVAL;
 
-	if (nr > MAX_AIO_SUBMITS)
-		nr = MAX_AIO_SUBMITS;
-	
-	iocb64 = compat_alloc_user_space(nr * sizeof(*iocb64));
-	ret = copy_iocb(nr, iocb, iocb64);
-	if (!ret)
-		ret = sys_io_submit(ctx_id, nr, iocb64);
-	return ret;
+	if (unlikely(!access_ok(VERIFY_READ, iocb, (nr * sizeof(u32)))))
+		return -EFAULT;
+
+	ctx = lookup_ioctx(ctx_id);
+	if (unlikely(!ctx))
+		return -EINVAL;
+
+	for (i=0; i<nr; i++) {
+		compat_uptr_t uptr;
+		struct iocb __user *user_iocb;
+		struct iocb tmp;
+
+		if (unlikely(get_user(uptr, iocb + i))) {
+			ret = -EFAULT;
+			break;
+		}
+
+		user_iocb = compat_ptr(uptr);
+
+		if (unlikely(copy_from_user(&tmp, user_iocb, sizeof(tmp)))) {
+			ret = -EFAULT;
+			break;
+		}
+
+		ret = io_submit_one(ctx, user_iocb, &tmp);
+		if (ret)
+			break;
+	}
+
+	put_ioctx(ctx);
+	return i ? i: ret;
 }
 
 struct compat_ncp_mount_data {

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH -mm 2/7][AIO] - fix aio.h includes
  2007-02-01  9:22 [PATCH -mm 0/7][AIO] - AIO completion signal notification v6 Sébastien Dugué
  2007-02-01  9:26 ` [PATCH -mm 1/7][AIO] - Rework compat_sys_io_submit Sébastien Dugué
@ 2007-02-01  9:29 ` Sébastien Dugué
  2007-02-01  9:30 ` [PATCH -mm 3/7][AIO] - Fix access_ok() checks Sébastien Dugué
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Sébastien Dugué @ 2007-02-01  9:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, linux-aio, Bharata B Rao, Christoph Hellwig,
	Suparna Bhattacharya, Ulrich Drepper, Zach Brown, Oleg Nesterov,
	Badari Pulavarty, Benjamin LaHaise, Jean Pierre Dion


From: Sébastien Dugué <sebastien.dugue@bull.net>

  Fix the double inclusion of linux/uio.h in linux/aio.h



 aio.h |    1 -
 1 file changed, 1 deletion(-)

Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net>

Index: linux-2.6.20-rc6-mm3/include/linux/aio.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/linux/aio.h	2007-01-30 10:05:08.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/aio.h	2007-01-30 10:11:22.000000000 +0100
@@ -7,7 +7,6 @@
 #include <linux/uio.h>
 
 #include <asm/atomic.h>
-#include <linux/uio.h>
 
 #define AIO_MAXSEGS		4
 #define AIO_KIOGRP_NR_ATOMIC	8

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH -mm 3/7][AIO] - Fix access_ok() checks
  2007-02-01  9:22 [PATCH -mm 0/7][AIO] - AIO completion signal notification v6 Sébastien Dugué
  2007-02-01  9:26 ` [PATCH -mm 1/7][AIO] - Rework compat_sys_io_submit Sébastien Dugué
  2007-02-01  9:29 ` [PATCH -mm 2/7][AIO] - fix aio.h includes Sébastien Dugué
@ 2007-02-01  9:30 ` Sébastien Dugué
  2007-02-01  9:30 ` [PATCH -mm 4/7][AIO] - Make good_sigevent() non-static Sébastien Dugué
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Sébastien Dugué @ 2007-02-01  9:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, linux-aio, Bharata B Rao, Christoph Hellwig,
	Suparna Bhattacharya, Ulrich Drepper, Zach Brown, Oleg Nesterov,
	Badari Pulavarty, Benjamin LaHaise, Jean Pierre Dion


From: Sébastien Dugué <sebastien.dugue@bull.net>

                   Fix access_ok() checks in sys_io_submit() and 
                            compat_sys_io_submit()


  sys_io_submit() and compat_sys_io_submit() check for the number of
requests (nr) being positive, but the following access_ok() call uses
nr * sizeof(ptr) which can overlow to a negative number.

  Just make sure that nr * sizeof(ptr) is positive in the first place.


 aio.c    |    2 +-
 compat.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net>

Index: linux-2.6.20-rc6-mm3/fs/aio.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c	2007-01-30 11:28:56.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c	2007-01-30 11:32:42.000000000 +0100
@@ -1630,7 +1630,7 @@ asmlinkage long sys_io_submit(aio_contex
 	long ret = 0;
 	int i;
 
-	if (unlikely(nr < 0))
+	if (unlikely((nr*sizeof(*iocbpp)) < 0))
 		return -EINVAL;
 
 	if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(*iocbpp)))))
Index: linux-2.6.20-rc6-mm3/fs/compat.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/compat.c	2007-01-30 11:30:29.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/compat.c	2007-01-30 11:31:55.000000000 +0100
@@ -651,7 +651,7 @@ compat_sys_io_submit(aio_context_t ctx_i
 	long ret = 0;
 	int i;
 
-	if (unlikely(nr < 0))
+	if (unlikely((nr * sizeof(u32)) < 0))
 		return -EINVAL;
 
 	if (unlikely(!access_ok(VERIFY_READ, iocb, (nr * sizeof(u32)))))

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH -mm 4/7][AIO] - Make good_sigevent() non-static
  2007-02-01  9:22 [PATCH -mm 0/7][AIO] - AIO completion signal notification v6 Sébastien Dugué
                   ` (2 preceding siblings ...)
  2007-02-01  9:30 ` [PATCH -mm 3/7][AIO] - Fix access_ok() checks Sébastien Dugué
@ 2007-02-01  9:30 ` Sébastien Dugué
  2007-02-02 18:00   ` Oleg Nesterov
  2007-02-01  9:31 ` [PATCH -mm 5/7][AIO] - Make __sigqueue_free() and __sigqueue_alloc() non static Sébastien Dugué
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Sébastien Dugué @ 2007-02-01  9:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, linux-aio, Bharata B Rao, Christoph Hellwig,
	Suparna Bhattacharya, Ulrich Drepper, Zach Brown, Oleg Nesterov,
	Badari Pulavarty, Benjamin LaHaise, Jean Pierre Dion


From: Sébastien Dugué <sebastien.dugue@bull.net>

                Make good_sigevent() non-static and rename it


  Move good_sigevent() from posix-timers.c to signal.c where it belongs,
clean it up, rename it to sigevent_find_task() to better describe what the
function does and make it non-static so that it can be used by other
 subsystems.


 include/linux/signal.h |    1 +
 kernel/posix-timers.c  |   19 +------------------
 kernel/signal.c        |   24 ++++++++++++++++++++++++
 3 files changed, 26 insertions(+), 18 deletions(-)

Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net>


Index: linux-2.6.20-rc6-mm3/include/linux/signal.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/linux/signal.h	2007-01-30 11:41:27.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/signal.h	2007-01-30 11:41:36.000000000 +0100
@@ -240,6 +240,7 @@ extern int sigprocmask(int, sigset_t *, 
 
 struct pt_regs;
 extern int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka, struct pt_regs *regs, void *cookie);
+extern struct task_struct * sigevent_find_task(sigevent_t *);
 
 extern struct kmem_cache *sighand_cachep;
 
Index: linux-2.6.20-rc6-mm3/kernel/posix-timers.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/kernel/posix-timers.c	2007-01-30 11:41:27.000000000 +0100
+++ linux-2.6.20-rc6-mm3/kernel/posix-timers.c	2007-01-30 11:41:36.000000000 +0100
@@ -367,23 +367,6 @@ static enum hrtimer_restart posix_timer_
 	return ret;
 }
 
-static struct task_struct * good_sigevent(sigevent_t * event)
-{
-	struct task_struct *rtn = current->group_leader;
-
-	if ((event->sigev_notify & SIGEV_THREAD_ID ) &&
-		(!(rtn = find_task_by_pid(event->sigev_notify_thread_id)) ||
-		 rtn->tgid != current->tgid ||
-		 (event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_SIGNAL))
-		return NULL;
-
-	if (((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) &&
-	    ((event->sigev_signo <= 0) || (event->sigev_signo > SIGRTMAX)))
-		return NULL;
-
-	return rtn;
-}
-
 void register_posix_clock(const clockid_t clock_id, struct k_clock *new_clock)
 {
 	if ((unsigned) clock_id >= MAX_CLOCKS) {
@@ -496,7 +479,7 @@ sys_timer_create(const clockid_t which_c
 		new_timer->it_sigev_value = event.sigev_value;
 
 		read_lock(&tasklist_lock);
-		if ((process = good_sigevent(&event))) {
+		if ((process = sigevent_find_task(&event))) {
 			/*
 			 * We may be setting up this process for another
 			 * thread.  It may be exiting.  To catch this
Index: linux-2.6.20-rc6-mm3/kernel/signal.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/kernel/signal.c	2007-01-30 11:41:27.000000000 +0100
+++ linux-2.6.20-rc6-mm3/kernel/signal.c	2007-01-30 11:41:36.000000000 +0100
@@ -1213,6 +1213,30 @@ int group_send_sig_info(int sig, struct 
 	return ret;
 }
 
+/***
+ * sigevent_find_task - check and get target task from a sigevent.
+ * @event: the sigevent to be checked
+ *
+ * This function must be called with the tasklist_lock held for reading.
+ */
+struct task_struct * sigevent_find_task(sigevent_t * event)
+{
+	struct task_struct *task = NULL;
+
+	if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
+		return NULL;
+
+	if ((event->sigev_notify & SIGEV_THREAD_ID ) == SIGEV_THREAD_ID) {
+		task = find_task_by_pid(event->sigev_notify_thread_id);
+
+		if (!task || task->tgid != current->tgid)
+			task = NULL;
+	} else if (event->sigev_notify == SIGEV_SIGNAL)
+		task = current->group_leader;
+
+	return task;
+}
+
 /*
  * kill_pgrp_info() sends a signal to a process group: this is what the tty
  * control characters do (^C, ^Z etc)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH -mm 5/7][AIO] - Make __sigqueue_free() and __sigqueue_alloc() non static
  2007-02-01  9:22 [PATCH -mm 0/7][AIO] - AIO completion signal notification v6 Sébastien Dugué
                   ` (3 preceding siblings ...)
  2007-02-01  9:30 ` [PATCH -mm 4/7][AIO] - Make good_sigevent() non-static Sébastien Dugué
@ 2007-02-01  9:31 ` Sébastien Dugué
  2007-02-01  9:31 ` [PATCH -mm 6/7][AIO] - AIO completion signal notification Sébastien Dugué
  2007-02-01  9:32 ` [PATCH -mm 7/7][AIO] - Add listio syscall support Sébastien Dugué
  6 siblings, 0 replies; 19+ messages in thread
From: Sébastien Dugué @ 2007-02-01  9:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, linux-aio, Bharata B Rao, Christoph Hellwig,
	Suparna Bhattacharya, Ulrich Drepper, Zach Brown, Oleg Nesterov,
	Badari Pulavarty, Benjamin LaHaise, Jean Pierre Dion


From: Sébastien Dugué <sebastien.dugue@bull.net>

             Make __sigqueue_alloc() and __sigqueue_free() non static

  Allow subsystems to directly call into __sigqueue_alloc() and __sigqueue_free.
This is used by the AIO signal notification patch.


 include/linux/signal.h |    3 +++
 kernel/signal.c        |    6 +++---
 2 files changed, 6 insertions(+), 3 deletions(-)

Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net>


Index: linux-2.6.20-rc6-mm3/include/linux/signal.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/linux/signal.h	2007-01-30 11:41:36.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/signal.h	2007-01-30 11:41:39.000000000 +0100
@@ -241,6 +241,9 @@ extern int sigprocmask(int, sigset_t *, 
 struct pt_regs;
 extern int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka, struct pt_regs *regs, void *cookie);
 extern struct task_struct * sigevent_find_task(sigevent_t *);
+extern struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
+					 int override_rlimit);
+extern void __sigqueue_free(struct sigqueue *q);
 
 extern struct kmem_cache *sighand_cachep;
 
Index: linux-2.6.20-rc6-mm3/kernel/signal.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/kernel/signal.c	2007-01-30 11:41:36.000000000 +0100
+++ linux-2.6.20-rc6-mm3/kernel/signal.c	2007-01-30 11:41:39.000000000 +0100
@@ -268,8 +268,8 @@ next_signal(struct sigpending *pending, 
 	return sig;
 }
 
-static struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
-					 int override_rlimit)
+struct sigqueue *__sigqueue_alloc(struct task_struct *t, gfp_t flags,
+				  int override_rlimit)
 {
 	struct sigqueue *q = NULL;
 	struct user_struct *user;
@@ -295,7 +295,7 @@ static struct sigqueue *__sigqueue_alloc
 	return(q);
 }
 
-static void __sigqueue_free(struct sigqueue *q)
+void __sigqueue_free(struct sigqueue *q)
 {
 	if (q->flags & SIGQUEUE_PREALLOC)
 		return;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH -mm 6/7][AIO] - AIO completion signal notification
  2007-02-01  9:22 [PATCH -mm 0/7][AIO] - AIO completion signal notification v6 Sébastien Dugué
                   ` (4 preceding siblings ...)
  2007-02-01  9:31 ` [PATCH -mm 5/7][AIO] - Make __sigqueue_free() and __sigqueue_alloc() non static Sébastien Dugué
@ 2007-02-01  9:31 ` Sébastien Dugué
  2007-02-01  9:32 ` [PATCH -mm 7/7][AIO] - Add listio syscall support Sébastien Dugué
  6 siblings, 0 replies; 19+ messages in thread
From: Sébastien Dugué @ 2007-02-01  9:31 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, linux-aio, Bharata B Rao, Christoph Hellwig,
	Suparna Bhattacharya, Ulrich Drepper, Zach Brown, Oleg Nesterov,
	Badari Pulavarty, Benjamin LaHaise, Jean Pierre Dion


From: Sébastien Dugué <sebastien.dugue@bull.net>

                      AIO completion signal notification

  The current 2.6 kernel does not support notification of user space via
an RT signal upon an asynchronous IO completion. The POSIX specification
states that when an AIO request completes, a signal can be delivered to
the application as notification.

  This patch adds a struct sigevent *aio_sigeventp to the iocb.
The relevant fields (pid, signal number and value) are stored in the kiocb
for use when the request completes.

  That sigevent structure is filled by the application as part of the AIO
request preparation. Upon request completion, the kernel notifies the
application using those sigevent parameters. If SIGEV_NONE has been specified,
then the old behaviour is retained and the application must rely on polling
the completion queue using io_getevents().


  A struct sigevent *aio_sigeventp field is added to struct iocb in
include/linux/aio_abi.h

  A struct aio_notify containing the sigevent parameters is defined in aio.h:

  struct aio_notify {
	struct task_struct	*target;
	__u16			signo;
	__u16			notify;
	sigval_t		value;
  };

  A struct aio_notify ki_notify is added to struct kiocb in include/linux/aio.h

  In io_submit_one(), if the application provided a sigevent then
aio_setup_sigevent() is called which does the following:

	- check access to the user sigevent and make a local copy

	- if the requested notification is SIGEV_NONE, then nothing to do

	- fill in the kiocb->ki_notify fields (notify, signo, value)

	- check sigevent consistency, get the signal target task and
	  save it in kiocb->ki_notify.target

	- preallocate a sigqueue for this event using __sigqueue_alloc()

  Upon request completion, in aio_complete(), if notification is needed for
this request (iocb->ki_notify.notify != SIGEV_NONE), then aio_send_signal()
is called to signal the target task as follows:

	- fill in the siginfo struct to be sent to the task

	- if notify is SIGEV_THREAD_ID then send signal to specific task
	  using send_sigqueue()

	- else send signal to task group using send_5group_sigqueue()



Notes concerning sigqueue preallocation:

 To ensure reliable delivery of completion notification, the sigqueue is
preallocated in the submission path so that there is no chance it can fail
in the completion path.

 Unlike the posix-timers case (currently the single other user of sigqueue
preallocation), where the sigqueue is allocated for the lifetime of the timer
and freed at timer destruction time, the aio case is a bit more tricky due to
the async nature of the whole thing.

  In the aio case, the sigqueue exists for the lifetime of the request, therefore
it must be freed only once the signal for the request completion has been
delivered.  The sigqueue is therefore freed by the signal delivery mechanism
when the signal is collected. The aio submission and completion error paths
do an explicit __sigqueue_free() to free the sigqueue.


 fs/aio.c                |  114 ++++++++++++++++++++++++++++++++++++++++++++++--
 fs/compat.c             |   18 +++++++
 include/linux/aio.h     |   12 +++++
 include/linux/aio_abi.h |    3 -
 kernel/signal.c         |    4 -
 5 files changed, 144 insertions(+), 7 deletions(-)

Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>

Index: linux-2.6.20-rc6-mm3/fs/aio.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c	2007-01-30 11:32:42.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c	2007-01-30 11:41:41.000000000 +0100
@@ -419,6 +419,7 @@ static struct kiocb fastcall *__aio_get_
 	req->ki_dtor = NULL;
 	req->private = NULL;
 	req->ki_iovec = NULL;
+	req->ki_notify.sigq = NULL;
 	INIT_LIST_HEAD(&req->ki_run_list);
 
 	/* Check if the completion queue has enough free space to
@@ -465,6 +466,12 @@ static inline void really_put_req(struct
 		req->ki_dtor(req);
 	if (req->ki_iovec != &req->ki_inline_vec)
 		kfree(req->ki_iovec);
+
+	/* Release task ref */
+	if (req->ki_notify.notify == SIGEV_THREAD_ID ||
+	    req->ki_notify.notify == SIGEV_SIGNAL)
+		put_task_struct(req->ki_notify.target);
+
 	kmem_cache_free(kiocb_cachep, req);
 	ctx->reqs_active--;
 
@@ -916,6 +923,78 @@ void fastcall kick_iocb(struct kiocb *io
 }
 EXPORT_SYMBOL(kick_iocb);
 
+static int aio_send_signal(struct aio_notify *notify)
+{
+	struct sigqueue *sigq = notify->sigq;
+	struct siginfo *info = &sigq->info;
+	int ret;
+
+	memset(info, 0, sizeof(struct siginfo));
+
+	info->si_signo = notify->signo;
+	info->si_errno = 0;
+	info->si_code = SI_ASYNCIO;
+	info->si_pid = 0;
+	info->si_uid = 0;
+	info->si_value = notify->value;
+
+	if (notify->notify & SIGEV_THREAD_ID)
+		ret = send_sigqueue(notify->signo, sigq, notify->target);
+	else
+		ret = send_group_sigqueue(notify->signo, sigq, notify->target);
+
+	return ret;
+}
+
+static long aio_setup_sigevent(struct aio_notify *notify,
+			       struct sigevent __user *user_event)
+{
+	sigevent_t event;
+	struct task_struct *target;
+
+	if (copy_from_user(&event, user_event, sizeof (event)))
+		return -EFAULT;
+
+	if (event.sigev_notify == SIGEV_NONE)
+		return 0;
+
+	notify->notify = event.sigev_notify;
+	notify->signo = event.sigev_signo;
+	notify->value = event.sigev_value;
+
+	rcu_read_lock();
+	target = sigevent_find_task(&event);
+
+	if (unlikely(!target))
+		goto out_unlock;
+
+	/*
+	 * At this point, we know that notify is either SIGEV_SIGNAL or
+	 * SIGEV_THREAD_ID and the target task is valid. So get a reference
+	 * on the task, it will be dropped in really_put_req() when
+	 * we're done with the request.
+	 */
+	get_task_struct(target);
+	notify->target = target;
+	rcu_read_unlock();
+
+	notify->sigq = __sigqueue_alloc(current, GFP_KERNEL, 0);
+
+	/*
+	 * The task ref will be released in really_put_req()
+	 * when we dispose of the iocb later on in the submission
+	 * path.
+	 */
+	if (unlikely(!notify->sigq))
+		return -EAGAIN;
+
+	return 0;
+
+out_unlock:
+	read_unlock(&tasklist_lock);
+	return -EINVAL;
+}
+
 /* aio_complete
  *	Called when the io request on the given iocb is complete.
  *	Returns true if this is the last user of the request.  The 
@@ -963,8 +1042,11 @@ int fastcall aio_complete(struct kiocb *
 	 * cancelled requests don't get events, userland was given one
 	 * when the event got cancelled.
 	 */
-	if (kiocbIsCancelled(iocb))
+	if (kiocbIsCancelled(iocb)) {
+		if (iocb->ki_notify.sigq)
+			__sigqueue_free(iocb->ki_notify.sigq);
 		goto put_rq;
+	}
 
 	ring = kmap_atomic(info->ring_pages[0], KM_IRQ1);
 
@@ -994,6 +1076,15 @@ int fastcall aio_complete(struct kiocb *
 	kunmap_atomic(ring, KM_IRQ1);
 
 	pr_debug("added to ring %p at [%lu]\n", iocb, tail);
+
+	if (iocb->ki_notify.notify != SIGEV_NONE) {
+		ret = aio_send_signal(&iocb->ki_notify);
+
+		/* If signal generation failed, release the sigqueue */
+		if (ret)
+			__sigqueue_free(iocb->ki_notify.sigq);
+	}
+
 put_rq:
 	/* everything turned out well, dispose of the aiocb. */
 	ret = __aio_put_req(ctx, iocb);
@@ -1545,8 +1636,7 @@ int fastcall io_submit_one(struct kioctx
 	ssize_t ret;
 
 	/* enforce forwards compatibility on users */
-	if (unlikely(iocb->aio_reserved1 || iocb->aio_reserved2 ||
-		     iocb->aio_reserved3)) {
+	if (unlikely(iocb->aio_reserved1 || iocb->aio_reserved3)) {
 		pr_debug("EINVAL: io_submit: reserve field set\n");
 		return -EINVAL;
 	}
@@ -1555,6 +1645,7 @@ int fastcall io_submit_one(struct kioctx
 	if (unlikely(
 	    (iocb->aio_buf != (unsigned long)iocb->aio_buf) ||
 	    (iocb->aio_nbytes != (size_t)iocb->aio_nbytes) ||
+	    (iocb->aio_sigeventp != (unsigned long)iocb->aio_sigeventp) ||
 	    ((ssize_t)iocb->aio_nbytes < 0)
 	   )) {
 		pr_debug("EINVAL: io_submit: overflow check\n");
@@ -1588,11 +1679,21 @@ int fastcall io_submit_one(struct kioctx
 	init_waitqueue_func_entry(&req->ki_wait.wait, aio_wake_function);
 	INIT_LIST_HEAD(&req->ki_wait.wait.task_list);
  	req->ki_run_list.next = req->ki_run_list.prev = NULL;
+	/* handle setting up the sigevent for POSIX AIO signals */
+	req->ki_notify.notify = SIGEV_NONE;
+
+	if (iocb->aio_sigeventp) {
+		ret = aio_setup_sigevent(&req->ki_notify,
+					 (struct sigevent __user *)(unsigned long)
+					 iocb->aio_sigeventp);
+		if (ret)
+			goto out_put_req;
+	}
 
 	ret = aio_setup_iocb(req);
 
 	if (ret)
-		goto out_put_req;
+		goto out_sigqfree;
 
 	spin_lock_irq(&ctx->ctx_lock);
 	aio_run_iocb(req);
@@ -1605,6 +1706,11 @@ int fastcall io_submit_one(struct kioctx
 	aio_put_req(req);	/* drop extra ref to req */
 	return 0;
 
+out_sigqfree:
+	/* Undo the sigqueue alloc if someting went bad */
+	if (req->ki_notify.sigq)
+		__sigqueue_free(req->ki_notify.sigq);
+
 out_put_req:
 	aio_put_req(req);	/* drop extra ref to req */
 	aio_put_req(req);	/* drop i/o ref to req */
Index: linux-2.6.20-rc6-mm3/include/linux/aio_abi.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/linux/aio_abi.h	2007-01-30 11:28:56.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/aio_abi.h	2007-01-30 11:41:41.000000000 +0100
@@ -82,8 +82,9 @@ struct iocb {
 	__u64	aio_nbytes;
 	__s64	aio_offset;
 
+	__u64	aio_sigeventp;	/* pointer to struct sigevent */
+
 	/* extra parameters */
-	__u64	aio_reserved2;	/* TODO: use this for a (struct sigevent *) */
 	__u64	aio_reserved3;
 }; /* 64 bytes */
 
Index: linux-2.6.20-rc6-mm3/include/linux/aio.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/linux/aio.h	2007-01-30 11:30:50.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/aio.h	2007-01-30 11:41:41.000000000 +0100
@@ -7,6 +7,7 @@
 #include <linux/uio.h>
 
 #include <asm/atomic.h>
+#include <asm/siginfo.h>
 
 #define AIO_MAXSEGS		4
 #define AIO_KIOGRP_NR_ATOMIC	8
@@ -54,6 +55,14 @@ struct kioctx;
 #define kiocbIsCancelled(iocb)	test_bit(KIF_CANCELLED, &(iocb)->ki_flags)
 #define kiocbIsRestarted(iocb)	test_bit(KIF_RESTARTED, &(iocb)->ki_flags)
 
+struct aio_notify {
+	struct task_struct	*target;
+	__u16			signo;
+	__u16			notify;
+	sigval_t		value;
+	struct sigqueue		*sigq;
+};
+
 /* is there a better place to document function pointer methods? */
 /**
  * ki_retry	-	iocb forward progress callback
@@ -123,6 +132,9 @@ struct kiocb {
 
 	struct list_head	ki_list;	/* the aio core uses this
 						 * for cancellation */
+
+	/* to notify a process on I/O event */
+	struct aio_notify	ki_notify;
 };
 
 #define is_sync_kiocb(iocb)	((iocb)->ki_key == KIOCB_SYNC_KEY)
Index: linux-2.6.20-rc6-mm3/fs/compat.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/compat.c	2007-01-30 11:31:55.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/compat.c	2007-01-30 11:41:41.000000000 +0100
@@ -665,6 +665,7 @@ compat_sys_io_submit(aio_context_t ctx_i
 		compat_uptr_t uptr;
 		struct iocb __user *user_iocb;
 		struct iocb tmp;
+		struct compat_sigevent __user *uevent;
 
 		if (unlikely(get_user(uptr, iocb + i))) {
 			ret = -EFAULT;
@@ -678,6 +679,23 @@ compat_sys_io_submit(aio_context_t ctx_i
 			break;
 		}
 
+		uevent = (struct compat_sigevent __user *)tmp.aio_sigeventp;
+
+		if (uevent) {
+			struct sigevent __user *event = NULL;
+			struct sigevent kevent;
+
+			event = compat_alloc_user_space(sizeof(*event));
+
+			if (get_compat_sigevent(&kevent, uevent) ||
+			    copy_to_user(event, &kevent, sizeof(*event))) {
+				ret = -EFAULT;
+				break;
+			}
+
+			tmp.aio_sigeventp = (__u64)event;
+		}
+
 		ret = io_submit_one(ctx, user_iocb, &tmp);
 		if (ret)
 			break;
Index: linux-2.6.20-rc6-mm3/kernel/signal.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/kernel/signal.c	2007-01-30 11:41:39.000000000 +0100
+++ linux-2.6.20-rc6-mm3/kernel/signal.c	2007-01-30 11:41:41.000000000 +0100
@@ -1517,7 +1517,7 @@ int send_sigqueue(int sig, struct sigque
 	unsigned long flags;
 	int ret = 0;
 
-	BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
+	BUG_ON(!(q->flags & SIGQUEUE_PREALLOC) && q->info.si_code != SI_ASYNCIO);
 
 	/*
 	 * The rcu based delayed sighand destroy makes it possible to
@@ -1568,7 +1568,7 @@ send_group_sigqueue(int sig, struct sigq
 	unsigned long flags;
 	int ret = 0;
 
-	BUG_ON(!(q->flags & SIGQUEUE_PREALLOC));
+	BUG_ON(!(q->flags & SIGQUEUE_PREALLOC) && q->info.si_code != SI_ASYNCIO);
 
 	read_lock(&tasklist_lock);
 	/* Since it_lock is held, p->sighand cannot be NULL. */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH -mm 7/7][AIO] - Add listio syscall support
  2007-02-01  9:22 [PATCH -mm 0/7][AIO] - AIO completion signal notification v6 Sébastien Dugué
                   ` (5 preceding siblings ...)
  2007-02-01  9:31 ` [PATCH -mm 6/7][AIO] - AIO completion signal notification Sébastien Dugué
@ 2007-02-01  9:32 ` Sébastien Dugué
  6 siblings, 0 replies; 19+ messages in thread
From: Sébastien Dugué @ 2007-02-01  9:32 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andrew Morton, linux-aio, Bharata B Rao, Christoph Hellwig,
	Suparna Bhattacharya, Ulrich Drepper, Zach Brown, Oleg Nesterov,
	Badari Pulavarty, Benjamin LaHaise, Jean Pierre Dion


From: Bharata B Rao <bharata@in.ibm.com>

This patch provides POSIX listio support by means of a new system call.

long lio_submit(aio_context_t ctx_id, int mode, long nr,
	struct iocb __user * __user *iocbpp, struct sigevent __user *event)

This system call is similar to the io_submit() system call, but takes
two more arguments.

'mode' argument can be LIO_WAIT or LIO_NOWAIT.

'event' argument specifies the signal notification mechanism.

This patch is built on support provided by the aio signal notification
patch by Sebastien. The following two structures together provide
the support for grouping iocbs belonging to a list (lio).

struct aio_notify {
	struct task_struct	*target;
	__u16			signo;
	__u16			notify;
	sigval_t		value;
	struct sigqueue		*sigq;
};

struct lio_event {
	atomic_t		lio_users;
	struct aio_notify	lio_notify;
};

A single lio_event struct is maintained for the list of iocbs.
lio_users holds the number of requests attached to this lio and lio_notify
has the necessary information for generating completion notification
signal.

If the mode is LIO_WAIT, the event argument is ignored and the system
call waits until all the requests of the lio are completed.

If the mode is LIO_NOWAIT, the system call returns immediately after
submitting the io requests and may optionally notify the process on
list io completion depending on the event argument.

Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net>
Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Bharata B Rao <bharata@in.ibm.com>
---

 arch/i386/kernel/syscall_table.S |    1 
 arch/x86_64/ia32/ia32entry.S     |    5 -
 fs/aio.c                         |  193 ++++++++++++++++++++++++++++++++++-----
 fs/compat.c                      |  125 +++++++++++++++++++++----
 include/asm-i386/unistd.h        |    3 
 include/asm-x86_64/unistd.h      |    4 
 include/linux/aio.h              |   14 ++
 include/linux/aio_abi.h          |    5 +
 include/linux/syscalls.h         |    2 
 9 files changed, 307 insertions(+), 45 deletions(-)

Index: linux-2.6.20-rc6-mm3/arch/i386/kernel/syscall_table.S
===================================================================
--- linux-2.6.20-rc6-mm3.orig/arch/i386/kernel/syscall_table.S	2007-01-31 15:31:59.000000000 +0100
+++ linux-2.6.20-rc6-mm3/arch/i386/kernel/syscall_table.S	2007-01-31 15:58:55.000000000 +0100
@@ -320,3 +320,4 @@ ENTRY(sys_call_table)
 	.long sys_getcpu
 	.long sys_epoll_pwait
 	.long sys_lutimesat		/* 320 */
+	.long sys_lio_submit
Index: linux-2.6.20-rc6-mm3/arch/x86_64/ia32/ia32entry.S
===================================================================
--- linux-2.6.20-rc6-mm3.orig/arch/x86_64/ia32/ia32entry.S	2007-01-31 15:31:59.000000000 +0100
+++ linux-2.6.20-rc6-mm3/arch/x86_64/ia32/ia32entry.S	2007-01-31 15:58:55.000000000 +0100
@@ -714,8 +714,11 @@ ia32_sys_call_table:
 	.quad compat_sys_get_robust_list
 	.quad sys_splice
 	.quad sys_sync_file_range
-	.quad sys_tee
+	.quad sys_tee			/* 315 */
 	.quad compat_sys_vmsplice
 	.quad compat_sys_move_pages
 	.quad sys_getcpu
+	.quad quiet_ni_syscall		/* sys_epoll_wait */
+	.quad quiet_ni_syscall		/* 320: sys_lutimesat */
+	.quad compat_sys_lio_submit
 ia32_syscall_end:		
Index: linux-2.6.20-rc6-mm3/fs/aio.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c	2007-01-31 15:31:59.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c	2007-01-31 16:00:57.000000000 +0100
@@ -416,6 +416,7 @@ static struct kiocb fastcall *__aio_get_
 	req->ki_ctx = ctx;
 	req->ki_cancel = NULL;
 	req->ki_retry = NULL;
+	req->ki_lio = NULL;
 	req->ki_dtor = NULL;
 	req->private = NULL;
 	req->ki_iovec = NULL;
@@ -995,6 +996,72 @@ out_unlock:
 	return -EINVAL;
 }
 
+/*
+ * lio_notify
+ * When all iocbs belonging to an lio are complete, this initiates
+ * the completion notification for the lio.
+ *
+ * NOTE: lio is freed here after the last iocb completes, but only
+ * for LIO_NOWAIT case. In case of LIO_WAIT, the submitting thread
+ * waits for iocbs to complete and later frees the lio.
+ */
+void lio_notify(struct lio_event *lio)
+{
+	int ret;
+
+	ret = atomic_dec_and_test(&lio->lio_users);
+
+	if (unlikely(ret) && lio->lio_notify.notify != SIGEV_NONE) {
+		/* last one -> notify process */
+		if (aio_send_signal(&lio->lio_notify))
+			__sigqueue_free(lio->lio_notify.sigq);
+		kfree(lio);
+	}
+}
+
+/*
+ * lio_create
+ * Allocates a lio_event structure which groups the iocbs belonging
+ * to the lio and sets up the completion notification.
+ *
+ * Case 1: LIO_NOWAIT and NULL sigevent - No need to group the iocbs
+ *	and hence lio_event is not created.
+ * Case 2: LIO_NOWAIT and sigvent - lio_event is created and notification
+ *	mechanism is setup.
+ * Case 3: LIO_WAIT - lio_event is created and sigevent is ignored.
+ */
+struct lio_event *lio_create(struct sigevent __user *user_event,
+			int mode)
+{
+	int ret = 0;
+	struct lio_event *lio = NULL;
+
+	if (unlikely((mode == LIO_NOWAIT) && !user_event))
+		return lio;
+
+	lio = kzalloc(sizeof(*lio), GFP_KERNEL);
+
+	if (!lio)
+		return ERR_PTR(-EAGAIN);
+
+	/*
+	 * Grab an initial ref on the lio to avoid races between
+	 * submission and completion.
+	 */
+	atomic_set(&lio->lio_users, 1);
+
+	lio->lio_notify.notify = SIGEV_NONE;
+
+	if (user_event && (mode == LIO_NOWAIT)) {
+		ret = aio_setup_sigevent(&lio->lio_notify, user_event);
+		if (ret) {
+			kfree(lio);
+			return ERR_PTR(ret);
+		}
+	}
+	return lio;
+}
+
 /* aio_complete
  *	Called when the io request on the given iocb is complete.
  *	Returns true if this is the last user of the request.  The 
@@ -1043,6 +1110,9 @@ int fastcall aio_complete(struct kiocb *
 	 * when the event got cancelled.
 	 */
 	if (kiocbIsCancelled(iocb)) {
+		if (iocb->ki_lio)
+			lio_notify(iocb->ki_lio);
+
 		if (iocb->ki_notify.sigq)
 			__sigqueue_free(iocb->ki_notify.sigq);
 		goto put_rq;
@@ -1085,6 +1155,8 @@ int fastcall aio_complete(struct kiocb *
 			__sigqueue_free(iocb->ki_notify.sigq);
 	}
 
+	if (iocb->ki_lio)
+		lio_notify(iocb->ki_lio);
 put_rq:
 	/* everything turned out well, dispose of the aiocb. */
 	ret = __aio_put_req(ctx, iocb);
@@ -1629,7 +1701,7 @@ static int aio_wake_function(wait_queue_
 }
 
 int fastcall io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
-			 struct iocb *iocb)
+			 struct iocb *iocb, struct lio_event *lio)
 {
 	struct kiocb *req;
 	struct file *file;
@@ -1690,6 +1762,9 @@ int fastcall io_submit_one(struct kioctx
 			goto out_put_req;
 	}
 
+	/* Attach this iocb to its lio */
+	req->ki_lio = lio;
+
 	ret = aio_setup_iocb(req);
 
 	if (ret)
@@ -1717,6 +1792,48 @@ out_put_req:
 	return ret;
 }
 
+static int io_submit_group(struct kioctx *ctx, long nr,
+	struct iocb __user * __user *iocbpp, struct lio_event *lio)
+{
+	int i;
+	long ret = 0;
+
+	/*
+	 * AKPM: should this return a partial result if some of the IOs were
+	 * successfully submitted?
+	 */
+	for (i = 0; i < nr; i++) {
+		struct iocb __user *user_iocb;
+		struct iocb tmp;
+
+		if (unlikely(__get_user(user_iocb, iocbpp + i))) {
+			ret = -EFAULT;
+			break;
+		}
+
+		if (unlikely(copy_from_user(&tmp, user_iocb, sizeof(tmp)))) {
+			ret = -EFAULT;
+			break;
+		}
+
+		if (lio)
+			atomic_inc(&lio->lio_users);
+
+		ret = io_submit_one(ctx, user_iocb, &tmp, lio);
+		if (ret) {
+			if (lio) {
+				/*
+				 * In case of listio, continue with
+				 * the subsequent requests
+				 */
+				atomic_dec(&lio->lio_users);
+			} else
+				break;
+		}
+	}
+	return i ? i : ret;
+}
+
 /* sys_io_submit:
  *	Queue the nr iocbs pointed to by iocbpp for processing.  Returns
  *	the number of iocbs queued.  May return -EINVAL if the aio_context
@@ -1734,7 +1851,6 @@ asmlinkage long sys_io_submit(aio_contex
 {
 	struct kioctx *ctx;
 	long ret = 0;
-	int i;
 
 	if (unlikely((nr*sizeof(*iocbpp)) < 0))
 		return -EINVAL;
@@ -1748,31 +1864,61 @@ asmlinkage long sys_io_submit(aio_contex
 		return -EINVAL;
 	}
 
-	/*
-	 * AKPM: should this return a partial result if some of the IOs were
-	 * successfully submitted?
-	 */
-	for (i=0; i<nr; i++) {
-		struct iocb __user *user_iocb;
-		struct iocb tmp;
+	ret = io_submit_group(ctx, nr, iocbpp, NULL);
 
-		if (unlikely(__get_user(user_iocb, iocbpp + i))) {
-			ret = -EFAULT;
-			break;
-		}
+	put_ioctx(ctx);
+	return ret;
+}
 
-		if (unlikely(copy_from_user(&tmp, user_iocb, sizeof(tmp)))) {
-			ret = -EFAULT;
-			break;
-		}
+asmlinkage long sys_lio_submit(aio_context_t ctx_id, int mode, long nr,
+	struct iocb __user * __user *iocbpp, struct sigevent __user *event)
+{
+	struct kioctx *ctx;
+	struct lio_event *lio = NULL;
+	long ret = 0;
 
-		ret = io_submit_one(ctx, user_iocb, &tmp);
-		if (ret)
-			break;
+	if (unlikely((nr*sizeof(*iocbpp)) < 0))
+		return -EINVAL;
+
+	if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(*iocbpp)))))
+		return -EFAULT;
+
+	ctx = lookup_ioctx(ctx_id);
+	if (unlikely(!ctx)) {
+		pr_debug("EINVAL: lio_submit: invalid context id\n");
+		return -EINVAL;
+	}
+
+	lio = lio_create(event, mode);
+
+	ret = PTR_ERR(lio);
+	if (IS_ERR(lio))
+		goto out_put_ctx;
+
+	ret = io_submit_group(ctx, nr, iocbpp, lio);
+
+	/* If we failed to submit even one request just return */
+	if (ret < 0 ) {
+		if (lio)
+			kfree(lio);
+		goto out_put_ctx;
+	}
+
+	/*
+	 * Drop extra ref on the lio now that we're done submitting requests.
+	 */
+	if (lio)
+		lio_notify(lio);
+
+	if (mode == LIO_WAIT) {
+		wait_event(ctx->wait, atomic_read(&lio->lio_users) == 0);
+		kfree(lio);
 	}
 
+out_put_ctx:
 	put_ioctx(ctx);
-	return i ? i : ret;
+
+	return ret;
 }
 
 /* lookup_kiocb
Index: linux-2.6.20-rc6-mm3/fs/compat.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/compat.c	2007-01-31 15:31:59.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/compat.c	2007-01-31 16:03:02.000000000 +0100
@@ -644,24 +644,13 @@ out:
 	return ret;
 }
 
-asmlinkage long
-compat_sys_io_submit(aio_context_t ctx_id, int nr, u32 __user *iocb)
+static int compat_io_submit_group(struct kioctx *ctx, long nr,
+				  u32 __user *iocb, struct lio_event *lio)
 {
-	struct kioctx *ctx;
-	long ret = 0;
 	int i;
+	long ret = 0;
 
-	if (unlikely((nr * sizeof(u32)) < 0))
-		return -EINVAL;
-
-	if (unlikely(!access_ok(VERIFY_READ, iocb, (nr * sizeof(u32)))))
-		return -EFAULT;
-
-	ctx = lookup_ioctx(ctx_id);
-	if (unlikely(!ctx))
-		return -EINVAL;
-
-	for (i=0; i<nr; i++) {
+	for (i = 0; i < nr; i++) {
 		compat_uptr_t uptr;
 		struct iocb __user *user_iocb;
 		struct iocb tmp;
@@ -696,13 +685,105 @@ compat_sys_io_submit(aio_context_t ctx_i
 			tmp.aio_sigeventp = (__u64)event;
 		}
 
-		ret = io_submit_one(ctx, user_iocb, &tmp);
-		if (ret)
-			break;
+		if (lio)
+			atomic_inc(&lio->lio_users);
+
+		ret = io_submit_one(ctx, user_iocb, &tmp, lio);
+		if (ret) {
+			if (lio) {
+				/*
+				 * In case of listio, continue with
+				 * the subsequent requests
+				 */
+				atomic_dec(&lio->lio_users);
+			} else
+				break;
+		}
+
+
 	}
+	return i ? i : ret;
+}
 
+asmlinkage long
+compat_sys_io_submit(aio_context_t ctx_id, int nr, u32 __user *iocb)
+{
+	struct kioctx *ctx;
+	long ret = 0;
+
+	if (unlikely((nr*sizeof(u32)) < 0))
+		return -EINVAL;
+
+	if (unlikely(!access_ok(VERIFY_READ, iocb, (nr * sizeof(u32)))))
+		return -EFAULT;
+
+	ctx = lookup_ioctx(ctx_id);
+	if (unlikely(!ctx))
+		return -EINVAL;
+
+	ret = compat_io_submit_group(ctx, nr, iocb, NULL);
 	put_ioctx(ctx);
-	return i ? i: ret;
+	return ret;
+}
+
+asmlinkage long
+compat_sys_lio_submit(aio_context_t ctx_id, int mode, int nr, u32 __user *iocb,
+		struct compat_sigevent __user *sig_user)
+{
+	struct kioctx *ctx;
+	struct lio_event *lio = NULL;
+	struct sigevent __user *event = NULL;
+	long ret = 0;
+
+	if (unlikely((nr*sizeof(u32)) < 0))
+		return -EINVAL;
+
+	if (unlikely(!access_ok(VERIFY_READ, iocb, (nr * sizeof(u32)))))
+		return -EFAULT;
+
+	ctx = lookup_ioctx(ctx_id);
+	if (unlikely(!ctx))
+		return -EINVAL;
+
+	if (sig_user) {
+		struct sigevent kevent;
+		event = compat_alloc_user_space(sizeof(struct sigevent));
+		if (get_compat_sigevent(&kevent, sig_user) ||
+			copy_to_user(event, &kevent, sizeof(struct sigevent))) {
+			ret = -EFAULT;
+			goto out_put_ctx;
+		}
+	}
+
+	lio = lio_create(event, mode);
+
+	ret = PTR_ERR(lio);
+	if (IS_ERR(lio))
+		goto out_put_ctx;
+
+	ret = compat_io_submit_group(ctx, nr, iocb, lio);
+
+	/* If we failed to submit even one request just return */
+	if (ret < 0) {
+		if (lio)
+			kfree(lio);
+		goto out_put_ctx;
+	}
+
+	/*
+	 * Drop extra ref on the lio now that we're done submitting requests.
+	 */
+	if (lio)
+		lio_notify(lio);
+
+	if (mode == LIO_WAIT) {
+		wait_event(ctx->wait, atomic_read(&lio->lio_users) == 0);
+		kfree(lio);
+	}
+
+out_put_ctx:
+	put_ioctx(ctx);
+	return ret;
 }
 
 struct compat_ncp_mount_data {
Index: linux-2.6.20-rc6-mm3/include/asm-i386/unistd.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/asm-i386/unistd.h	2007-01-31 15:31:59.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/asm-i386/unistd.h	2007-01-31 15:58:55.000000000 +0100
@@ -326,10 +326,11 @@
 #define __NR_getcpu		318
 #define __NR_epoll_pwait	319
 #define __NR_lutimesat		320
+#define __NR_lio_submit		321
 
 #ifdef __KERNEL__
 
-#define NR_syscalls 321
+#define NR_syscalls 322
 
 #define __ARCH_WANT_IPC_PARSE_VERSION
 #define __ARCH_WANT_OLD_READDIR
Index: linux-2.6.20-rc6-mm3/include/asm-x86_64/unistd.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/asm-x86_64/unistd.h	2007-01-31 15:31:59.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/asm-x86_64/unistd.h	2007-01-31 15:58:55.000000000 +0100
@@ -619,8 +619,10 @@ __SYSCALL(__NR_sync_file_range, sys_sync
 __SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages		279
 __SYSCALL(__NR_move_pages, sys_move_pages)
+#define __NR_lio_submit		280
+__SYSCALL(__NR_lio_submit, sys_lio_submit)
 
-#define __NR_syscall_max __NR_move_pages
+#define __NR_syscall_max __NR_lio_submit
 
 #ifndef __NO_STUBS
 #define __ARCH_WANT_OLD_READDIR
Index: linux-2.6.20-rc6-mm3/include/linux/aio_abi.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/linux/aio_abi.h	2007-01-31 15:31:59.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/aio_abi.h	2007-01-31 15:58:55.000000000 +0100
@@ -45,6 +45,11 @@ enum {
 	IOCB_CMD_PWRITEV = 8,
 };
 
+enum {
+	LIO_WAIT = 0,
+	LIO_NOWAIT = 1,
+};
+
 /* read() from /dev/aio returns these structures. */
 struct io_event {
 	__u64		data;		/* the data field from the iocb */
Index: linux-2.6.20-rc6-mm3/include/linux/aio.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/linux/aio.h	2007-01-31 15:31:59.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/aio.h	2007-01-31 15:58:55.000000000 +0100
@@ -63,6 +63,11 @@ struct aio_notify {
 	struct sigqueue		*sigq;
 };
 
+struct lio_event {
+	atomic_t		lio_users;
+	struct aio_notify	lio_notify;
+};
+
 /* is there a better place to document function pointer methods? */
 /**
  * ki_retry	-	iocb forward progress callback
@@ -117,6 +122,8 @@ struct kiocb {
 	__u64			ki_user_data;	/* user's data for completion */
 	struct wait_bit_queue	ki_wait;
 	loff_t			ki_pos;
+	/* lio this iocb might be attached to */
+	struct lio_event	*ki_lio;
 
 	atomic_t		ki_bio_count;	/* num bio used for this iocb */
 	void			*private;
@@ -225,12 +232,15 @@ struct mm_struct;
 extern void FASTCALL(exit_aio(struct mm_struct *mm));
 extern struct kioctx *lookup_ioctx(unsigned long ctx_id);
 extern int FASTCALL(io_submit_one(struct kioctx *ctx,
-			struct iocb __user *user_iocb, struct iocb *iocb));
+			struct iocb __user *user_iocb, struct iocb *iocb,
+			struct lio_event *lio));
+struct lio_event *lio_create(struct sigevent __user *user_event, int mode);
+void lio_notify(struct lio_event *lio);
 
 /* semi private, but used by the 32bit emulations: */
 struct kioctx *lookup_ioctx(unsigned long ctx_id);
 int FASTCALL(io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
-				  struct iocb *iocb));
+				  struct iocb *iocb, struct lio_event *lio));
 
 #define get_ioctx(kioctx) do {						\
 	BUG_ON(atomic_read(&(kioctx)->users) <= 0);			\
Index: linux-2.6.20-rc6-mm3/include/linux/syscalls.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/linux/syscalls.h	2007-01-31 15:31:59.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/syscalls.h	2007-01-31 15:58:55.000000000 +0100
@@ -317,6 +317,8 @@ asmlinkage long sys_io_getevents(aio_con
 				struct timespec __user *timeout);
 asmlinkage long sys_io_submit(aio_context_t, long,
 				struct iocb __user * __user *);
+asmlinkage long sys_lio_submit(aio_context_t, int, long,
+		struct iocb __user * __user *, struct sigevent __user *);
 asmlinkage long sys_io_cancel(aio_context_t ctx_id, struct iocb __user *iocb,
 			      struct io_event __user *result);
 asmlinkage ssize_t sys_sendfile(int out_fd, int in_fd,

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH -mm 4/7][AIO] - Make good_sigevent() non-static
  2007-02-01  9:30 ` [PATCH -mm 4/7][AIO] - Make good_sigevent() non-static Sébastien Dugué
@ 2007-02-02 18:00   ` Oleg Nesterov
  2007-02-05 11:09     ` Sébastien Dugué
  2007-02-05 12:18     ` [PATCH -mm 4/7][AIO] Resend " Sébastien Dugué
  0 siblings, 2 replies; 19+ messages in thread
From: Oleg Nesterov @ 2007-02-02 18:00 UTC (permalink / raw)
  To: S?bastien Dugu?
  Cc: linux-kernel, Andrew Morton, linux-aio, Bharata B Rao,
	Christoph Hellwig, Suparna Bhattacharya, Ulrich Drepper,
	Zach Brown, Badari Pulavarty, Benjamin LaHaise, Jean Pierre Dion

On 02/01, S?bastien Dugu? wrote:
> 
> +struct task_struct * sigevent_find_task(sigevent_t * event)
> +{
> +	struct task_struct *task = NULL;
> +
> +	if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
> +		return NULL;
> +
> +	if ((event->sigev_notify & SIGEV_THREAD_ID ) == SIGEV_THREAD_ID) {
> +		task = find_task_by_pid(event->sigev_notify_thread_id);
> +
> +		if (!task || task->tgid != current->tgid)
> +			task = NULL;
> +	} else if (event->sigev_notify == SIGEV_SIGNAL)
> +		task = current->group_leader;
> +
> +	return task;
> +}

I am afraid this is still not right. Consider

	->sigev_notify == SIGEV_THREAD_ID | RANDOM_BIT

Now, the second "if (SIGEV_THREAD_ID)" returns a valid task. However,

	really_put_req:

		if (notify == SIGEV_THREAD_ID || notify == SIGEV_SIGNAL)
			put_task_struct();

doesn't work, so we have task_struct leak.

Worse, this breaks posix-timers. Note that posix-timers allow SIGEV_NONE,
the timer is not queued in that case, we shouldn't do ->sigev_signo check.
This means that aio should check SIGEV_NONE itself.

Also, it is critical for posix-timers that SIGEV_THREAD_ID doesn't come
with another bit (like in the example below), note the code like

	if (sigev_notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID))
		...

IOW: good_sigevent() in its current form is very cryptic, and it _really_
needs a cleanup, but we should not change its behaviour.

Apart from this, I don't see other problems in the signal related code in
this series.

Oleg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH -mm 4/7][AIO] - Make good_sigevent() non-static
  2007-02-02 18:00   ` Oleg Nesterov
@ 2007-02-05 11:09     ` Sébastien Dugué
  2007-02-05 12:18     ` [PATCH -mm 4/7][AIO] Resend " Sébastien Dugué
  1 sibling, 0 replies; 19+ messages in thread
From: Sébastien Dugué @ 2007-02-05 11:09 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Andrew Morton, linux-aio, Bharata B Rao,
	Christoph Hellwig, Suparna Bhattacharya, Ulrich Drepper,
	Zach Brown, Badari Pulavarty, Benjamin LaHaise, Jean Pierre Dion


  Hi Oleg,

  thanks for your comments.

On Fri, 2 Feb 2007 21:00:39 +0300 Oleg Nesterov <oleg@tv-sign.ru> wrote:

> On 02/01, S?bastien Dugu? wrote:
> > 
> > +struct task_struct * sigevent_find_task(sigevent_t * event)
> > +{
> > +	struct task_struct *task = NULL;
> > +
> > +	if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
> > +		return NULL;
> > +
> > +	if ((event->sigev_notify & SIGEV_THREAD_ID ) == SIGEV_THREAD_ID) {
> > +		task = find_task_by_pid(event->sigev_notify_thread_id);
> > +
> > +		if (!task || task->tgid != current->tgid)
> > +			task = NULL;
> > +	} else if (event->sigev_notify == SIGEV_SIGNAL)
> > +		task = current->group_leader;
> > +
> > +	return task;
> > +}
> 
> I am afraid this is still not right. Consider
> 
> 	->sigev_notify == SIGEV_THREAD_ID | RANDOM_BIT
> 
> Now, the second "if (SIGEV_THREAD_ID)" returns a valid task. However,
> 
> 	really_put_req:
> 
> 		if (notify == SIGEV_THREAD_ID || notify == SIGEV_SIGNAL)
> 			put_task_struct();
> 
> doesn't work, so we have task_struct leak.

  Right, I'll revert back to the old code with cleanups.

> 
> Worse, this breaks posix-timers. Note that posix-timers allow SIGEV_NONE,
> the timer is not queued in that case, we shouldn't do ->sigev_signo check.
> This means that aio should check SIGEV_NONE itself.
> 
> Also, it is critical for posix-timers that SIGEV_THREAD_ID doesn't come
> with another bit (like in the example below), note the code like
> 
> 	if (sigev_notify == (SIGEV_SIGNAL|SIGEV_THREAD_ID))
> 		...
> 
> IOW: good_sigevent() in its current form is very cryptic, and it _really_
> needs a cleanup, but we should not change its behaviour.

  Yep. I must admit that I didn't pay enough attention to this 10-liner,
but in the end it rightfully backfired on me.

> 
> Apart from this, I don't see other problems in the signal related code in
> this series.
> 
> Oleg.
> 

  Thanks again for your review.

  Sébastien.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH -mm 4/7][AIO] Resend - Make good_sigevent() non-static
  2007-02-02 18:00   ` Oleg Nesterov
  2007-02-05 11:09     ` Sébastien Dugué
@ 2007-02-05 12:18     ` Sébastien Dugué
  2007-02-05 13:43       ` Oleg Nesterov
  1 sibling, 1 reply; 19+ messages in thread
From: Sébastien Dugué @ 2007-02-05 12:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oleg Nesterov, Andrew Morton, linux-aio, Bharata B Rao,
	Christoph Hellwig, Suparna Bhattacharya, Ulrich Drepper,
	Zach Brown, Badari Pulavarty, Benjamin LaHaise, Jean Pierre Dion


  Andrew,

  Could you replace make-good_sigevent-non-static.patch with this
patch, as it fixes some problems reported by Oleg Nesterov.

  Oleg, chime in if you still see something I missed.

  Hopefully I will have gotten it right this time.

  Sorry for the hassle.

  Sébastien.


From: Sébastien Dugué <sebastien.dugue@bull.net>

                Make good_sigevent() non-static and rename it


  Move good_sigevent() from posix-timers.c to signal.c where it belongs,
clean it up, rename it to sigevent_find_task() to better describe what the
function does and make it non-static so that it can be used by other
 subsystems.


 include/linux/signal.h |    1 +
 kernel/posix-timers.c  |   19 +------------------
 kernel/signal.c        |   24 ++++++++++++++++++++++++
 3 files changed, 26 insertions(+), 18 deletions(-)

Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net>


Index: linux-2.6.20-rc6-mm3/include/linux/signal.h
===================================================================
--- linux-2.6.20-rc6-mm3.orig/include/linux/signal.h	2007-01-30 11:41:27.000000000 +0100
+++ linux-2.6.20-rc6-mm3/include/linux/signal.h	2007-02-05 12:25:12.000000000 +0100
@@ -240,6 +240,7 @@ extern int sigprocmask(int, sigset_t *, 
 
 struct pt_regs;
 extern int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka, struct pt_regs *regs, void *cookie);
+extern struct task_struct * sigevent_find_task(sigevent_t *);
 
 extern struct kmem_cache *sighand_cachep;
 
Index: linux-2.6.20-rc6-mm3/kernel/posix-timers.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/kernel/posix-timers.c	2007-01-30 11:41:27.000000000 +0100
+++ linux-2.6.20-rc6-mm3/kernel/posix-timers.c	2007-01-30 11:41:36.000000000 +0100
@@ -367,23 +367,6 @@ static enum hrtimer_restart posix_timer_
 	return ret;
 }
 
-static struct task_struct * good_sigevent(sigevent_t * event)
-{
-	struct task_struct *rtn = current->group_leader;
-
-	if ((event->sigev_notify & SIGEV_THREAD_ID ) &&
-		(!(rtn = find_task_by_pid(event->sigev_notify_thread_id)) ||
-		 rtn->tgid != current->tgid ||
-		 (event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_SIGNAL))
-		return NULL;
-
-	if (((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) &&
-	    ((event->sigev_signo <= 0) || (event->sigev_signo > SIGRTMAX)))
-		return NULL;
-
-	return rtn;
-}
-
 void register_posix_clock(const clockid_t clock_id, struct k_clock *new_clock)
 {
 	if ((unsigned) clock_id >= MAX_CLOCKS) {
@@ -496,7 +479,7 @@ sys_timer_create(const clockid_t which_c
 		new_timer->it_sigev_value = event.sigev_value;
 
 		read_lock(&tasklist_lock);
-		if ((process = good_sigevent(&event))) {
+		if ((process = sigevent_find_task(&event))) {
 			/*
 			 * We may be setting up this process for another
 			 * thread.  It may be exiting.  To catch this
Index: linux-2.6.20-rc6-mm3/kernel/signal.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/kernel/signal.c	2007-01-30 11:41:27.000000000 +0100
+++ linux-2.6.20-rc6-mm3/kernel/signal.c	2007-02-05 12:27:33.000000000 +0100
@@ -1213,6 +1213,34 @@ int group_send_sig_info(int sig, struct 
 	return ret;
 }
 
+/***
+ * sigevent_find_task - check and get target task from a sigevent.
+ * @event: the sigevent to be checked
+ *
+ * This function must be called with the tasklist_lock held for reading.
+ */
+struct task_struct * sigevent_find_task(sigevent_t * event)
+{
+	struct task_struct *task = current->group_leader;
+
+	if ((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) {
+		if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
+			return NULL;
+	}
+
+	if (event->sigev_notify & SIGEV_THREAD_ID) {
+		if ((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_SIGNAL)
+			return NULL;
+
+		task = find_task_by_pid(event->sigev_notify_thread_id);
+
+		if (!task || task->tgid != current->tgid)
+			task = NULL;
+	}
+
+	return task;
+}
+
 /*
  * kill_pgrp_info() sends a signal to a process group: this is what the tty
  * control characters do (^C, ^Z etc)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH -mm 4/7][AIO] Resend - Make good_sigevent() non-static
  2007-02-05 12:18     ` [PATCH -mm 4/7][AIO] Resend " Sébastien Dugué
@ 2007-02-05 13:43       ` Oleg Nesterov
  2007-02-05 16:00         ` [PATCH -mm][AIO] Fix AIO completion signal notification possible ref leak Sébastien Dugué
  0 siblings, 1 reply; 19+ messages in thread
From: Oleg Nesterov @ 2007-02-05 13:43 UTC (permalink / raw)
  To: S?bastien Dugu?
  Cc: linux-kernel, Andrew Morton, linux-aio, Bharata B Rao,
	Christoph Hellwig, Suparna Bhattacharya, Ulrich Drepper,
	Zach Brown, Badari Pulavarty, Benjamin LaHaise, Jean Pierre Dion

On 02/05, S?bastien Dugu? wrote:
> 
> +struct task_struct * sigevent_find_task(sigevent_t * event)
> +{
> +	struct task_struct *task = current->group_leader;
> +
> +	if ((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) {
> +		if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX)
> +			return NULL;
> +	}
> +
> +	if (event->sigev_notify & SIGEV_THREAD_ID) {
> +		if ((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_SIGNAL)
> +			return NULL;
> +
> +		task = find_task_by_pid(event->sigev_notify_thread_id);
> +
> +		if (!task || task->tgid != current->tgid)
> +			task = NULL;
> +	}
> +
> +	return task;
> +}

I think this patch is correct, and the code is much more readable than
the old good_sigevet().

But please don't forget that PATCH 6/7 still needs a small fixup, we can
leak a task_struct in really_put_req()

		if (notify == SIGEV_THREAD_ID || notify == SIGEV_SIGNAL)
			put_task_struct(req->ki_notify.target);

because without SIGEV_THREAD_ID ->sigev_notify can have any bit/bits set,
even SIGEV_NONE (along with some other bit, aio_setup_sigevent checks for
== SIGEV_NONE).

Oleg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH -mm][AIO] Fix AIO completion signal notification possible ref leak
  2007-02-05 13:43       ` Oleg Nesterov
@ 2007-02-05 16:00         ` Sébastien Dugué
  2007-02-05 17:13           ` Oleg Nesterov
  0 siblings, 1 reply; 19+ messages in thread
From: Sébastien Dugué @ 2007-02-05 16:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oleg Nesterov, Andrew Morton, linux-aio, Bharata B Rao,
	Christoph Hellwig, Suparna Bhattacharya, Ulrich Drepper,
	Zach Brown, Badari Pulavarty, Jean Pierre Dion


  Andrew,

  here is an incremental patch to fix a possible ref leak when users
do weird things with the ->sigev_notify flags.

  Thanks Oleg for spotting this.

  Sébastien.


From: Sébastien Dugué <sebastien.dugue@bull.net>

         Fix AIO completion signal notification possible ref leak

  Make sure we only accept valid sigev_notify values in aio_setup_sigevent(),
namely SIGEV_NONE, SIGEV_THREAD_ID or SIGEV_SIGNAL.

 aio.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net>


Index: linux-2.6.20-rc6-mm3/fs/aio.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c	2007-02-05 16:50:27.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c	2007-02-05 16:53:43.000000000 +0100
@@ -939,7 +939,7 @@ static int aio_send_signal(struct aio_no
 	info->si_uid = 0;
 	info->si_value = notify->value;
 
-	if (notify->notify & SIGEV_THREAD_ID)
+	if (notify->notify == SIGEV_THREAD_ID)
 		ret = send_sigqueue(notify->signo, sigq, notify->target);
 	else
 		ret = send_group_sigqueue(notify->signo, sigq, notify->target);
@@ -959,6 +959,10 @@ static long aio_setup_sigevent(struct ai
 	if (event.sigev_notify == SIGEV_NONE)
 		return 0;
 
+	if (event.sigev_notify != SIGEV_SIGNAL &&
+	    event.sigev_notify != SIGEV_THREAD_ID)
+		return -EINVAL;
+
 	notify->notify = event.sigev_notify;
 	notify->signo = event.sigev_signo;
 	notify->value = event.sigev_value;

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH -mm][AIO] Fix AIO completion signal notification possible ref leak
  2007-02-05 16:00         ` [PATCH -mm][AIO] Fix AIO completion signal notification possible ref leak Sébastien Dugué
@ 2007-02-05 17:13           ` Oleg Nesterov
  2007-02-06  8:31             ` Sébastien Dugué
  2007-02-06  9:22             ` [PATCH -mm][AIO] AIO completion signal notification fixes and cleanups Sébastien Dugué
  0 siblings, 2 replies; 19+ messages in thread
From: Oleg Nesterov @ 2007-02-05 17:13 UTC (permalink / raw)
  To: S?bastien Dugu?
  Cc: linux-kernel, Andrew Morton, linux-aio, Bharata B Rao,
	Christoph Hellwig, Suparna Bhattacharya, Ulrich Drepper,
	Zach Brown, Badari Pulavarty, Jean Pierre Dion

On 02/05, S?bastien Dugu? wrote:
> 
>   Make sure we only accept valid sigev_notify values in aio_setup_sigevent(),
> namely SIGEV_NONE, SIGEV_THREAD_ID or SIGEV_SIGNAL.

I think this is correct, but I have another concern (most probably I just
confused looking at non-applied patch), could you re-check?

> @@ -959,6 +959,10 @@ static long aio_setup_sigevent(struct ai
>  	if (event.sigev_notify == SIGEV_NONE)
>  		return 0;
>  
> +	if (event.sigev_notify != SIGEV_SIGNAL &&
> +	    event.sigev_notify != SIGEV_THREAD_ID)
> +		return -EINVAL;
> +
>  	notify->notify = event.sigev_notify;
>  	notify->signo = event.sigev_signo;
>  	notify->value = event.sigev_value;

Ok. But what if sigevent_find_task() fails after that? Doesn't this mean
that really_put_req() will do put_task_struct(NULL) ?

Oleg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH -mm][AIO] Fix AIO completion signal notification possible ref leak
  2007-02-05 17:13           ` Oleg Nesterov
@ 2007-02-06  8:31             ` Sébastien Dugué
  2007-02-06  9:22             ` [PATCH -mm][AIO] AIO completion signal notification fixes and cleanups Sébastien Dugué
  1 sibling, 0 replies; 19+ messages in thread
From: Sébastien Dugué @ 2007-02-06  8:31 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Andrew Morton, linux-aio, Bharata B Rao,
	Christoph Hellwig, Suparna Bhattacharya, Ulrich Drepper,
	Zach Brown, Badari Pulavarty, Jean Pierre Dion

On Mon, 5 Feb 2007 20:13:35 +0300 Oleg Nesterov <oleg@tv-sign.ru> wrote:

> On 02/05, S?bastien Dugu? wrote:
> > 
> >   Make sure we only accept valid sigev_notify values in aio_setup_sigevent(),
> > namely SIGEV_NONE, SIGEV_THREAD_ID or SIGEV_SIGNAL.
> 
> I think this is correct, but I have another concern (most probably I just
> confused looking at non-applied patch), could you re-check?
> 
> > @@ -959,6 +959,10 @@ static long aio_setup_sigevent(struct ai
> >  	if (event.sigev_notify == SIGEV_NONE)
> >  		return 0;
> >  
> > +	if (event.sigev_notify != SIGEV_SIGNAL &&
> > +	    event.sigev_notify != SIGEV_THREAD_ID)
> > +		return -EINVAL;
> > +
> >  	notify->notify = event.sigev_notify;
> >  	notify->signo = event.sigev_signo;
> >  	notify->value = event.sigev_value;
> 
> Ok. But what if sigevent_find_task() fails after that? Doesn't this mean
> that really_put_req() will do put_task_struct(NULL) ?
> 

  Argh, right, a patch to fix that and a couple of other corner cases to
follow soon.

  Thanks Oleg for looking through this.

  Sébastien.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH -mm][AIO] AIO completion signal notification fixes and cleanups
  2007-02-05 17:13           ` Oleg Nesterov
  2007-02-06  8:31             ` Sébastien Dugué
@ 2007-02-06  9:22             ` Sébastien Dugué
  2007-02-06 11:05               ` Oleg Nesterov
  1 sibling, 1 reply; 19+ messages in thread
From: Sébastien Dugué @ 2007-02-06  9:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oleg Nesterov, Andrew Morton, linux-aio, Bharata B Rao,
	Christoph Hellwig, Suparna Bhattacharya, Ulrich Drepper,
	Zach Brown, Badari Pulavarty, Jean Pierre Dion


  Andrew, here is another incremental patch which does a bit of cleanup
as well as fixing a possible release on a task ref that was not taken.

  Thanks,

  Sébastien.


From: Sébastien Dugué <sebastien.dugue@bull.net>

	AIO completion signal notification misc fixes and cleanups

  This patches cleans up the notification path and fixes a possible
release on a task ref that was not taken in aio_setup_sigevent().


 aio.c |   15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net>


Index: linux-2.6.20-rc6-mm3/fs/aio.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c	2007-02-05 16:53:43.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c	2007-02-06 09:33:55.000000000 +0100
@@ -469,8 +469,7 @@ static inline void really_put_req(struct
 		kfree(req->ki_iovec);
 
 	/* Release task ref */
-	if (req->ki_notify.notify == SIGEV_THREAD_ID ||
-	    req->ki_notify.notify == SIGEV_SIGNAL)
+	if (req->ki_notify.notify != SIGEV_NONE)
 		put_task_struct(req->ki_notify.target);
 
 	kmem_cache_free(kiocb_cachep, req);
@@ -970,8 +969,14 @@ static long aio_setup_sigevent(struct ai
 	rcu_read_lock();
 	target = sigevent_find_task(&event);
 
-	if (unlikely(!target))
+	if (unlikely(!target)) {
+		/*
+		 * Revert notify to SIGEV_NONE so that really_put_req()
+		 * knows that no ref has been taken on a task.
+		 */
+		notify->notify = SIGEV_NONE;
 		goto out_unlock;
+	}
 
 	/*
 	 * At this point, we know that notify is either SIGEV_SIGNAL or
@@ -996,7 +1001,7 @@ static long aio_setup_sigevent(struct ai
 	return 0;
 
 out_unlock:
-	read_unlock(&tasklist_lock);
+	rcu_read_unlock();
 	return -EINVAL;
 }
 
@@ -1763,7 +1768,7 @@ int fastcall io_submit_one(struct kioctx
 					 (struct sigevent __user *)(unsigned long)
 					 iocb->aio_sigeventp);
 		if (ret)
-			goto out_put_req;
+			goto out_sigqfree;
 	}
 
 	/* Attach this iocb to its lio */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH -mm][AIO] AIO completion signal notification fixes and cleanups
  2007-02-06  9:22             ` [PATCH -mm][AIO] AIO completion signal notification fixes and cleanups Sébastien Dugué
@ 2007-02-06 11:05               ` Oleg Nesterov
  2007-02-06 11:39                 ` Sébastien Dugué
  2007-02-06 11:57                 ` [PATCH -mm][AIO] AIO completion signal notification small cleanup Sébastien Dugué
  0 siblings, 2 replies; 19+ messages in thread
From: Oleg Nesterov @ 2007-02-06 11:05 UTC (permalink / raw)
  To: S?bastien Dugu?
  Cc: linux-kernel, Andrew Morton, linux-aio, Bharata B Rao,
	Christoph Hellwig, Suparna Bhattacharya, Ulrich Drepper,
	Zach Brown, Badari Pulavarty, Jean Pierre Dion

On 02/06, S?bastien Dugu? wrote:
> 
> @@ -970,8 +969,14 @@ static long aio_setup_sigevent(struct ai
>  	rcu_read_lock();
>  	target = sigevent_find_task(&event);
>  
> -	if (unlikely(!target))
> +	if (unlikely(!target)) {
> +		/*
> +		 * Revert notify to SIGEV_NONE so that really_put_req()
> +		 * knows that no ref has been taken on a task.
> +		 */
> +		notify->notify = SIGEV_NONE;
>  		goto out_unlock;
> +	}

Very minor nit, feel free to ignore.

Isn't it better to move "notify->* = event.*;" down, when we know that
target != NULL. Imho, a bit easier to follow. This way we don't need to
reset notify->notify = SIGEV_NONE.

aio_setup_sigevent() relies on the fact that ->notify = SIGEV_NONE on
entry anyway.

Oleg.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH -mm][AIO] AIO completion signal notification fixes and cleanups
  2007-02-06 11:05               ` Oleg Nesterov
@ 2007-02-06 11:39                 ` Sébastien Dugué
  2007-02-06 11:57                 ` [PATCH -mm][AIO] AIO completion signal notification small cleanup Sébastien Dugué
  1 sibling, 0 replies; 19+ messages in thread
From: Sébastien Dugué @ 2007-02-06 11:39 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: linux-kernel, Andrew Morton, linux-aio, Bharata B Rao,
	Christoph Hellwig, Suparna Bhattacharya, Ulrich Drepper,
	Zach Brown, Badari Pulavarty, Jean Pierre Dion

On Tue, 6 Feb 2007 14:05:39 +0300 Oleg Nesterov <oleg@tv-sign.ru> wrote:

> On 02/06, S?bastien Dugu? wrote:
> > 
> > @@ -970,8 +969,14 @@ static long aio_setup_sigevent(struct ai
> >  	rcu_read_lock();
> >  	target = sigevent_find_task(&event);
> >  
> > -	if (unlikely(!target))
> > +	if (unlikely(!target)) {
> > +		/*
> > +		 * Revert notify to SIGEV_NONE so that really_put_req()
> > +		 * knows that no ref has been taken on a task.
> > +		 */
> > +		notify->notify = SIGEV_NONE;
> >  		goto out_unlock;
> > +	}
> 
> Very minor nit, feel free to ignore.
> 
> Isn't it better to move "notify->* = event.*;" down, when we know that
> target != NULL. Imho, a bit easier to follow. This way we don't need to
> reset notify->notify = SIGEV_NONE.
> 
> aio_setup_sigevent() relies on the fact that ->notify = SIGEV_NONE on
> entry anyway.

  Yep, right, it will make things cleaner.

  Thanks,

  Sébastien.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH -mm][AIO] AIO completion signal notification small cleanup
  2007-02-06 11:05               ` Oleg Nesterov
  2007-02-06 11:39                 ` Sébastien Dugué
@ 2007-02-06 11:57                 ` Sébastien Dugué
  1 sibling, 0 replies; 19+ messages in thread
From: Sébastien Dugué @ 2007-02-06 11:57 UTC (permalink / raw)
  To: linux-kernel
  Cc: Oleg Nesterov, Andrew Morton, linux-aio, Bharata B Rao,
	Christoph Hellwig, Suparna Bhattacharya, Ulrich Drepper,
	Zach Brown, Badari Pulavarty, Jean Pierre Dion


  Andrew, one more cleanup to aio_setup_sigevent() to make it more readable.

  Sébastien.


From: Sébastien Dugué <sebastien.dugue@bull.net>

	AIO completion signal notification small cleanup

  This patch cleans up aio_setup_sigevent() to make it more readable.

 aio.c |   15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

Signed-off-by: Sébastien Dugué <sebastien.dugue@bull.net>


Index: linux-2.6.20-rc6-mm3/fs/aio.c
===================================================================
--- linux-2.6.20-rc6-mm3.orig/fs/aio.c	2007-02-06 09:33:55.000000000 +0100
+++ linux-2.6.20-rc6-mm3/fs/aio.c	2007-02-06 12:43:52.000000000 +0100
@@ -962,21 +962,11 @@ static long aio_setup_sigevent(struct ai
 	    event.sigev_notify != SIGEV_THREAD_ID)
 		return -EINVAL;
 
-	notify->notify = event.sigev_notify;
-	notify->signo = event.sigev_signo;
-	notify->value = event.sigev_value;
-
 	rcu_read_lock();
 	target = sigevent_find_task(&event);
 
-	if (unlikely(!target)) {
-		/*
-		 * Revert notify to SIGEV_NONE so that really_put_req()
-		 * knows that no ref has been taken on a task.
-		 */
-		notify->notify = SIGEV_NONE;
+	if (unlikely(!target))
 		goto out_unlock;
-	}
 
 	/*
 	 * At this point, we know that notify is either SIGEV_SIGNAL or
@@ -988,6 +978,9 @@ static long aio_setup_sigevent(struct ai
 	notify->target = target;
 	rcu_read_unlock();
 
+	notify->notify = event.sigev_notify;
+	notify->signo = event.sigev_signo;
+	notify->value = event.sigev_value;
 	notify->sigq = __sigqueue_alloc(current, GFP_KERNEL, 0);
 
 	/*

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2007-02-06 11:58 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-01  9:22 [PATCH -mm 0/7][AIO] - AIO completion signal notification v6 Sébastien Dugué
2007-02-01  9:26 ` [PATCH -mm 1/7][AIO] - Rework compat_sys_io_submit Sébastien Dugué
2007-02-01  9:29 ` [PATCH -mm 2/7][AIO] - fix aio.h includes Sébastien Dugué
2007-02-01  9:30 ` [PATCH -mm 3/7][AIO] - Fix access_ok() checks Sébastien Dugué
2007-02-01  9:30 ` [PATCH -mm 4/7][AIO] - Make good_sigevent() non-static Sébastien Dugué
2007-02-02 18:00   ` Oleg Nesterov
2007-02-05 11:09     ` Sébastien Dugué
2007-02-05 12:18     ` [PATCH -mm 4/7][AIO] Resend " Sébastien Dugué
2007-02-05 13:43       ` Oleg Nesterov
2007-02-05 16:00         ` [PATCH -mm][AIO] Fix AIO completion signal notification possible ref leak Sébastien Dugué
2007-02-05 17:13           ` Oleg Nesterov
2007-02-06  8:31             ` Sébastien Dugué
2007-02-06  9:22             ` [PATCH -mm][AIO] AIO completion signal notification fixes and cleanups Sébastien Dugué
2007-02-06 11:05               ` Oleg Nesterov
2007-02-06 11:39                 ` Sébastien Dugué
2007-02-06 11:57                 ` [PATCH -mm][AIO] AIO completion signal notification small cleanup Sébastien Dugué
2007-02-01  9:31 ` [PATCH -mm 5/7][AIO] - Make __sigqueue_free() and __sigqueue_alloc() non static Sébastien Dugué
2007-02-01  9:31 ` [PATCH -mm 6/7][AIO] - AIO completion signal notification Sébastien Dugué
2007-02-01  9:32 ` [PATCH -mm 7/7][AIO] - Add listio syscall support Sébastien Dugué

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.