* io_cancel return value change
@ 2014-08-15 19:27 Jeff Moyer
2014-08-18 14:49 ` Benjamin LaHaise
0 siblings, 1 reply; 5+ messages in thread
From: Jeff Moyer @ 2014-08-15 19:27 UTC (permalink / raw)
To: Kent, Ben; +Cc: Zach, linux-fsdevel
Hi, Kent,
Sorry to bring this up long after your patches have gone in--I am
remiss. In the following commit, you changed what was returned by the
io_cancel system call:
commit bec68faaf3ba74ed0dcd5dc3a881b30aec542973
Author: Kent Overstreet <koverstreet@google.com>
Date: Mon May 13 14:45:08 2013 -0700
aio: io_cancel() no longer returns the io_event
Originally, io_event() was documented to return the io_event if
cancellation succeeded - the io_event wouldn't be delivered via the ring
buffer like it normally would.
But this isn't what the implementation was actually doing; the only
driver implementing cancellation, the usb gadget code, never returned an
io_event in its cancel function. And aio_complete() was recently changed
to no longer suppress event delivery if the kiocb had been cancelled.
This gets rid of the unused io_event argument to kiocb_cancel() and
kiocb->ki_cancel(), and changes io_cancel() to return -EINPROGRESS if
kiocb->ki_cancel() returned success.
Also tweak the refcounting in kiocb_cancel() to make more sense.
You made a passing reference to previous aio_complete changes as well
that I can't easily track down (it wasn't mentioned in the commit log
that I can tell, and it isn't obvious to me by looking through the patch
set).
At the very least, you need to also include man page updates for this.
Honestly, though, we should not have let this patch set go in as is. If
any software actually wrote code to handle cancellation, this change
would surely cause confusion. That said, I think you got away with this
for two reasons:
1) no drivers implements a cancellation routine (except the crazy usb
gadgetfs thing)
2) userspace code may not even call io_cancel (who knows?), and if they
do, well, see point 1.
My preference would be for this behavior change to be reverted. You
could make a case for keeping the change and updating the
documentation. I'm not convinced 100% one way or the other. So what do
you (and others) think?
Cheers,
Jeff
p.s. I did do a quick google search, and the only caller of io_cancel I
could find was a test harness. But, that's certainly not proof that
nobody does it!
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: io_cancel return value change
2014-08-15 19:27 io_cancel return value change Jeff Moyer
@ 2014-08-18 14:49 ` Benjamin LaHaise
2014-08-18 17:33 ` Jeff Moyer
0 siblings, 1 reply; 5+ messages in thread
From: Benjamin LaHaise @ 2014-08-18 14:49 UTC (permalink / raw)
To: Jeff Moyer; +Cc: Kent, Zach, linux-fsdevel, linux-aio
On Fri, Aug 15, 2014 at 03:27:53PM -0400, Jeff Moyer wrote:
> Hi, Kent,
>
> Sorry to bring this up long after your patches have gone in--I am
> remiss. In the following commit, you changed what was returned by the
> io_cancel system call:
>
> commit bec68faaf3ba74ed0dcd5dc3a881b30aec542973
> Author: Kent Overstreet <koverstreet@google.com>
> Date: Mon May 13 14:45:08 2013 -0700
>
> aio: io_cancel() no longer returns the io_event
>
> Originally, io_event() was documented to return the io_event if
> cancellation succeeded - the io_event wouldn't be delivered via the ring
> buffer like it normally would.
>
> But this isn't what the implementation was actually doing; the only
> driver implementing cancellation, the usb gadget code, never returned an
> io_event in its cancel function. And aio_complete() was recently changed
> to no longer suppress event delivery if the kiocb had been cancelled.
>
> This gets rid of the unused io_event argument to kiocb_cancel() and
> kiocb->ki_cancel(), and changes io_cancel() to return -EINPROGRESS if
> kiocb->ki_cancel() returned success.
>
> Also tweak the refcounting in kiocb_cancel() to make more sense.
>
> You made a passing reference to previous aio_complete changes as well
> that I can't easily track down (it wasn't mentioned in the commit log
> that I can tell, and it isn't obvious to me by looking through the patch
> set).
>
> At the very least, you need to also include man page updates for this.
> Honestly, though, we should not have let this patch set go in as is. If
Yes, the man page needs to be updated, but this change in the API is
backwards compatible. The existing man page says that it returns 0 on
sucess and returns an io_event -- anything other than a 0 return implies
that a completion event will be returned later. Cancellation has never
provided a 100% guarantee that it will succeed without providing a
completion event.
> any software actually wrote code to handle cancellation, this change
> would surely cause confusion. That said, I think you got away with this
> for two reasons:
> 1) no drivers implements a cancellation routine (except the crazy usb
> gadgetfs thing)
> 2) userspace code may not even call io_cancel (who knows?), and if they
> do, well, see point 1.
I implemented some test code to verify that the new approach to cancel
works. I have a bunch of other changes against 3.4 that need to be ported
to 3.17, but I just haven't had time to work on them. Below are 3 of the
patches implementing async pipe reads that need forward porting and should
be relatively easy to merge.
> My preference would be for this behavior change to be reverted. You
> could make a case for keeping the change and updating the
> documentation. I'm not convinced 100% one way or the other. So what do
> you (and others) think?
I'm not convinced it needs to be reverted. Anything that's not a file or
block device requires cancellation support, and the in-kernel api changes
were meant to better handle various race conditions.
> Cheers,
> Jeff
>
> p.s. I did do a quick google search, and the only caller of io_cancel I
> could find was a test harness. But, that's certainly not proof that
> nobody does it!
You're ignoring the fact that cancellation is used internally to the kernel
on process exit or io context destruction. Cancel methods have to work, as
otherwise a process will hang at exit.
-ben
--
"Thought is the essence of where you are now."
commit fff36ea0b18906d0a2254b31b1c2ebf05453ac74
Author: Benjamin LaHaise <bcrl@kvack.org>
Date: Tue Jul 23 13:05:44 2013 -0400
aio/pipe: break out the core __pipe_read() from pipe_read() to prep for aio
Move code around from pipe_read() to a helper function __pipe_read() in
preparation for adding async read support to pipes.
diff --git a/fs/pipe.c b/fs/pipe.c
index 1667e6f..2b596ca 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -356,6 +356,103 @@ static const struct pipe_buf_operations packet_pipe_buf_ops = {
.get = generic_pipe_buf_get,
};
+static int __pipe_read(struct pipe_inode_info *pipe, size_t *total_len,
+ struct iovec *iov, ssize_t *ret, int *do_wakeup,
+ struct file *filp)
+{
+ int bufs;
+try_again:
+ bufs = pipe->nrbufs;
+ if (bufs) {
+ int curbuf = pipe->curbuf;
+ struct pipe_buffer *buf = pipe->bufs + curbuf;
+ const struct pipe_buf_operations *ops = buf->ops;
+ void *addr;
+ size_t chars = buf->len;
+ int error, atomic;
+
+ if (chars > *total_len)
+ chars = *total_len;
+
+ error = ops->confirm(pipe, buf);
+ if (error) {
+ if (!*ret)
+ *ret = error;
+ goto out_break;
+ }
+
+ atomic = !iov_fault_in_pages_write(iov, chars);
+redo:
+ addr = ops->map(pipe, buf, atomic);
+ error = pipe_iov_copy_to_user(iov, addr + buf->offset, chars, atomic);
+ ops->unmap(pipe, buf, addr);
+ if (unlikely(error)) {
+ /*
+ * Just retry with the slow path if we failed.
+ */
+ if (atomic) {
+ atomic = 0;
+ goto redo;
+ }
+ if (!*ret)
+ *ret = error;
+ goto out_break;
+ }
+ *ret += chars;
+ buf->offset += chars;
+ buf->len -= chars;
+
+ /* Was it a packet buffer? Clean up and exit */
+ if (buf->flags & PIPE_BUF_FLAG_PACKET) {
+ *total_len = chars;
+ buf->len = 0;
+ }
+
+ if (!buf->len) {
+ buf->ops = NULL;
+ ops->release(pipe, buf);
+ curbuf = (curbuf + 1) & (pipe->buffers - 1);
+ pipe->curbuf = curbuf;
+ pipe->nrbufs = --bufs;
+ *do_wakeup = 1;
+ }
+ *total_len -= chars;
+ if (!*total_len)
+ goto out_break; /* common path: read succeeded */
+ }
+ if (bufs) /* More to do? */
+ goto try_again;
+ if (!pipe->writers)
+ goto out_break;
+ if (!pipe->waiting_writers) {
+ /* syscall merging: Usually we must not sleep
+ * if O_NONBLOCK is set, or if we got some data.
+ * But if a writer sleeps in kernel space, then
+ * we can wait for that data without violating POSIX.
+ */
+ if (*ret)
+ goto out_break;
+ if (filp->f_flags & O_NONBLOCK) {
+ *ret = -EAGAIN;
+ goto out_break;
+ }
+ }
+ if (signal_pending(current)) {
+ if (!*ret)
+ *ret = -ERESTARTSYS;
+ goto out_break;
+ }
+ if (*do_wakeup) {
+ wake_up_interruptible_sync_poll(&pipe->wait, POLLOUT | POLLWRNORM);
+ kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
+ }
+
+ return 0;
+
+out_break:
+ return 1;
+}
+
static ssize_t
pipe_read(struct kiocb *iocb, const struct iovec *_iov,
unsigned long nr_segs, loff_t pos)
@@ -378,90 +475,8 @@ pipe_read(struct kiocb *iocb, const struct iovec *_iov,
mutex_lock(&inode->i_mutex);
pipe = inode->i_pipe;
for (;;) {
- int bufs = pipe->nrbufs;
- if (bufs) {
- int curbuf = pipe->curbuf;
- struct pipe_buffer *buf = pipe->bufs + curbuf;
- const struct pipe_buf_operations *ops = buf->ops;
- void *addr;
- size_t chars = buf->len;
- int error, atomic;
-
- if (chars > total_len)
- chars = total_len;
-
- error = ops->confirm(pipe, buf);
- if (error) {
- if (!ret)
- ret = error;
- break;
- }
-
- atomic = !iov_fault_in_pages_write(iov, chars);
-redo:
- addr = ops->map(pipe, buf, atomic);
- error = pipe_iov_copy_to_user(iov, addr + buf->offset, chars, atomic);
- ops->unmap(pipe, buf, addr);
- if (unlikely(error)) {
- /*
- * Just retry with the slow path if we failed.
- */
- if (atomic) {
- atomic = 0;
- goto redo;
- }
- if (!ret)
- ret = error;
- break;
- }
- ret += chars;
- buf->offset += chars;
- buf->len -= chars;
-
- /* Was it a packet buffer? Clean up and exit */
- if (buf->flags & PIPE_BUF_FLAG_PACKET) {
- total_len = chars;
- buf->len = 0;
- }
-
- if (!buf->len) {
- buf->ops = NULL;
- ops->release(pipe, buf);
- curbuf = (curbuf + 1) & (pipe->buffers - 1);
- pipe->curbuf = curbuf;
- pipe->nrbufs = --bufs;
- do_wakeup = 1;
- }
- total_len -= chars;
- if (!total_len)
- break; /* common path: read succeeded */
- }
- if (bufs) /* More to do? */
- continue;
- if (!pipe->writers)
+ if (__pipe_read(pipe, &total_len, iov, &ret, &do_wakeup, filp))
break;
- if (!pipe->waiting_writers) {
- /* syscall merging: Usually we must not sleep
- * if O_NONBLOCK is set, or if we got some data.
- * But if a writer sleeps in kernel space, then
- * we can wait for that data without violating POSIX.
- */
- if (ret)
- break;
- if (filp->f_flags & O_NONBLOCK) {
- ret = -EAGAIN;
- break;
- }
- }
- if (signal_pending(current)) {
- if (!ret)
- ret = -ERESTARTSYS;
- break;
- }
- if (do_wakeup) {
- wake_up_interruptible_sync_poll(&pipe->wait, POLLOUT | POLLWRNORM);
- kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
- }
pipe_wait(pipe);
}
mutex_unlock(&inode->i_mutex);
commit 187de6fbe66fd70485d53bb45a7a034ad747a7cb
Author: Benjamin LaHaise <bcrl@kvack.org>
Date: Tue Jul 23 14:57:45 2013 -0400
aio, pipe: implement async pipe read
Add code to pipe_read to implement async reads of pipes without using the threaded
aio core. This will allow optimizations to reduce the number of context switches
to be made.
diff --git a/fs/pipe.c b/fs/pipe.c
index 2b596ca..fa0969c 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -103,6 +103,15 @@ void pipe_wait(struct pipe_inode_info *pipe)
pipe_lock(pipe);
}
+/* Drop the inode semaphore and wait for a pipe event, atomically */
+ssize_t async_pipe_wait(struct pipe_inode_info *pipe, struct kiocb *iocb)
+{
+ BUG_ON(pipe->read_iocb);
+ pipe->read_iocb = iocb;
+ pipe_unlock(pipe);
+ return -EIOCBRETRY;
+}
+
static int
pipe_iov_copy_from_user(void *to, struct iovec *iov, unsigned long len,
int atomic)
@@ -443,6 +452,10 @@ redo:
goto out_break;
}
if (*do_wakeup) {
+ struct kiocb *read_iocb = pipe->read_iocb;
+ pipe->read_iocb = NULL;
+ if (read_iocb)
+ kick_iocb(read_iocb);
wake_up_interruptible_sync_poll(&pipe->wait, POLLOUT | POLLWRNORM);
kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
}
@@ -453,6 +466,36 @@ out_break:
return 1;
}
+int pipe_read_cancel(struct kiocb *iocb, struct io_event *event)
+{
+ struct file *filp = iocb->ki_filp;
+ struct inode *inode = filp->f_path.dentry->d_inode;
+ struct pipe_inode_info *pipe;
+ int did_cancel = 0;
+
+ mutex_lock(&inode->i_mutex);
+ pipe = inode->i_pipe;
+
+ if (pipe->read_iocb == iocb) {
+ pipe->read_iocb = NULL;
+ did_cancel = 1;
+ }
+
+ mutex_unlock(&inode->i_mutex);
+
+ if (did_cancel) {
+ ssize_t ret = (long)iocb->private2;
+ kiocbClearCancelled(iocb);
+ aio_put_req(iocb);
+ if (ret == 0)
+ ret = -EINTR;
+ aio_complete(iocb, ret, 0);
+ return -EINPROGRESS;
+ }
+
+ return -EAGAIN;
+}
+
static ssize_t
pipe_read(struct kiocb *iocb, const struct iovec *_iov,
unsigned long nr_segs, loff_t pos)
@@ -463,6 +506,7 @@ pipe_read(struct kiocb *iocb, const struct iovec *_iov,
int do_wakeup;
ssize_t ret;
struct iovec *iov = (struct iovec *)_iov;
+ struct kiocb *read_iocb = NULL;
size_t total_len;
total_len = iov_length(iov, nr_segs);
@@ -472,17 +516,42 @@ pipe_read(struct kiocb *iocb, const struct iovec *_iov,
do_wakeup = 0;
ret = 0;
+
+ if (!is_sync_kiocb(iocb)) {
+ if (iocb->ki_cancel == NULL) {
+ kiocb_set_cancel_fn(iocb, pipe_read_cancel);
+ iocb->private = (void *)total_len;
+ iocb->private2 = (void *)0;
+ iocb->ki_pos = 0;
+ }
+ total_len = (long)iocb->private;
+ ret = (long)iocb->private2;
+ do_wakeup = iocb->ki_pos;
+ }
+
mutex_lock(&inode->i_mutex);
pipe = inode->i_pipe;
for (;;) {
if (__pipe_read(pipe, &total_len, iov, &ret, &do_wakeup, filp))
break;
+ if (!is_sync_kiocb(iocb)) {
+ iocb->private = (void *)total_len;
+ iocb->private2 = (void *)ret;
+ iocb->ki_pos = do_wakeup;
+ return async_pipe_wait(pipe, iocb);
+ }
pipe_wait(pipe);
}
+ if (do_wakeup) {
+ read_iocb = pipe->read_iocb;
+ pipe->read_iocb = NULL;
+ }
mutex_unlock(&inode->i_mutex);
/* Signal writers asynchronously that there is more room. */
if (do_wakeup) {
+ if (read_iocb)
+ kick_iocb(read_iocb);
wake_up_interruptible_sync_poll(&pipe->wait, POLLOUT | POLLWRNORM);
kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
}
@@ -506,6 +575,7 @@ pipe_write(struct kiocb *iocb, const struct iovec *_iov,
ssize_t ret;
int do_wakeup;
struct iovec *iov = (struct iovec *)_iov;
+ struct kiocb *read_iocb = NULL;
size_t total_len;
ssize_t chars;
@@ -655,6 +725,11 @@ redo2:
break;
}
if (do_wakeup) {
+ if (pipe->read_iocb) {
+ read_iocb = pipe->read_iocb;
+ pipe->read_iocb = NULL;
+ kick_iocb(read_iocb);
+ }
wake_up_interruptible_sync_poll(&pipe->wait, POLLIN | POLLRDNORM);
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
do_wakeup = 0;
@@ -664,8 +739,14 @@ redo2:
pipe->waiting_writers--;
}
out:
+ if (do_wakeup) {
+ read_iocb = pipe->read_iocb;
+ pipe->read_iocb = NULL;
+ }
mutex_unlock(&inode->i_mutex);
if (do_wakeup) {
+ if (read_iocb)
+ kick_iocb(read_iocb);
wake_up_interruptible_sync_poll(&pipe->wait, POLLIN | POLLRDNORM);
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
}
@@ -748,6 +829,7 @@ pipe_poll(struct file *filp, poll_table *wait)
static int
pipe_release(struct inode *inode, int decr, int decw)
{
+ struct kiocb *read_iocb = NULL;
struct pipe_inode_info *pipe;
mutex_lock(&inode->i_mutex);
@@ -758,12 +840,17 @@ pipe_release(struct inode *inode, int decr, int decw)
if (!pipe->readers && !pipe->writers) {
free_pipe_info(inode);
} else {
+ read_iocb = pipe->read_iocb;
+ pipe->read_iocb = NULL;
wake_up_interruptible_sync_poll(&pipe->wait, POLLIN | POLLOUT | POLLRDNORM | POLLWRNORM | POLLERR | POLLHUP);
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
}
mutex_unlock(&inode->i_mutex);
+ if (read_iocb)
+ kick_iocb(read_iocb);
+
return 0;
}
@@ -909,6 +996,7 @@ const struct file_operations read_pipefifo_fops = {
.open = pipe_read_open,
.release = pipe_read_release,
.fasync = pipe_read_fasync,
+ .use_aio_read = true,
};
const struct file_operations write_pipefifo_fops = {
@@ -934,6 +1022,7 @@ const struct file_operations rdwr_pipefifo_fops = {
.open = pipe_rdwr_open,
.release = pipe_rdwr_release,
.fasync = pipe_rdwr_fasync,
+ .use_aio_read = true,
};
struct pipe_inode_info * alloc_pipe_info(struct inode *inode)
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index e1ac1ce..d404e796 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -55,6 +55,7 @@ struct pipe_inode_info {
struct fasync_struct *fasync_writers;
struct inode *inode;
struct pipe_buffer *bufs;
+ struct kiocb *read_iocb;
};
/*
commit 0ff94f4f7612e2161cacf872565df22758190918
Author: Benjamin LaHaise <bcrl@kvack.org>
Date: Tue Jul 23 15:48:20 2013 -0400
aio/pipe: reduce context switches when async pipe read is in same mm as write
For applications that make use of a pipe internally for waking internal
state machines, an async pipe read may occur in the same address space as
the write to wake up that pipe. In this case, 2 context switches can be
eliminated by completing the async read directly in the pipe writer. On
one application making use of this facilty, this results in the number of
context switches per second dropping from approximately 35k to 22k.
diff --git a/fs/pipe.c b/fs/pipe.c
index fa0969c..42b1d3c 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -565,6 +565,43 @@ static inline int is_packetized(struct file *file)
return (file->f_flags & O_DIRECT) != 0;
}
+static void pipe_maybe_complete_aio_read(struct pipe_inode_info *pipe)
+{
+ struct kiocb *read_iocb = pipe->read_iocb;
+ pipe->read_iocb = NULL;
+ if (!read_iocb)
+ return;
+ /* Is an awaiting read in the same address space as the current
+ * process? If so, complete the read to avoid the context switch
+ * to the aio work queue.
+ */
+ if (!is_sync_kiocb(read_iocb) &&
+ (read_iocb->ki_ctx->mm == current->mm)) {
+ int do_wakeup;
+ ssize_t ret;
+ size_t total_len;
+
+ total_len = (long)read_iocb->private;
+ ret = (long)read_iocb->private2;
+ do_wakeup = read_iocb->ki_pos;
+
+ for (;;) {
+ if (__pipe_read(pipe, &total_len, read_iocb->ki_iovec,
+ &ret, &do_wakeup, read_iocb->ki_filp))
+ break;
+ read_iocb->private = (void *)total_len;
+ read_iocb->private2 = (void *)ret;
+ read_iocb->ki_pos = do_wakeup;
+ pipe->read_iocb = read_iocb;
+ return;
+ }
+ aio_complete(read_iocb, ret, 0);
+ return;
+ }
+
+ kick_iocb(read_iocb);
+}
+
static ssize_t
pipe_write(struct kiocb *iocb, const struct iovec *_iov,
unsigned long nr_segs, loff_t ppos)
@@ -725,11 +762,7 @@ redo2:
break;
}
if (do_wakeup) {
- if (pipe->read_iocb) {
- read_iocb = pipe->read_iocb;
- pipe->read_iocb = NULL;
- kick_iocb(read_iocb);
- }
+ pipe_maybe_complete_aio_read(pipe);
wake_up_interruptible_sync_poll(&pipe->wait, POLLIN | POLLRDNORM);
kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
do_wakeup = 0;
@@ -740,6 +773,7 @@ redo2:
}
out:
if (do_wakeup) {
+ pipe_maybe_complete_aio_read(pipe);
read_iocb = pipe->read_iocb;
pipe->read_iocb = NULL;
}
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: io_cancel return value change
2014-08-18 14:49 ` Benjamin LaHaise
@ 2014-08-18 17:33 ` Jeff Moyer
2014-08-18 18:15 ` Benjamin LaHaise
0 siblings, 1 reply; 5+ messages in thread
From: Jeff Moyer @ 2014-08-18 17:33 UTC (permalink / raw)
To: Benjamin LaHaise; +Cc: Kent, Zach, linux-fsdevel, linux-aio
Benjamin LaHaise <bcrl@kvack.org> writes:
>> At the very least, you need to also include man page updates for this.
>> Honestly, though, we should not have let this patch set go in as is. If
>
> Yes, the man page needs to be updated, but this change in the API is
> backwards compatible. The existing man page says that it returns 0 on
> sucess and returns an io_event -- anything other than a 0 return implies
> that a completion event will be returned later. Cancellation has never
> provided a 100% guarantee that it will succeed without providing a
> completion event.
The man page states:
If the AIO context is found, the event will be canceled and then
copied into the memory pointed to by result without being placed
into the completion queue.
That sounds pretty definitive to me.
>> any software actually wrote code to handle cancellation, this change
>> would surely cause confusion. That said, I think you got away with this
>> for two reasons:
>> 1) no drivers implements a cancellation routine (except the crazy usb
>> gadgetfs thing)
>> 2) userspace code may not even call io_cancel (who knows?), and if they
>> do, well, see point 1.
>
> I implemented some test code to verify that the new approach to cancel
> works.
I have no doubt it works, Ben. :) My main concern is a change to the
kernel<->user ABI.
>> My preference would be for this behavior change to be reverted. You
>> could make a case for keeping the change and updating the
>> documentation. I'm not convinced 100% one way or the other. So what do
>> you (and others) think?
>
> I'm not convinced it needs to be reverted. Anything that's not a file or
> block device requires cancellation support, and the in-kernel api changes
> were meant to better handle various race conditions.
I don't understand why you're pitting this change against working
cancelation support. They seem to be separate issues to me.
>> p.s. I did do a quick google search, and the only caller of io_cancel I
>> could find was a test harness. But, that's certainly not proof that
>> nobody does it!
>
> You're ignoring the fact that cancellation is used internally to the kernel
> on process exit or io context destruction. Cancel methods have to work, as
> otherwise a process will hang at exit.
I was not ignoring that at all. We don't provide backwards
compatibility for kernel modules, and the modules in-tree are changed
when interfaces change. I'm really only concerned about whether we
think it matters that the kernel<->user ABI was changed.
I think at some level we're talking past each other. Hopefully I've
clarified my questions.
Cheers,
Jeff
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: io_cancel return value change
2014-08-18 17:33 ` Jeff Moyer
@ 2014-08-18 18:15 ` Benjamin LaHaise
2014-08-18 18:25 ` Jeff Moyer
0 siblings, 1 reply; 5+ messages in thread
From: Benjamin LaHaise @ 2014-08-18 18:15 UTC (permalink / raw)
To: Jeff Moyer; +Cc: Kent, Zach, linux-fsdevel, linux-aio
On Mon, Aug 18, 2014 at 01:33:05PM -0400, Jeff Moyer wrote:
> Benjamin LaHaise <bcrl@kvack.org> writes:
>
> >> At the very least, you need to also include man page updates for this.
> >> Honestly, though, we should not have let this patch set go in as is. If
> >
> > Yes, the man page needs to be updated, but this change in the API is
> > backwards compatible. The existing man page says that it returns 0 on
> > sucess and returns an io_event -- anything other than a 0 return implies
> > that a completion event will be returned later. Cancellation has never
> > provided a 100% guarantee that it will succeed without providing a
> > completion event.
>
> The man page states:
>
> If the AIO context is found, the event will be canceled and then
> copied into the memory pointed to by result without being placed
> into the completion queue.
>
> That sounds pretty definitive to me.
Cancellation is inherently racy. It is and always has been possible for
an attempt at cancellation to fail, or for the attempt at cancellation to
suceed in the bigger picture yet have the io_cancel() call fail (and cause
an event to be delivered). Applications always have and always will have
to handle an io_cancel() call potentially failing with the kernel delivering
a completion event. You have to read the io_cancel() man page with that
understanding in place.
> I have no doubt it works, Ben. :) My main concern is a change to the
> kernel<->user ABI.
As I said above: an application always has and always will have to handle
the case where an io_cancel() returns failure. An application that does
handle an error return from io_cancel() is broken, as the condition can
occur when the cancel races with the kernel calling aio_complete().
> I was not ignoring that at all. We don't provide backwards
> compatibility for kernel modules, and the modules in-tree are changed
> when interfaces change. I'm really only concerned about whether we
> think it matters that the kernel<->user ABI was changed.
And this change is ABI compatible. Applications aren't being made to handle
any failures they haven't already had to cope with. If an application can't
handle a transient io_cancel() failure, then the app is buggy and needs to
be fixed since that condition can already occur when the cancel races with
completion.
-ben
> I think at some level we're talking past each other. Hopefully I've
> clarified my questions.
>
> Cheers,
> Jeff
--
"Thought is the essence of where you are now."
--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org. For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: io_cancel return value change
2014-08-18 18:15 ` Benjamin LaHaise
@ 2014-08-18 18:25 ` Jeff Moyer
0 siblings, 0 replies; 5+ messages in thread
From: Jeff Moyer @ 2014-08-18 18:25 UTC (permalink / raw)
To: Benjamin LaHaise; +Cc: Kent, Zach, linux-fsdevel, linux-aio
Benjamin LaHaise <bcrl@kvack.org> writes:
> Cancellation is inherently racy. It is and always has been possible for
> an attempt at cancellation to fail, or for the attempt at cancellation to
> suceed in the bigger picture yet have the io_cancel() call fail (and cause
> an event to be delivered). Applications always have and always will have
> to handle an io_cancel() call potentially failing with the kernel delivering
> a completion event. You have to read the io_cancel() man page with that
> understanding in place.
Sorry, Ben, Zach just pointed out what I was so obviously missing. From
the changelog:
This gets rid of the unused io_event argument to kiocb_cancel() and
kiocb->ki_cancel(), and changes io_cancel() to return -EINPROGRESS if
kiocb->ki_cancel() returned success.
Somehow I managed to skip right over that change to return
-EINPROGRESS. *facepalm*
Cheers,
Jeff
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-08-18 18:25 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-15 19:27 io_cancel return value change Jeff Moyer
2014-08-18 14:49 ` Benjamin LaHaise
2014-08-18 17:33 ` Jeff Moyer
2014-08-18 18:15 ` Benjamin LaHaise
2014-08-18 18:25 ` Jeff Moyer
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.