All of lore.kernel.org
 help / color / mirror / Atom feed
* sendfile and EAGAIN
@ 2013-02-25 17:22 Ulrich Drepper
  2013-02-25 19:22 ` Eric Dumazet
  2013-03-03  3:53 ` H. Peter Anvin
  0 siblings, 2 replies; 8+ messages in thread
From: Ulrich Drepper @ 2013-02-25 17:22 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Al Viro

When using sendfile with a non-blocking output file descriptor for a
socket the operation can cause a partial write because of capacity
issues.  This is nothing critical and the operation could resume after
the output queue is cleared.  The problem is: there is no way to
determine where to resume.

The system call just returns -EAGAIN without any further indication.
The caller doesn't know what to resend.

And this even though the interface of sendfile would be capable of
communicating this information and the man page (I know, it's not
authoritive) describes this behavior as well.

The problem is probably in a few places, here is one (fs/splice.c):

static ssize_t default_file_splice_write(struct pipe_inode_info *pipe,
                                         struct file *out, loff_t *ppos,
                                         size_t len, unsigned int flags)
{
        ssize_t ret;

        ret = splice_from_pipe(pipe, out, ppos, len, flags, write_pipe_buf);
        if (ret > 0)
                *ppos += ret;

        return ret;
}

Note that *ppos is only updated if the call doesn't fail.  We could
also update the position if ret == -EAGAIN.  This would require
re-architecting the system a bit to either update *ppos in
splice_from_pipe etc or to communicate number of the bytes which are
written from the splice_from_pipe call.  In any case, the result would
be that the caller knows where to resume the operation.

I would argue that this doesn't break the ABI.  In case existing
programs today just resend packages today from the beginning they will
have send an unpredictable number of bytes in the previous sendfile()
call, making the state of the communication unpredictable.

Opinions?  I think as is sendfile() isn't useful with O_NONBLOCK.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sendfile and EAGAIN
  2013-02-25 17:22 sendfile and EAGAIN Ulrich Drepper
@ 2013-02-25 19:22 ` Eric Dumazet
  2013-03-03  1:41   ` Ulrich Drepper
  2013-03-03  3:53 ` H. Peter Anvin
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2013-02-25 19:22 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Linux Kernel Mailing List, Al Viro

On Mon, 2013-02-25 at 12:22 -0500, Ulrich Drepper wrote:
> When using sendfile with a non-blocking output file descriptor for a
> socket the operation can cause a partial write because of capacity
> issues.  This is nothing critical and the operation could resume after
> the output queue is cleared.  The problem is: there is no way to
> determine where to resume.
> 
> The system call just returns -EAGAIN without any further indication.
> The caller doesn't know what to resend.
> 
> And this even though the interface of sendfile would be capable of
> communicating this information and the man page (I know, it's not
> authoritive) describes this behavior as well.
> 
> The problem is probably in a few places, here is one (fs/splice.c):
> 
> static ssize_t default_file_splice_write(struct pipe_inode_info *pipe,
>                                          struct file *out, loff_t *ppos,
>                                          size_t len, unsigned int flags)
> {
>         ssize_t ret;
> 
>         ret = splice_from_pipe(pipe, out, ppos, len, flags, write_pipe_buf);
>         if (ret > 0)
>                 *ppos += ret;
> 
>         return ret;
> }
> 
> Note that *ppos is only updated if the call doesn't fail.  We could
> also update the position if ret == -EAGAIN.  This would require
> re-architecting the system a bit to either update *ppos in
> splice_from_pipe etc or to communicate number of the bytes which are
> written from the splice_from_pipe call.  In any case, the result would
> be that the caller knows where to resume the operation.
> 
> I would argue that this doesn't break the ABI.  In case existing
> programs today just resend packages today from the beginning they will
> have send an unpredictable number of bytes in the previous sendfile()
> call, making the state of the communication unpredictable.
> 
> Opinions?  I think as is sendfile() isn't useful with O_NONBLOCK.
> --

I don't understand the issue.

sendfile() returns -EAGAIN only if no bytes were copied to the socket.

If some bytes were copied, sendfile() returns the number of bytes,
exactly like write() would do for a partial write.

I guess the following should work (well... with better tests)

offset = 0;
while (offset < len) {
   res = sendfile(sock, fd, &offset, len - offset);
   if (res >= 0) {
        offset += res;
   } else {
        if (errno != EAGAIN)
            break;
        wait_some_event();
    }
}





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sendfile and EAGAIN
  2013-02-25 19:22 ` Eric Dumazet
@ 2013-03-03  1:41   ` Ulrich Drepper
  2013-03-03  3:09     ` Eric Dumazet
  2013-03-04 10:28     ` Eric Wong
  0 siblings, 2 replies; 8+ messages in thread
From: Ulrich Drepper @ 2013-03-03  1:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Mailing List, Al Viro

On Mon, Feb 25, 2013 at 2:22 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> I don't understand the issue.
>
> sendfile() returns -EAGAIN only if no bytes were copied to the socket.

There is something wrong/unexpected/...

I have a program which can use either sendfile or send.  When using
sendfile to transmit a large block (I've seen it with 900k) the
sendfile call does not transmit everything.  There receiver gets only
about 600k.  This is the situation when I think I've seen EAGAIN
errors from sendmail but I cannot just now reproduce it.  This is with
sockets of AF_UNIX type.

Are there any limits to take into account?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sendfile and EAGAIN
  2013-03-03  1:41   ` Ulrich Drepper
@ 2013-03-03  3:09     ` Eric Dumazet
  2013-03-03  3:16       ` Ulrich Drepper
  2013-03-04 10:28     ` Eric Wong
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2013-03-03  3:09 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Linux Kernel Mailing List, Al Viro

On Sat, 2013-03-02 at 20:41 -0500, Ulrich Drepper wrote:
> On Mon, Feb 25, 2013 at 2:22 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > I don't understand the issue.
> >
> > sendfile() returns -EAGAIN only if no bytes were copied to the socket.
> 
> There is something wrong/unexpected/...
> 
> I have a program which can use either sendfile or send.  When using
> sendfile to transmit a large block (I've seen it with 900k) the
> sendfile call does not transmit everything.  There receiver gets only
> about 600k.  This is the situation when I think I've seen EAGAIN
> errors from sendmail but I cannot just now reproduce it.  This is with
> sockets of AF_UNIX type.

There is no real sendfile() support for AF_UNIX.

It does a copy.

( sock_no_sendpage() fallback )

> 
> Are there any limits to take into account?


This is totally expected that sendfile() doesn't queue the whole file,
if the transport is slower than the producer.

You cant ask for non blocking operation and expect sendfile() storing
Gigabytes of data in the kernel, even if its only meta data.

Using non blocking IO means the sender (and the receiver) must be able
to perform several operations, as long as the whole transfert is not
finished.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sendfile and EAGAIN
  2013-03-03  3:09     ` Eric Dumazet
@ 2013-03-03  3:16       ` Ulrich Drepper
  0 siblings, 0 replies; 8+ messages in thread
From: Ulrich Drepper @ 2013-03-03  3:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Mailing List, Al Viro

On Sat, Mar 2, 2013 at 10:09 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Using non blocking IO means the sender (and the receiver) must be able
> to perform several operations, as long as the whole transfert is not
> finished.

Certainly, and this is implemented.  But the receiver never gets the
rest of the data while the sender (most of the time) gets notified
that everything is sent.

I don't have a reduced test case yet.  Hopefully I'll get to it
sometime soon.  For now I worked around it by not using sendfile.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sendfile and EAGAIN
  2013-02-25 17:22 sendfile and EAGAIN Ulrich Drepper
  2013-02-25 19:22 ` Eric Dumazet
@ 2013-03-03  3:53 ` H. Peter Anvin
  1 sibling, 0 replies; 8+ messages in thread
From: H. Peter Anvin @ 2013-03-03  3:53 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Linux Kernel Mailing List, Al Viro

On 02/25/2013 09:22 AM, Ulrich Drepper wrote:
> When using sendfile with a non-blocking output file descriptor for a
> socket the operation can cause a partial write because of capacity
> issues.  This is nothing critical and the operation could resume after
> the output queue is cleared.  The problem is: there is no way to
> determine where to resume.
> 
> The system call just returns -EAGAIN without any further indication.
> The caller doesn't know what to resend.

This is IMO just a bug.  EAGAIN should only be used in the zero-byte
case and in other cases it should return the number of bytes
transferred, just like all the read/write system calls.

This was clearly also the intent.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sendfile and EAGAIN
  2013-03-03  1:41   ` Ulrich Drepper
  2013-03-03  3:09     ` Eric Dumazet
@ 2013-03-04 10:28     ` Eric Wong
  2013-03-05  0:11       ` Ulrich Drepper
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Wong @ 2013-03-04 10:28 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Eric Dumazet, Linux Kernel Mailing List, Al Viro

Ulrich Drepper <drepper@gmail.com> wrote:
> On Mon, Feb 25, 2013 at 2:22 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > I don't understand the issue.
> >
> > sendfile() returns -EAGAIN only if no bytes were copied to the socket.
> 
> There is something wrong/unexpected/...
> 
> I have a program which can use either sendfile or send.  When using
> sendfile to transmit a large block (I've seen it with 900k) the
> sendfile call does not transmit everything.  There receiver gets only
> about 600k.  This is the situation when I think I've seen EAGAIN
> errors from sendmail but I cannot just now reproduce it.  This is with
> sockets of AF_UNIX type.

If you manage to reproduce it, can you pass an offset to sendfile() and
see if the offset changed when you get EAGAIN?

Also, which kernel version are you using?  Perhaps it's triggered
by memory pressure.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: sendfile and EAGAIN
  2013-03-04 10:28     ` Eric Wong
@ 2013-03-05  0:11       ` Ulrich Drepper
  0 siblings, 0 replies; 8+ messages in thread
From: Ulrich Drepper @ 2013-03-05  0:11 UTC (permalink / raw)
  To: Eric Wong; +Cc: Eric Dumazet, Linux Kernel Mailing List, Al Viro

On Mon, Mar 4, 2013 at 5:28 AM, Eric Wong <normalperson@yhbt.net> wrote:
> If you manage to reproduce it, can you pass an offset to sendfile() and
> see if the offset changed when you get EAGAIN?

I did that and didn't see the offset to change.


> Also, which kernel version are you using?  Perhaps it's triggered
> by memory pressure.

That was with the RHEL6.2 kernel.  I don't have the details as to what
patches are included on top of 2.6.32...

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-03-05  0:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-25 17:22 sendfile and EAGAIN Ulrich Drepper
2013-02-25 19:22 ` Eric Dumazet
2013-03-03  1:41   ` Ulrich Drepper
2013-03-03  3:09     ` Eric Dumazet
2013-03-03  3:16       ` Ulrich Drepper
2013-03-04 10:28     ` Eric Wong
2013-03-05  0:11       ` Ulrich Drepper
2013-03-03  3:53 ` H. Peter Anvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.