linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sendfile
@ 2003-04-30 14:28 Pål Halvorsen
  2003-04-30 16:51 ` sendfile bert hubert
  0 siblings, 1 reply; 22+ messages in thread
From: Pål Halvorsen @ 2003-04-30 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: paalh

Hi!

Does sendfile support UDP connections (SOCK_DGRAM)?

Does sendfile remove ALL in-memory data copy operations?

PS! Please cc me as I'm not currently is a member of the list.

Best regards,
-ph


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-04-30 14:28 sendfile Pål Halvorsen
@ 2003-04-30 16:51 ` bert hubert
  2003-04-30 19:12   ` sendfile Pål Halvorsen
  0 siblings, 1 reply; 22+ messages in thread
From: bert hubert @ 2003-04-30 16:51 UTC (permalink / raw)
  To: P?l Halvorsen; +Cc: linux-kernel

On Wed, Apr 30, 2003 at 04:28:32PM +0200, P?l Halvorsen wrote:
> Hi!
> 
> Does sendfile support UDP connections (SOCK_DGRAM)?

Try it. I bet it doesn't do so, and certainly not usably. Blasting UDP
frames is seldomly useful without checks, like NFS performs.

> Does sendfile remove ALL in-memory data copy operations?

Depends. With some network adaptors it might. Definition of 'zero-copy' is
somewhat misty. Some variants of zero-copy would actually be called
'one-copy' or 'minus-one-copy' in other contexts.

Regards,

bert

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-04-30 16:51 ` sendfile bert hubert
@ 2003-04-30 19:12   ` Pål Halvorsen
  2003-04-30 19:28     ` sendfile bert hubert
  0 siblings, 1 reply; 22+ messages in thread
From: Pål Halvorsen @ 2003-04-30 19:12 UTC (permalink / raw)
  To: bert hubert; +Cc: linux-kernel, Pål Halvorsen

On Wed, 30 Apr 2003, bert hubert wrote:

> > Does sendfile support UDP connections (SOCK_DGRAM)?
>
> Try it. I bet it doesn't do so, and certainly not usably. Blasting UDP
> frames is seldomly useful without checks, like NFS performs.

It could be useful for applications like streaming video where other
protocols on top provide additional functionality or in a multicast
session where TCP migth not be appropriate.

> > Does sendfile remove ALL in-memory data copy operations?
>
> Depends. With some network adaptors it might. Definition of 'zero-copy' is
> somewhat misty. Some variants of zero-copy would actually be called
> 'one-copy' or 'minus-one-copy' in other contexts.

But should not the 2.4.X kernels have support for chained sk_buffs (like
the BSD mbufs) meaning that support for scatter-gatter I/O from the NIC
should be unneccessary to support zero-copy (i.e., NO in-memory data
copy operations)?

> Regards,
>
> bert

Cheers,
-ph

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-04-30 19:12   ` sendfile Pål Halvorsen
@ 2003-04-30 19:28     ` bert hubert
  2003-04-30 21:57       ` sendfile Pål Halvorsen
  0 siblings, 1 reply; 22+ messages in thread
From: bert hubert @ 2003-04-30 19:28 UTC (permalink / raw)
  To: P?l Halvorsen; +Cc: linux-kernel

On Wed, Apr 30, 2003 at 09:12:17PM +0200, P?l Halvorsen wrote:

> It could be useful for applications like streaming video where other
> protocols on top provide additional functionality or in a multicast
> session where TCP migth not be appropriate.

sendfile on UDP would try to send gigabits per second over ppp0...

> But should not the 2.4.X kernels have support for chained sk_buffs (like
> the BSD mbufs) meaning that support for scatter-gatter I/O from the NIC
> should be unneccessary to support zero-copy (i.e., NO in-memory data
> copy operations)?

No clue what you mean over here. Zero copy means different things to
different people. Sendfile eliminates the 'read(to buffer);write(buffer to
network);' copy. 

Some network drivers again may eliminate the 'copy_with_checksum()' step,
allowing minus-one-copy, in zerocopy reference frame.

Regards,

bert


-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-04-30 19:28     ` sendfile bert hubert
@ 2003-04-30 21:57       ` Pål Halvorsen
  2003-04-30 22:18         ` sendfile Mark Mielke
  0 siblings, 1 reply; 22+ messages in thread
From: Pål Halvorsen @ 2003-04-30 21:57 UTC (permalink / raw)
  To: bert hubert; +Cc: linux-kernel

On Wed, 30 Apr 2003, bert hubert wrote:

> On Wed, Apr 30, 2003 at 09:12:17PM +0200, P?l Halvorsen wrote:
>
> > It could be useful for applications like streaming video where other
> > protocols on top provide additional functionality or in a multicast
> > session where TCP migth not be appropriate.
>
> sendfile on UDP would try to send gigabits per second over ppp0...

YES, I guess sendfile will send "count" bytes as fast as possible using
UDP. However, can't sendfile be called several times, allowing the
sender to keep track of the offsett and byte count, e.g., sending the
data needed for a second video each second? Or does sendfile
close the file/socket after each call (really making it useful for only
whole file transfers at a time like retrieving a www-document)?

> > But should not the 2.4.X kernels have support for chained sk_buffs (like
> > the BSD mbufs) meaning that support for scatter-gatter I/O from the NIC
> > should be unneccessary to support zero-copy (i.e., NO in-memory data
> > copy operations)?
>
> No clue what you mean over here. Zero copy means different things to
> different people. Sendfile eliminates the 'read(to buffer);write(buffer to
> network);' copy.

First, zero-copy for me is to have no copy operations from one main memory
location to another (not counting the transfer from disk to memory and
from memory to NIC). Thus, I would like to read data into one memory
location and transfer the same data form the same location to the NIC.

I would like to be able to have data several places in memory
(like reading data from disk into several non-contiguous pages, e.g.
using DMA). Then, I would like to be able to send these data without
moving data to another memory location. If for example data for a packet
is located in two different pages, I'd like to have a sk_buff pointing to
each of these data areas and sending these two data chunks to the NIC
without having to copy the data into one single, continuous memory region
first before sending it to the NIC.

The issue about "chained" sk_buffs is something I read in the Linux
Journal (january issue I think) about sendfile. Taking a very brief look
at the sk_buff code, I think skb->data could be pointing to a
	struct skb_shared_info {
        	atomic_t        dataref;
        	unsigned int    nr_frags;
        	struct sk_buff  *frag_list;
        	skb_frag_t      frags[MAX_SKB_FRAGS];
	};
where each "frags" is a pointer to a page, the offset and the size.
Thus, the sk_buff could be able to get data from several memory pages
for a single packet!??

However, you will have a data transfer to the CPU calculating the
checksum, but the data will not be put into another memory region (i.e.,
no copy operation).

> Some network drivers again may eliminate the 'copy_with_checksum()' step,
> allowing minus-one-copy, in zerocopy reference frame.

Does this mean that if the NIC cannot perform the checksum on-board, the
Linux communication system performs a "copy_with_checksum" copying the
data to another location when performing the checksum, i.e., always
giving a copy operation?

-ph

> Regards,
>
> bert


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-04-30 21:57       ` sendfile Pål Halvorsen
@ 2003-04-30 22:18         ` Mark Mielke
  2003-04-30 22:34           ` sendfile Pål Halvorsen
  0 siblings, 1 reply; 22+ messages in thread
From: Mark Mielke @ 2003-04-30 22:18 UTC (permalink / raw)
  To: Pål Halvorsen; +Cc: bert hubert, linux-kernel

On Wed, Apr 30, 2003 at 11:57:59PM +0200, P?l Halvorsen wrote:
> On Wed, 30 Apr 2003, bert hubert wrote:
> > On Wed, Apr 30, 2003 at 09:12:17PM +0200, P?l Halvorsen wrote:
> > > It could be useful for applications like streaming video where other
> > > protocols on top provide additional functionality or in a multicast
> > > session where TCP migth not be appropriate.
> > sendfile on UDP would try to send gigabits per second over ppp0...
> YES, I guess sendfile will send "count" bytes as fast as possible using
> UDP. However, can't sendfile be called several times, allowing the
> sender to keep track of the offsett and byte count, e.g., sending the
> data needed for a second video each second? Or does sendfile
> close the file/socket after each call (really making it useful for only
> whole file transfers at a time like retrieving a www-document)?

At some point, I would wonder 'why'? I've always considered the real
benefit of sendfile() that the system never has to fully swap your
process in, in order to do work on your behalf as would be necessary
with read() and write(). The zero copy architecture doesn't seem
significant to me if you are going to wait between sendfile()
requests.

> > > But should not the 2.4.X kernels have support for chained sk_buffs (like
> > > the BSD mbufs) meaning that support for scatter-gatter I/O from the NIC
> > > should be unneccessary to support zero-copy (i.e., NO in-memory data
> > > copy operations)?
> > No clue what you mean over here. Zero copy means different things to
> > different people. Sendfile eliminates the 'read(to buffer);write(buffer to
> > network);' copy.
> First, zero-copy for me is to have no copy operations from one main memory
> location to another (not counting the transfer from disk to memory and
> from memory to NIC). Thus, I would like to read data into one memory
> location and transfer the same data form the same location to the NIC.

To some degree, couldn't sendto() fit this description? (Assuming the kernel
implemented 'zero-copy' on sendto()) The benefit of sendfile() is that
data isn't coming from a memory location. It is coming from disk, meaning
that your process doesn't have to become active in order for work to be
done. In the case of UDP packets, you almost always want a layer on top
that either times the UDP packet output, or sends output in response to
input, mostly defeating the purpose of sendfile()...

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-04-30 22:18         ` sendfile Mark Mielke
@ 2003-04-30 22:34           ` Pål Halvorsen
  2003-05-01  4:28             ` sendfile Mark Mielke
  0 siblings, 1 reply; 22+ messages in thread
From: Pål Halvorsen @ 2003-04-30 22:34 UTC (permalink / raw)
  To: Mark Mielke; +Cc: bert hubert, linux-kernel, Pål Halvorsen

On Wed, 30 Apr 2003, Mark Mielke wrote:

> On Wed, Apr 30, 2003 at 11:57:59PM +0200, P?l Halvorsen wrote:
> > On Wed, 30 Apr 2003, bert hubert wrote:
> > > On Wed, Apr 30, 2003 at 09:12:17PM +0200, P?l Halvorsen wrote:
> > > > It could be useful for applications like streaming video where other
> > > > protocols on top provide additional functionality or in a multicast
> > > > session where TCP migth not be appropriate.
> > > sendfile on UDP would try to send gigabits per second over ppp0...
> > YES, I guess sendfile will send "count" bytes as fast as possible using
> > UDP. However, can't sendfile be called several times, allowing the
> > sender to keep track of the offsett and byte count, e.g., sending the
> > data needed for a second video each second? Or does sendfile
> > close the file/socket after each call (really making it useful for only
> > whole file transfers at a time like retrieving a www-document)?
>
> At some point, I would wonder 'why'? I've always considered the real
> benefit of sendfile() that the system never has to fully swap your
> process in, in order to do work on your behalf as would be necessary
> with read() and write(). The zero copy architecture doesn't seem
> significant to me if you are going to wait between sendfile()
> requests.

OK, but what I want to do is to use a sendfile-like ("streamfile") system
call for streaming multimedia data like video, i.e., sending the whole
file requires large buffers at the client (e.g., 4-5 GB for a DVD video).
Thus, I would like to have a sending/transfer rate equal to the
consumption rate.

Sure, I can use read/write, mmap/write, etc. but these include copy
operations and several address space switches. If I can have a system call
saying "send data segment X to client Y" in one system call and no copy
operations, I'll save resources on a heavily loaded machine.....

> > > > But should not the 2.4.X kernels have support for chained sk_buffs (like
> > > > the BSD mbufs) meaning that support for scatter-gatter I/O from the NIC
> > > > should be unneccessary to support zero-copy (i.e., NO in-memory data
> > > > copy operations)?
> > > No clue what you mean over here. Zero copy means different things to
> > > different people. Sendfile eliminates the 'read(to buffer);write(buffer to
> > > network);' copy.
> > First, zero-copy for me is to have no copy operations from one main memory
> > location to another (not counting the transfer from disk to memory and
> > from memory to NIC). Thus, I would like to read data into one memory
> > location and transfer the same data form the same location to the NIC.
>
> To some degree, couldn't sendto() fit this description? (Assuming the kernel
> implemented 'zero-copy' on sendto()) The benefit of sendfile() is that
> data isn't coming from a memory location. It is coming from disk, meaning
> that your process doesn't have to become active in order for work to be
> done. In the case of UDP packets, you almost always want a layer on top
> that either times the UDP packet output, or sends output in response to
> input, mostly defeating the purpose of sendfile()...

Maybe, but then I'll have two system calls...
-ph

> mark


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-04-30 22:34           ` sendfile Pål Halvorsen
@ 2003-05-01  4:28             ` Mark Mielke
  2003-05-01 15:25               ` sendfile Joseph Malicki
  2003-05-01 21:17               ` sendfile Pål Halvorsen
  0 siblings, 2 replies; 22+ messages in thread
From: Mark Mielke @ 2003-05-01  4:28 UTC (permalink / raw)
  To: Pål Halvorsen; +Cc: bert hubert, linux-kernel

On Thu, May 01, 2003 at 12:34:32AM +0200, Pål Halvorsen wrote:
> On Wed, 30 Apr 2003, Mark Mielke wrote:
> > To some degree, couldn't sendto() fit this description? (Assuming the
> > kernel implemented 'zero-copy' on sendto()) The benefit of sendfile()
> > is that data isn't coming from a memory location. It is coming from disk,
> > meaning that your process doesn't have to become active in order for work
> > to be done. In the case of UDP packets, you almost always want a layer on
> > top that either times the UDP packet output, or sends output in response
> > to input, mostly defeating the purpose of sendfile()...
> Maybe, but then I'll have two system calls...

As I mentioned before, the real benefit to sendfile(), as I understand it, is
that sendfile() makes it unnecessary for the OS to fully activate the calling
process in order to do work for the calling process. Unless you can point out
some other benefit provided by sendfile(), I fail to see how you will do:

    while (1) {
        send_frame_over_udp();
        sleep();
    }

Without two system calls. Whether send_frame_over_udp() uses sendfile() as
you seem to want it to, or whether it just calls sendto(), doesn't make a
difference. Because one of your requirements is that you need to provide a
smooth feed, the primary benefit of sendfile(), that of not having to activate
your process, becomes invalid.

I haven't done timings, or looked deeply at this part of linux-2.5.x,
however, I fail to see why the following code should not meet your
requirements:

    void *p = mmap(0, length_of_file, PROT_READ, MAP_SHARED, fd, 0);
    off_t offset = 0;

    while (offset < length_of_file)
      {
        int packet_size = max(512, length_of_file - offset);
        send(socket, &p[offset], packet_size, 0);
        offset += packet_size;
        usleep(packets_size * 1000000 / packets_per_second);
      }

In theory, send() should be able to provide the zero copy benefits you
are requesting. In practice, it might be a little harder, but in this
case, from my perspective, send() and sendfile() should both provide
equivalent performance. Why would sendfile() perform better than send()?

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-05-01  4:28             ` sendfile Mark Mielke
@ 2003-05-01 15:25               ` Joseph Malicki
  2003-05-01 21:17               ` sendfile Pål Halvorsen
  1 sibling, 0 replies; 22+ messages in thread
From: Joseph Malicki @ 2003-05-01 15:25 UTC (permalink / raw)
  To: Mark Mielke, Pål Halvorsen; +Cc: linux-kernel

One major difference I've noticed is the interaction with the VM subsystem.
When you have a large number of processes mmap'ing large files to send(), it
really starts to tickle bugs and performance problems.  sendfile() avoids
this, only needing to use the page cache.

-joe

----- Original Message ----- 
From: "Mark Mielke" <mark@mark.mielke.cc>
To: "Pål Halvorsen" <paalh@ifi.uio.no>
Cc: "bert hubert" <ahu@ds9a.nl>; <linux-kernel@vger.kernel.org>
Sent: Thursday, May 01, 2003 12:28 AM
Subject: Re: sendfile


> On Thu, May 01, 2003 at 12:34:32AM +0200, Pål Halvorsen wrote:
> > On Wed, 30 Apr 2003, Mark Mielke wrote:
> > > To some degree, couldn't sendto() fit this description? (Assuming the
> > > kernel implemented 'zero-copy' on sendto()) The benefit of sendfile()
> > > is that data isn't coming from a memory location. It is coming from
disk,
> > > meaning that your process doesn't have to become active in order for
work
> > > to be done. In the case of UDP packets, you almost always want a layer
on
> > > top that either times the UDP packet output, or sends output in
response
> > > to input, mostly defeating the purpose of sendfile()...
> > Maybe, but then I'll have two system calls...
>
> As I mentioned before, the real benefit to sendfile(), as I understand it,
is
> that sendfile() makes it unnecessary for the OS to fully activate the
calling
> process in order to do work for the calling process. Unless you can point
out
> some other benefit provided by sendfile(), I fail to see how you will do:
>
>     while (1) {
>         send_frame_over_udp();
>         sleep();
>     }
>
> Without two system calls. Whether send_frame_over_udp() uses sendfile() as
> you seem to want it to, or whether it just calls sendto(), doesn't make a
> difference. Because one of your requirements is that you need to provide a
> smooth feed, the primary benefit of sendfile(), that of not having to
activate
> your process, becomes invalid.
>
> I haven't done timings, or looked deeply at this part of linux-2.5.x,
> however, I fail to see why the following code should not meet your
> requirements:
>
>     void *p = mmap(0, length_of_file, PROT_READ, MAP_SHARED, fd, 0);
>     off_t offset = 0;
>
>     while (offset < length_of_file)
>       {
>         int packet_size = max(512, length_of_file - offset);
>         send(socket, &p[offset], packet_size, 0);
>         offset += packet_size;
>         usleep(packets_size * 1000000 / packets_per_second);
>       }
>
> In theory, send() should be able to provide the zero copy benefits you
> are requesting. In practice, it might be a little harder, but in this
> case, from my perspective, send() and sendfile() should both provide
> equivalent performance. Why would sendfile() perform better than send()?
>
> mark
>
> -- 
> mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com
__________________________
> .  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
> |\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
> |  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario,
Canada
>
>   One ring to rule them all, one ring to find them, one ring to bring them
all
>                        and in the darkness bind them...
>
>                            http://mark.mielke.cc/
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-05-01  4:28             ` sendfile Mark Mielke
  2003-05-01 15:25               ` sendfile Joseph Malicki
@ 2003-05-01 21:17               ` Pål Halvorsen
  2003-05-01 22:31                 ` sendfile Chris Friesen
  1 sibling, 1 reply; 22+ messages in thread
From: Pål Halvorsen @ 2003-05-01 21:17 UTC (permalink / raw)
  To: Mark Mielke; +Cc: bert hubert, linux-kernel, Pål Halvorsen

On Thu, 1 May 2003, Mark Mielke wrote:

> On Thu, May 01, 2003 at 12:34:32AM +0200, Pål Halvorsen wrote:
> > On Wed, 30 Apr 2003, Mark Mielke wrote:
> > > To some degree, couldn't sendto() fit this description? (Assuming the
> > > kernel implemented 'zero-copy' on sendto()) The benefit of sendfile()
> > > is that data isn't coming from a memory location. It is coming from disk,
> > > meaning that your process doesn't have to become active in order for work
> > > to be done. In the case of UDP packets, you almost always want a layer on
> > > top that either times the UDP packet output, or sends output in response
> > > to input, mostly defeating the purpose of sendfile()...
> > Maybe, but then I'll have two system calls...
>
> As I mentioned before, the real benefit to sendfile(), as I understand it, is
> that sendfile() makes it unnecessary for the OS to fully activate the calling
> process in order to do work for the calling process. Unless you can point out
> some other benefit provided by sendfile(), I fail to see how you will do:
>
>     while (1) {
>         send_frame_over_udp();
>         sleep();
>     }
>
> Without two system calls. Whether send_frame_over_udp() uses sendfile() as
> you seem to want it to, or whether it just calls sendto(), doesn't make a
> difference. Because one of your requirements is that you need to provide a
> smooth feed, the primary benefit of sendfile(), that of not having to activate
> your process, becomes invalid.
>
> I haven't done timings, or looked deeply at this part of linux-2.5.x,
> however, I fail to see why the following code should not meet your
> requirements:
>
>     void *p = mmap(0, length_of_file, PROT_READ, MAP_SHARED, fd, 0);
>     off_t offset = 0;
>
>     while (offset < length_of_file)
>       {
>         int packet_size = max(512, length_of_file - offset);
>         send(socket, &p[offset], packet_size, 0);
>         offset += packet_size;
>         usleep(packets_size * 1000000 / packets_per_second);
>       }
>
> In theory, send() should be able to provide the zero copy benefits you
> are requesting. In practice, it might be a little harder, but in this
> case, from my perspective, send() and sendfile() should both provide
> equivalent performance. Why would sendfile() perform better than send()?

As far as i understand mmap/send, you'll have a copy operation in the
kernel here. mmap shares the kernel and user buffer, but when sending the
packet data is copied to the socket buffer!!??

OK, but I understand that my streaming scenario is not the target
application for sendfile.

Then, I have another question - so that I maybe can implement this myself.
Can the network interface support gather operations - ie. collecting data
several places for a packet ("DMA gather copy" from memory to NIC)?

(Like described in
http://delivery.acm.org/10.1145/610000/603774/6345.html?key1=603774&key2=4582281501&coll=portal&dl=ACM&CFID=10149715&CFTOKEN=89922395
- Linux Journal Volume 2003 ,  Issue 105  (January 2003) )

If so, does the sk_buff use the struct skb_shared_info to point to the
different memory regions, or ...?

-ph

> mark


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-05-01 21:17               ` sendfile Pål Halvorsen
@ 2003-05-01 22:31                 ` Chris Friesen
  2003-05-01 23:32                   ` sendfile Ketil Froyn
  2003-05-02  2:41                   ` sendfile Mark Mielke
  0 siblings, 2 replies; 22+ messages in thread
From: Chris Friesen @ 2003-05-01 22:31 UTC (permalink / raw)
  To: Pål Halvorsen; +Cc: Mark Mielke, bert hubert, linux-kernel

Pål Halvorsen wrote:

> As far as i understand mmap/send, you'll have a copy operation in the
> kernel here. mmap shares the kernel and user buffer, but when sending the
> packet data is copied to the socket buffer!!??

Yes, there is a copy there.

> OK, but I understand that my streaming scenario is not the target
> application for sendfile.

What stops you from using sendfile (with TCP) to each destination separately, 
with the client only reading from the pipe as needed (presumably with a number 
of frames worth of buffer on the client side)?


Chris

-- 
Chris Friesen                    | MailStop: 043/33/F10
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-05-01 22:31                 ` sendfile Chris Friesen
@ 2003-05-01 23:32                   ` Ketil Froyn
  2003-05-02  9:02                     ` sendfile Bernd Eckenfels
  2003-05-02  2:41                   ` sendfile Mark Mielke
  1 sibling, 1 reply; 22+ messages in thread
From: Ketil Froyn @ 2003-05-01 23:32 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Pål Halvorsen, Mark Mielke, bert hubert, linux-kernel

On Thu, 1 May 2003, Chris Friesen wrote:

> Pål Halvorsen wrote:
>
> > OK, but I understand that my streaming scenario is not the target
> > application for sendfile.
>
> What stops you from using sendfile (with TCP) to each destination
> separately, with the client only reading from the pipe as needed
> (presumably with a number of frames worth of buffer on the client
> side)?

I don't think TCP is suitable for streaming multimedia stuff to clients.
For instance, if a packet does not arrive on the client, it's better to
handle this in the client and skip a frame or show one of worse quality
than to have the video stop while waiting for the server to resend.

Ketil



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-05-01 22:31                 ` sendfile Chris Friesen
  2003-05-01 23:32                   ` sendfile Ketil Froyn
@ 2003-05-02  2:41                   ` Mark Mielke
  2003-05-02  4:19                     ` sendfile Chris Friesen
  1 sibling, 1 reply; 22+ messages in thread
From: Mark Mielke @ 2003-05-02  2:41 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Pål Halvorsen, bert hubert, linux-kernel

On Thu, May 01, 2003 at 06:31:05PM -0400, Chris Friesen wrote:
> Pål Halvorsen wrote:
> >As far as i understand mmap/send, you'll have a copy operation in the
> >kernel here. mmap shares the kernel and user buffer, but when sending the
> >packet data is copied to the socket buffer!!??
> Yes, there is a copy there.

As far as I understand, sendfile() still requires the data to get from the
disk to a page in memory, similar to how send() referencing an mmap()'d page
may cause a page fault, reading the data from disk to a page in memory. One
copy each. I don't know of a kernel interface that lets data be copied from
disk to ethernet card without involving a temporary copy to be in paged
memory at some point in time... perhaps the iSCSI stuff can do this? I dunno.

Somebody else pointed out that mmap() may not yet be implemented completely
optimally. I will have to look at the code before I continue to make my
'in theory' comments, however the following 'NOTE' in the manpage for sendfile
makes me suspect that sendfile() is very similar to mmap()/write():

       -- CUT --
       Presently the descriptor from which data is read cannot correspond to a
       socket, it must correspond to a file which supports mmap()-like  opera-
       tions.
       -- CUT --

> >OK, but I understand that my streaming scenario is not the target
> >application for sendfile.
> What stops you from using sendfile (with TCP) to each destination 
> separately, with the client only reading from the pipe as needed 
> (presumably with a number of frames worth of buffer on the client side)?

TCP isn't very well suited for video feeds. First, it is streamed, which
makes it a little annoying to ensure that only whole frames get through.
Second, its acknowledgement scheme prefers reliability over low latency.

I'm hoping for good things from SCTP. From what I've read, it looks as
if it should provide a compromise between TCP and UDP that is quite
optimal.

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-05-02  2:41                   ` sendfile Mark Mielke
@ 2003-05-02  4:19                     ` Chris Friesen
  2003-05-02 21:06                       ` sendfile Mark Mielke
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Friesen @ 2003-05-02  4:19 UTC (permalink / raw)
  To: Mark Mielke; +Cc: Pål Halvorsen, bert hubert, linux-kernel

Mark Mielke wrote:

> As far as I understand, sendfile() still requires the data to get from the
> disk to a page in memory, similar to how send() referencing an mmap()'d page
> may cause a page fault, reading the data from disk to a page in memory. One
> copy each. I don't know of a kernel interface that lets data be copied from
> disk to ethernet card without involving a temporary copy to be in paged
> memory at some point in time... perhaps the iSCSI stuff can do this? I dunno.

According to this:

http://asia.cnet.com/builder/program/dev/0,39009360,39062783,00.htm

using sendfile() is easier on the CPU due to less trashing of the TLB.


I do get your point about protocol limitiations though.

Chris

-- 
Chris Friesen                    | MailStop: 043/33/F10
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-05-01 23:32                   ` sendfile Ketil Froyn
@ 2003-05-02  9:02                     ` Bernd Eckenfels
  0 siblings, 0 replies; 22+ messages in thread
From: Bernd Eckenfels @ 2003-05-02  9:02 UTC (permalink / raw)
  To: linux-kernel

In article <Pine.LNX.4.40L0.0305020124050.1874-100000@ketil.hb.local> you wrote:
> I don't think TCP is suitable for streaming multimedia stuff to clients.
> For instance, if a packet does not arrive on the client, it's better to
> handle this in the client and skip a frame or show one of worse quality
> than to have the video stop while waiting for the server to resend.

Yes, this is a problem, but on the other hand, if you want to stream to a
large number of clients, you need to consider deployment and firewalling
issues. 

Nearly all streaming applications out there nowaday offer at least a TCP (or
HTTP) fallback, or use only TCP.

Greetings
Bernd
-- 
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-05-02  4:19                     ` sendfile Chris Friesen
@ 2003-05-02 21:06                       ` Mark Mielke
  2003-05-03  0:42                         ` sendfile Miquel van Smoorenburg
                                           ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Mark Mielke @ 2003-05-02 21:06 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Pål Halvorsen, bert hubert, linux-kernel

On Fri, May 02, 2003 at 12:19:25AM -0400, Chris Friesen wrote:
> According to this:
>   http://asia.cnet.com/builder/program/dev/0,39009360,39062783,00.htm
> using sendfile() is easier on the CPU due to less trashing of the TLB.

Thanks for the link. It looks quite accurate.

One question it raises in my mind, is whether there would be value in
improving write()/send() such that they detect that the userspace
pointer refers entirely to mmap()'d file pages, and therefore no copy
of data from userspace -> kernelspace should be performed. The pages
could be loaded and accessed directly (as they are with sendfile())
rather than generating a page fault to load the pages. The TLB trashing
does not need to occur.

I guess the first response to this question would be 'why not use
sendfile()?  it already exists, and people have already begun to use
it...'

My answer is that I don't like sendfile(). It is not-yet-standard, and
is fairly limited. I could just be naive, but I think that:

     write(fd, mmapped_file_pages, length);

Could be transparently mapped to the sendfile() code in the kernel,
minimizing the benefit of sendfile() having its own system call. It all
comes down to optimization. The current implementation of mmap() is not
optimal where mmap()'d file pages are passed as data to system calls.

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-05-02 21:06                       ` sendfile Mark Mielke
@ 2003-05-03  0:42                         ` Miquel van Smoorenburg
  2003-05-03 15:04                           ` sendfile Mark Mielke
  2003-05-03 12:52                         ` sendfile Pål Halvorsen
  2003-05-03 21:01                         ` sendfile Pål Halvorsen
  2 siblings, 1 reply; 22+ messages in thread
From: Miquel van Smoorenburg @ 2003-05-03  0:42 UTC (permalink / raw)
  To: linux-kernel

In article <20030502210648.GA5322@mark.mielke.cc>,
Mark Mielke  <mark@mark.mielke.cc> wrote:
>One question it raises in my mind, is whether there would be value in
>improving write()/send() such that they detect that the userspace
>pointer refers entirely to mmap()'d file pages, and therefore no copy
>of data from userspace -> kernelspace should be performed.

You mean like
http://hypermail.idiosynkrasia.net/linux-kernel/archived/2003/week00/0056.html

Mike.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-05-02 21:06                       ` sendfile Mark Mielke
  2003-05-03  0:42                         ` sendfile Miquel van Smoorenburg
@ 2003-05-03 12:52                         ` Pål Halvorsen
  2003-05-03 21:01                         ` sendfile Pål Halvorsen
  2 siblings, 0 replies; 22+ messages in thread
From: Pål Halvorsen @ 2003-05-03 12:52 UTC (permalink / raw)
  To: Mark Mielke; +Cc: Chris Friesen, bert hubert, linux-kernel

On Fri, 2 May 2003, Mark Mielke wrote:

> On Fri, May 02, 2003 at 12:19:25AM -0400, Chris Friesen wrote:
> > According to this:
> >   http://asia.cnet.com/builder/program/dev/0,39009360,39062783,00.htm
> > using sendfile() is easier on the CPU due to less trashing of the TLB.
>
> Thanks for the link. It looks quite accurate.
>
> One question it raises in my mind, is whether there would be value in
> improving write()/send() such that they detect that the userspace
> pointer refers entirely to mmap()'d file pages, and therefore no copy
> of data from userspace -> kernelspace should be performed. The pages
> could be loaded and accessed directly (as they are with sendfile())
> rather than generating a page fault to load the pages. The TLB trashing
> does not need to occur.
>
> I guess the first response to this question would be 'why not use
> sendfile()?  it already exists, and people have already begun to use
> it...'
>
> My answer is that I don't like sendfile(). It is not-yet-standard, and
> is fairly limited. I could just be naive, but I think that:
>
>      write(fd, mmapped_file_pages, length);
>
> Could be transparently mapped to the sendfile() code in the kernel,
> minimizing the benefit of sendfile() having its own system call. It all
> comes down to optimization. The current implementation of mmap() is not
> optimal where mmap()'d file pages are passed as data to system calls.

This is somewhat similar to what I want to do as well. As long as sendfile
can have this, why cant we make write/send/... similar. Thus, removing the
copy operation. Then, one can easier support streaming applications (or
applications needing more control than sendfile)!

-ph

> mark


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-05-03  0:42                         ` sendfile Miquel van Smoorenburg
@ 2003-05-03 15:04                           ` Mark Mielke
  0 siblings, 0 replies; 22+ messages in thread
From: Mark Mielke @ 2003-05-03 15:04 UTC (permalink / raw)
  To: Miquel van Smoorenburg; +Cc: linux-kernel

On Sat, May 03, 2003 at 12:42:59AM +0000, Miquel van Smoorenburg wrote:
> In article <20030502210648.GA5322@mark.mielke.cc>,
> Mark Mielke  <mark@mark.mielke.cc> wrote:
> >One question it raises in my mind, is whether there would be value in
> >improving write()/send() such that they detect that the userspace
> >pointer refers entirely to mmap()'d file pages, and therefore no copy
> >of data from userspace -> kernelspace should be performed.
> You mean like
> http://hypermail.idiosynkrasia.net/linux-kernel/archived/2003/week00/0056.html

Yes, definately, and thank you for referring us to work that has already
been done.

mark

-- 
mark@mielke.cc/markm@ncf.ca/markm@nortelnetworks.com __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-05-02 21:06                       ` sendfile Mark Mielke
  2003-05-03  0:42                         ` sendfile Miquel van Smoorenburg
  2003-05-03 12:52                         ` sendfile Pål Halvorsen
@ 2003-05-03 21:01                         ` Pål Halvorsen
  2003-05-04  0:53                           ` sendfile Miquel van Smoorenburg
  2 siblings, 1 reply; 22+ messages in thread
From: Pål Halvorsen @ 2003-05-03 21:01 UTC (permalink / raw)
  To: Mark Mielke; +Cc: linux-kernel, Pål Halvorsen, miquels


> Sat, May 03, 2003 at 12:42:59AM +0000, Miquel van Smoorenburg wrote:
> > In article <20030502210648.GA5322@mark.mielke.cc>,
> > Mark Mielke  <mark@mark.mielke.cc> wrote:
> > >One question it raises in my mind, is whether there would be value in
> > >improving write()/send() such that they detect that the userspace
> > >pointer refers entirely to mmap()'d file pages, and therefore no copy
> > >of data from userspace -> kernelspace should be performed.
> > You mean like
> >
>  http://hypermail.idiosynkrasia.net/linux-kernel/archived/2003/week00/0056.html
>
> Yes, definately, and thank you for referring us to work that has already
> been done.
>
> mark

Does this mean that if you memory map a file and send it through TCP,
you'll have no copy operations transfering data from disk to NIC (except
the DMS transfers disk->memory and memory->NIC)?

Does there exist work implementing this also for UDP?

-ph

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: sendfile
  2003-05-03 21:01                         ` sendfile Pål Halvorsen
@ 2003-05-04  0:53                           ` Miquel van Smoorenburg
  0 siblings, 0 replies; 22+ messages in thread
From: Miquel van Smoorenburg @ 2003-05-04  0:53 UTC (permalink / raw)
  To: Pål Halvorsen; +Cc: Mark Mielke, linux-kernel, Pål Halvorsen, miquels

On Sat, 03 May 2003 23:01:21, Pål Halvorsen wrote:
> 
> > Sat, May 03, 2003 at 12:42:59AM +0000, Miquel van Smoorenburg wrote:
> > > In article <20030502210648.GA5322@mark.mielke.cc>,
> > > Mark Mielke  <mark@mark.mielke.cc> wrote:
> > > >One question it raises in my mind, is whether there would be value
> in
> > > >improving write()/send() such that they detect that the userspace
> > > >pointer refers entirely to mmap()'d file pages, and therefore no
> copy
> > > >of data from userspace -> kernelspace should be performed.
> > > You mean like
> > >
> >  
> http://hypermail.idiosynkrasia.net/linux-kernel/archived/2003/week00/0056.html
> >
> > Yes, definately, and thank you for referring us to work that has
> already
> > been done.
> >
> > mark
> 
> Does this mean that if you memory map a file and send it through TCP,
> you'll have no copy operations transfering data from disk to NIC (except
> the DMS transfers disk->memory and memory->NIC)?

No. I just referred to an earlier discussion about this topic. That does't
mean it has been implemented. In fact if you actually read that discussion
you'll see that it probably won't be implemented at all.
  Mike.
-- 
| Miquel van Smoorenburg        | "I know one million ways, to always pick 
|
| miquels@{drinkel.,}cistron.nl |  the wrong fantasy" - the Black Crowes.  
|

^ permalink raw reply	[flat|nested] 22+ messages in thread

* sendfile
@ 2001-05-24  8:44 Pål Halvorsen
  0 siblings, 0 replies; 22+ messages in thread
From: Pål Halvorsen @ 2001-05-24  8:44 UTC (permalink / raw)
  To: linux-kernel, torvalds; +Cc: paalh

Hi!

I'm a Norwegian PhD student looking at zero-copy data paths through the OS
kernel and found sendfile to be interesting. Do this system call remove
all in-memory copy operations, i.e., sharing data buffers between file
system and com. system? (i'm sending data from disk to the network)

Is there any documentation about sendfile?

PS! I'm not a member of the mailing list so please cc the answers to my
mailing address

Thank you in advance,
-ph
---       . o  o   .  o  .  o ..  o ..  o .. o oo . o  . o o o
         _n_n_n____i_i _++++++_ _______ ________ _+++++++++++_
      *>(____________I I______I I_____I I______I I___________I
 __^__  /ooOOOO OOOOoo  oo ooo  oo   oo oo    oo ooo       ooo  __^__
( ___ )--------------------------------------------------------( ___ )
 | / | Paal Halvorsen   UniK - Center for technology at Kjeller | \ |
 | / |                                       University of Oslo | \ |
 | / | Phone: +47 64844731                               PB. 70 | \ |
 | / | Phone: +47 64844700 (switchboard)       N - 2027 KJELLER | \ |
 |_/_| Fax:   +47 63818146                               Norway |__|
(_____)-- E-mail: paalh@unik.no -- http://www.unik.no/~paalh --(_____)



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2003-05-04  0:39 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-30 14:28 sendfile Pål Halvorsen
2003-04-30 16:51 ` sendfile bert hubert
2003-04-30 19:12   ` sendfile Pål Halvorsen
2003-04-30 19:28     ` sendfile bert hubert
2003-04-30 21:57       ` sendfile Pål Halvorsen
2003-04-30 22:18         ` sendfile Mark Mielke
2003-04-30 22:34           ` sendfile Pål Halvorsen
2003-05-01  4:28             ` sendfile Mark Mielke
2003-05-01 15:25               ` sendfile Joseph Malicki
2003-05-01 21:17               ` sendfile Pål Halvorsen
2003-05-01 22:31                 ` sendfile Chris Friesen
2003-05-01 23:32                   ` sendfile Ketil Froyn
2003-05-02  9:02                     ` sendfile Bernd Eckenfels
2003-05-02  2:41                   ` sendfile Mark Mielke
2003-05-02  4:19                     ` sendfile Chris Friesen
2003-05-02 21:06                       ` sendfile Mark Mielke
2003-05-03  0:42                         ` sendfile Miquel van Smoorenburg
2003-05-03 15:04                           ` sendfile Mark Mielke
2003-05-03 12:52                         ` sendfile Pål Halvorsen
2003-05-03 21:01                         ` sendfile Pål Halvorsen
2003-05-04  0:53                           ` sendfile Miquel van Smoorenburg
  -- strict thread matches above, loose matches on Subject: below --
2001-05-24  8:44 sendfile Pål Halvorsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).