All of lore.kernel.org
 help / color / mirror / Atom feed
* Splice status
@ 2010-07-05  9:26 Ofer Heifetz
  2010-07-05  9:59 ` Changli Gao
  0 siblings, 1 reply; 25+ messages in thread
From: Ofer Heifetz @ 2010-07-05  9:26 UTC (permalink / raw)
  To: netdev

Hi

I have been trying to test splice on kernel 2.6.35_4 (x86) from Samba (v3.4.7) but could not copy more than ~60MB to the Samba server share.

Strace shows that the splice got stuck in blocking mode on the splice call from socket to pipe.

Has anyone managed to get splice from socket to fd work for large files (up to 4G file size) ?

-Ofer

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Splice status
  2010-07-05  9:26 Splice status Ofer Heifetz
@ 2010-07-05  9:59 ` Changli Gao
  2010-07-05 10:52   ` Ofer Heifetz
  0 siblings, 1 reply; 25+ messages in thread
From: Changli Gao @ 2010-07-05  9:59 UTC (permalink / raw)
  To: Ofer Heifetz; +Cc: netdev

On Mon, Jul 5, 2010 at 5:26 PM, Ofer Heifetz <oferh@marvell.com> wrote:
> Hi
>
> I have been trying to test splice on kernel 2.6.35_4 (x86) from Samba (v3.4.7) but could not copy more than ~60MB to the Samba server share.
>
> Strace shows that the splice got stuck in blocking mode on the splice call from socket to pipe.
>

Did you drain the pipe before calling splice(2) to move data from
socket to pipe?

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Splice status
  2010-07-05  9:59 ` Changli Gao
@ 2010-07-05 10:52   ` Ofer Heifetz
  2010-07-05 12:08     ` Changli Gao
  2010-07-05 12:50     ` Eric Dumazet
  0 siblings, 2 replies; 25+ messages in thread
From: Ofer Heifetz @ 2010-07-05 10:52 UTC (permalink / raw)
  To: Changli Gao; +Cc: netdev

I am using Samba, so from my understanding of the source code, it loops and performs splice(sock, pipe) and splice(pipe, fd). There is no flush of any sort in between.

When you say drain you mean to flush all data to pipe?

-Ofer

-----Original Message-----
From: Changli Gao [mailto:xiaosuo@gmail.com] 
Sent: Monday, July 05, 2010 12:59 PM
To: Ofer Heifetz
Cc: netdev@vger.kernel.org
Subject: Re: Splice status

On Mon, Jul 5, 2010 at 5:26 PM, Ofer Heifetz <oferh@marvell.com> wrote:
> Hi
>
> I have been trying to test splice on kernel 2.6.35_4 (x86) from Samba (v3.4.7) but could not copy more than ~60MB to the Samba server share.
>
> Strace shows that the splice got stuck in blocking mode on the splice call from socket to pipe.
>

Did you drain the pipe before calling splice(2) to move data from
socket to pipe?

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Splice status
  2010-07-05 10:52   ` Ofer Heifetz
@ 2010-07-05 12:08     ` Changli Gao
  2010-07-05 12:50     ` Eric Dumazet
  1 sibling, 0 replies; 25+ messages in thread
From: Changli Gao @ 2010-07-05 12:08 UTC (permalink / raw)
  To: Ofer Heifetz; +Cc: netdev

On Mon, Jul 5, 2010 at 6:52 PM, Ofer Heifetz <oferh@marvell.com> wrote:
> I am using Samba, so from my understanding of the source code, it loops and performs splice(sock, pipe) and splice(pipe, fd). There is no flush of any sort in between.
>

I checked the function: sys_recvfile() and found It is buggy.

                to_write = nread;
                while (to_write > 0) {
                        int thistime;
                        thistime = splice(pipefd[0], NULL, tofd,
                                          &splice_offset, to_write,
                                          SPLICE_F_MOVE);
                        if (thistime == -1) {
                                goto done;
                        }
                        to_write -= thistime;
                }

                total_written += nread;
                count -= nread;

When splice fails, it should drain the pipe. If not, the following
splice(2) to pipe may hang, because the pipe hasn't enough space for
the data read from socket.

> When you say drain you mean to flush all data to pipe?
>

No. I means to read all the data in the pipe.


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Splice status
  2010-07-05 10:52   ` Ofer Heifetz
  2010-07-05 12:08     ` Changli Gao
@ 2010-07-05 12:50     ` Eric Dumazet
  2010-07-05 13:47       ` Ofer Heifetz
  2010-07-06  2:01       ` Changli Gao
  1 sibling, 2 replies; 25+ messages in thread
From: Eric Dumazet @ 2010-07-05 12:50 UTC (permalink / raw)
  To: Ofer Heifetz; +Cc: Changli Gao, netdev

Le lundi 05 juillet 2010 à 13:52 +0300, Ofer Heifetz a écrit :
> I am using Samba, so from my understanding of the source code, it
loops and performs splice(sock, pipe) and splice(pipe, fd). There is no
flush of any sort in between.
> 
> When you say drain you mean to flush all data to pipe?
> 

Draining pipe before splice() call would only trigger the bug less
often.

splice(sock, pipe) can block if caller dont use appropriate "non
blocking pipe' splice() mode, even if pipe is empty before a splice()
call.

Last time I checked, splice() code was disabled in samba.

Is it a patched version ?

Samba should add SPLICE_F_NONBLOCK to first splice() call (from sock to
pipe)

(You also need a recent kernel, check for details :
http://patchwork.ozlabs.org/patch/34511/ )

diff --git a/source3/lib/recvfile.c b/source3/lib/recvfile.c
index ea01596..65e6f34 100644
--- a/source3/lib/recvfile.c
+++ b/source3/lib/recvfile.c
@@ -182,7 +182,7 @@ ssize_t sys_recvfile(int fromfd,
                int nread, to_write;
 
                nread = splice(fromfd, NULL, pipefd[1], NULL,
-                              MIN(count, 16384), SPLICE_F_MOVE);
+                              MIN(count, 16384), SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
                if (nread == -1) {
                        if (errno == EINTR) {
                                continue;



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* RE: Splice status
  2010-07-05 12:50     ` Eric Dumazet
@ 2010-07-05 13:47       ` Ofer Heifetz
  2010-07-05 15:34         ` Eric Dumazet
  2010-07-06  2:01       ` Changli Gao
  1 sibling, 1 reply; 25+ messages in thread
From: Ofer Heifetz @ 2010-07-05 13:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Changli Gao, netdev

Hi,

Well, Samba still disables splice support (hard coded), I applied your patch (adding the SPLICE_F_NONBLOCK to the splice(sock, pipe)) and I managed to write 4G file to Samba share.

I did notice that the splice is done on buffers in two sizes: 1380 and 2760 (when writing to share file), I guess that if I can get samba to use bigger buffers it will reduce the splice calls and achieve better performance.

I also saw that when re-writing a file splice does use the maximum buffer size (~16K) occasionally.

Need to perform some more testing with samba splice ...

-Ofer

-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
Sent: Monday, July 05, 2010 3:51 PM
To: Ofer Heifetz
Cc: Changli Gao; netdev@vger.kernel.org
Subject: RE: Splice status

Le lundi 05 juillet 2010 à 13:52 +0300, Ofer Heifetz a écrit :
> I am using Samba, so from my understanding of the source code, it
loops and performs splice(sock, pipe) and splice(pipe, fd). There is no
flush of any sort in between.
> 
> When you say drain you mean to flush all data to pipe?
> 

Draining pipe before splice() call would only trigger the bug less
often.

splice(sock, pipe) can block if caller dont use appropriate "non
blocking pipe' splice() mode, even if pipe is empty before a splice()
call.

Last time I checked, splice() code was disabled in samba.

Is it a patched version ?

Samba should add SPLICE_F_NONBLOCK to first splice() call (from sock to
pipe)

(You also need a recent kernel, check for details :
http://patchwork.ozlabs.org/patch/34511/ )

diff --git a/source3/lib/recvfile.c b/source3/lib/recvfile.c
index ea01596..65e6f34 100644
--- a/source3/lib/recvfile.c
+++ b/source3/lib/recvfile.c
@@ -182,7 +182,7 @@ ssize_t sys_recvfile(int fromfd,
                int nread, to_write;
 
                nread = splice(fromfd, NULL, pipefd[1], NULL,
-                              MIN(count, 16384), SPLICE_F_MOVE);
+                              MIN(count, 16384), SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
                if (nread == -1) {
                        if (errno == EINTR) {
                                continue;



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* RE: Splice status
  2010-07-05 13:47       ` Ofer Heifetz
@ 2010-07-05 15:34         ` Eric Dumazet
  0 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2010-07-05 15:34 UTC (permalink / raw)
  To: Ofer Heifetz; +Cc: Changli Gao, netdev

Le lundi 05 juillet 2010 à 16:47 +0300, Ofer Heifetz a écrit :
> Hi,
> 
> Well, Samba still disables splice support (hard coded), I applied your
> patch (adding the SPLICE_F_NONBLOCK to the splice(sock, pipe)) and I
> managed to write 4G file to Samba share.
> 
> I did notice that the splice is done on buffers in two sizes: 1380 and
> 2760 (when writing to share file), I guess that if I can get samba to
> use bigger buffers it will reduce the splice calls and achieve better
> performance.
> 

Note that if your load increases or network is faster, splice will
naturally use more data per call. Dont worry.

Also, you can change MIN(count, 16384) to MIN(count, 65536) now the real
samba bug is known and can be fixed (by the SPLICE_F_NONBLOCK patch I
sent)

(I guess using 16384 instead of 65536 was a try to reduce hang
probability)


> I also saw that when re-writing a file splice does use the maximum
> buffer size (~16K) occasionally.

max is 16 * PAGE_SIZE, 65536 bytes on x86

> 
> Need to perform some more testing with samba splice ...
> 
> -Ofer
> 
> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
> Sent: Monday, July 05, 2010 3:51 PM
> To: Ofer Heifetz
> Cc: Changli Gao; netdev@vger.kernel.org
> Subject: RE: Splice status
> 
> Le lundi 05 juillet 2010 à 13:52 +0300, Ofer Heifetz a écrit :
> > I am using Samba, so from my understanding of the source code, it
> loops and performs splice(sock, pipe) and splice(pipe, fd). There is no
> flush of any sort in between.
> > 
> > When you say drain you mean to flush all data to pipe?
> > 
> 
> Draining pipe before splice() call would only trigger the bug less
> often.
> 
> splice(sock, pipe) can block if caller dont use appropriate "non
> blocking pipe' splice() mode, even if pipe is empty before a splice()
> call.
> 
> Last time I checked, splice() code was disabled in samba.
> 
> Is it a patched version ?
> 
> Samba should add SPLICE_F_NONBLOCK to first splice() call (from sock to
> pipe)
> 
> (You also need a recent kernel, check for details :
> http://patchwork.ozlabs.org/patch/34511/ )
> 
> diff --git a/source3/lib/recvfile.c b/source3/lib/recvfile.c
> index ea01596..65e6f34 100644
> --- a/source3/lib/recvfile.c
> +++ b/source3/lib/recvfile.c
> @@ -182,7 +182,7 @@ ssize_t sys_recvfile(int fromfd,
>                 int nread, to_write;
>  
>                 nread = splice(fromfd, NULL, pipefd[1], NULL,
> -                              MIN(count, 16384), SPLICE_F_MOVE);
> +                              MIN(count, 16384), SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
>                 if (nread == -1) {
>                         if (errno == EINTR) {
>                                 continue;
> 
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Splice status
  2010-07-05 12:50     ` Eric Dumazet
  2010-07-05 13:47       ` Ofer Heifetz
@ 2010-07-06  2:01       ` Changli Gao
  2010-07-06  2:36         ` Ofer Heifetz
  2010-07-06  3:56         ` Eric Dumazet
  1 sibling, 2 replies; 25+ messages in thread
From: Changli Gao @ 2010-07-06  2:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jens Axboe, Ofer Heifetz, netdev

On Mon, Jul 5, 2010 at 8:50 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 05 juillet 2010 à 13:52 +0300, Ofer Heifetz a écrit :
>> I am using Samba, so from my understanding of the source code, it
> loops and performs splice(sock, pipe) and splice(pipe, fd). There is no
> flush of any sort in between.
>>
>> When you say drain you mean to flush all data to pipe?
>>
>
> Draining pipe before splice() call would only trigger the bug less
> often.

If we don't drain the pipe before calling splice(2), the data spliced
from pipe maybe not be what we expect. Then data corruption occurs.

>
> splice(sock, pipe) can block if caller dont use appropriate "non
> blocking pipe' splice() mode, even if pipe is empty before a splice()
> call.

I don't think it is expected. The code of sys_recvfile is much like
the sendfile(2) implementation in kernel. If sys_recvfile may block
without non_block flag, sendfile(2) may block too.

BTW: Samba can use sendfile(2) instead in sys_recvfile.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Splice status
  2010-07-06  2:01       ` Changli Gao
@ 2010-07-06  2:36         ` Ofer Heifetz
  2010-07-06  3:56         ` Eric Dumazet
  1 sibling, 0 replies; 25+ messages in thread
From: Ofer Heifetz @ 2010-07-06  2:36 UTC (permalink / raw)
  To: Changli Gao, Eric Dumazet; +Cc: Jens Axboe, netdev

Regarding your remark of replacing sendfile with recvfile, I have two questions:
1) what will be used if both are enabled in smb.conf
2) from your experience, which is faster for reading files?
________________________________________
From: Changli Gao [xiaosuo@gmail.com]
Sent: Tuesday, July 06, 2010 5:01 AM
To: Eric Dumazet
Cc: Jens Axboe; Ofer Heifetz; netdev@vger.kernel.org
Subject: Re: Splice status

On Mon, Jul 5, 2010 at 8:50 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le lundi 05 juillet 2010 à 13:52 +0300, Ofer Heifetz a écrit :
>> I am using Samba, so from my understanding of the source code, it
> loops and performs splice(sock, pipe) and splice(pipe, fd). There is no
> flush of any sort in between.
>>
>> When you say drain you mean to flush all data to pipe?
>>
>
> Draining pipe before splice() call would only trigger the bug less
> often.

If we don't drain the pipe before calling splice(2), the data spliced
from pipe maybe not be what we expect. Then data corruption occurs.

>
> splice(sock, pipe) can block if caller dont use appropriate "non
> blocking pipe' splice() mode, even if pipe is empty before a splice()
> call.

I don't think it is expected. The code of sys_recvfile is much like
the sendfile(2) implementation in kernel. If sys_recvfile may block
without non_block flag, sendfile(2) may block too.

BTW: Samba can use sendfile(2) instead in sys_recvfile.

--
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Splice status
  2010-07-06  2:01       ` Changli Gao
  2010-07-06  2:36         ` Ofer Heifetz
@ 2010-07-06  3:56         ` Eric Dumazet
  2010-07-11 13:08           ` Changli Gao
  1 sibling, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2010-07-06  3:56 UTC (permalink / raw)
  To: Changli Gao; +Cc: Jens Axboe, Ofer Heifetz, netdev

Le mardi 06 juillet 2010 à 10:01 +0800, Changli Gao a écrit :
> On Mon, Jul 5, 2010 at 8:50 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Le lundi 05 juillet 2010 à 13:52 +0300, Ofer Heifetz a écrit :
> >> I am using Samba, so from my understanding of the source code, it
> > loops and performs splice(sock, pipe) and splice(pipe, fd). There is no
> > flush of any sort in between.
> >>
> >> When you say drain you mean to flush all data to pipe?
> >>
> >
> > Draining pipe before splice() call would only trigger the bug less
> > often.
> 
> If we don't drain the pipe before calling splice(2), the data spliced
> from pipe maybe not be what we expect. Then data corruption occurs.
> 

This is not true. A pipe is a pipe is a buffer. You dont need it to be
empty when using it. Nowhere in documentation its stated.

However, a single skb can fill a pipe, even if "its empty"



> >
> > splice(sock, pipe) can block if caller dont use appropriate "non
> > blocking pipe' splice() mode, even if pipe is empty before a splice()
> > call.
> 
> I don't think it is expected. The code of sys_recvfile is much like
> the sendfile(2) implementation in kernel. If sys_recvfile may block
> without non_block flag, sendfile(2) may block too.

Then it would be a bug. You might fix it easily.

Using splice() correctly (ie, not blocking on sock->pipe) should work
too.

Again, you can block on splice(sock, pipe), iff you have a second thread
doing the opposite (pipe->file) in parallel to unblock you. But samba
recvfile algo is using a single thread.

> 
> BTW: Samba can use sendfile(2) instead in sys_recvfile.
> 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Splice status
  2010-07-06  3:56         ` Eric Dumazet
@ 2010-07-11 13:08           ` Changli Gao
  2010-07-13 11:41             ` Ofer Heifetz
  0 siblings, 1 reply; 25+ messages in thread
From: Changli Gao @ 2010-07-11 13:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jens Axboe, Ofer Heifetz, netdev

On Tue, Jul 6, 2010 at 11:56 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 06 juillet 2010 à 10:01 +0800, Changli Gao a écrit :
>>
>> If we don't drain the pipe before calling splice(2), the data spliced
>> from pipe maybe not be what we expect. Then data corruption occurs.
>>
>
> This is not true. A pipe is a pipe is a buffer. You dont need it to be
> empty when using it. Nowhere in documentation its stated.

Do you mean splice(2) empties the pipe buffer before using it as an
output buffer? If not, the pipe draining is needed to avoid data
corruption.

>
> However, a single skb can fill a pipe, even if "its empty"
>

Yea. Because tcp_splice_read() doesn't know if the __tcp_splice_read
returns due to pipe fulling.

>
>> >
>> > splice(sock, pipe) can block if caller dont use appropriate "non
>> > blocking pipe' splice() mode, even if pipe is empty before a splice()
>> > call.
>>
>> I don't think it is expected. The code of sys_recvfile is much like
>> the sendfile(2) implementation in kernel. If sys_recvfile may block
>> without non_block flag, sendfile(2) may block too.
>
> Then it would be a bug. You might fix it easily.

It seems reasonable. I'll fix it.

>
> Using splice() correctly (ie, not blocking on sock->pipe) should work
> too.
>
> Again, you can block on splice(sock, pipe), iff you have a second thread
> doing the opposite (pipe->file) in parallel to unblock you. But samba
> recvfile algo is using a single thread.
>




-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Splice status
  2010-07-11 13:08           ` Changli Gao
@ 2010-07-13 11:41             ` Ofer Heifetz
  2010-07-13 12:32               ` Changli Gao
  2010-07-13 14:11               ` Eric Dumazet
  0 siblings, 2 replies; 25+ messages in thread
From: Ofer Heifetz @ 2010-07-13 11:41 UTC (permalink / raw)
  To: Changli Gao, Eric Dumazet; +Cc: Jens Axboe, netdev

Hi,

I wanted to let you know that I have been testing Samba splice on Marvell 6282 SoC on 2.6.35_rc3 and noticed that it gave worst performance than not using it and also noticed that on re-writing file the iowait is high.

iometer using 2G file (file is created before test)

Splice  write cpu% iow%
-----------------------
 No     58    98    0
Yes     14   100   48

iozone using 2G file (file created during test)

Splice  write cpu% iow%  re-write cpu% iow%  
-------------------------------------------
 No     35    85    4    58.2     70    0
Yes     33    85    4    15.7    100   58

Any clue why splice introduces a high iowait?
I noticed samba uses up to 16K per splice syscall, changing the samba to try more did not help, so I guess it is a kernel limitation.

-Ofer

-----Original Message-----
From: Changli Gao [mailto:xiaosuo@gmail.com] 
Sent: Sunday, July 11, 2010 4:09 PM
To: Eric Dumazet
Cc: Jens Axboe; Ofer Heifetz; netdev@vger.kernel.org
Subject: Re: Splice status

On Tue, Jul 6, 2010 at 11:56 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 06 juillet 2010 à 10:01 +0800, Changli Gao a écrit :
>>
>> If we don't drain the pipe before calling splice(2), the data spliced
>> from pipe maybe not be what we expect. Then data corruption occurs.
>>
>
> This is not true. A pipe is a pipe is a buffer. You dont need it to be
> empty when using it. Nowhere in documentation its stated.

Do you mean splice(2) empties the pipe buffer before using it as an
output buffer? If not, the pipe draining is needed to avoid data
corruption.

>
> However, a single skb can fill a pipe, even if "its empty"
>

Yea. Because tcp_splice_read() doesn't know if the __tcp_splice_read
returns due to pipe fulling.

>
>> >
>> > splice(sock, pipe) can block if caller dont use appropriate "non
>> > blocking pipe' splice() mode, even if pipe is empty before a splice()
>> > call.
>>
>> I don't think it is expected. The code of sys_recvfile is much like
>> the sendfile(2) implementation in kernel. If sys_recvfile may block
>> without non_block flag, sendfile(2) may block too.
>
> Then it would be a bug. You might fix it easily.

It seems reasonable. I'll fix it.

>
> Using splice() correctly (ie, not blocking on sock->pipe) should work
> too.
>
> Again, you can block on splice(sock, pipe), iff you have a second thread
> doing the opposite (pipe->file) in parallel to unblock you. But samba
> recvfile algo is using a single thread.
>




-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Splice status
  2010-07-13 11:41             ` Ofer Heifetz
@ 2010-07-13 12:32               ` Changli Gao
  2010-07-13 12:42                 ` Ofer Heifetz
  2010-07-13 14:11               ` Eric Dumazet
  1 sibling, 1 reply; 25+ messages in thread
From: Changli Gao @ 2010-07-13 12:32 UTC (permalink / raw)
  To: Ofer Heifetz; +Cc: Eric Dumazet, Jens Axboe, netdev

On Tue, Jul 13, 2010 at 7:41 PM, Ofer Heifetz <oferh@marvell.com> wrote:
> Hi,
>
> I wanted to let you know that I have been testing Samba splice on Marvell 6282 SoC on 2.6.35_rc3 and noticed that it gave worst performance than not using it and also noticed that on re-writing file the iowait is high.
>
> iometer using 2G file (file is created before test)
>
> Splice  write cpu% iow%
> -----------------------
>  No     58    98    0
> Yes     14   100   48
>
> iozone using 2G file (file created during test)
>
> Splice  write cpu% iow%  re-write cpu% iow%
> -------------------------------------------
>  No     35    85    4    58.2     70    0
> Yes     33    85    4    15.7    100   58
>
> Any clue why splice introduces a high iowait?
> I noticed samba uses up to 16K per splice syscall, changing the samba to try more did not help, so I guess it is a kernel limitation.
>
> -Ofer
>

What does the column write means? And what do you mean by saying
re-write? Thanks.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Splice status
  2010-07-13 12:32               ` Changli Gao
@ 2010-07-13 12:42                 ` Ofer Heifetz
  2010-07-13 13:58                   ` Changli Gao
  0 siblings, 1 reply; 25+ messages in thread
From: Ofer Heifetz @ 2010-07-13 12:42 UTC (permalink / raw)
  To: Changli Gao; +Cc: Eric Dumazet, Jens Axboe, netdev

Write and re-write numbers are in MBps.
Iozone performs re-write meaning reads a chunk of data and writes it back, so basically the performance for this operation should be quiet high since kernel caches usage.

I forgot to mention that I used EXT4 fs.

-Ofer

-----Original Message-----
From: Changli Gao [mailto:xiaosuo@gmail.com] 
Sent: Tuesday, July 13, 2010 3:32 PM
To: Ofer Heifetz
Cc: Eric Dumazet; Jens Axboe; netdev@vger.kernel.org
Subject: Re: Splice status

On Tue, Jul 13, 2010 at 7:41 PM, Ofer Heifetz <oferh@marvell.com> wrote:
> Hi,
>
> I wanted to let you know that I have been testing Samba splice on Marvell 6282 SoC on 2.6.35_rc3 and noticed that it gave worst performance than not using it and also noticed that on re-writing file the iowait is high.
>
> iometer using 2G file (file is created before test)
>
> Splice  write cpu% iow%
> -----------------------
>  No     58    98    0
> Yes     14   100   48
>
> iozone using 2G file (file created during test)
>
> Splice  write cpu% iow%  re-write cpu% iow%
> -------------------------------------------
>  No     35    85    4    58.2     70    0
> Yes     33    85    4    15.7    100   58
>
> Any clue why splice introduces a high iowait?
> I noticed samba uses up to 16K per splice syscall, changing the samba to try more did not help, so I guess it is a kernel limitation.
>
> -Ofer
>

What does the column write means? And what do you mean by saying
re-write? Thanks.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Splice status
  2010-07-13 12:42                 ` Ofer Heifetz
@ 2010-07-13 13:58                   ` Changli Gao
  2010-07-13 14:40                     ` Ofer Heifetz
  0 siblings, 1 reply; 25+ messages in thread
From: Changli Gao @ 2010-07-13 13:58 UTC (permalink / raw)
  To: Ofer Heifetz; +Cc: Eric Dumazet, Jens Axboe, netdev

On Tue, Jul 13, 2010 at 8:42 PM, Ofer Heifetz <oferh@marvell.com> wrote:
> Write and re-write numbers are in MBps.
> Iozone performs re-write meaning reads a chunk of data and writes it back, so basically the performance for this operation should be quiet high since kernel caches usage.
>
> I forgot to mention that I used EXT4 fs.

Maybe it is caused by this line in generic_file_splice_write():

                balance_dirty_pages_ratelimited_nr(mapping, nr_pages);

Please try to test it again without this line.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Splice status
  2010-07-13 11:41             ` Ofer Heifetz
  2010-07-13 12:32               ` Changli Gao
@ 2010-07-13 14:11               ` Eric Dumazet
  2010-07-14 15:08                 ` Ofer Heifetz
                                   ` (2 more replies)
  1 sibling, 3 replies; 25+ messages in thread
From: Eric Dumazet @ 2010-07-13 14:11 UTC (permalink / raw)
  To: Ofer Heifetz; +Cc: Changli Gao, Jens Axboe, netdev

Le mardi 13 juillet 2010 à 14:41 +0300, Ofer Heifetz a écrit :
> Hi,
> 
> I wanted to let you know that I have been testing Samba splice on Marvell 6282 SoC on 2.6.35_rc3 and noticed that it gave worst performance than not using it and also noticed that on re-writing file the iowait is high.
> 
> iometer using 2G file (file is created before test)
> 
> Splice  write cpu% iow%
> -----------------------
>  No     58    98    0
> Yes     14   100   48
> 
> iozone using 2G file (file created during test)
> 
> Splice  write cpu% iow%  re-write cpu% iow%  
> -------------------------------------------
>  No     35    85    4    58.2     70    0
> Yes     33    85    4    15.7    100   58
> 
> Any clue why splice introduces a high iowait?
> I noticed samba uses up to 16K per splice syscall, changing the samba to try more did not help, so I guess it is a kernel limitation.
> 

splice(socket -> pipe) provides partial buffers (depending on the MTU)

With typical MTU=1500 and tcp timestamps, each network frame contains
1448 bytes of payload, partially filling one page (of 4096 bytes)

When doing the splice(pipe -> file), kernel has to coalesce partial
data, but amount of written data per syscall() is small (about 20
Kbytes)

Without splice(), the write() syscall provides more data, and vfs
overhead is smaller as buffer size is a power of two.

Samba uses a 128 KBytes TRANSFER_BUF_SIZE in its default_sys_recvfile()
implementation, it easily outperforms splice() implementation.

You could try extending pipe size (fcntl(fd, F_SETPIPE_SZ, 256)), maybe
it will be a bit better. (and ask 256*4096 bytes to splice())

I tried this and got about 256Kbytes per splice() call...

# perf report
# Events: 13K
#
# Overhead         Command      Shared Object  Symbol
# ........  ..............  .................  ......
#
     8.69%  splice-fromnet  [kernel.kallsyms]  [k] memcpy
     3.82%  splice-fromnet  [kernel.kallsyms]  [k] kunmap_atomic
     3.51%  splice-fromnet  [kernel.kallsyms]  [k] __block_prepare_write
     2.79%  splice-fromnet  [kernel.kallsyms]  [k] __skb_splice_bits
     2.58%  splice-fromnet  [kernel.kallsyms]  [k] ext3_mark_iloc_dirty
     2.45%  splice-fromnet  [kernel.kallsyms]  [k] do_get_write_access
     2.04%  splice-fromnet  [kernel.kallsyms]  [k] __find_get_block
     1.89%  splice-fromnet  [kernel.kallsyms]  [k] _raw_spin_lock
     1.83%  splice-fromnet  [kernel.kallsyms]  [k] journal_add_journal_head
     1.46%  splice-fromnet  [bnx2x]            [k] bnx2x_rx_int
     1.46%  splice-fromnet  [kernel.kallsyms]  [k] kfree
     1.42%  splice-fromnet  [kernel.kallsyms]  [k] journal_put_journal_head
     1.29%  splice-fromnet  [kernel.kallsyms]  [k] __ext3_get_inode_loc
     1.26%  splice-fromnet  [kernel.kallsyms]  [k] journal_dirty_metadata
     1.25%  splice-fromnet  [kernel.kallsyms]  [k] page_address
     1.20%  splice-fromnet  [kernel.kallsyms]  [k] journal_cancel_revoke
     1.15%  splice-fromnet  [kernel.kallsyms]  [k] tcp_read_sock
     1.09%  splice-fromnet  [kernel.kallsyms]  [k] unlock_buffer
     1.09%  splice-fromnet  [kernel.kallsyms]  [k] pipe_to_file
     1.05%  splice-fromnet  [kernel.kallsyms]  [k] radix_tree_lookup_element
     1.04%  splice-fromnet  [kernel.kallsyms]  [k] kmap_atomic_prot
     1.04%  splice-fromnet  [kernel.kallsyms]  [k] kmem_cache_free
     1.03%  splice-fromnet  [kernel.kallsyms]  [k] kmem_cache_alloc
     1.01%  splice-fromnet  [bnx2x]            [k] bnx2x_poll



^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Splice status
  2010-07-13 13:58                   ` Changli Gao
@ 2010-07-13 14:40                     ` Ofer Heifetz
  0 siblings, 0 replies; 25+ messages in thread
From: Ofer Heifetz @ 2010-07-13 14:40 UTC (permalink / raw)
  To: Changli Gao; +Cc: Eric Dumazet, Jens Axboe, netdev

I profiled the splice iometer write run and noticed that blk_end_request_err is being called many times, it looks like a good candidate for the high iowait, need to debug the root cause for it.

-Ofer

-----Original Message-----
From: Changli Gao [mailto:xiaosuo@gmail.com] 
Sent: Tuesday, July 13, 2010 4:58 PM
To: Ofer Heifetz
Cc: Eric Dumazet; Jens Axboe; netdev@vger.kernel.org
Subject: Re: Splice status

On Tue, Jul 13, 2010 at 8:42 PM, Ofer Heifetz <oferh@marvell.com> wrote:
> Write and re-write numbers are in MBps.
> Iozone performs re-write meaning reads a chunk of data and writes it back, so basically the performance for this operation should be quiet high since kernel caches usage.
>
> I forgot to mention that I used EXT4 fs.

Maybe it is caused by this line in generic_file_splice_write():

                balance_dirty_pages_ratelimited_nr(mapping, nr_pages);

Please try to test it again without this line.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Splice status
  2010-07-13 14:11               ` Eric Dumazet
@ 2010-07-14 15:08                 ` Ofer Heifetz
  2010-07-15  3:47                 ` Ofer Heifetz
  2010-07-25 14:47                 ` Ofer Heifetz
  2 siblings, 0 replies; 25+ messages in thread
From: Ofer Heifetz @ 2010-07-14 15:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Changli Gao, Jens Axboe, netdev

Hi,

The strange thing happens when I configure it to use 256K using fcntl, I see that it uses splice with 1460 bytes, I tried making it smaller but no change.
Any clue why I get smaller pipe though trying to resize it to 256K?

BTW I changed the kernel to 2.6.35_rc5.

-Ofer

-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
Sent: Tuesday, July 13, 2010 5:12 PM
To: Ofer Heifetz
Cc: Changli Gao; Jens Axboe; netdev@vger.kernel.org
Subject: RE: Splice status

Le mardi 13 juillet 2010 à 14:41 +0300, Ofer Heifetz a écrit :
> Hi,
> 
> I wanted to let you know that I have been testing Samba splice on Marvell 6282 SoC on 2.6.35_rc3 and noticed that it gave worst performance than not using it and also noticed that on re-writing file the iowait is high.
> 
> iometer using 2G file (file is created before test)
> 
> Splice  write cpu% iow%
> -----------------------
>  No     58    98    0
> Yes     14   100   48
> 
> iozone using 2G file (file created during test)
> 
> Splice  write cpu% iow%  re-write cpu% iow%  
> -------------------------------------------
>  No     35    85    4    58.2     70    0
> Yes     33    85    4    15.7    100   58
> 
> Any clue why splice introduces a high iowait?
> I noticed samba uses up to 16K per splice syscall, changing the samba to try more did not help, so I guess it is a kernel limitation.
> 

splice(socket -> pipe) provides partial buffers (depending on the MTU)

With typical MTU=1500 and tcp timestamps, each network frame contains
1448 bytes of payload, partially filling one page (of 4096 bytes)

When doing the splice(pipe -> file), kernel has to coalesce partial
data, but amount of written data per syscall() is small (about 20
Kbytes)

Without splice(), the write() syscall provides more data, and vfs
overhead is smaller as buffer size is a power of two.

Samba uses a 128 KBytes TRANSFER_BUF_SIZE in its default_sys_recvfile()
implementation, it easily outperforms splice() implementation.

You could try extending pipe size (fcntl(fd, F_SETPIPE_SZ, 256)), maybe
it will be a bit better. (and ask 256*4096 bytes to splice())

I tried this and got about 256Kbytes per splice() call...

# perf report
# Events: 13K
#
# Overhead         Command      Shared Object  Symbol
# ........  ..............  .................  ......
#
     8.69%  splice-fromnet  [kernel.kallsyms]  [k] memcpy
     3.82%  splice-fromnet  [kernel.kallsyms]  [k] kunmap_atomic
     3.51%  splice-fromnet  [kernel.kallsyms]  [k] __block_prepare_write
     2.79%  splice-fromnet  [kernel.kallsyms]  [k] __skb_splice_bits
     2.58%  splice-fromnet  [kernel.kallsyms]  [k] ext3_mark_iloc_dirty
     2.45%  splice-fromnet  [kernel.kallsyms]  [k] do_get_write_access
     2.04%  splice-fromnet  [kernel.kallsyms]  [k] __find_get_block
     1.89%  splice-fromnet  [kernel.kallsyms]  [k] _raw_spin_lock
     1.83%  splice-fromnet  [kernel.kallsyms]  [k] journal_add_journal_head
     1.46%  splice-fromnet  [bnx2x]            [k] bnx2x_rx_int
     1.46%  splice-fromnet  [kernel.kallsyms]  [k] kfree
     1.42%  splice-fromnet  [kernel.kallsyms]  [k] journal_put_journal_head
     1.29%  splice-fromnet  [kernel.kallsyms]  [k] __ext3_get_inode_loc
     1.26%  splice-fromnet  [kernel.kallsyms]  [k] journal_dirty_metadata
     1.25%  splice-fromnet  [kernel.kallsyms]  [k] page_address
     1.20%  splice-fromnet  [kernel.kallsyms]  [k] journal_cancel_revoke
     1.15%  splice-fromnet  [kernel.kallsyms]  [k] tcp_read_sock
     1.09%  splice-fromnet  [kernel.kallsyms]  [k] unlock_buffer
     1.09%  splice-fromnet  [kernel.kallsyms]  [k] pipe_to_file
     1.05%  splice-fromnet  [kernel.kallsyms]  [k] radix_tree_lookup_element
     1.04%  splice-fromnet  [kernel.kallsyms]  [k] kmap_atomic_prot
     1.04%  splice-fromnet  [kernel.kallsyms]  [k] kmem_cache_free
     1.03%  splice-fromnet  [kernel.kallsyms]  [k] kmem_cache_alloc
     1.01%  splice-fromnet  [bnx2x]            [k] bnx2x_poll



^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Splice status
  2010-07-13 14:11               ` Eric Dumazet
  2010-07-14 15:08                 ` Ofer Heifetz
@ 2010-07-15  3:47                 ` Ofer Heifetz
  2010-07-25 14:47                 ` Ofer Heifetz
  2 siblings, 0 replies; 25+ messages in thread
From: Ofer Heifetz @ 2010-07-15  3:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Changli Gao, Jens Axboe, netdev

Hi,

I managed to get splice use up to 64K which look to me as a samba limitation (smb.conf SO_RCVBUF limitation I think) but still do not get any performance improvement using splice, the write numbers for splice are in about the same as for regular read/write though refraining from copy_to_user and copy_from_user.

-Ofer

-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
Sent: Tuesday, July 13, 2010 5:12 PM
To: Ofer Heifetz
Cc: Changli Gao; Jens Axboe; netdev@vger.kernel.org
Subject: RE: Splice status

Le mardi 13 juillet 2010 à 14:41 +0300, Ofer Heifetz a écrit :
> Hi,
> 
> I wanted to let you know that I have been testing Samba splice on Marvell 6282 SoC on 2.6.35_rc3 and noticed that it gave worst performance than not using it and also noticed that on re-writing file the iowait is high.
> 
> iometer using 2G file (file is created before test)
> 
> Splice  write cpu% iow%
> -----------------------
>  No     58    98    0
> Yes     14   100   48
> 
> iozone using 2G file (file created during test)
> 
> Splice  write cpu% iow%  re-write cpu% iow%  
> -------------------------------------------
>  No     35    85    4    58.2     70    0
> Yes     33    85    4    15.7    100   58
> 
> Any clue why splice introduces a high iowait?
> I noticed samba uses up to 16K per splice syscall, changing the samba to try more did not help, so I guess it is a kernel limitation.
> 

splice(socket -> pipe) provides partial buffers (depending on the MTU)

With typical MTU=1500 and tcp timestamps, each network frame contains
1448 bytes of payload, partially filling one page (of 4096 bytes)

When doing the splice(pipe -> file), kernel has to coalesce partial
data, but amount of written data per syscall() is small (about 20
Kbytes)

Without splice(), the write() syscall provides more data, and vfs
overhead is smaller as buffer size is a power of two.

Samba uses a 128 KBytes TRANSFER_BUF_SIZE in its default_sys_recvfile()
implementation, it easily outperforms splice() implementation.

You could try extending pipe size (fcntl(fd, F_SETPIPE_SZ, 256)), maybe
it will be a bit better. (and ask 256*4096 bytes to splice())

I tried this and got about 256Kbytes per splice() call...

# perf report
# Events: 13K
#
# Overhead         Command      Shared Object  Symbol
# ........  ..............  .................  ......
#
     8.69%  splice-fromnet  [kernel.kallsyms]  [k] memcpy
     3.82%  splice-fromnet  [kernel.kallsyms]  [k] kunmap_atomic
     3.51%  splice-fromnet  [kernel.kallsyms]  [k] __block_prepare_write
     2.79%  splice-fromnet  [kernel.kallsyms]  [k] __skb_splice_bits
     2.58%  splice-fromnet  [kernel.kallsyms]  [k] ext3_mark_iloc_dirty
     2.45%  splice-fromnet  [kernel.kallsyms]  [k] do_get_write_access
     2.04%  splice-fromnet  [kernel.kallsyms]  [k] __find_get_block
     1.89%  splice-fromnet  [kernel.kallsyms]  [k] _raw_spin_lock
     1.83%  splice-fromnet  [kernel.kallsyms]  [k] journal_add_journal_head
     1.46%  splice-fromnet  [bnx2x]            [k] bnx2x_rx_int
     1.46%  splice-fromnet  [kernel.kallsyms]  [k] kfree
     1.42%  splice-fromnet  [kernel.kallsyms]  [k] journal_put_journal_head
     1.29%  splice-fromnet  [kernel.kallsyms]  [k] __ext3_get_inode_loc
     1.26%  splice-fromnet  [kernel.kallsyms]  [k] journal_dirty_metadata
     1.25%  splice-fromnet  [kernel.kallsyms]  [k] page_address
     1.20%  splice-fromnet  [kernel.kallsyms]  [k] journal_cancel_revoke
     1.15%  splice-fromnet  [kernel.kallsyms]  [k] tcp_read_sock
     1.09%  splice-fromnet  [kernel.kallsyms]  [k] unlock_buffer
     1.09%  splice-fromnet  [kernel.kallsyms]  [k] pipe_to_file
     1.05%  splice-fromnet  [kernel.kallsyms]  [k] radix_tree_lookup_element
     1.04%  splice-fromnet  [kernel.kallsyms]  [k] kmap_atomic_prot
     1.04%  splice-fromnet  [kernel.kallsyms]  [k] kmem_cache_free
     1.03%  splice-fromnet  [kernel.kallsyms]  [k] kmem_cache_alloc
     1.01%  splice-fromnet  [bnx2x]            [k] bnx2x_poll



^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: Splice status
  2010-07-13 14:11               ` Eric Dumazet
  2010-07-14 15:08                 ` Ofer Heifetz
  2010-07-15  3:47                 ` Ofer Heifetz
@ 2010-07-25 14:47                 ` Ofer Heifetz
  2010-07-26  7:41                   ` Changli Gao
  2010-07-26 20:37                   ` Jarek Poplawski
  2 siblings, 2 replies; 25+ messages in thread
From: Ofer Heifetz @ 2010-07-25 14:47 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Changli Gao, Jens Axboe, netdev

Hi Eric,

Still trying to get better performance with splice, I noticed that when using splice there are twice the file size memcpy (placed a counter in memcpy), I verified it via samba file transfer and splice-fromnet/out.

I also used the splice-fromnet/out and using ftrace I did notice that data is copied twice using these routines: skb_splice_bits, pipe_to_file.

I thought that the main goal of splice is to refrain from one copy when moving data from network to file descriptor.

If there are the same number of memcpy and context switches and in samba copy additional vfs overhead it makes sense that splice deteriates the samba overall write performance.

-Ofer

-----Original Message-----
From: Eric Dumazet [mailto:eric.dumazet@gmail.com] 
Sent: Tuesday, July 13, 2010 5:12 PM
To: Ofer Heifetz
Cc: Changli Gao; Jens Axboe; netdev@vger.kernel.org
Subject: RE: Splice status

Le mardi 13 juillet 2010 à 14:41 +0300, Ofer Heifetz a écrit :
> Hi,
> 
> I wanted to let you know that I have been testing Samba splice on Marvell 6282 SoC on 2.6.35_rc3 and noticed that it gave worst performance than not using it and also noticed that on re-writing file the iowait is high.
> 
> iometer using 2G file (file is created before test)
> 
> Splice  write cpu% iow%
> -----------------------
>  No     58    98    0
> Yes     14   100   48
> 
> iozone using 2G file (file created during test)
> 
> Splice  write cpu% iow%  re-write cpu% iow%  
> -------------------------------------------
>  No     35    85    4    58.2     70    0
> Yes     33    85    4    15.7    100   58
> 
> Any clue why splice introduces a high iowait?
> I noticed samba uses up to 16K per splice syscall, changing the samba to try more did not help, so I guess it is a kernel limitation.
> 

splice(socket -> pipe) provides partial buffers (depending on the MTU)

With typical MTU=1500 and tcp timestamps, each network frame contains
1448 bytes of payload, partially filling one page (of 4096 bytes)

When doing the splice(pipe -> file), kernel has to coalesce partial
data, but amount of written data per syscall() is small (about 20
Kbytes)

Without splice(), the write() syscall provides more data, and vfs
overhead is smaller as buffer size is a power of two.

Samba uses a 128 KBytes TRANSFER_BUF_SIZE in its default_sys_recvfile()
implementation, it easily outperforms splice() implementation.

You could try extending pipe size (fcntl(fd, F_SETPIPE_SZ, 256)), maybe
it will be a bit better. (and ask 256*4096 bytes to splice())

I tried this and got about 256Kbytes per splice() call...

# perf report
# Events: 13K
#
# Overhead         Command      Shared Object  Symbol
# ........  ..............  .................  ......
#
     8.69%  splice-fromnet  [kernel.kallsyms]  [k] memcpy
     3.82%  splice-fromnet  [kernel.kallsyms]  [k] kunmap_atomic
     3.51%  splice-fromnet  [kernel.kallsyms]  [k] __block_prepare_write
     2.79%  splice-fromnet  [kernel.kallsyms]  [k] __skb_splice_bits
     2.58%  splice-fromnet  [kernel.kallsyms]  [k] ext3_mark_iloc_dirty
     2.45%  splice-fromnet  [kernel.kallsyms]  [k] do_get_write_access
     2.04%  splice-fromnet  [kernel.kallsyms]  [k] __find_get_block
     1.89%  splice-fromnet  [kernel.kallsyms]  [k] _raw_spin_lock
     1.83%  splice-fromnet  [kernel.kallsyms]  [k] journal_add_journal_head
     1.46%  splice-fromnet  [bnx2x]            [k] bnx2x_rx_int
     1.46%  splice-fromnet  [kernel.kallsyms]  [k] kfree
     1.42%  splice-fromnet  [kernel.kallsyms]  [k] journal_put_journal_head
     1.29%  splice-fromnet  [kernel.kallsyms]  [k] __ext3_get_inode_loc
     1.26%  splice-fromnet  [kernel.kallsyms]  [k] journal_dirty_metadata
     1.25%  splice-fromnet  [kernel.kallsyms]  [k] page_address
     1.20%  splice-fromnet  [kernel.kallsyms]  [k] journal_cancel_revoke
     1.15%  splice-fromnet  [kernel.kallsyms]  [k] tcp_read_sock
     1.09%  splice-fromnet  [kernel.kallsyms]  [k] unlock_buffer
     1.09%  splice-fromnet  [kernel.kallsyms]  [k] pipe_to_file
     1.05%  splice-fromnet  [kernel.kallsyms]  [k] radix_tree_lookup_element
     1.04%  splice-fromnet  [kernel.kallsyms]  [k] kmap_atomic_prot
     1.04%  splice-fromnet  [kernel.kallsyms]  [k] kmem_cache_free
     1.03%  splice-fromnet  [kernel.kallsyms]  [k] kmem_cache_alloc
     1.01%  splice-fromnet  [bnx2x]            [k] bnx2x_poll



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Splice status
  2010-07-25 14:47                 ` Ofer Heifetz
@ 2010-07-26  7:41                   ` Changli Gao
  2010-07-26 20:37                   ` Jarek Poplawski
  1 sibling, 0 replies; 25+ messages in thread
From: Changli Gao @ 2010-07-26  7:41 UTC (permalink / raw)
  To: Ofer Heifetz; +Cc: Eric Dumazet, Jens Axboe, netdev, Nick Piggin

On Sun, Jul 25, 2010 at 10:47 PM, Ofer Heifetz <oferh@marvell.com> wrote:
> Hi Eric,
>
> Still trying to get better performance with splice, I noticed that when using splice there are twice the file size memcpy (placed a counter in memcpy), I verified it via samba file transfer and splice-fromnet/out.
>
> I also used the splice-fromnet/out and using ftrace I did notice that data is copied twice using these routines: skb_splice_bits, pipe_to_file.
>
> I thought that the main goal of splice is to refrain from one copy when moving data from network to file descriptor.
>
> If there are the same number of memcpy and context switches and in samba copy additional vfs overhead it makes sense that splice deteriates the samba overall write performance.
>

The support for SPLICE_F_MOVE is removed by Nick in commit
http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commitdiff;h=485ddb4b9741bafb70b22e5c1f9b4f37dc3e85bd
.

Nick, can we add it back now?

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Splice status
  2010-07-25 14:47                 ` Ofer Heifetz
  2010-07-26  7:41                   ` Changli Gao
@ 2010-07-26 20:37                   ` Jarek Poplawski
  2010-07-26 20:50                     ` Eric Dumazet
  1 sibling, 1 reply; 25+ messages in thread
From: Jarek Poplawski @ 2010-07-26 20:37 UTC (permalink / raw)
  To: Ofer Heifetz; +Cc: Eric Dumazet, Changli Gao, Jens Axboe, netdev

Ofer Heifetz wrote, On 25.07.2010 16:47:

> Hi Eric,
> 
> Still trying to get better performance with splice, I noticed that when using splice there are twice the file size memcpy (placed a counter in memcpy), I verified it via samba file transfer and splice-fromnet/out.
> 
> I also used the splice-fromnet/out and using ftrace I did notice that data is copied twice using these routines: skb_splice_bits, pipe_to_file.


I'm not sure you're using optimal NIC for splice, which should use
skbs with almost all data paged (non-linear), like niu or myri10ge.

Jarek P.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: Splice status
  2010-07-26 20:37                   ` Jarek Poplawski
@ 2010-07-26 20:50                     ` Eric Dumazet
  0 siblings, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2010-07-26 20:50 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: Ofer Heifetz, Changli Gao, Jens Axboe, netdev

Le lundi 26 juillet 2010 à 22:37 +0200, Jarek Poplawski a écrit :
> Ofer Heifetz wrote, On 25.07.2010 16:47:
> 
> > Hi Eric,
> > 
> > Still trying to get better performance with splice, I noticed that when using splice there are twice the file size memcpy (placed a counter in memcpy), I verified it via samba file transfer and splice-fromnet/out.
> > 
> > I also used the splice-fromnet/out and using ftrace I did notice that data is copied twice using these routines: skb_splice_bits, pipe_to_file.
> 
> 
> I'm not sure you're using optimal NIC for splice, which should use
> skbs with almost all data paged (non-linear), like niu or myri10ge.
> 
> Jarek P.

Yes, I dont think splice() _should_ be faster, with a NIC delivering
frames of 1460 (or less bytes), when disk IO should be performed with 4
Kbytes blocs (or a multiple) to get good performance.

sendfile(file -> socket) is fast because blocs are pages, but
splice(socket -> file) is not fast, unless the NIC is able to perform
tcp receive offload.

To take an analogy, think about libc stdio versus read(2)/write(2)
syscalls. stdio, while doing copies in intermediate buffers, is able to
be faster than read()/write() in most cases.

Using splice() with 1460 bytes frames is like using read()/write()
instead of nice sized buffers given by stdio layer.

zero-copy can hurt badly if the IO sizes are not page aligned.




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: splice status
  2006-04-20 14:29 splice status Jens Axboe
@ 2006-04-20 23:00 ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2006-04-20 23:00 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

On Thu, Apr 20, 2006 at 04:29:03PM +0200, Jens Axboe wrote:
>   converting them to ->splice_read(). There's also a patch there that
>   fixes up loop.

It actuaklly breaks loop in various setup.  You now directly call
do_generic_file_read which is just a library function for filesystems.
For example xfs or ocfs actually do need additional locking and/or other
bits before calling it.  So this absolutely has to go through a file
operation.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* splice status
@ 2006-04-20 14:29 Jens Axboe
  2006-04-20 23:00 ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2006-04-20 14:29 UTC (permalink / raw)
  To: linux-kernel

Hi,

Since a lot of splice/tee stuff has been merged, I thought I'd post a
little status report and other potentially useful info.

- splice interfaces should be stable now, I don't envision any further
  changes to the ->splice_read or ->splice_write file_operations hooks
  or the splice syscall. splice now accepts an input or output offset
  like sendfile(), so it doesn't have to rely on ->f_pos in the file
  structure.

- Ditto for the sys_tee syscall.

- sendfile() will be replaced with splice(). sys_sendfile will remain of
  course, this is only an internal thing. The current do_splice_direct()
  is a sendfile() helper. The splice branch in the block git repo has
  a patch to remove generic_file_sendfile() and all it's users by
  converting them to ->splice_read(). There's also a patch there that
  fixes up loop. The only remaining users of the file_operations
  .sendfile hook are nfds/shmem/ext2-xip/relay. That still needs doing.
  The current plan is to merge this stuff post 2.6.17.

I have a little collection of splice test tools that people may find
useful to play with this stuff. It's in a git repo here:

        git://brick.kernel.dk/data/git/splice.git

and snapshots are generated every hour on changes and can be fetched
from

        http://brick.kernel.dk/snaps/

There are tools there to test both splice and tee, a little README
explains the basic principle of them. I'd appreciate people testing and
playing with these tools, just in case we still have some bugs lurking.

Finally, known bugs:

- Some smallish splice reads are buggy. Patch is in splice branch and
  will hopefully be merged whenever Linus gets in front of his computer.

- The ->splice_pipe cache needs to be initialized to NULL on forks. Only
  affects do_splice_direct() usage, so not a problem in current kernels.
  Patch also Linus bound today.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2010-07-26 20:50 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-05  9:26 Splice status Ofer Heifetz
2010-07-05  9:59 ` Changli Gao
2010-07-05 10:52   ` Ofer Heifetz
2010-07-05 12:08     ` Changli Gao
2010-07-05 12:50     ` Eric Dumazet
2010-07-05 13:47       ` Ofer Heifetz
2010-07-05 15:34         ` Eric Dumazet
2010-07-06  2:01       ` Changli Gao
2010-07-06  2:36         ` Ofer Heifetz
2010-07-06  3:56         ` Eric Dumazet
2010-07-11 13:08           ` Changli Gao
2010-07-13 11:41             ` Ofer Heifetz
2010-07-13 12:32               ` Changli Gao
2010-07-13 12:42                 ` Ofer Heifetz
2010-07-13 13:58                   ` Changli Gao
2010-07-13 14:40                     ` Ofer Heifetz
2010-07-13 14:11               ` Eric Dumazet
2010-07-14 15:08                 ` Ofer Heifetz
2010-07-15  3:47                 ` Ofer Heifetz
2010-07-25 14:47                 ` Ofer Heifetz
2010-07-26  7:41                   ` Changli Gao
2010-07-26 20:37                   ` Jarek Poplawski
2010-07-26 20:50                     ` Eric Dumazet
  -- strict thread matches above, loose matches on Subject: below --
2006-04-20 14:29 splice status Jens Axboe
2006-04-20 23:00 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.