* Splice status @ 2010-07-05 9:26 Ofer Heifetz 2010-07-05 9:59 ` Changli Gao 0 siblings, 1 reply; 25+ messages in thread From: Ofer Heifetz @ 2010-07-05 9:26 UTC (permalink / raw) To: netdev Hi I have been trying to test splice on kernel 2.6.35_4 (x86) from Samba (v3.4.7) but could not copy more than ~60MB to the Samba server share. Strace shows that the splice got stuck in blocking mode on the splice call from socket to pipe. Has anyone managed to get splice from socket to fd work for large files (up to 4G file size) ? -Ofer ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Splice status 2010-07-05 9:26 Splice status Ofer Heifetz @ 2010-07-05 9:59 ` Changli Gao 2010-07-05 10:52 ` Ofer Heifetz 0 siblings, 1 reply; 25+ messages in thread From: Changli Gao @ 2010-07-05 9:59 UTC (permalink / raw) To: Ofer Heifetz; +Cc: netdev On Mon, Jul 5, 2010 at 5:26 PM, Ofer Heifetz <oferh@marvell.com> wrote: > Hi > > I have been trying to test splice on kernel 2.6.35_4 (x86) from Samba (v3.4.7) but could not copy more than ~60MB to the Samba server share. > > Strace shows that the splice got stuck in blocking mode on the splice call from socket to pipe. > Did you drain the pipe before calling splice(2) to move data from socket to pipe? -- Regards, Changli Gao(xiaosuo@gmail.com) ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: Splice status 2010-07-05 9:59 ` Changli Gao @ 2010-07-05 10:52 ` Ofer Heifetz 2010-07-05 12:08 ` Changli Gao 2010-07-05 12:50 ` Eric Dumazet 0 siblings, 2 replies; 25+ messages in thread From: Ofer Heifetz @ 2010-07-05 10:52 UTC (permalink / raw) To: Changli Gao; +Cc: netdev I am using Samba, so from my understanding of the source code, it loops and performs splice(sock, pipe) and splice(pipe, fd). There is no flush of any sort in between. When you say drain you mean to flush all data to pipe? -Ofer -----Original Message----- From: Changli Gao [mailto:xiaosuo@gmail.com] Sent: Monday, July 05, 2010 12:59 PM To: Ofer Heifetz Cc: netdev@vger.kernel.org Subject: Re: Splice status On Mon, Jul 5, 2010 at 5:26 PM, Ofer Heifetz <oferh@marvell.com> wrote: > Hi > > I have been trying to test splice on kernel 2.6.35_4 (x86) from Samba (v3.4.7) but could not copy more than ~60MB to the Samba server share. > > Strace shows that the splice got stuck in blocking mode on the splice call from socket to pipe. > Did you drain the pipe before calling splice(2) to move data from socket to pipe? -- Regards, Changli Gao(xiaosuo@gmail.com) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Splice status 2010-07-05 10:52 ` Ofer Heifetz @ 2010-07-05 12:08 ` Changli Gao 2010-07-05 12:50 ` Eric Dumazet 1 sibling, 0 replies; 25+ messages in thread From: Changli Gao @ 2010-07-05 12:08 UTC (permalink / raw) To: Ofer Heifetz; +Cc: netdev On Mon, Jul 5, 2010 at 6:52 PM, Ofer Heifetz <oferh@marvell.com> wrote: > I am using Samba, so from my understanding of the source code, it loops and performs splice(sock, pipe) and splice(pipe, fd). There is no flush of any sort in between. > I checked the function: sys_recvfile() and found It is buggy. to_write = nread; while (to_write > 0) { int thistime; thistime = splice(pipefd[0], NULL, tofd, &splice_offset, to_write, SPLICE_F_MOVE); if (thistime == -1) { goto done; } to_write -= thistime; } total_written += nread; count -= nread; When splice fails, it should drain the pipe. If not, the following splice(2) to pipe may hang, because the pipe hasn't enough space for the data read from socket. > When you say drain you mean to flush all data to pipe? > No. I means to read all the data in the pipe. -- Regards, Changli Gao(xiaosuo@gmail.com) ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: Splice status 2010-07-05 10:52 ` Ofer Heifetz 2010-07-05 12:08 ` Changli Gao @ 2010-07-05 12:50 ` Eric Dumazet 2010-07-05 13:47 ` Ofer Heifetz 2010-07-06 2:01 ` Changli Gao 1 sibling, 2 replies; 25+ messages in thread From: Eric Dumazet @ 2010-07-05 12:50 UTC (permalink / raw) To: Ofer Heifetz; +Cc: Changli Gao, netdev Le lundi 05 juillet 2010 à 13:52 +0300, Ofer Heifetz a écrit : > I am using Samba, so from my understanding of the source code, it loops and performs splice(sock, pipe) and splice(pipe, fd). There is no flush of any sort in between. > > When you say drain you mean to flush all data to pipe? > Draining pipe before splice() call would only trigger the bug less often. splice(sock, pipe) can block if caller dont use appropriate "non blocking pipe' splice() mode, even if pipe is empty before a splice() call. Last time I checked, splice() code was disabled in samba. Is it a patched version ? Samba should add SPLICE_F_NONBLOCK to first splice() call (from sock to pipe) (You also need a recent kernel, check for details : http://patchwork.ozlabs.org/patch/34511/ ) diff --git a/source3/lib/recvfile.c b/source3/lib/recvfile.c index ea01596..65e6f34 100644 --- a/source3/lib/recvfile.c +++ b/source3/lib/recvfile.c @@ -182,7 +182,7 @@ ssize_t sys_recvfile(int fromfd, int nread, to_write; nread = splice(fromfd, NULL, pipefd[1], NULL, - MIN(count, 16384), SPLICE_F_MOVE); + MIN(count, 16384), SPLICE_F_MOVE | SPLICE_F_NONBLOCK); if (nread == -1) { if (errno == EINTR) { continue; ^ permalink raw reply related [flat|nested] 25+ messages in thread
* RE: Splice status 2010-07-05 12:50 ` Eric Dumazet @ 2010-07-05 13:47 ` Ofer Heifetz 2010-07-05 15:34 ` Eric Dumazet 2010-07-06 2:01 ` Changli Gao 1 sibling, 1 reply; 25+ messages in thread From: Ofer Heifetz @ 2010-07-05 13:47 UTC (permalink / raw) To: Eric Dumazet; +Cc: Changli Gao, netdev Hi, Well, Samba still disables splice support (hard coded), I applied your patch (adding the SPLICE_F_NONBLOCK to the splice(sock, pipe)) and I managed to write 4G file to Samba share. I did notice that the splice is done on buffers in two sizes: 1380 and 2760 (when writing to share file), I guess that if I can get samba to use bigger buffers it will reduce the splice calls and achieve better performance. I also saw that when re-writing a file splice does use the maximum buffer size (~16K) occasionally. Need to perform some more testing with samba splice ... -Ofer -----Original Message----- From: Eric Dumazet [mailto:eric.dumazet@gmail.com] Sent: Monday, July 05, 2010 3:51 PM To: Ofer Heifetz Cc: Changli Gao; netdev@vger.kernel.org Subject: RE: Splice status Le lundi 05 juillet 2010 à 13:52 +0300, Ofer Heifetz a écrit : > I am using Samba, so from my understanding of the source code, it loops and performs splice(sock, pipe) and splice(pipe, fd). There is no flush of any sort in between. > > When you say drain you mean to flush all data to pipe? > Draining pipe before splice() call would only trigger the bug less often. splice(sock, pipe) can block if caller dont use appropriate "non blocking pipe' splice() mode, even if pipe is empty before a splice() call. Last time I checked, splice() code was disabled in samba. Is it a patched version ? Samba should add SPLICE_F_NONBLOCK to first splice() call (from sock to pipe) (You also need a recent kernel, check for details : http://patchwork.ozlabs.org/patch/34511/ ) diff --git a/source3/lib/recvfile.c b/source3/lib/recvfile.c index ea01596..65e6f34 100644 --- a/source3/lib/recvfile.c +++ b/source3/lib/recvfile.c @@ -182,7 +182,7 @@ ssize_t sys_recvfile(int fromfd, int nread, to_write; nread = splice(fromfd, NULL, pipefd[1], NULL, - MIN(count, 16384), SPLICE_F_MOVE); + MIN(count, 16384), SPLICE_F_MOVE | SPLICE_F_NONBLOCK); if (nread == -1) { if (errno == EINTR) { continue; ^ permalink raw reply related [flat|nested] 25+ messages in thread
* RE: Splice status 2010-07-05 13:47 ` Ofer Heifetz @ 2010-07-05 15:34 ` Eric Dumazet 0 siblings, 0 replies; 25+ messages in thread From: Eric Dumazet @ 2010-07-05 15:34 UTC (permalink / raw) To: Ofer Heifetz; +Cc: Changli Gao, netdev Le lundi 05 juillet 2010 à 16:47 +0300, Ofer Heifetz a écrit : > Hi, > > Well, Samba still disables splice support (hard coded), I applied your > patch (adding the SPLICE_F_NONBLOCK to the splice(sock, pipe)) and I > managed to write 4G file to Samba share. > > I did notice that the splice is done on buffers in two sizes: 1380 and > 2760 (when writing to share file), I guess that if I can get samba to > use bigger buffers it will reduce the splice calls and achieve better > performance. > Note that if your load increases or network is faster, splice will naturally use more data per call. Dont worry. Also, you can change MIN(count, 16384) to MIN(count, 65536) now the real samba bug is known and can be fixed (by the SPLICE_F_NONBLOCK patch I sent) (I guess using 16384 instead of 65536 was a try to reduce hang probability) > I also saw that when re-writing a file splice does use the maximum > buffer size (~16K) occasionally. max is 16 * PAGE_SIZE, 65536 bytes on x86 > > Need to perform some more testing with samba splice ... > > -Ofer > > -----Original Message----- > From: Eric Dumazet [mailto:eric.dumazet@gmail.com] > Sent: Monday, July 05, 2010 3:51 PM > To: Ofer Heifetz > Cc: Changli Gao; netdev@vger.kernel.org > Subject: RE: Splice status > > Le lundi 05 juillet 2010 à 13:52 +0300, Ofer Heifetz a écrit : > > I am using Samba, so from my understanding of the source code, it > loops and performs splice(sock, pipe) and splice(pipe, fd). There is no > flush of any sort in between. > > > > When you say drain you mean to flush all data to pipe? > > > > Draining pipe before splice() call would only trigger the bug less > often. > > splice(sock, pipe) can block if caller dont use appropriate "non > blocking pipe' splice() mode, even if pipe is empty before a splice() > call. > > Last time I checked, splice() code was disabled in samba. > > Is it a patched version ? > > Samba should add SPLICE_F_NONBLOCK to first splice() call (from sock to > pipe) > > (You also need a recent kernel, check for details : > http://patchwork.ozlabs.org/patch/34511/ ) > > diff --git a/source3/lib/recvfile.c b/source3/lib/recvfile.c > index ea01596..65e6f34 100644 > --- a/source3/lib/recvfile.c > +++ b/source3/lib/recvfile.c > @@ -182,7 +182,7 @@ ssize_t sys_recvfile(int fromfd, > int nread, to_write; > > nread = splice(fromfd, NULL, pipefd[1], NULL, > - MIN(count, 16384), SPLICE_F_MOVE); > + MIN(count, 16384), SPLICE_F_MOVE | SPLICE_F_NONBLOCK); > if (nread == -1) { > if (errno == EINTR) { > continue; > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Splice status 2010-07-05 12:50 ` Eric Dumazet 2010-07-05 13:47 ` Ofer Heifetz @ 2010-07-06 2:01 ` Changli Gao 2010-07-06 2:36 ` Ofer Heifetz 2010-07-06 3:56 ` Eric Dumazet 1 sibling, 2 replies; 25+ messages in thread From: Changli Gao @ 2010-07-06 2:01 UTC (permalink / raw) To: Eric Dumazet; +Cc: Jens Axboe, Ofer Heifetz, netdev On Mon, Jul 5, 2010 at 8:50 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le lundi 05 juillet 2010 à 13:52 +0300, Ofer Heifetz a écrit : >> I am using Samba, so from my understanding of the source code, it > loops and performs splice(sock, pipe) and splice(pipe, fd). There is no > flush of any sort in between. >> >> When you say drain you mean to flush all data to pipe? >> > > Draining pipe before splice() call would only trigger the bug less > often. If we don't drain the pipe before calling splice(2), the data spliced from pipe maybe not be what we expect. Then data corruption occurs. > > splice(sock, pipe) can block if caller dont use appropriate "non > blocking pipe' splice() mode, even if pipe is empty before a splice() > call. I don't think it is expected. The code of sys_recvfile is much like the sendfile(2) implementation in kernel. If sys_recvfile may block without non_block flag, sendfile(2) may block too. BTW: Samba can use sendfile(2) instead in sys_recvfile. -- Regards, Changli Gao(xiaosuo@gmail.com) ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: Splice status 2010-07-06 2:01 ` Changli Gao @ 2010-07-06 2:36 ` Ofer Heifetz 2010-07-06 3:56 ` Eric Dumazet 1 sibling, 0 replies; 25+ messages in thread From: Ofer Heifetz @ 2010-07-06 2:36 UTC (permalink / raw) To: Changli Gao, Eric Dumazet; +Cc: Jens Axboe, netdev Regarding your remark of replacing sendfile with recvfile, I have two questions: 1) what will be used if both are enabled in smb.conf 2) from your experience, which is faster for reading files? ________________________________________ From: Changli Gao [xiaosuo@gmail.com] Sent: Tuesday, July 06, 2010 5:01 AM To: Eric Dumazet Cc: Jens Axboe; Ofer Heifetz; netdev@vger.kernel.org Subject: Re: Splice status On Mon, Jul 5, 2010 at 8:50 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le lundi 05 juillet 2010 à 13:52 +0300, Ofer Heifetz a écrit : >> I am using Samba, so from my understanding of the source code, it > loops and performs splice(sock, pipe) and splice(pipe, fd). There is no > flush of any sort in between. >> >> When you say drain you mean to flush all data to pipe? >> > > Draining pipe before splice() call would only trigger the bug less > often. If we don't drain the pipe before calling splice(2), the data spliced from pipe maybe not be what we expect. Then data corruption occurs. > > splice(sock, pipe) can block if caller dont use appropriate "non > blocking pipe' splice() mode, even if pipe is empty before a splice() > call. I don't think it is expected. The code of sys_recvfile is much like the sendfile(2) implementation in kernel. If sys_recvfile may block without non_block flag, sendfile(2) may block too. BTW: Samba can use sendfile(2) instead in sys_recvfile. -- Regards, Changli Gao(xiaosuo@gmail.com) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Splice status 2010-07-06 2:01 ` Changli Gao 2010-07-06 2:36 ` Ofer Heifetz @ 2010-07-06 3:56 ` Eric Dumazet 2010-07-11 13:08 ` Changli Gao 1 sibling, 1 reply; 25+ messages in thread From: Eric Dumazet @ 2010-07-06 3:56 UTC (permalink / raw) To: Changli Gao; +Cc: Jens Axboe, Ofer Heifetz, netdev Le mardi 06 juillet 2010 à 10:01 +0800, Changli Gao a écrit : > On Mon, Jul 5, 2010 at 8:50 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > Le lundi 05 juillet 2010 à 13:52 +0300, Ofer Heifetz a écrit : > >> I am using Samba, so from my understanding of the source code, it > > loops and performs splice(sock, pipe) and splice(pipe, fd). There is no > > flush of any sort in between. > >> > >> When you say drain you mean to flush all data to pipe? > >> > > > > Draining pipe before splice() call would only trigger the bug less > > often. > > If we don't drain the pipe before calling splice(2), the data spliced > from pipe maybe not be what we expect. Then data corruption occurs. > This is not true. A pipe is a pipe is a buffer. You dont need it to be empty when using it. Nowhere in documentation its stated. However, a single skb can fill a pipe, even if "its empty" > > > > splice(sock, pipe) can block if caller dont use appropriate "non > > blocking pipe' splice() mode, even if pipe is empty before a splice() > > call. > > I don't think it is expected. The code of sys_recvfile is much like > the sendfile(2) implementation in kernel. If sys_recvfile may block > without non_block flag, sendfile(2) may block too. Then it would be a bug. You might fix it easily. Using splice() correctly (ie, not blocking on sock->pipe) should work too. Again, you can block on splice(sock, pipe), iff you have a second thread doing the opposite (pipe->file) in parallel to unblock you. But samba recvfile algo is using a single thread. > > BTW: Samba can use sendfile(2) instead in sys_recvfile. > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Splice status 2010-07-06 3:56 ` Eric Dumazet @ 2010-07-11 13:08 ` Changli Gao 2010-07-13 11:41 ` Ofer Heifetz 0 siblings, 1 reply; 25+ messages in thread From: Changli Gao @ 2010-07-11 13:08 UTC (permalink / raw) To: Eric Dumazet; +Cc: Jens Axboe, Ofer Heifetz, netdev On Tue, Jul 6, 2010 at 11:56 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le mardi 06 juillet 2010 à 10:01 +0800, Changli Gao a écrit : >> >> If we don't drain the pipe before calling splice(2), the data spliced >> from pipe maybe not be what we expect. Then data corruption occurs. >> > > This is not true. A pipe is a pipe is a buffer. You dont need it to be > empty when using it. Nowhere in documentation its stated. Do you mean splice(2) empties the pipe buffer before using it as an output buffer? If not, the pipe draining is needed to avoid data corruption. > > However, a single skb can fill a pipe, even if "its empty" > Yea. Because tcp_splice_read() doesn't know if the __tcp_splice_read returns due to pipe fulling. > >> > >> > splice(sock, pipe) can block if caller dont use appropriate "non >> > blocking pipe' splice() mode, even if pipe is empty before a splice() >> > call. >> >> I don't think it is expected. The code of sys_recvfile is much like >> the sendfile(2) implementation in kernel. If sys_recvfile may block >> without non_block flag, sendfile(2) may block too. > > Then it would be a bug. You might fix it easily. It seems reasonable. I'll fix it. > > Using splice() correctly (ie, not blocking on sock->pipe) should work > too. > > Again, you can block on splice(sock, pipe), iff you have a second thread > doing the opposite (pipe->file) in parallel to unblock you. But samba > recvfile algo is using a single thread. > -- Regards, Changli Gao(xiaosuo@gmail.com) ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: Splice status 2010-07-11 13:08 ` Changli Gao @ 2010-07-13 11:41 ` Ofer Heifetz 2010-07-13 12:32 ` Changli Gao 2010-07-13 14:11 ` Eric Dumazet 0 siblings, 2 replies; 25+ messages in thread From: Ofer Heifetz @ 2010-07-13 11:41 UTC (permalink / raw) To: Changli Gao, Eric Dumazet; +Cc: Jens Axboe, netdev Hi, I wanted to let you know that I have been testing Samba splice on Marvell 6282 SoC on 2.6.35_rc3 and noticed that it gave worst performance than not using it and also noticed that on re-writing file the iowait is high. iometer using 2G file (file is created before test) Splice write cpu% iow% ----------------------- No 58 98 0 Yes 14 100 48 iozone using 2G file (file created during test) Splice write cpu% iow% re-write cpu% iow% ------------------------------------------- No 35 85 4 58.2 70 0 Yes 33 85 4 15.7 100 58 Any clue why splice introduces a high iowait? I noticed samba uses up to 16K per splice syscall, changing the samba to try more did not help, so I guess it is a kernel limitation. -Ofer -----Original Message----- From: Changli Gao [mailto:xiaosuo@gmail.com] Sent: Sunday, July 11, 2010 4:09 PM To: Eric Dumazet Cc: Jens Axboe; Ofer Heifetz; netdev@vger.kernel.org Subject: Re: Splice status On Tue, Jul 6, 2010 at 11:56 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Le mardi 06 juillet 2010 à 10:01 +0800, Changli Gao a écrit : >> >> If we don't drain the pipe before calling splice(2), the data spliced >> from pipe maybe not be what we expect. Then data corruption occurs. >> > > This is not true. A pipe is a pipe is a buffer. You dont need it to be > empty when using it. Nowhere in documentation its stated. Do you mean splice(2) empties the pipe buffer before using it as an output buffer? If not, the pipe draining is needed to avoid data corruption. > > However, a single skb can fill a pipe, even if "its empty" > Yea. Because tcp_splice_read() doesn't know if the __tcp_splice_read returns due to pipe fulling. > >> > >> > splice(sock, pipe) can block if caller dont use appropriate "non >> > blocking pipe' splice() mode, even if pipe is empty before a splice() >> > call. >> >> I don't think it is expected. The code of sys_recvfile is much like >> the sendfile(2) implementation in kernel. If sys_recvfile may block >> without non_block flag, sendfile(2) may block too. > > Then it would be a bug. You might fix it easily. It seems reasonable. I'll fix it. > > Using splice() correctly (ie, not blocking on sock->pipe) should work > too. > > Again, you can block on splice(sock, pipe), iff you have a second thread > doing the opposite (pipe->file) in parallel to unblock you. But samba > recvfile algo is using a single thread. > -- Regards, Changli Gao(xiaosuo@gmail.com) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Splice status 2010-07-13 11:41 ` Ofer Heifetz @ 2010-07-13 12:32 ` Changli Gao 2010-07-13 12:42 ` Ofer Heifetz 2010-07-13 14:11 ` Eric Dumazet 1 sibling, 1 reply; 25+ messages in thread From: Changli Gao @ 2010-07-13 12:32 UTC (permalink / raw) To: Ofer Heifetz; +Cc: Eric Dumazet, Jens Axboe, netdev On Tue, Jul 13, 2010 at 7:41 PM, Ofer Heifetz <oferh@marvell.com> wrote: > Hi, > > I wanted to let you know that I have been testing Samba splice on Marvell 6282 SoC on 2.6.35_rc3 and noticed that it gave worst performance than not using it and also noticed that on re-writing file the iowait is high. > > iometer using 2G file (file is created before test) > > Splice write cpu% iow% > ----------------------- > No 58 98 0 > Yes 14 100 48 > > iozone using 2G file (file created during test) > > Splice write cpu% iow% re-write cpu% iow% > ------------------------------------------- > No 35 85 4 58.2 70 0 > Yes 33 85 4 15.7 100 58 > > Any clue why splice introduces a high iowait? > I noticed samba uses up to 16K per splice syscall, changing the samba to try more did not help, so I guess it is a kernel limitation. > > -Ofer > What does the column write means? And what do you mean by saying re-write? Thanks. -- Regards, Changli Gao(xiaosuo@gmail.com) ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: Splice status 2010-07-13 12:32 ` Changli Gao @ 2010-07-13 12:42 ` Ofer Heifetz 2010-07-13 13:58 ` Changli Gao 0 siblings, 1 reply; 25+ messages in thread From: Ofer Heifetz @ 2010-07-13 12:42 UTC (permalink / raw) To: Changli Gao; +Cc: Eric Dumazet, Jens Axboe, netdev Write and re-write numbers are in MBps. Iozone performs re-write meaning reads a chunk of data and writes it back, so basically the performance for this operation should be quiet high since kernel caches usage. I forgot to mention that I used EXT4 fs. -Ofer -----Original Message----- From: Changli Gao [mailto:xiaosuo@gmail.com] Sent: Tuesday, July 13, 2010 3:32 PM To: Ofer Heifetz Cc: Eric Dumazet; Jens Axboe; netdev@vger.kernel.org Subject: Re: Splice status On Tue, Jul 13, 2010 at 7:41 PM, Ofer Heifetz <oferh@marvell.com> wrote: > Hi, > > I wanted to let you know that I have been testing Samba splice on Marvell 6282 SoC on 2.6.35_rc3 and noticed that it gave worst performance than not using it and also noticed that on re-writing file the iowait is high. > > iometer using 2G file (file is created before test) > > Splice write cpu% iow% > ----------------------- > No 58 98 0 > Yes 14 100 48 > > iozone using 2G file (file created during test) > > Splice write cpu% iow% re-write cpu% iow% > ------------------------------------------- > No 35 85 4 58.2 70 0 > Yes 33 85 4 15.7 100 58 > > Any clue why splice introduces a high iowait? > I noticed samba uses up to 16K per splice syscall, changing the samba to try more did not help, so I guess it is a kernel limitation. > > -Ofer > What does the column write means? And what do you mean by saying re-write? Thanks. -- Regards, Changli Gao(xiaosuo@gmail.com) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Splice status 2010-07-13 12:42 ` Ofer Heifetz @ 2010-07-13 13:58 ` Changli Gao 2010-07-13 14:40 ` Ofer Heifetz 0 siblings, 1 reply; 25+ messages in thread From: Changli Gao @ 2010-07-13 13:58 UTC (permalink / raw) To: Ofer Heifetz; +Cc: Eric Dumazet, Jens Axboe, netdev On Tue, Jul 13, 2010 at 8:42 PM, Ofer Heifetz <oferh@marvell.com> wrote: > Write and re-write numbers are in MBps. > Iozone performs re-write meaning reads a chunk of data and writes it back, so basically the performance for this operation should be quiet high since kernel caches usage. > > I forgot to mention that I used EXT4 fs. Maybe it is caused by this line in generic_file_splice_write(): balance_dirty_pages_ratelimited_nr(mapping, nr_pages); Please try to test it again without this line. -- Regards, Changli Gao(xiaosuo@gmail.com) ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: Splice status 2010-07-13 13:58 ` Changli Gao @ 2010-07-13 14:40 ` Ofer Heifetz 0 siblings, 0 replies; 25+ messages in thread From: Ofer Heifetz @ 2010-07-13 14:40 UTC (permalink / raw) To: Changli Gao; +Cc: Eric Dumazet, Jens Axboe, netdev I profiled the splice iometer write run and noticed that blk_end_request_err is being called many times, it looks like a good candidate for the high iowait, need to debug the root cause for it. -Ofer -----Original Message----- From: Changli Gao [mailto:xiaosuo@gmail.com] Sent: Tuesday, July 13, 2010 4:58 PM To: Ofer Heifetz Cc: Eric Dumazet; Jens Axboe; netdev@vger.kernel.org Subject: Re: Splice status On Tue, Jul 13, 2010 at 8:42 PM, Ofer Heifetz <oferh@marvell.com> wrote: > Write and re-write numbers are in MBps. > Iozone performs re-write meaning reads a chunk of data and writes it back, so basically the performance for this operation should be quiet high since kernel caches usage. > > I forgot to mention that I used EXT4 fs. Maybe it is caused by this line in generic_file_splice_write(): balance_dirty_pages_ratelimited_nr(mapping, nr_pages); Please try to test it again without this line. -- Regards, Changli Gao(xiaosuo@gmail.com) ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: Splice status 2010-07-13 11:41 ` Ofer Heifetz 2010-07-13 12:32 ` Changli Gao @ 2010-07-13 14:11 ` Eric Dumazet 2010-07-14 15:08 ` Ofer Heifetz ` (2 more replies) 1 sibling, 3 replies; 25+ messages in thread From: Eric Dumazet @ 2010-07-13 14:11 UTC (permalink / raw) To: Ofer Heifetz; +Cc: Changli Gao, Jens Axboe, netdev Le mardi 13 juillet 2010 à 14:41 +0300, Ofer Heifetz a écrit : > Hi, > > I wanted to let you know that I have been testing Samba splice on Marvell 6282 SoC on 2.6.35_rc3 and noticed that it gave worst performance than not using it and also noticed that on re-writing file the iowait is high. > > iometer using 2G file (file is created before test) > > Splice write cpu% iow% > ----------------------- > No 58 98 0 > Yes 14 100 48 > > iozone using 2G file (file created during test) > > Splice write cpu% iow% re-write cpu% iow% > ------------------------------------------- > No 35 85 4 58.2 70 0 > Yes 33 85 4 15.7 100 58 > > Any clue why splice introduces a high iowait? > I noticed samba uses up to 16K per splice syscall, changing the samba to try more did not help, so I guess it is a kernel limitation. > splice(socket -> pipe) provides partial buffers (depending on the MTU) With typical MTU=1500 and tcp timestamps, each network frame contains 1448 bytes of payload, partially filling one page (of 4096 bytes) When doing the splice(pipe -> file), kernel has to coalesce partial data, but amount of written data per syscall() is small (about 20 Kbytes) Without splice(), the write() syscall provides more data, and vfs overhead is smaller as buffer size is a power of two. Samba uses a 128 KBytes TRANSFER_BUF_SIZE in its default_sys_recvfile() implementation, it easily outperforms splice() implementation. You could try extending pipe size (fcntl(fd, F_SETPIPE_SZ, 256)), maybe it will be a bit better. (and ask 256*4096 bytes to splice()) I tried this and got about 256Kbytes per splice() call... # perf report # Events: 13K # # Overhead Command Shared Object Symbol # ........ .............. ................. ...... # 8.69% splice-fromnet [kernel.kallsyms] [k] memcpy 3.82% splice-fromnet [kernel.kallsyms] [k] kunmap_atomic 3.51% splice-fromnet [kernel.kallsyms] [k] __block_prepare_write 2.79% splice-fromnet [kernel.kallsyms] [k] __skb_splice_bits 2.58% splice-fromnet [kernel.kallsyms] [k] ext3_mark_iloc_dirty 2.45% splice-fromnet [kernel.kallsyms] [k] do_get_write_access 2.04% splice-fromnet [kernel.kallsyms] [k] __find_get_block 1.89% splice-fromnet [kernel.kallsyms] [k] _raw_spin_lock 1.83% splice-fromnet [kernel.kallsyms] [k] journal_add_journal_head 1.46% splice-fromnet [bnx2x] [k] bnx2x_rx_int 1.46% splice-fromnet [kernel.kallsyms] [k] kfree 1.42% splice-fromnet [kernel.kallsyms] [k] journal_put_journal_head 1.29% splice-fromnet [kernel.kallsyms] [k] __ext3_get_inode_loc 1.26% splice-fromnet [kernel.kallsyms] [k] journal_dirty_metadata 1.25% splice-fromnet [kernel.kallsyms] [k] page_address 1.20% splice-fromnet [kernel.kallsyms] [k] journal_cancel_revoke 1.15% splice-fromnet [kernel.kallsyms] [k] tcp_read_sock 1.09% splice-fromnet [kernel.kallsyms] [k] unlock_buffer 1.09% splice-fromnet [kernel.kallsyms] [k] pipe_to_file 1.05% splice-fromnet [kernel.kallsyms] [k] radix_tree_lookup_element 1.04% splice-fromnet [kernel.kallsyms] [k] kmap_atomic_prot 1.04% splice-fromnet [kernel.kallsyms] [k] kmem_cache_free 1.03% splice-fromnet [kernel.kallsyms] [k] kmem_cache_alloc 1.01% splice-fromnet [bnx2x] [k] bnx2x_poll ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: Splice status 2010-07-13 14:11 ` Eric Dumazet @ 2010-07-14 15:08 ` Ofer Heifetz 2010-07-15 3:47 ` Ofer Heifetz 2010-07-25 14:47 ` Ofer Heifetz 2 siblings, 0 replies; 25+ messages in thread From: Ofer Heifetz @ 2010-07-14 15:08 UTC (permalink / raw) To: Eric Dumazet; +Cc: Changli Gao, Jens Axboe, netdev Hi, The strange thing happens when I configure it to use 256K using fcntl, I see that it uses splice with 1460 bytes, I tried making it smaller but no change. Any clue why I get smaller pipe though trying to resize it to 256K? BTW I changed the kernel to 2.6.35_rc5. -Ofer -----Original Message----- From: Eric Dumazet [mailto:eric.dumazet@gmail.com] Sent: Tuesday, July 13, 2010 5:12 PM To: Ofer Heifetz Cc: Changli Gao; Jens Axboe; netdev@vger.kernel.org Subject: RE: Splice status Le mardi 13 juillet 2010 à 14:41 +0300, Ofer Heifetz a écrit : > Hi, > > I wanted to let you know that I have been testing Samba splice on Marvell 6282 SoC on 2.6.35_rc3 and noticed that it gave worst performance than not using it and also noticed that on re-writing file the iowait is high. > > iometer using 2G file (file is created before test) > > Splice write cpu% iow% > ----------------------- > No 58 98 0 > Yes 14 100 48 > > iozone using 2G file (file created during test) > > Splice write cpu% iow% re-write cpu% iow% > ------------------------------------------- > No 35 85 4 58.2 70 0 > Yes 33 85 4 15.7 100 58 > > Any clue why splice introduces a high iowait? > I noticed samba uses up to 16K per splice syscall, changing the samba to try more did not help, so I guess it is a kernel limitation. > splice(socket -> pipe) provides partial buffers (depending on the MTU) With typical MTU=1500 and tcp timestamps, each network frame contains 1448 bytes of payload, partially filling one page (of 4096 bytes) When doing the splice(pipe -> file), kernel has to coalesce partial data, but amount of written data per syscall() is small (about 20 Kbytes) Without splice(), the write() syscall provides more data, and vfs overhead is smaller as buffer size is a power of two. Samba uses a 128 KBytes TRANSFER_BUF_SIZE in its default_sys_recvfile() implementation, it easily outperforms splice() implementation. You could try extending pipe size (fcntl(fd, F_SETPIPE_SZ, 256)), maybe it will be a bit better. (and ask 256*4096 bytes to splice()) I tried this and got about 256Kbytes per splice() call... # perf report # Events: 13K # # Overhead Command Shared Object Symbol # ........ .............. ................. ...... # 8.69% splice-fromnet [kernel.kallsyms] [k] memcpy 3.82% splice-fromnet [kernel.kallsyms] [k] kunmap_atomic 3.51% splice-fromnet [kernel.kallsyms] [k] __block_prepare_write 2.79% splice-fromnet [kernel.kallsyms] [k] __skb_splice_bits 2.58% splice-fromnet [kernel.kallsyms] [k] ext3_mark_iloc_dirty 2.45% splice-fromnet [kernel.kallsyms] [k] do_get_write_access 2.04% splice-fromnet [kernel.kallsyms] [k] __find_get_block 1.89% splice-fromnet [kernel.kallsyms] [k] _raw_spin_lock 1.83% splice-fromnet [kernel.kallsyms] [k] journal_add_journal_head 1.46% splice-fromnet [bnx2x] [k] bnx2x_rx_int 1.46% splice-fromnet [kernel.kallsyms] [k] kfree 1.42% splice-fromnet [kernel.kallsyms] [k] journal_put_journal_head 1.29% splice-fromnet [kernel.kallsyms] [k] __ext3_get_inode_loc 1.26% splice-fromnet [kernel.kallsyms] [k] journal_dirty_metadata 1.25% splice-fromnet [kernel.kallsyms] [k] page_address 1.20% splice-fromnet [kernel.kallsyms] [k] journal_cancel_revoke 1.15% splice-fromnet [kernel.kallsyms] [k] tcp_read_sock 1.09% splice-fromnet [kernel.kallsyms] [k] unlock_buffer 1.09% splice-fromnet [kernel.kallsyms] [k] pipe_to_file 1.05% splice-fromnet [kernel.kallsyms] [k] radix_tree_lookup_element 1.04% splice-fromnet [kernel.kallsyms] [k] kmap_atomic_prot 1.04% splice-fromnet [kernel.kallsyms] [k] kmem_cache_free 1.03% splice-fromnet [kernel.kallsyms] [k] kmem_cache_alloc 1.01% splice-fromnet [bnx2x] [k] bnx2x_poll ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: Splice status 2010-07-13 14:11 ` Eric Dumazet 2010-07-14 15:08 ` Ofer Heifetz @ 2010-07-15 3:47 ` Ofer Heifetz 2010-07-25 14:47 ` Ofer Heifetz 2 siblings, 0 replies; 25+ messages in thread From: Ofer Heifetz @ 2010-07-15 3:47 UTC (permalink / raw) To: Eric Dumazet; +Cc: Changli Gao, Jens Axboe, netdev Hi, I managed to get splice use up to 64K which look to me as a samba limitation (smb.conf SO_RCVBUF limitation I think) but still do not get any performance improvement using splice, the write numbers for splice are in about the same as for regular read/write though refraining from copy_to_user and copy_from_user. -Ofer -----Original Message----- From: Eric Dumazet [mailto:eric.dumazet@gmail.com] Sent: Tuesday, July 13, 2010 5:12 PM To: Ofer Heifetz Cc: Changli Gao; Jens Axboe; netdev@vger.kernel.org Subject: RE: Splice status Le mardi 13 juillet 2010 à 14:41 +0300, Ofer Heifetz a écrit : > Hi, > > I wanted to let you know that I have been testing Samba splice on Marvell 6282 SoC on 2.6.35_rc3 and noticed that it gave worst performance than not using it and also noticed that on re-writing file the iowait is high. > > iometer using 2G file (file is created before test) > > Splice write cpu% iow% > ----------------------- > No 58 98 0 > Yes 14 100 48 > > iozone using 2G file (file created during test) > > Splice write cpu% iow% re-write cpu% iow% > ------------------------------------------- > No 35 85 4 58.2 70 0 > Yes 33 85 4 15.7 100 58 > > Any clue why splice introduces a high iowait? > I noticed samba uses up to 16K per splice syscall, changing the samba to try more did not help, so I guess it is a kernel limitation. > splice(socket -> pipe) provides partial buffers (depending on the MTU) With typical MTU=1500 and tcp timestamps, each network frame contains 1448 bytes of payload, partially filling one page (of 4096 bytes) When doing the splice(pipe -> file), kernel has to coalesce partial data, but amount of written data per syscall() is small (about 20 Kbytes) Without splice(), the write() syscall provides more data, and vfs overhead is smaller as buffer size is a power of two. Samba uses a 128 KBytes TRANSFER_BUF_SIZE in its default_sys_recvfile() implementation, it easily outperforms splice() implementation. You could try extending pipe size (fcntl(fd, F_SETPIPE_SZ, 256)), maybe it will be a bit better. (and ask 256*4096 bytes to splice()) I tried this and got about 256Kbytes per splice() call... # perf report # Events: 13K # # Overhead Command Shared Object Symbol # ........ .............. ................. ...... # 8.69% splice-fromnet [kernel.kallsyms] [k] memcpy 3.82% splice-fromnet [kernel.kallsyms] [k] kunmap_atomic 3.51% splice-fromnet [kernel.kallsyms] [k] __block_prepare_write 2.79% splice-fromnet [kernel.kallsyms] [k] __skb_splice_bits 2.58% splice-fromnet [kernel.kallsyms] [k] ext3_mark_iloc_dirty 2.45% splice-fromnet [kernel.kallsyms] [k] do_get_write_access 2.04% splice-fromnet [kernel.kallsyms] [k] __find_get_block 1.89% splice-fromnet [kernel.kallsyms] [k] _raw_spin_lock 1.83% splice-fromnet [kernel.kallsyms] [k] journal_add_journal_head 1.46% splice-fromnet [bnx2x] [k] bnx2x_rx_int 1.46% splice-fromnet [kernel.kallsyms] [k] kfree 1.42% splice-fromnet [kernel.kallsyms] [k] journal_put_journal_head 1.29% splice-fromnet [kernel.kallsyms] [k] __ext3_get_inode_loc 1.26% splice-fromnet [kernel.kallsyms] [k] journal_dirty_metadata 1.25% splice-fromnet [kernel.kallsyms] [k] page_address 1.20% splice-fromnet [kernel.kallsyms] [k] journal_cancel_revoke 1.15% splice-fromnet [kernel.kallsyms] [k] tcp_read_sock 1.09% splice-fromnet [kernel.kallsyms] [k] unlock_buffer 1.09% splice-fromnet [kernel.kallsyms] [k] pipe_to_file 1.05% splice-fromnet [kernel.kallsyms] [k] radix_tree_lookup_element 1.04% splice-fromnet [kernel.kallsyms] [k] kmap_atomic_prot 1.04% splice-fromnet [kernel.kallsyms] [k] kmem_cache_free 1.03% splice-fromnet [kernel.kallsyms] [k] kmem_cache_alloc 1.01% splice-fromnet [bnx2x] [k] bnx2x_poll ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: Splice status 2010-07-13 14:11 ` Eric Dumazet 2010-07-14 15:08 ` Ofer Heifetz 2010-07-15 3:47 ` Ofer Heifetz @ 2010-07-25 14:47 ` Ofer Heifetz 2010-07-26 7:41 ` Changli Gao 2010-07-26 20:37 ` Jarek Poplawski 2 siblings, 2 replies; 25+ messages in thread From: Ofer Heifetz @ 2010-07-25 14:47 UTC (permalink / raw) To: Eric Dumazet; +Cc: Changli Gao, Jens Axboe, netdev Hi Eric, Still trying to get better performance with splice, I noticed that when using splice there are twice the file size memcpy (placed a counter in memcpy), I verified it via samba file transfer and splice-fromnet/out. I also used the splice-fromnet/out and using ftrace I did notice that data is copied twice using these routines: skb_splice_bits, pipe_to_file. I thought that the main goal of splice is to refrain from one copy when moving data from network to file descriptor. If there are the same number of memcpy and context switches and in samba copy additional vfs overhead it makes sense that splice deteriates the samba overall write performance. -Ofer -----Original Message----- From: Eric Dumazet [mailto:eric.dumazet@gmail.com] Sent: Tuesday, July 13, 2010 5:12 PM To: Ofer Heifetz Cc: Changli Gao; Jens Axboe; netdev@vger.kernel.org Subject: RE: Splice status Le mardi 13 juillet 2010 à 14:41 +0300, Ofer Heifetz a écrit : > Hi, > > I wanted to let you know that I have been testing Samba splice on Marvell 6282 SoC on 2.6.35_rc3 and noticed that it gave worst performance than not using it and also noticed that on re-writing file the iowait is high. > > iometer using 2G file (file is created before test) > > Splice write cpu% iow% > ----------------------- > No 58 98 0 > Yes 14 100 48 > > iozone using 2G file (file created during test) > > Splice write cpu% iow% re-write cpu% iow% > ------------------------------------------- > No 35 85 4 58.2 70 0 > Yes 33 85 4 15.7 100 58 > > Any clue why splice introduces a high iowait? > I noticed samba uses up to 16K per splice syscall, changing the samba to try more did not help, so I guess it is a kernel limitation. > splice(socket -> pipe) provides partial buffers (depending on the MTU) With typical MTU=1500 and tcp timestamps, each network frame contains 1448 bytes of payload, partially filling one page (of 4096 bytes) When doing the splice(pipe -> file), kernel has to coalesce partial data, but amount of written data per syscall() is small (about 20 Kbytes) Without splice(), the write() syscall provides more data, and vfs overhead is smaller as buffer size is a power of two. Samba uses a 128 KBytes TRANSFER_BUF_SIZE in its default_sys_recvfile() implementation, it easily outperforms splice() implementation. You could try extending pipe size (fcntl(fd, F_SETPIPE_SZ, 256)), maybe it will be a bit better. (and ask 256*4096 bytes to splice()) I tried this and got about 256Kbytes per splice() call... # perf report # Events: 13K # # Overhead Command Shared Object Symbol # ........ .............. ................. ...... # 8.69% splice-fromnet [kernel.kallsyms] [k] memcpy 3.82% splice-fromnet [kernel.kallsyms] [k] kunmap_atomic 3.51% splice-fromnet [kernel.kallsyms] [k] __block_prepare_write 2.79% splice-fromnet [kernel.kallsyms] [k] __skb_splice_bits 2.58% splice-fromnet [kernel.kallsyms] [k] ext3_mark_iloc_dirty 2.45% splice-fromnet [kernel.kallsyms] [k] do_get_write_access 2.04% splice-fromnet [kernel.kallsyms] [k] __find_get_block 1.89% splice-fromnet [kernel.kallsyms] [k] _raw_spin_lock 1.83% splice-fromnet [kernel.kallsyms] [k] journal_add_journal_head 1.46% splice-fromnet [bnx2x] [k] bnx2x_rx_int 1.46% splice-fromnet [kernel.kallsyms] [k] kfree 1.42% splice-fromnet [kernel.kallsyms] [k] journal_put_journal_head 1.29% splice-fromnet [kernel.kallsyms] [k] __ext3_get_inode_loc 1.26% splice-fromnet [kernel.kallsyms] [k] journal_dirty_metadata 1.25% splice-fromnet [kernel.kallsyms] [k] page_address 1.20% splice-fromnet [kernel.kallsyms] [k] journal_cancel_revoke 1.15% splice-fromnet [kernel.kallsyms] [k] tcp_read_sock 1.09% splice-fromnet [kernel.kallsyms] [k] unlock_buffer 1.09% splice-fromnet [kernel.kallsyms] [k] pipe_to_file 1.05% splice-fromnet [kernel.kallsyms] [k] radix_tree_lookup_element 1.04% splice-fromnet [kernel.kallsyms] [k] kmap_atomic_prot 1.04% splice-fromnet [kernel.kallsyms] [k] kmem_cache_free 1.03% splice-fromnet [kernel.kallsyms] [k] kmem_cache_alloc 1.01% splice-fromnet [bnx2x] [k] bnx2x_poll ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Splice status 2010-07-25 14:47 ` Ofer Heifetz @ 2010-07-26 7:41 ` Changli Gao 2010-07-26 20:37 ` Jarek Poplawski 1 sibling, 0 replies; 25+ messages in thread From: Changli Gao @ 2010-07-26 7:41 UTC (permalink / raw) To: Ofer Heifetz; +Cc: Eric Dumazet, Jens Axboe, netdev, Nick Piggin On Sun, Jul 25, 2010 at 10:47 PM, Ofer Heifetz <oferh@marvell.com> wrote: > Hi Eric, > > Still trying to get better performance with splice, I noticed that when using splice there are twice the file size memcpy (placed a counter in memcpy), I verified it via samba file transfer and splice-fromnet/out. > > I also used the splice-fromnet/out and using ftrace I did notice that data is copied twice using these routines: skb_splice_bits, pipe_to_file. > > I thought that the main goal of splice is to refrain from one copy when moving data from network to file descriptor. > > If there are the same number of memcpy and context switches and in samba copy additional vfs overhead it makes sense that splice deteriates the samba overall write performance. > The support for SPLICE_F_MOVE is removed by Nick in commit http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commitdiff;h=485ddb4b9741bafb70b22e5c1f9b4f37dc3e85bd . Nick, can we add it back now? -- Regards, Changli Gao(xiaosuo@gmail.com) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Splice status 2010-07-25 14:47 ` Ofer Heifetz 2010-07-26 7:41 ` Changli Gao @ 2010-07-26 20:37 ` Jarek Poplawski 2010-07-26 20:50 ` Eric Dumazet 1 sibling, 1 reply; 25+ messages in thread From: Jarek Poplawski @ 2010-07-26 20:37 UTC (permalink / raw) To: Ofer Heifetz; +Cc: Eric Dumazet, Changli Gao, Jens Axboe, netdev Ofer Heifetz wrote, On 25.07.2010 16:47: > Hi Eric, > > Still trying to get better performance with splice, I noticed that when using splice there are twice the file size memcpy (placed a counter in memcpy), I verified it via samba file transfer and splice-fromnet/out. > > I also used the splice-fromnet/out and using ftrace I did notice that data is copied twice using these routines: skb_splice_bits, pipe_to_file. I'm not sure you're using optimal NIC for splice, which should use skbs with almost all data paged (non-linear), like niu or myri10ge. Jarek P. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Splice status 2010-07-26 20:37 ` Jarek Poplawski @ 2010-07-26 20:50 ` Eric Dumazet 0 siblings, 0 replies; 25+ messages in thread From: Eric Dumazet @ 2010-07-26 20:50 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Ofer Heifetz, Changli Gao, Jens Axboe, netdev Le lundi 26 juillet 2010 à 22:37 +0200, Jarek Poplawski a écrit : > Ofer Heifetz wrote, On 25.07.2010 16:47: > > > Hi Eric, > > > > Still trying to get better performance with splice, I noticed that when using splice there are twice the file size memcpy (placed a counter in memcpy), I verified it via samba file transfer and splice-fromnet/out. > > > > I also used the splice-fromnet/out and using ftrace I did notice that data is copied twice using these routines: skb_splice_bits, pipe_to_file. > > > I'm not sure you're using optimal NIC for splice, which should use > skbs with almost all data paged (non-linear), like niu or myri10ge. > > Jarek P. Yes, I dont think splice() _should_ be faster, with a NIC delivering frames of 1460 (or less bytes), when disk IO should be performed with 4 Kbytes blocs (or a multiple) to get good performance. sendfile(file -> socket) is fast because blocs are pages, but splice(socket -> file) is not fast, unless the NIC is able to perform tcp receive offload. To take an analogy, think about libc stdio versus read(2)/write(2) syscalls. stdio, while doing copies in intermediate buffers, is able to be faster than read()/write() in most cases. Using splice() with 1460 bytes frames is like using read()/write() instead of nice sized buffers given by stdio layer. zero-copy can hurt badly if the IO sizes are not page aligned. ^ permalink raw reply [flat|nested] 25+ messages in thread
* splice status @ 2006-04-20 14:29 Jens Axboe 2006-04-20 23:00 ` Christoph Hellwig 0 siblings, 1 reply; 25+ messages in thread From: Jens Axboe @ 2006-04-20 14:29 UTC (permalink / raw) To: linux-kernel Hi, Since a lot of splice/tee stuff has been merged, I thought I'd post a little status report and other potentially useful info. - splice interfaces should be stable now, I don't envision any further changes to the ->splice_read or ->splice_write file_operations hooks or the splice syscall. splice now accepts an input or output offset like sendfile(), so it doesn't have to rely on ->f_pos in the file structure. - Ditto for the sys_tee syscall. - sendfile() will be replaced with splice(). sys_sendfile will remain of course, this is only an internal thing. The current do_splice_direct() is a sendfile() helper. The splice branch in the block git repo has a patch to remove generic_file_sendfile() and all it's users by converting them to ->splice_read(). There's also a patch there that fixes up loop. The only remaining users of the file_operations .sendfile hook are nfds/shmem/ext2-xip/relay. That still needs doing. The current plan is to merge this stuff post 2.6.17. I have a little collection of splice test tools that people may find useful to play with this stuff. It's in a git repo here: git://brick.kernel.dk/data/git/splice.git and snapshots are generated every hour on changes and can be fetched from http://brick.kernel.dk/snaps/ There are tools there to test both splice and tee, a little README explains the basic principle of them. I'd appreciate people testing and playing with these tools, just in case we still have some bugs lurking. Finally, known bugs: - Some smallish splice reads are buggy. Patch is in splice branch and will hopefully be merged whenever Linus gets in front of his computer. - The ->splice_pipe cache needs to be initialized to NULL on forks. Only affects do_splice_direct() usage, so not a problem in current kernels. Patch also Linus bound today. -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: splice status 2006-04-20 14:29 splice status Jens Axboe @ 2006-04-20 23:00 ` Christoph Hellwig 0 siblings, 0 replies; 25+ messages in thread From: Christoph Hellwig @ 2006-04-20 23:00 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-kernel On Thu, Apr 20, 2006 at 04:29:03PM +0200, Jens Axboe wrote: > converting them to ->splice_read(). There's also a patch there that > fixes up loop. It actuaklly breaks loop in various setup. You now directly call do_generic_file_read which is just a library function for filesystems. For example xfs or ocfs actually do need additional locking and/or other bits before calling it. So this absolutely has to go through a file operation. ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2010-07-26 20:50 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-07-05 9:26 Splice status Ofer Heifetz 2010-07-05 9:59 ` Changli Gao 2010-07-05 10:52 ` Ofer Heifetz 2010-07-05 12:08 ` Changli Gao 2010-07-05 12:50 ` Eric Dumazet 2010-07-05 13:47 ` Ofer Heifetz 2010-07-05 15:34 ` Eric Dumazet 2010-07-06 2:01 ` Changli Gao 2010-07-06 2:36 ` Ofer Heifetz 2010-07-06 3:56 ` Eric Dumazet 2010-07-11 13:08 ` Changli Gao 2010-07-13 11:41 ` Ofer Heifetz 2010-07-13 12:32 ` Changli Gao 2010-07-13 12:42 ` Ofer Heifetz 2010-07-13 13:58 ` Changli Gao 2010-07-13 14:40 ` Ofer Heifetz 2010-07-13 14:11 ` Eric Dumazet 2010-07-14 15:08 ` Ofer Heifetz 2010-07-15 3:47 ` Ofer Heifetz 2010-07-25 14:47 ` Ofer Heifetz 2010-07-26 7:41 ` Changli Gao 2010-07-26 20:37 ` Jarek Poplawski 2010-07-26 20:50 ` Eric Dumazet -- strict thread matches above, loose matches on Subject: below -- 2006-04-20 14:29 splice status Jens Axboe 2006-04-20 23:00 ` Christoph Hellwig
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.