linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.1 sendfile regression
@ 2004-01-10  0:01 Felix von Leitner
  2004-01-10  0:21 ` Andrew Morton
  2004-01-10  5:23 ` Lennert Buytenhek
  0 siblings, 2 replies; 7+ messages in thread
From: Felix von Leitner @ 2004-01-10  0:01 UTC (permalink / raw)
  To: linux-kernel

I'm getting a huge sendfile regression in 2.6.1; when serving a large
file on an IPv4 TCP socket via sendfile64, the transfer starts at about
4 MB (2.6.0: >7 MB) and the hard disk like stays on 100% (which normally
only happens if the file is badly fragmented, fragmentation of this file
is 0%).  Then, suddenly, the network performance drops dramatically,
getting worse and worse.  strace shows that the process is hanging
inside sendfile64 (which should not happen since the socket is
non-blocking).  The process then stays inside sendfile for up to a
minute or so and can't be interrupted or killed in that time.

Please fix!

Felix

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.1 sendfile regression
  2004-01-10  0:01 2.6.1 sendfile regression Felix von Leitner
@ 2004-01-10  0:21 ` Andrew Morton
  2004-01-10  0:52   ` Ram Pai
  2004-01-10  5:23 ` Lennert Buytenhek
  1 sibling, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2004-01-10  0:21 UTC (permalink / raw)
  To: Felix von Leitner; +Cc: linux-kernel, Ram Pai

Felix von Leitner <felix-kernel@fefe.de> wrote:
>
> I'm getting a huge sendfile regression in 2.6.1; when serving a large
> file on an IPv4 TCP socket via sendfile64, the transfer starts at about
> 4 MB (2.6.0: >7 MB) and the hard disk like stays on 100% (which normally
> only happens if the file is badly fragmented, fragmentation of this file
> is 0%).  Then, suddenly, the network performance drops dramatically,
> getting worse and worse.  strace shows that the process is hanging
> inside sendfile64 (which should not happen since the socket is
> non-blocking).  The process then stays inside sendfile for up to a
> minute or so and can't be interrupted or killed in that time.
> 

Probably it is the ill-advised readahead tweak.  Does the below patch fix
it up?

diff -puN mm/filemap.c~a mm/filemap.c
--- 25/mm/filemap.c~a	Fri Jan  9 16:20:32 2004
+++ 25-akpm/mm/filemap.c	Fri Jan  9 16:20:46 2004
@@ -596,12 +596,6 @@ void do_generic_mapping_read(struct addr
 	offset = *ppos & ~PAGE_CACHE_MASK;
 	last = (*ppos + desc->count) >> PAGE_CACHE_SHIFT;
 
-	/*
-	 * Let the readahead logic know upfront about all
-	 * the pages we'll need to satisfy this request
-	 */
-	for (; index < last; index++)
-		page_cache_readahead(mapping, ra, filp, index);
 	index = *ppos >> PAGE_CACHE_SHIFT;
 
 	for (;;) {

_


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.1 sendfile regression
  2004-01-10  0:21 ` Andrew Morton
@ 2004-01-10  0:52   ` Ram Pai
  2004-01-10  1:42     ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Ram Pai @ 2004-01-10  0:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Felix von Leitner, linux-kernel

On Fri, 2004-01-09 at 16:21, Andrew Morton wrote:
 
> Probably it is the ill-advised readahead tweak.  Does the below patch fix
> it up?
> 
> diff -puN mm/filemap.c~a mm/filemap.c
> --- 25/mm/filemap.c~a	Fri Jan  9 16:20:32 2004
> +++ 25-akpm/mm/filemap.c	Fri Jan  9 16:20:46 2004
> @@ -596,12 +596,6 @@ void do_generic_mapping_read(struct addr
>  	offset = *ppos & ~PAGE_CACHE_MASK;
>  	last = (*ppos + desc->count) >> PAGE_CACHE_SHIFT;
>  
> -	/*
> -	 * Let the readahead logic know upfront about all
> -	 * the pages we'll need to satisfy this request
> -	 */
> -	for (; index < last; index++)
> -		page_cache_readahead(mapping, ra, filp, index);
>  	index = *ppos >> PAGE_CACHE_SHIFT;
>  
>  	for (;;) {
> 

There is a small mistake in Andrew's patch.  The call to
page_cache_readahead() is missing.

Try this one.


*** linux-2.6.1-rc2/mm/filemap.c        Fri Jan  9 00:47:04 2004
--- linux-2.6.1-rc2/mm/filemap.c.1      Fri Jan  9 00:46:43 2004
***************
*** 587,608 ****
                             read_actor_t actor)
  {
        struct inode *inode = mapping->host;
!       unsigned long index, offset, last;
        struct page *cached_page;
        int error;

        cached_page = NULL;
        index = *ppos >> PAGE_CACHE_SHIFT;
        offset = *ppos & ~PAGE_CACHE_MASK;
-       last = (*ppos + desc->count) >> PAGE_CACHE_SHIFT;
- 
-       /*
-        * Let the readahead logic know upfront about all
-        * the pages we'll need to satisfy this request
-        */
-       for (; index < last; index++)
-               page_cache_readahead(mapping, ra, filp, index);
-       index = *ppos >> PAGE_CACHE_SHIFT;

        for (;;) {
                struct page *page;
--- 587,599 ----
                             read_actor_t actor)
  {
        struct inode *inode = mapping->host;
!       unsigned long index, offset;
        struct page *cached_page;
        int error;

        cached_page = NULL;
        index = *ppos >> PAGE_CACHE_SHIFT;
        offset = *ppos & ~PAGE_CACHE_MASK;

        for (;;) {
                struct page *page;
***************
*** 621,626 ****
--- 612,618 ----
                }

                cond_resched();
+               page_cache_readahead(mapping, ra, filp, index);

                nr = nr - offset;
  find_page:



> _
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.1 sendfile regression
  2004-01-10  0:52   ` Ram Pai
@ 2004-01-10  1:42     ` Andrew Morton
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2004-01-10  1:42 UTC (permalink / raw)
  To: Ram Pai; +Cc: felix-kernel, linux-kernel

Ram Pai <linuxram@us.ibm.com> wrote:
>
> There is a small mistake in Andrew's patch.  The call to
>  page_cache_readahead() is missing.
> 
>  Try this one.

Third time lucky.

diff -puN mm/filemap.c~readahead-partial-backout mm/filemap.c
--- 25/mm/filemap.c~readahead-partial-backout	2004-01-09 17:41:14.000000000 -0800
+++ 25-akpm/mm/filemap.c	2004-01-09 17:41:14.000000000 -0800
@@ -587,22 +587,13 @@ void do_generic_mapping_read(struct addr
 			     read_actor_t actor)
 {
 	struct inode *inode = mapping->host;
-	unsigned long index, offset, last;
+	unsigned long index, offset;
 	struct page *cached_page;
 	int error;
 
 	cached_page = NULL;
 	index = *ppos >> PAGE_CACHE_SHIFT;
 	offset = *ppos & ~PAGE_CACHE_MASK;
-	last = (*ppos + desc->count) >> PAGE_CACHE_SHIFT;
-
-	/*
-	 * Let the readahead logic know upfront about all
-	 * the pages we'll need to satisfy this request
-	 */
-	for (; index < last; index++)
-		page_cache_readahead(mapping, ra, filp, index);
-	index = *ppos >> PAGE_CACHE_SHIFT;
 
 	for (;;) {
 		struct page *page;
@@ -621,6 +612,7 @@ void do_generic_mapping_read(struct addr
 		}
 
 		cond_resched();
+		page_cache_readahead(mapping, ra, filp, index);
 
 		nr = nr - offset;
 find_page:

_


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.1 sendfile regression
  2004-01-10  0:01 2.6.1 sendfile regression Felix von Leitner
  2004-01-10  0:21 ` Andrew Morton
@ 2004-01-10  5:23 ` Lennert Buytenhek
  2004-01-12  2:58   ` Linus Torvalds
  1 sibling, 1 reply; 7+ messages in thread
From: Lennert Buytenhek @ 2004-01-10  5:23 UTC (permalink / raw)
  To: Felix von Leitner; +Cc: linux-kernel

On Sat, Jan 10, 2004 at 01:01:28AM +0100, Felix von Leitner wrote:

> strace shows that the process is hanging
> inside sendfile64 (which should not happen since the socket is
> non-blocking).

What if the data you're sending is not in the page cache?


--L

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.1 sendfile regression
  2004-01-10  5:23 ` Lennert Buytenhek
@ 2004-01-12  2:58   ` Linus Torvalds
  2004-01-12 11:31     ` Lennert Buytenhek
  0 siblings, 1 reply; 7+ messages in thread
From: Linus Torvalds @ 2004-01-12  2:58 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: Felix von Leitner, linux-kernel



On Sat, 10 Jan 2004, Lennert Buytenhek wrote:

> On Sat, Jan 10, 2004 at 01:01:28AM +0100, Felix von Leitner wrote:
> 
> > strace shows that the process is hanging
> > inside sendfile64 (which should not happen since the socket is
> > non-blocking).
> 
> What if the data you're sending is not in the page cache?

It will always block on the actual page cache, although we could try to 
change that. However, even if it blocks, it should only block at one page 
at a time (or "incidental" blockage due to memory allocations etc).

Blocking for long times implies a bug.

		Linus

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.1 sendfile regression
  2004-01-12  2:58   ` Linus Torvalds
@ 2004-01-12 11:31     ` Lennert Buytenhek
  0 siblings, 0 replies; 7+ messages in thread
From: Lennert Buytenhek @ 2004-01-12 11:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Felix von Leitner, linux-kernel

On Sun, Jan 11, 2004 at 06:58:55PM -0800, Linus Torvalds wrote:

> > > strace shows that the process is hanging
> > > inside sendfile64 (which should not happen since the socket is
> > > non-blocking).
> > 
> > What if the data you're sending is not in the page cache?
> 
> It will always block on the actual page cache, although we could try to 
> change that.

My impression is that this is the reason why AIO sendfile was attempted.
Although my wild guess is that that probably would only work on raw block
devices and still not on regular filesystem files.


> However, even if it blocks, it should only block at one page 
> at a time (or "incidental" blockage due to memory allocations etc).
> 
> Blocking for long times implies a bug.

(Even if it only blocks a page at a time, it will happily block on the
next page and the one after that as long as there is write space in the
destination socket?)


--L

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-01-12 11:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-10  0:01 2.6.1 sendfile regression Felix von Leitner
2004-01-10  0:21 ` Andrew Morton
2004-01-10  0:52   ` Ram Pai
2004-01-10  1:42     ` Andrew Morton
2004-01-10  5:23 ` Lennert Buytenhek
2004-01-12  2:58   ` Linus Torvalds
2004-01-12 11:31     ` Lennert Buytenhek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).